From 5b6291c214eca26df1c1b33683da351bbd9d61ee Mon Sep 17 00:00:00 2001 From: Kurt Hindenburg Date: Sun, 13 Nov 2011 16:34:50 -0500 Subject: [PATCH] Add some test files to check Konsole's unicode handling. --- tests/GLASS.utf8 | 170 ++++++++++++++++++++++++++++++++++ tests/UTF-8-demo.txt | 212 +++++++++++++++++++++++++++++++++++++++++++ tests/nfd_unicode.py | 16 ++++ tests/unicode.txt | 37 ++++++++ 4 files changed, 435 insertions(+) create mode 100644 tests/GLASS.utf8 create mode 100644 tests/UTF-8-demo.txt create mode 100644 tests/nfd_unicode.py create mode 100644 tests/unicode.txt diff --git a/tests/GLASS.utf8 b/tests/GLASS.utf8 new file mode 100644 index 00000000..610b595e --- /dev/null +++ b/tests/GLASS.utf8 @@ -0,0 +1,170 @@ +I Can Eat Glass +In various languages + +Adopted from http://www.columbia.edu/kermit/utf8.html#glass +Do not edit. Submit additions to the URL above and resynch. + +Permission is granted by the Kermit project (http://www.columbia.edu/kermit/) +to redistribute this file, with absolutely no warranty. + + + +Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥ +Sanskrit (standard transcription): kācaṃ śaknomyattum; nopahinasti mām. +Classical Greek: ὕαλον ϕαγεῖν δύναμαι· τοῦτο οὔ με βλάπτει. +Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα. +Etruscan: (NEEDED) +Latin: Vitrum edere possum; mihi non nocet. +Old French: Je puis mangier del voirre. Ne me nuit. +French: Je peux manger du verre, ça ne me fait pas de mal. +Provençal / Occitan: Pòdi manjar de veire, me nafrariá pas. +Québécois: J'peux manger d'la vitre, ça m'fa pas mal. +Walloon: Dji pou magnî do vêre, çoula m' freut nén må. +Champenois: (NEEDED) +Lorrain: (NEEDED) +Picard: Ch'peux mingi du verre, cha m'foé mie n'ma. +Corsican: (NEEDED) +Kreyòl Ayisyen: Mwen kap manje vè, li pa blese'm. +Basque: Kristala jan dezaket, ez dit minik ematen. +Catalan: Puc menjar vidre que no em fa mal. +Spanish: Puedo comer vidrio, no me hace daño. +Aragones: Puedo minchar beire, no me'n fa mal . +Galician: Eu podo xantar cristais e non cortarme. +Portuguese: Posso comer vidro, não me faz mal. +Brazilian Portuguese (7): Posso comer vidro, não me machuca. +Caboverdiano: M' podê cumê vidru, ca ta maguâ-m'. +Papiamentu: Ami por kome glas anto e no ta hasimi daño. +Italian: Posso mangiare il vetro e non mi fa male. +Milanese: Sôn bôn de magnà el véder, el me fa minga mal. +Roman: Me posso magna' er vetro, e nun me fa male. +Napoletano: M' pozz magna' o'vetr, e nun m' fa mal. +Sicilian: Puotsu mangiari u vitru, nun mi fa mali. +Venetian: Mi posso magnare el vetro, no'l me fa mae. +Zeneise (Genovese): Pòsso mangiâ o veddro e o no me fà mâ. +Rheto-Romance / Romansch: (NEEDED) +Romany / Tsigane: (NEEDED) +Romanian: Pot să mănânc sticlă și ea nu mă rănește. +Esperanto: Mi povas manĝi vitron, ĝi ne damaĝas min. +Pictish: (NEEDED) +Breton: (NEEDED) +Cornish: Mý a yl dybry gwéder hag éf ny wra ow ankenya. +Welsh: Dw i'n gallu bwyta gwydr, 'dyw e ddim yn gwneud dolur i mi. +Manx Gaelic: Foddym gee glonney agh cha jean eh gortaghey mee. +Old Irish (Ogham): ᚛᚛ᚉᚑᚅᚔᚉᚉᚔᚋ ᚔᚈᚔ ᚍᚂᚐᚅᚑ ᚅᚔᚋᚌᚓᚅᚐ᚜ +Old Irish (Latin): Con·iccim ithi nglano. Ním·géna. +Irish: Is féidir liom gloinne a ithe. Ní dhéanann sí dochar ar bith dom. +Scottish Gaelic: S urrainn dhomh gloinne ithe; cha ghoirtich i mi. +Anglo-Saxon (Runes): ᛁᚳ᛫ᛗᚨᚷ᛫ᚷᛚᚨᛋ᛫ᛖᚩᛏᚪᚾ᛫ᚩᚾᛞ᛫ᚻᛁᛏ᛫ᚾᛖ᛫ᚻᛖᚪᚱᛗᛁᚪᚧ᛫ᛗᛖ᛬ +Anglo-Saxon (Latin): Ic mæg glæs eotan ond hit ne hearmiað me. +Middle English: Ich canne glas eten and hit hirtiþ me nouȝt. +English: I can eat glass and it doesn't hurt me. +English (IPA): [aɪ kæn iːt glɑːs ænd ɪt dɐz nɒt hɜːt miː] (Received Pronunciation) +English (Braille): ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑ +Lalland Scots / Doric: Ah can eat gless, it disnae hurt us. +Glaswegian: (NEEDED) +Gothic (4): 𐌼𐌰𐌲 𐌲𐌻𐌴𐍃 𐌹̈𐍄𐌰𐌽, 𐌽𐌹 𐌼𐌹𐍃 𐍅𐌿 𐌽𐌳𐌰𐌽 𐌱𐍂𐌹𐌲𐌲𐌹𐌸. +Old Norse (Runes): ᛖᚴ ᚷᛖᛏ ᛖᛏᛁ ᚧ ᚷᛚᛖᚱ ᛘᚾ ᚦᛖᛋᛋ ᚨᚧ ᚡᛖ ᚱᚧᚨ ᛋᚨᚱ +Old Norse (Latin): Ek get etið gler án þess að verða sár. +Norsk / Norwegian (Nynorsk): Eg kan eta glas utan å skada meg. +Norsk / Norwegian (Bokmål): Jeg kan spise glass uten å skade meg. +Føroyskt / Faroese: (NEEDED) +Íslenska / Icelandic: Ég get etið gler án þess að meiða mig. +Svenska / Swedish: Jag kan äta glas utan att skada mig. +Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig. +Soenderjysk: Æ ka æe glass uhen at det go mæ naue. +Frysk / Frisian: Ik kin glês ite, it docht me net sear. +Nederlands / Dutch: Ik kan glas eten, het doet mij geen kwaad. +Kirchröadsj/Bôchesserplat: Iech ken glaas èèse, mer 't deet miech jing pieng. +Afrikaans: Ek kan glas eet, maar dit doen my nie skade nie. +Lëtzebuergescht / Luxemburgish: Ech kan Glas iessen, daat deet mir nët wei. +Deutsch / German: Ich kann Glas essen, ohne mir weh zu tun. +Ruhrdeutsch: Ich kann Glas verkasematuckeln, ohne dattet mich wat jucken tut. +Lausitzer Mundart ("Lusatian"): Ich koann Gloos assn und doas dudd merr ni wii. +Odenwälderisch: Iech konn glaasch voschbachteln ohne dass es mir ebbs daun doun dud. +Sächsisch / Saxon: 'sch kann Glos essn, ohne dass'sch mer wehtue. +Pfälzisch: Isch konn Glass fresse ohne dasses mer ebbes ausmache dud. +Schwäbisch / Swabian: I kå Glas frässa, ond des macht mr nix! +Bayrisch / Bavarian: I koh Glos esa, und es duard ma ned wei. +Allemannisch: I kaun Gloos essen, es tuat ma ned weh. +Schwyzerdütsch: Ich chan Glaas ässe, das tuet mir nöd weeh. +Hungarian: Meg tudom enni az üveget, nem lesz tőle bajom. +Suomi / Finnish: Voin syödä lasia, se ei vahingoita minua. +Sami (Northern): Sáhtán borrat lása, dat ii leat bávččas. +Erzian: Мон ярсан суликадо, ды зыян эйстэнзэ а ули. +Karelian: (NEEDED) +Vepsian: (NEEDED) +Votian: (NEEDED) +Livonian: (NEEDED) +Estonian: Ma võin klaasi süüa, see ei tee mulle midagi. +Latvian: Es varu ēst stiklu, tas man nekaitē. +Lithuanian: Aš galiu valgyti stiklą ir jis manęs nežeidžia +Old Prussian: (NEEDED) +Sorbian (Wendish): (NEEDED) +Czech: Mohu jíst sklo, neublíží mi. +Slovak: Môžem jesť sklo. Nezraní ma. +Polska / Polish: Mogę jeść szkło i mi nie szkodzi. +Slovenian: Lahko jem steklo, ne da bi mi škodovalo. +Croatian: Ja mogu jesti staklo i ne boli me. +Serbian (Latin): Mogu jesti staklo a da mi ne škodi. +Serbian (Cyrillic): Могу јести стакло а да ми не шкоди. +Macedonian: Можам да јадам стакло, а не ме штета. +Russian: Я могу есть стекло, оно мне не вредит. +Belarusian (Cyrillic): Я магу есці шкло, яно мне не шкодзіць. +Belarusian (Lacinka): Ja mahu jeści škło, jano mne ne škodzić. +Ukrainian: Я можу їсти шкло, й воно мені не пошкодить. +Bulgarian: Мога да ям стъкло, то не ми вреди. +Georgian: მინას ვჭამ და არა მტკივა. +Armenian: Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։ +Albanian: Unë mund të ha qelq dhe nuk më gjen gjë. +Turkish: Cam yiyebilirim, bana zararı dokunmaz. +Turkish (Ottoman): جام ييه بلورم بڭا ضررى طوقونمز +Bangla / Bengali: আমি কাঁচ খেতে পারি, তাতে আমার কোনো ক্ষতি হয় না। +Marathi: मी काच खाऊ शकतो, मला ते दुखत नाही. +Hindi: मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती. +Tamil: நான் கண்ணாடி சாப்பிடுவேன், அதனால் எனக்கு ஒரு கேடும் வராது. +Urdu(2): میں کانچ کھا سکتا ہوں اور مجھے تکلیف نہیں ہوتی ۔ +Pashto(2): زه شيشه خوړلې شم، هغه ما نه خوږوي +Farsi / Persian: .من می توانم بدونِ احساس درد شيشه بخورم +Arabic(2): أنا قادر على أكل الزجاج و هذا لا يؤلمني. +Aramaic: (NEEDED) +Hebrew(2): אני יכול לאכול זכוכית וזה לא מזיק לי. +Yiddish(2): איך קען עסן גלאָז און עס טוט מיר נישט װײ. +Judeo-Arabic: (NEEDED) +Ladino: (NEEDED) +Gǝʼǝz: (NEEDED) +Amharic: (NEEDED) +Twi: Metumi awe tumpan, ɜnyɜ me hwee. +Hausa (Latin): Inā iya taunar gilāshi kuma in gamā lāfiyā. +Hausa (Ajami) (2): إِنا إِىَ تَونَر غِلَاشِ كُمَ إِن غَمَا لَافِىَا +Yoruba(3): Mo lè je̩ dígí, kò ní pa mí lára. +(Ki)Swahili: Naweza kula bilauri na sikunyui. +Malay: Saya boleh makan kaca dan ia tidak mencederakan saya. +Tagalog: Kaya kong kumain nang bubog at hindi ako masaktan. +Chamorro: Siña yo' chumocho krestat, ti ha na'lalamen yo'. +Javanese: Aku isa mangan beling tanpa lara. +Burmese: (NEEDED) +Vietnamese (quốc ngữ): Tôi có thể ăn thủy tinh mà không hại gì. +Vietnamese (nôm) (4): 些 𣎏 世 咹 水 晶 𦓡 空 𣎏 害 咦 +Khmer: (NEEDED) +Lao: (NEEDED) +Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ +Mongolian (Cyrillic): Би шил идэй чадна, надад хортой биш +Mongolian (Classic) (5): ᠪᠢ ᠰᠢᠯᠢ ᠢᠳᠡᠶᠦ ᠴᠢᠳᠠᠨᠠ ᠂ ᠨᠠᠳᠤᠷ ᠬᠣᠤᠷᠠᠳᠠᠢ ᠪᠢᠰᠢ +Dzongkha: (NEEDED) +Nepali: (NEEDED) +Tibetan: ཤེལ་སྒོ་ཟ་ནས་ང་ན་གི་མ་རེད། +Chinese: 我能吞下玻璃而不伤身体。 +Chinese (Traditional): 我能吞下玻璃而不傷身體。 +Taiwanese(6): Góa ē-tàng chia̍h po-lê, mā bē tio̍h-siong. +Japanese: 私はガラスを食べられます。それは私を傷つけません。 +Korean: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 +Bislama: Mi save kakae glas, hemi no save katem mi. +Hawaiian: Hiki iaʻu ke ʻai i ke aniani; ʻaʻole nō lā au e ʻeha. +Marquesan: E koʻana e kai i te karahi, mea ʻā, ʻaʻe hauhau. +Chinook Jargon: Naika məkmək kakshət labutay, pi weyk ukuk munk-sik nay. +Navajo: Tsésǫʼ yishą́ągo bííníshghah dóó doo shił neezgai da. +Cherokee (and Cree, Ojibwa, Inuktitut, and other Native American languages): (NEEDED) +Garifuna: (NEEDED) +Gullah: (NEEDED) +Lojban: mi kakne le nu citka le blaci .iku'i le se go'i na xrani mi +Nórdicg: Ljœr ye caudran créneþ ý jor cẃran. diff --git a/tests/UTF-8-demo.txt b/tests/UTF-8-demo.txt new file mode 100644 index 00000000..4363f27b --- /dev/null +++ b/tests/UTF-8-demo.txt @@ -0,0 +1,212 @@ + +UTF-8 encoded sample plain-text file +‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ + +Markus Kuhn [ˈmaʳkʊs kuːn] — 2002-07-25 + + +The ASCII compatible UTF-8 encoding used in this plain-text file +is defined in Unicode, ISO 10646-1, and RFC 2279. + + +Using Unicode/UTF-8, you can write in emails and source code things such as + +Mathematics and sciences: + + ∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ⎧⎡⎛┌─────┐⎞⎤⎫ + ⎪⎢⎜│a²+b³ ⎟⎥⎪ + ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β), ⎪⎢⎜│───── ⎟⎥⎪ + ⎪⎢⎜⎷ c₈ ⎟⎥⎪ + ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⎨⎢⎜ ⎟⎥⎬ + ⎪⎢⎜ ∞ ⎟⎥⎪ + ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (⟦A⟧ ⇔ ⟪B⟫), ⎪⎢⎜ ⎲ ⎟⎥⎪ + ⎪⎢⎜ ⎳aⁱ-bⁱ⎟⎥⎪ + 2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm ⎩⎣⎝i=1 ⎠⎦⎭ + +Linguistics and dictionaries: + + ði ıntəˈnæʃənəl fəˈnɛtık əsoʊsiˈeıʃn + Y [ˈʏpsilɔn], Yen [jɛn], Yoga [ˈjoːgɑ] + +APL: + + ((V⍳V)=⍳⍴V)/V←,V ⌷←⍳→⍴∆∇⊃‾⍎⍕⌈ + +Nicer typography in plain text files: + + ╔══════════════════════════════════════════╗ + ║ ║ + ║ • ‘single’ and “double” quotes ║ + ║ ║ + ║ • Curly apostrophes: “We’ve been here” ║ + ║ ║ + ║ • Latin-1 apostrophe and accents: '´` ║ + ║ ║ + ║ • ‚deutsche‘ „Anführungszeichen“ ║ + ║ ║ + ║ • †, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║ + ║ ║ + ║ • ASCII safety test: 1lI|, 0OD, 8B ║ + ║ ╭─────────╮ ║ + ║ • the euro symbol: │ 14.95 € │ ║ + ║ ╰─────────╯ ║ + ╚══════════════════════════════════════════╝ + +Combining characters: + + STARGΛ̊TE SG-1, a = v̇ = r̈, a⃑ ⊥ b⃑ + +Greek (in Polytonic): + + The Greek anthem: + + Σὲ γνωρίζω ἀπὸ τὴν κόψη + τοῦ σπαθιοῦ τὴν τρομερή, + σὲ γνωρίζω ἀπὸ τὴν ὄψη + ποὺ μὲ βία μετράει τὴ γῆ. + + ᾿Απ᾿ τὰ κόκκαλα βγαλμένη + τῶν ῾Ελλήνων τὰ ἱερά + καὶ σὰν πρῶτα ἀνδρειωμένη + χαῖρε, ὦ χαῖρε, ᾿Ελευθεριά! + + From a speech of Demosthenes in the 4th century BC: + + Οὐχὶ ταὐτὰ παρίσταταί μοι γιγνώσκειν, ὦ ἄνδρες ᾿Αθηναῖοι, + ὅταν τ᾿ εἰς τὰ πράγματα ἀποβλέψω καὶ ὅταν πρὸς τοὺς + λόγους οὓς ἀκούω· τοὺς μὲν γὰρ λόγους περὶ τοῦ + τιμωρήσασθαι Φίλιππον ὁρῶ γιγνομένους, τὰ δὲ πράγματ᾿ + εἰς τοῦτο προήκοντα, ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αὐτοὶ + πρότερον κακῶς σκέψασθαι δέον. οὐδέν οὖν ἄλλο μοι δοκοῦσιν + οἱ τὰ τοιαῦτα λέγοντες ἢ τὴν ὑπόθεσιν, περὶ ἧς βουλεύεσθαι, + οὐχὶ τὴν οὖσαν παριστάντες ὑμῖν ἁμαρτάνειν. ἐγὼ δέ, ὅτι μέν + ποτ᾿ ἐξῆν τῇ πόλει καὶ τὰ αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον + τιμωρήσασθαι, καὶ μάλ᾿ ἀκριβῶς οἶδα· ἐπ᾿ ἐμοῦ γάρ, οὐ πάλαι + γέγονεν ταῦτ᾿ ἀμφότερα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν + προλαβεῖν ἡμῖν εἶναι τὴν πρώτην, ὅπως τοὺς συμμάχους + σώσομεν. ἐὰν γὰρ τοῦτο βεβαίως ὑπάρξῃ, τότε καὶ περὶ τοῦ + τίνα τιμωρήσεταί τις καὶ ὃν τρόπον ἐξέσται σκοπεῖν· πρὶν δὲ + τὴν ἀρχὴν ὀρθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι περὶ τῆς + τελευτῆς ὁντινοῦν ποιεῖσθαι λόγον. + + Δημοσθένους, Γ´ ᾿Ολυνθιακὸς + +Georgian: + + From a Unicode conference invitation: + + გთხოვთ ახლავე გაიაროთ რეგისტრაცია Unicode-ის მეათე საერთაშორისო + კონფერენციაზე დასასწრებად, რომელიც გაიმართება 10-12 მარტს, + ქ. მაინცში, გერმანიაში. კონფერენცია შეჰკრებს ერთად მსოფლიოს + ექსპერტებს ისეთ დარგებში როგორიცაა ინტერნეტი და Unicode-ი, + ინტერნაციონალიზაცია და ლოკალიზაცია, Unicode-ის გამოყენება + ოპერაციულ სისტემებსა, და გამოყენებით პროგრამებში, შრიფტებში, + ტექსტების დამუშავებასა და მრავალენოვან კომპიუტერულ სისტემებში. + +Russian: + + From a Unicode conference invitation: + + Зарегистрируйтесь сейчас на Десятую Международную Конференцию по + Unicode, которая состоится 10-12 марта 1997 года в Майнце в Германии. + Конференция соберет широкий круг экспертов по вопросам глобального + Интернета и Unicode, локализации и интернационализации, воплощению и + применению Unicode в различных операционных системах и программных + приложениях, шрифтах, верстке и многоязычных компьютерных системах. + +Thai (UCS Level 2): + + Excerpt from a poetry on The Romance of The Three Kingdoms (a Chinese + classic 'San Gua'): + + [----------------------------|------------------------] + ๏ แผ่นดินฮั่นเสื่อมโทรมแสนสังเวช พระปกเกศกองบู๊กู้ขึ้นใหม่ + สิบสองกษัตริย์ก่อนหน้าแลถัดไป สององค์ไซร้โง่เขลาเบาปัญญา + ทรงนับถือขันทีเป็นที่พึ่ง บ้านเมืองจึงวิปริตเป็นนักหนา + โฮจิ๋นเรียกทัพทั่วหัวเมืองมา หมายจะฆ่ามดชั่วตัวสำคัญ + เหมือนขับไสไล่เสือจากเคหา รับหมาป่าเข้ามาเลยอาสัญ + ฝ่ายอ้องอุ้นยุแยกให้แตกกัน ใช้สาวนั้นเป็นชนวนชื่นชวนใจ + พลันลิฉุยกุยกีกลับก่อเหตุ ช่างอาเพศจริงหนาฟ้าร้องไห้ + ต้องรบราฆ่าฟันจนบรรลัย ฤๅหาใครค้ำชูกู้บรรลังก์ ฯ + + (The above is a two-column text. If combining characters are handled + correctly, the lines of the second column should be aligned with the + | character above.) + +Ethiopian: + + Proverbs in the Amharic language: + + ሰማይ አይታረስ ንጉሥ አይከሰስ። + ብላ ካለኝ እንደአባቴ በቆመጠኝ። + ጌጥ ያለቤቱ ቁምጥና ነው። + ደሀ በሕልሙ ቅቤ ባይጠጣ ንጣት በገደለው። + የአፍ ወለምታ በቅቤ አይታሽም። + አይጥ በበላ ዳዋ ተመታ። + ሲተረጉሙ ይደረግሙ። + ቀስ በቀስ፥ ዕንቁላል በእግሩ ይሄዳል። + ድር ቢያብር አንበሳ ያስር። + ሰው እንደቤቱ እንጅ እንደ ጉረቤቱ አይተዳደርም። + እግዜር የከፈተውን ጉሮሮ ሳይዘጋው አይድርም። + የጎረቤት ሌባ፥ ቢያዩት ይስቅ ባያዩት ያጠልቅ። + ሥራ ከመፍታት ልጄን ላፋታት። + ዓባይ ማደሪያ የለው፥ ግንድ ይዞ ይዞራል። + የእስላም አገሩ መካ የአሞራ አገሩ ዋርካ። + ተንጋሎ ቢተፉ ተመልሶ ባፉ። + ወዳጅህ ማር ቢሆን ጨርስህ አትላሰው። + እግርህን በፍራሽህ ልክ ዘርጋ። + +Runes: + + ᚻᛖ ᚳᚹᚫᚦ ᚦᚫᛏ ᚻᛖ ᛒᚢᛞᛖ ᚩᚾ ᚦᚫᛗ ᛚᚪᚾᛞᛖ ᚾᚩᚱᚦᚹᛖᚪᚱᛞᚢᛗ ᚹᛁᚦ ᚦᚪ ᚹᛖᛥᚫ + + (Old English, which transcribed into Latin reads 'He cwaeth that he + bude thaem lande northweardum with tha Westsae.' and means 'He said + that he lived in the northern land near the Western Sea.') + +Braille: + + ⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌ + + ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠙⠑⠁⠙⠒ ⠞⠕ ⠃⠑⠛⠔ ⠺⠊⠹⠲ ⡹⠻⠑ ⠊⠎ ⠝⠕ ⠙⠳⠃⠞ + ⠱⠁⠞⠑⠧⠻ ⠁⠃⠳⠞ ⠹⠁⠞⠲ ⡹⠑ ⠗⠑⠛⠊⠌⠻ ⠕⠋ ⠙⠊⠎ ⠃⠥⠗⠊⠁⠇ ⠺⠁⠎ + ⠎⠊⠛⠝⠫ ⠃⠹ ⠹⠑ ⠊⠇⠻⠛⠹⠍⠁⠝⠂ ⠹⠑ ⠊⠇⠻⠅⠂ ⠹⠑ ⠥⠝⠙⠻⠞⠁⠅⠻⠂ + ⠁⠝⠙ ⠹⠑ ⠡⠊⠑⠋ ⠍⠳⠗⠝⠻⠲ ⡎⠊⠗⠕⠕⠛⠑ ⠎⠊⠛⠝⠫ ⠊⠞⠲ ⡁⠝⠙ + ⡎⠊⠗⠕⠕⠛⠑⠰⠎ ⠝⠁⠍⠑ ⠺⠁⠎ ⠛⠕⠕⠙ ⠥⠏⠕⠝ ⠰⡡⠁⠝⠛⠑⠂ ⠋⠕⠗ ⠁⠝⠹⠹⠔⠛ ⠙⠑ + ⠡⠕⠎⠑ ⠞⠕ ⠏⠥⠞ ⠙⠊⠎ ⠙⠁⠝⠙ ⠞⠕⠲ + + ⡕⠇⠙ ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ + + ⡍⠔⠙⠖ ⡊ ⠙⠕⠝⠰⠞ ⠍⠑⠁⠝ ⠞⠕ ⠎⠁⠹ ⠹⠁⠞ ⡊ ⠅⠝⠪⠂ ⠕⠋ ⠍⠹ + ⠪⠝ ⠅⠝⠪⠇⠫⠛⠑⠂ ⠱⠁⠞ ⠹⠻⠑ ⠊⠎ ⠏⠜⠞⠊⠊⠥⠇⠜⠇⠹ ⠙⠑⠁⠙ ⠁⠃⠳⠞ + ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ ⡊ ⠍⠊⠣⠞ ⠙⠁⠧⠑ ⠃⠑⠲ ⠔⠊⠇⠔⠫⠂ ⠍⠹⠎⠑⠇⠋⠂ ⠞⠕ + ⠗⠑⠛⠜⠙ ⠁ ⠊⠕⠋⠋⠔⠤⠝⠁⠊⠇ ⠁⠎ ⠹⠑ ⠙⠑⠁⠙⠑⠌ ⠏⠊⠑⠊⠑ ⠕⠋ ⠊⠗⠕⠝⠍⠕⠝⠛⠻⠹ + ⠔ ⠹⠑ ⠞⠗⠁⠙⠑⠲ ⡃⠥⠞ ⠹⠑ ⠺⠊⠎⠙⠕⠍ ⠕⠋ ⠳⠗ ⠁⠝⠊⠑⠌⠕⠗⠎ + ⠊⠎ ⠔ ⠹⠑ ⠎⠊⠍⠊⠇⠑⠆ ⠁⠝⠙ ⠍⠹ ⠥⠝⠙⠁⠇⠇⠪⠫ ⠙⠁⠝⠙⠎ + ⠩⠁⠇⠇ ⠝⠕⠞ ⠙⠊⠌⠥⠗⠃ ⠊⠞⠂ ⠕⠗ ⠹⠑ ⡊⠳⠝⠞⠗⠹⠰⠎ ⠙⠕⠝⠑ ⠋⠕⠗⠲ ⡹⠳ + ⠺⠊⠇⠇ ⠹⠻⠑⠋⠕⠗⠑ ⠏⠻⠍⠊⠞ ⠍⠑ ⠞⠕ ⠗⠑⠏⠑⠁⠞⠂ ⠑⠍⠏⠙⠁⠞⠊⠊⠁⠇⠇⠹⠂ ⠹⠁⠞ + ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ + + (The first couple of paragraphs of "A Christmas Carol" by Dickens) + +Compact font selection example text: + + ABCDEFGHIJKLMNOPQRSTUVWXYZ /0123456789 + abcdefghijklmnopqrstuvwxyz £©µÀÆÖÞßéöÿ + –—‘“”„†•…‰™œŠŸž€ ΑΒΓΔΩαβγδω АБВГДабвгд + ∀∂∈ℝ∧∪≡∞ ↑↗↨↻⇣ ┐┼╔╘░►☺♀ fi�⑀₂ἠḂӥẄɐː⍎אԱა + +Greetings in various languages: + + Hello world, Καλημέρα κόσμε, コンニチハ + +Box drawing alignment tests: █ + ▉ + ╔══╦══╗ ┌──┬──┐ ╭──┬──╮ ╭──┬──╮ ┏━━┳━━┓ ┎┒┏┑ ╷ ╻ ┏┯┓ ┌┰┐ ▊ ╱╲╱╲╳╳╳ + ║┌─╨─┐║ │╔═╧═╗│ │╒═╪═╕│ │╓─╁─╖│ ┃┌─╂─┐┃ ┗╃╄┙ ╶┼╴╺╋╸┠┼┨ ┝╋┥ ▋ ╲╱╲╱╳╳╳ + ║│╲ ╱│║ │║ ║│ ││ │ ││ │║ ┃ ║│ ┃│ ╿ │┃ ┍╅╆┓ ╵ ╹ ┗┷┛ └┸┘ ▌ ╱╲╱╲╳╳╳ + ╠╡ ╳ ╞╣ ├╢ ╟┤ ├┼─┼─┼┤ ├╫─╂─╫┤ ┣┿╾┼╼┿┫ ┕┛┖┚ ┌┄┄┐ ╎ ┏┅┅┓ ┋ ▍ ╲╱╲╱╳╳╳ + ║│╱ ╲│║ │║ ║│ ││ │ ││ │║ ┃ ║│ ┃│ ╽ │┃ ░░▒▒▓▓██ ┊ ┆ ╎ ╏ ┇ ┋ ▎ + ║└─╥─┘║ │╚═╤═╝│ │╘═╪═╛│ │╙─╀─╜│ ┃└─╂─┘┃ ░░▒▒▓▓██ ┊ ┆ ╎ ╏ ┇ ┋ ▏ + ╚══╩══╝ └──┴──┘ ╰──┴──╯ ╰──┴──╯ ┗━━┻━━┛ ▗▄▖▛▀▜ └╌╌┘ ╎ ┗╍╍┛ ┋ ▁▂▃▄▅▆▇█ + ▝▀▘▙▄▟ diff --git a/tests/nfd_unicode.py b/tests/nfd_unicode.py new file mode 100644 index 00000000..9b27c502 --- /dev/null +++ b/tests/nfd_unicode.py @@ -0,0 +1,16 @@ +#! /usr/bin/python +import unicodedata + +# https://bugs.kde.org/show_bug.cgi?id=96536 + +print "The same word should be displayed 4 times." +print +u = u'Ha\u0308mikon' +u1 = unicodedata.normalize('NFC', u) +u2 = unicodedata.normalize('NFD', u) +u3 = unicodedata.normalize('NFKD', u) +u4 = unicodedata.normalize('NFKC', u) +print u1, u2, u3, u4 + + + diff --git a/tests/unicode.txt b/tests/unicode.txt new file mode 100644 index 00000000..404e6801 --- /dev/null +++ b/tests/unicode.txt @@ -0,0 +1,37 @@ +#!/usr/bin/python + +# A list of some trouble-some chars + +# Some taken from https://bugs.kde.org/show_bug.cgi?id=210329 +print(u'\u0307') +print(u'\u2500') +print(u'\u2501') +print(u'\u2502') +print(u'\u2503') +print(u'\u2504') +print(u'\u2505') +print(u'\u2506') +print(u'\u2507') +print(u'\u2508') +print(u'\u2509') +print(u'\u250A') +print(u'\u250B') +print(u'\u250C') +print(u'\u250D') +print(u'\u250E') +print(u'\u250F') + +print(u'\u254C') +print(u'\u254D') +print(u'\u254E') +print(u'\u254F') + +print(u'\u256D') +print(u'\u256E') +print(u'\u256F') + +print(u'\u2570') +print(u'\u2571') +print(u'\u2572') +print(u'\u2573') +