![]() |
|
|
#1 |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
10,753 Posts |
There is something not quite right about the treatment of many Unicode characters by the forum software. All code points below 007F are handled correctly. Given that this is the range of traditional US-ASCII it is what should be expected.
The pound sterling, £, comes through unscathed. This is U+A3 and I deduce that all below U+FF are fine. Likewise, the Euro, € is handled correctly. This one is U+20A0. https://mersenneforum.org/showpost.p...2&postcount=34 contains a few characters with code points just above 012000. One of them is the Sumerian character GAL, which has code point 0120F2. If I type a GAL ( 𒃲 ) into this composition window it now displays perfectly. It also displays perfectly in a Preview window. In the post referred to above, it is replaced by six U+FFFD characters. These display as a white question mark within a black diamond on my display. Subsequent posts will attempt a binary search on the range U+2000 through U+12000 in attempt to discover what the forum software handles correctly and where it is broken. My guess that the breaking point lies at a binary boundary such as U+4000, U+8000 or U_10000. Note that the breakage may lie in any of the software between accepting a post and serving it up to a reader. Given experience with Preview, my guess is that the underlying database is not UTF8-clean. |
|
|
|
|
|
#2 | |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
10,753 Posts |
Quote:
The GAL now displays perfectly in the quoted post (and this one), yet the posts in the referenced post are still incorrect. Very tempted to go back to that post and edit it. Last fiddled with by xilman on 2020-03-18 at 18:45 Reason: Added "(and this one)" |
|
|
|
|
|
|
#3 | |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
9,787 Posts |
Quote:
|
|
|
|
|
|
|
#4 |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
10,753 Posts |
Thanks. I confirm it still displays correctly in your reply.
However, the GAL now shows as six U+FFFD characters post #2 yet remains correct everywhere else (for the time being). Curiouser and curiouser. Could you report whether you see the same as me in the amantes linguam Latinam post please? Or, for that matter, whether the GAL turns into six U+FFFD characters in any of the above. Last fiddled with by xilman on 2020-03-18 at 20:29 |
|
|
|
|
|
#5 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
9,787 Posts |
I see rhomboids with ? inside each, in that thread and in your quote. Is it your entry method? Alt-#### vs cut-and-paste?
|
|
|
|
|
|
#6 | |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
10,753 Posts |
Quote:
The characters you see are the common representation of U+FFFD. Ah, just found Ctrl-Shift-u-xxxx. Giving it a try ... Yup works a treat. Let's see whether the GAL above survives a few round trips. Last fiddled with by xilman on 2020-03-18 at 21:09 Reason: Found Ctrl-Shift-u-0-1-2-0-f-2 |
|
|
|
|
|
|
#7 |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
101010000000012 Posts |
Nope. It didn't even survive its first exposure. Not that I expected to to be treated any differently.
This is really irritating. The specific case of Sumerian cuneiform is relatively unimportant but it is a symptom of an underlying cause which may be more serious. Something in the pipeline is quite clearly not fully UTF-8 compatible. (Incidentally, we saw a number of such issues in FlyBase when it migrated circa 2008. It was a real bugger tracking down and fixing them.) |
|
|
|
|
|
#8 |
|
Romulan Interpreter
Jun 2011
Thailand
22·33·89 Posts |
rhomboids with question marks here too.
Is that (maybe) a local setting I can make? (like fonts to download, or change forum default font?) |
|
|
|
|
|
#9 | |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
10,753 Posts |
Quote:
To test this possibility I saved the downloaded bytes and examined them with od(1). The glyphs you describe are indeed the U+FFFD characters and not the original cuneiform. Your browser then displays them as such. Precisely two posts in this thread still contain the correct characters: my original and uncwilly's first response. All the rest have been corrupted in some manner. No idea what's going on; I suspect that only someone with superuser access to the forum software and database will be able to take corrective action. The only workaround I can suggest is hinted at by my signature: Last fiddled with by xilman on 2020-03-22 at 17:01 |
|
|
|
|
|
|
#10 |
|
"Mike"
Aug 2002
202016 Posts |
𒀀 𒀁 𒀂 𒀃 𒀄 𒀅 𒀆 𒀇 𒀈 𒀉 𒀊 𒀋 𒀌 𒀍 𒀎 𒀏 𒀐 𒀑 𒀒 𒀓 𒀔 𒀕 𒀖 𒀗 𒀘 𒀙 𒀚 𒀛 𒀜 𒀝 𒀞 𒀟 𒀠 𒀡 𒀢 𒀣 𒀤 𒀥 𒀦 𒀧 𒀨 𒀩 𒀪 𒀫 𒀬 𒀭 𒀮 𒀯 𒀰 𒀱 𒀲 𒀳 𒀴 𒀵 𒀶 𒀷 𒀸 𒀹 𒀺 𒀻 𒀼 𒀽 𒀾 𒀿 𒁀 𒁁 𒁂 𒁃 𒁄 𒁅 𒁆 𒁇 𒁈 𒁉 𒁊 𒁋 𒁌 𒁍 𒁎 𒁏 𒁐 𒁑 𒁒 𒁓 𒁔 𒁕 𒁖 𒁗 𒁘 𒁙 𒁚 𒁛 𒁜 𒁝 𒁞 𒁟 𒁠 𒁡 𒁢 𒁣 𒁤 𒁥 𒁦 𒁧 𒁨 𒁩 𒁪 𒁫 𒁬 𒁭 𒁮 𒁯 𒁰 𒁱 𒁲 𒁳 𒁴 𒁵 𒁶 𒁷 𒁸 𒁹 𒁺 𒁻 𒁼 𒁽 𒁾 𒁿 𒂀 𒂁 𒂂 𒂃 𒂄 𒂅 𒂆 𒂇 𒂈 𒂉 𒂊 𒂋 𒂌 𒂍 𒂎 𒂏 𒂐 𒂑 𒂒 𒂓 𒂔 𒂕 𒂖 𒂗 𒂘 𒂙 𒂚 𒂛 𒂜 𒂝 𒂞 𒂟 𒂠 𒂡 𒂢 𒂣 𒂤 𒂥 𒂦 𒂧 𒂨 𒂩 𒂪 𒂫 𒂬 𒂭 𒂮 𒂯 𒂰 𒂱 𒂲 𒂳 𒂴 𒂵 𒂶 𒂷 𒂸 𒂹 𒂺 𒂻 𒂼 𒂽 𒂾 𒂿 𒃀 𒃁 𒃂 𒃃 𒃄 𒃅 𒃆 𒃇 𒃈 𒃉 𒃊 𒃋 𒃌 𒃍 𒃎 𒃏 𒃐 𒃑 𒃒 𒃓 𒃔 𒃕 𒃖 𒃗 𒃘 𒃙 𒃚 𒃛 𒃜 𒃝 𒃞 𒃟 𒃠 𒃡 𒃢 𒃣 𒃤 𒃥 𒃦 𒃧 𒃨 𒃩 𒃪 𒃫 𒃬 𒃭 𒃮 𒃯 𒃰 𒃱 𒃲 𒃳 𒃴 𒃵 𒃶 𒃷 𒃸 𒃹 𒃺 𒃻 𒃼 𒃽 𒃾 𒃿 𒄀 𒄁 𒄂 𒄃 𒄄 𒄅 𒄆 𒄇 𒄈 𒄉 𒄊 𒄋 𒄌 𒄍 𒄎 𒄏 𒄐 𒄑 𒄒 𒄓 𒄔 𒄕 𒄖 𒄗 𒄘 𒄙 𒄚 𒄛 𒄜 𒄝 𒄞 𒄟 𒄠 𒄡 𒄢 𒄣 𒄤 𒄥 𒄦 𒄧 𒄨 𒄩 𒄪 𒄫 𒄬
|
|
|
|
|
|
#11 |
|
"Mike"
Aug 2002
25·257 Posts |
Any previously entered text has probably been mangled by the forum software.
New text should be treated properly.
|
|
|
|