mersenneforum.org Accents and diacritical marks on PrimeNet lists
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2014-09-06, 16:18 #1 Rodrigo     Jun 2010 Pennsylvania 16378 Posts Accents and diacritical marks on PrimeNet lists I just noticed that Top Producers listings that include characters with accent marks, are not showing up properly, whereas they used to up until very recently. Has something changed to affect how accented characters are displayed? This is happening on every PC I've checked with, so I know it's not a settings change at the user end. To give a couple of examples: #485 in the TF Top Producers, André Jordi, is showing up as "Andr� Jordi", with a small box where the "é" should be. And #713, Jean-François Nies, is rendered as "Jean-Fran�ois Nies." You get the idea. What happened to change this, and can it get fixed? R�drig� (just kidding there)
 2014-09-06, 16:52 #2 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 23·359 Posts It looks like the pages are all setting their character set to UTF-8, but I bet nobody converted existing names in the database which are probably ISO-8559-1 encoded.
 2014-09-07, 06:01 #3 kladner     "Kieren" Jul 2011 In My Own Galaxy! 3×3,329 Posts I am seeing the '�' character under my normal Unicode encoding setting in Firefox. Changing the encoding to Western shows that space as containing- (see attachment). It displays yet other sets of characters under other settings like Central European Windows or ISO. So this is just a bit of cleanup on the migration, as I understand your answer, Mark. Attached Thumbnails   Last fiddled with by kladner on 2014-09-07 at 06:03 Reason: forgot attachment
2014-09-07, 16:26   #4
Xyzzy

"Mike"
Aug 2002

7,703 Posts

Quote:
 I am seeing the '�' character under my normal Unicode encoding setting in Firefox. Changing the encoding to Western shows that space as containing- (see attachment).
A sample from Chrome set to UTF-8.
Attached Thumbnails

 2014-09-07, 19:11 #5 Rodrigo     Jun 2010 Pennsylvania 92710 Posts FWIW, when I manually typed in the special characters to show what they were supposed to look like on the producers list, I entered the é as ALT-130, while the ç was ALT-135 (in Windows). Rodrigo
 2014-09-08, 02:05 #6 Xyzzy     "Mike" Aug 2002 7,703 Posts It is interesting to see how the special characters appear when "viewed" by od: Code: \$ echo "To give a couple of examples: #485 in the TF Top Producers, André Jordi, is showing up as "Andr� Jordi", with a small box where the "é" should be. And #713, Jean-François Nies, is rendered as "Jean-Fran�ois Nies." You get the idea." | od -c 0000000 T o g i v e a c o u p l e 0000020 o f e x a m p l e s : # 4 0000040 8 5 i n t h e T F T o p 0000060 P r o d u c e r s , A n d r 0000100 303 251 J o r d i , i s s h o 0000120 w i n g u p a s A n d r 357 0000140 277 275 J o r d i , w i t h a 0000160 s m a l l b o x w h e r e 0000200 t h e 303 251 s h o u l d b 0000220 e . A n d # 7 1 3 , J e a 0000240 n - F r a n 303 247 o i s N i e s 0000260 , i s r e n d e r e d a s 0000300 J e a n - F r a n 357 277 275 o i s 0000320 N i e s . Y o u g e t t 0000340 h e i d e a . \n 0000351
 2014-09-08, 22:48 #7 Rodrigo     Jun 2010 Pennsylvania 32·103 Posts I guess that the questions now are, is it possible to fix this, and what would the fix involve? Maybe somebody has a way to convert the characters in some automated fashion. Or maybe somebody could volunteer to take guesses at what the special characters are supposed to be, and then feed them to the right person. I'd have to verify, but I'm confident I could get most of them right (though not nearly all without research, about 13/21 from the Top TF list). Curiously, not all special characters got messed up this way. Check out #1951 on that list: Code: Ś�ṇȩł Heh - a member name made up entirely of letters with diacritical marks. Maybe figuring out why some characters made it while others didn't, will yield useful clues. Rodrigo Last fiddled with by Rodrigo on 2014-09-08 at 22:48 Reason: typo
 2014-09-10, 05:46 #8 Rodrigo     Jun 2010 Pennsylvania 32×103 Posts I see now that the special characters have been fixed -- fabulous! Rodrigo
2014-09-22, 16:46   #9
Serpentine Vermin Jar

Jul 2014

63158 Posts

Quote:
 Originally Posted by Rodrigo I see now that the special characters have been fixed -- fabulous! Rodrigo
On the hourly reports, it's a dump from SQL and the resulting file was not UTF-8. The fix was to do a post-dump conversion from ANSI to UTF-8 and that did the trick. We also made sure the encoding from the web server is UTF-8, and there are also a few ad-hoc reports that pull data from SQL which may or may not have some upper-ASCII characters, so those are slowly being fixed to make sure utf-8 is used to output anything where that could happen.

The database itself is using varchar for some columns where ideally it should be nvarchar, but after looking at it in total, changing those column definitions and all the other things that touch it would be a larger task, so we're hoping to "fix" it for now on the PHP side of things.

As it is, while accented characters are being stored and output in ANSI, I wondered what would happen if a Japanese user tried to set their public username to something in Hiragana ... the system probably wouldn't allow it or, if it did, the mix of encoding would produce some bizarre results. So yeah, it'd be good to use some proper fields long term. For now I'm just happy that the hourly reports are showing accented western characters properly now.

 Similar Threads Thread Thread Starter Forum Replies Last Post ET_ PrimeNet 9 2018-07-04 20:28 R.D. Silverman Cunningham Tables 1 2010-09-21 16:16 ckdo Lone Mersenne Hunters 0 2008-09-04 05:54 jasong NFSNET Discussion 5 2006-05-17 01:42 xilman Forum Feedback 1 2006-04-23 18:14

All times are UTC. The time now is 09:17.

Fri Oct 23 09:17:32 UTC 2020 up 43 days, 6:28, 0 users, load averages: 1.11, 1.54, 1.69