mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2014-09-06, 16:18   #1
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

16378 Posts
Default Accents and diacritical marks on PrimeNet lists

I just noticed that Top Producers listings that include characters with accent marks, are not showing up properly, whereas they used to up until very recently. Has something changed to affect how accented characters are displayed? This is happening on every PC I've checked with, so I know it's not a settings change at the user end.

To give a couple of examples: #485 in the TF Top Producers, André Jordi, is showing up as "Andr� Jordi", with a small box where the "é" should be. And #713, Jean-François Nies, is rendered as "Jean-Fran�ois Nies." You get the idea.

What happened to change this, and can it get fixed?

R�drig� (just kidding there)
Rodrigo is offline   Reply With Quote
Old 2014-09-06, 16:52   #2
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

23·359 Posts
Default

It looks like the pages are all setting their character set to UTF-8, but I bet nobody converted existing names in the database which are probably ISO-8559-1 encoded.
Mark Rose is offline   Reply With Quote
Old 2014-09-07, 06:01   #3
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

3×3,329 Posts
Default

I am seeing the '�' character under my normal Unicode encoding setting in Firefox. Changing the encoding to Western shows that space as containing- (see attachment). It displays yet other sets of characters under other settings like Central European Windows or ISO.

So this is just a bit of cleanup on the migration, as I understand your answer, Mark.
Attached Thumbnails
Click image for larger version

Name:	western_encode.JPG
Views:	100
Size:	26.8 KB
ID:	11675  

Last fiddled with by kladner on 2014-09-07 at 06:03 Reason: forgot attachment
kladner is offline   Reply With Quote
Old 2014-09-07, 16:26   #4
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

7,703 Posts
Default

Quote:
I am seeing the '�' character under my normal Unicode encoding setting in Firefox. Changing the encoding to Western shows that space as containing- (see attachment).
A sample from Chrome set to UTF-8.
Attached Thumbnails
Click image for larger version

Name:	special-characters.png
Views:	86
Size:	11.4 KB
ID:	11679  
Xyzzy is offline   Reply With Quote
Old 2014-09-07, 19:11   #5
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

92710 Posts
Default

FWIW, when I manually typed in the special characters to show what they were supposed to look like on the producers list, I entered the é as ALT-130, while the ç was ALT-135 (in Windows).

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2014-09-08, 02:05   #6
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

7,703 Posts
Default

It is interesting to see how the special characters appear when "viewed" by od:
Code:
$ echo "To give a couple of examples: #485 in the TF Top Producers, André Jordi, is showing up as "Andr� Jordi", with a small box where the "é" should be. And #713, Jean-François Nies, is rendered as "Jean-Fran�ois Nies." You get the idea." | od -c
0000000   T   o       g   i   v   e       a       c   o   u   p   l   e
0000020       o   f       e   x   a   m   p   l   e   s   :       #   4
0000040   8   5       i   n       t   h   e       T   F       T   o   p
0000060       P   r   o   d   u   c   e   r   s   ,       A   n   d   r
0000100 303 251       J   o   r   d   i   ,       i   s       s   h   o
0000120   w   i   n   g       u   p       a   s       A   n   d   r 357
0000140 277 275       J   o   r   d   i   ,       w   i   t   h       a
0000160       s   m   a   l   l       b   o   x       w   h   e   r   e
0000200       t   h   e     303 251       s   h   o   u   l   d       b
0000220   e   .       A   n   d       #   7   1   3   ,       J   e   a
0000240   n   -   F   r   a   n 303 247   o   i   s       N   i   e   s
0000260   ,       i   s       r   e   n   d   e   r   e   d       a   s
0000300       J   e   a   n   -   F   r   a   n 357 277 275   o   i   s
0000320       N   i   e   s   .       Y   o   u       g   e   t       t
0000340   h   e       i   d   e   a   .  \n
0000351
Xyzzy is offline   Reply With Quote
Old 2014-09-08, 22:48   #7
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

32·103 Posts
Default

I guess that the questions now are, is it possible to fix this, and what would the fix involve?

Maybe somebody has a way to convert the characters in some automated fashion. Or maybe somebody could volunteer to take guesses at what the special characters are supposed to be, and then feed them to the right person. I'd have to verify, but I'm confident I could get most of them right (though not nearly all without research, about 13/21 from the Top TF list).

Curiously, not all special characters got messed up this way. Check out #1951 on that list:

Code:
Ś�ṇȩł
Heh - a member name made up entirely of letters with diacritical marks. Maybe figuring out why some characters made it while others didn't, will yield useful clues.

Rodrigo

Last fiddled with by Rodrigo on 2014-09-08 at 22:48 Reason: typo
Rodrigo is offline   Reply With Quote
Old 2014-09-10, 05:46   #8
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

32×103 Posts
Thumbs up

I see now that the special characters have been fixed -- fabulous!

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2014-09-22, 16:46   #9
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

63158 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
I see now that the special characters have been fixed -- fabulous!

Rodrigo
On the hourly reports, it's a dump from SQL and the resulting file was not UTF-8. The fix was to do a post-dump conversion from ANSI to UTF-8 and that did the trick. We also made sure the encoding from the web server is UTF-8, and there are also a few ad-hoc reports that pull data from SQL which may or may not have some upper-ASCII characters, so those are slowly being fixed to make sure utf-8 is used to output anything where that could happen.

The database itself is using varchar for some columns where ideally it should be nvarchar, but after looking at it in total, changing those column definitions and all the other things that touch it would be a larger task, so we're hoping to "fix" it for now on the PHP side of things.

As it is, while accented characters are being stored and output in ANSI, I wondered what would happen if a Japanese user tried to set their public username to something in Hiragana ... the system probably wouldn't allow it or, if it did, the mix of encoding would produce some bizarre results. So yeah, it'd be good to use some proper fields long term. For now I'm just happy that the hourly reports are showing accented western characters properly now.
Madpoo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Primenet and GMP-ECM ET_ PrimeNet 9 2018-07-04 20:28
Wanted Lists R.D. Silverman Cunningham Tables 1 2010-09-21 16:16
56.0-57.x on PrimeNet v5 ckdo Lone Mersenne Hunters 0 2008-09-04 05:54
Question about work units and confusion about mailing lists jasong NFSNET Discussion 5 2006-05-17 01:42
True ignore lists? xilman Forum Feedback 1 2006-04-23 18:14

All times are UTC. The time now is 09:17.

Fri Oct 23 09:17:32 UTC 2020 up 43 days, 6:28, 0 users, load averages: 1.11, 1.54, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.