mersenneforum.org Language localization for GIMPS software.
 Register FAQ Search Today's Posts Mark Forums Read

2021-08-07, 12:45   #23
kar_bon

Mar 2006
Germany

2×5×293 Posts

In my Wiki I've the option to include all languages by using an own page.

For example I've just created a page for the readme.txt file with just pre'ed the text (not all is included).
Quote:
To include another language, a subpage with the language-code is used:
So the German page is found under
Quote:
The first line in those pages is the template to show all available languages and to switch easily between those.

 2021-08-07, 16:23 #24 M344587487     "Composite as Heck" Oct 2017 53·7 Posts Personally I don't see the point of localising the program, I know it's an easy thing to say as a native english speaker but adding technical debt for minimal benefit doesn't seem worthwhile. That said like any good internet slacker if it has to be done I have opinions on how to do it: Enforce strict UTF-8 as the character encoding, it's the winner of the format war all other encodings need not apply. Validation is simple, editing is natural, and at least with a Linux terminal display seems to just work after setting a UTF-8 locale (my terminal is UTF-8 by default albeit en_GB, no idea what significance if any there is to printing with regional locales). Alternatively ignore locale and instead convert non-ascii code points to "\uxxxE" format if that turns out to increase portability. Either way whether it actually displays properly is largely out of your hands AFAIK, there's probably a cross-platform display library if it isn't trivially portable +1 to iso dates It looks ugly in code, but as suggested replacing individual words/phrases and static sentences could make sense. The one thing you do not want to do is expose entire printf format strings replacement-codes-and-all to users, that would be a nightmare to sanitise and would be a trivial way to break the program I'd be inclined to compile the translations into the binaries instead of exposing user-breakable ini files to users. Set up a git repo with the english reference that translators can do pull requests to to add translations, the validation could also be done at compile time If compiling the translations in you could potentially allow translation of full printf format strings containing replacement codes, but the parameters would still need to be in the same order Variable args in any order would require translators to define the entire printf command instead of just part/all of a format string. To do that while allowing the language to be chosen at runtime probably requires function pointer wrappers for every printf command unless there's magic to be done (macro or otherwise), essentially the translation file creates a bank of output functions for a language that can be switched to at runtime if that language is chosen Alternatively, baking a single language in at compile time could be done easily by just replacing all printf's with unique defines that are defined in the translation file, easier format and implementation but a binary per language is not ideal. Please no "british english" translation, I know american is a strange dialect but I think we can muddle through
2021-08-07, 17:37   #25
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·7·73 Posts

Quote:
 Originally Posted by M344587487 Personally I don't see the point of localising the program... That said like any good internet slacker if it has to be done I have opinions on how to do it
How someone else ought do it, right? And you've plenty company.
I think there are better uses for George's time or other program authors than generating x OS-specific times y localization-specific executables and zip files and uploading the lot. And redoing that at every version update or bug fix update.
Suppose the localization was in a plain text file, and included a line of special secret security sauce that was content-dependent.
If the file is called for by a localization line in local.txt, and passes the security check, it gets loaded. If it fails the security check the program falls back to the author's originally coded language, which is typically American english.
That would allow x OS-specific images to load any one of y-1 or y (depending on implementation) localization files upon user choice, and prevent end user tampering.

Last fiddled with by kriesel on 2021-08-07 at 17:42

2021-08-07, 19:20   #26
M344587487

"Composite as Heck"
Oct 2017

53·7 Posts

Quote:
 Originally Posted by kriesel How someone else ought do it, right? And you've plenty company.
Of course, as I said I don't think it's necessary so am unlikely to be inspired to implement it, but it doesn't stop me from reasoning out how it could be done.

Quote:
 Originally Posted by kriesel I think there are better uses for George's time or other program authors than generating x OS-specific times y localization-specific executables and zip files and uploading the lot. And redoing that at every version update or bug fix update. Suppose the localization was in a plain text file, and included a line of special secret security sauce that was content-dependent. If the file is called for by a localization line in local.txt, and passes the security check, it gets loaded. If it fails the security check the program falls back to the author's originally coded language, which is typically American english. That would allow x OS-specific images to load any one of y-1 or y (depending on implementation) localization files upon user choice, and prevent end user tampering.
Compiling localisations into the binary does not mean a binary per language, only one of the bullet points mentioned that and not in a favourable way. The third from bottom bullet point is what I settled on as a good enough compromise between programmer-ease and user-ease. Once an english translation file was implemented a translator would copy it, rename it from en to whatever (the functions/variables/array-pointers can be auto-mangled with __FILE__ if necessary), then the translator proceeds to edit only the s/printf lines within the functions to do the translation and they are done. The wrapper function names and parameters remain untouched, but passed to the wrapped s/printf in any order by the translator as required. It would only take a few comments to teach a non-programmer how to edit the file to do the translation, validation from translator-error would be largely free thanks to the compiler and a cursory glance can weed out anything malicious. Whenever a new language is ready it just requires a few minor edits to the code to include the new translation file and register the language as an option (that could be automated too but probably not worth the effort).

What you're suggesting could work but involves "special secret security sauce", special handling and a custom format for argument ordering. You end up with something far more complicated than just sucking all output into hot-swappable function pointers which is all my suggestion boils down to.

 2021-08-07, 22:19 #28 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 2×3×1,709 Posts The localization does not need to require compilation for each new language. There aren't that many messages. Have a single variable in local.ini that is set at install with the preferred language. Have a single file that contains all the items that are language variant, in all the languages. On boot, Prime95 reads the local and then from the language file reads in all the messages in that language. The language file can even have the language choices in the first section. As new languages are added, the language file can be updated with no need to change the code. There can be some minor bit shift or something to prevent the average user from corrupting the language file by manual editing (because they won't understand what they are seeing).
 2021-08-07, 22:28 #29 M344587487     "Composite as Heck" Oct 2017 53×7 Posts There's a few hundred messages at least with english maybe up to 400 or so, spread over sprintf, fprintf, printf, excluding sqlite, json and three of the 4 architecture directories that have identical source. Hard to tell precisely even with grep as there are many calls spread over multiple lines and many with no english so a simple grep won't tell the whole story. cout and snprintf don't appear to be a factor, probably missing some others that are though. Code: grep -r -P "[ \t][fs]{0,1}printf" ./
 2021-08-08, 08:06 #30 S485122     "Jacob" Sep 2006 Brussels, Belgium 177710 Posts In my opinion, the messages from the software do not need to be translated : most of the concepts are new to the users and even if translated need explanation. One would also need to translate the users possible answers and their treatment, mprime for instance has some dialogues that expect a text answer as "y/n". The proposed localisation work would make it logical to extend it to the results. This in turn would mean the servers routines should also be adapted. On the other hand, and as some others have already said, the readme and undoc files would benefit from translation. This is a simple thing to do even if it would take time and energy. Those "translations" could also be expanded to explain the concepts, the program messages and settings (even in English ;-) As a side note amongst the languages proposed I miss the USA's second language : Spanish. It has a user base equivalent to Hindi. Last fiddled with by S485122 on 2021-08-08 at 08:10 Reason: fiddling after posting ;-(
2021-08-08, 08:15   #31
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

3·5·743 Posts

Quote:
 Originally Posted by S485122 In my opinion, the messages from the software do not need to be translated : most of the concepts are new to the users and even if translated need explanation. One would also need to translate the users possible answers and their treatment, mprime for instance has some dialogues that expect a text answer as "y/n". The proposed localisation work would make it logical to extend it to the results. This in turn would mean the servers routines should also be adapted. On the other hand, and as some others have already said, the readme and undoc files would benefit from translation. This is a simple thing to do even if it would take time and energy. Those "translations" could also be expanded to explain the concepts, the program messages and settings (even in English ;-) As a side note amongst the languages proposed I miss the USA's second language : Spanish. It has a user base equivalent to Hindi.
+1

 2021-08-08, 14:48 #32 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 22·3·7·73 Posts Spanish is indeed widely used; #4 for total speakers, #2 for native speakers. And like English, there are regional differences in Spanish; fortunately less in written than spoken form. https://en.wikipedia.org/wiki/Spanis..._and_varieties Arabic is also common. https://www.k-international.com/blog/learn-a-language/ Number of people per language is not quite the metric we're looking for. (Total speakers is an available proxy for number of readers.) Something like steepest increase of GIMPS participation or GHD/day gained per unit localization effort expended by the available volunteer pool might be closer to the mark, and very hard to estimate. A mostly-computerized search I made in prime95 v30.5b2 source code could be summarized as follows. Search all source files *.c or *.cpp for lines containing strings printf or cout, dump into file each occurrence. Strip preceding indentation or conditionals or labels Remove records that are only comments, or that produce JSON format results records, which won't be localized. Sort the result of the above. Note there was no cout found. Remove duplicates to obtain a sorted list of unique printf statements. Remove lines that appear to be irrelevant to localization considerations. This involves some judgment. Result is ~482 distinct cases. Probably a low figure, since following lines of multiline printfs get missed using this method, yielding only "sprintf(buf," in several cases that probably differed on their following lines. I think the actual number is ~490 +-~10% unique print statements relevant to localization. Occurrence rate of a specific *printf line was seen to vary significantly; maximum was 12. Average is around 1.7, estimating by original file size to final file size comparison. The grep line posted earlier, while very usefully concise, would have missed all the _tprintf (12) and _stprintf (1) occurring in the prime95 source if I read it correctly. Or in Gpuowl, would have missed snprintf, or vaprintf which IIRC was used in some versions. "The ultimate introductory guide to software localization" advises to place each language's translation in a separate file, and to use a standard such as JSON for the file content's format, determined partly by support in the development environment, which IIRC is MS VS for prime95. Keeping the executable size smaller will be an advantage for those who run code on small-ram devices such as Raspberry Pis, compute sticks, cell phones, ancient laptops, etc. I assume localization would apply initially to some future version released. There may be some utility to also backfitting earlier versions later. Note the variety of version numbers versus OS on https://www.mersenne.org/download/, or the common use still of Gpuowl v6.11-380 or -364 or v7.2-53. Last fiddled with by kriesel on 2021-08-08 at 15:00
2021-08-08, 15:21   #33
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

101011100010012 Posts

Quote:
 Originally Posted by kriesel Spanish is indeed widely used; #4 for total speakers, #2 for native speakers. And like English, there are regional differences in Spanish; fortunately less in written than spoken form. https://en.wikipedia.org/wiki/Spanis..._and_varieties
Indeed. I am (slowly) learning Canario.

Last fiddled with by xilman on 2021-08-08 at 15:54

 Similar Threads Thread Thread Starter Forum Replies Last Post kriesel kriesel 8 2021-09-13 16:45 Dr Sardonicus Lounge 28 2018-10-10 19:52 BillMMar Information & Answers 6 2010-05-02 22:23 Orgasmic Troll Lounge 2 2005-11-29 16:52 delta_t Software 5 2002-12-06 17:36

All times are UTC. The time now is 08:04.

Thu Jan 27 08:04:42 UTC 2022 up 188 days, 2:33, 1 user, load averages: 1.48, 1.65, 1.67