mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Getting reliable LL from unreliable hardware (https://www.mersenneforum.org/showthread.php?t=22471)

Mark Rose 2017-08-23 20:03

[QUOTE=kriesel;466222]I prefer fully addressing the message integrity question once. Doing checksums or CRCs on individual fields or less, feels to me like using half measures, and more likely to be revisited later.[/QUOTE]

That would be my preference.

kriesel 2017-08-23 20:59

[QUOTE=Mark Rose;466232]That would be my preference.[/QUOTE]

Do you see addition of a message CRC as impacting mfloop or llloop or anything derived from them (Ernst's python script for mlucas, IIRC)?

Mark Rose 2017-08-23 21:26

[QUOTE=kriesel;466235]Do you see addition of a message CRC as impacting mfloop or llloop or anything derived from them (Ernst's python script for mlucas, IIRC)?[/QUOTE]

mfloop does parse lines, so if software starts puting JSON into text, it will require work. The addition of a hash on top of that would be trivial.

preda 2017-08-24 04:41

What if a result doesn't have the "checksum" field, would the server accept it unprotected or reject with "checksum error"?

I think the checksum should be seen by the user as a help protecting him from making accidental errors, not as barrier against intentional changes. Let me give an example, let's say the software generates the result without the computer-name or with the wrong computer-name. Now the user wants to edit the result before submission (to update computer-name). He sees he can't, and is really pissed off, looks for alternatives, until he finds on the internet the tool "checksum updater", but he's still pissed off.

Though, maybe the checksum should protect only what the user would never legitimately want to edit/update, i.e. exponent+residue.

Madpoo 2017-08-24 05:43

[QUOTE=preda;466251]What if a result doesn't have the "checksum" field, would the server accept it unprotected or reject with "checksum error"?

I think the checksum should be seen by the user as a help protecting him from making accidental errors, not as barrier against intentional changes. Let me give an example, let's say the software generates the result without the computer-name or with the wrong computer-name. Now the user wants to edit the result before submission (to update computer-name). He sees he can't, and is really pissed off, looks for alternatives, until he finds on the internet the tool "checksum updater", but he's still pissed off.

Though, maybe the checksum should protect only what the user would never legitimately want to edit/update, i.e. exponent+residue.[/QUOTE]

Good question... I imagine if Primenet and any clients changed the way manual reports are submitted, that'd be a good time to specify what fields should be mandatory and make values a bit more well-defined than now. Saying "there must be a CRC32 hash of all values except the CRC field itself" would be okay in my book.

For backwards compatibility the server would need to accept current lines of text using the regex stuff, but notes about it being deprecated. Not sure how long that would fly... the server still accepts connections from v4 clients using some custom stuff. Last v4 client was what... 2008? 9 years ago?

Note that talk of crc32 (and I'd stick with just plain CRC32, not any of the b/c variants) is just about message integrity during the paste into the manual results form. Although, to be honest, it'd be a decent idea to change the automatic process to use json instead of query string parameters...

Now, Prime95 clients (prime95/mprime) have their own checksum code to make sure the result isn't just made up and submitted for credit. It looks at the exponent, residue, other info, and sends that checksum to the server. The server will then calculate the same thing and compare.

Alternative clients like CUDALucas, clLucas, etc. don't have that check. The server will accept them but really, for purposes of verifying results, that's probably one more reason why there needs to be a Prime95 and GPU program involved, not two GPU programs doing both tests.

It'd be nice if there was some clever way around that... right now though, if the hashing code were known or included in public source code, then it defeats the purpose. Something to think on at any rate.

GP2 2017-08-24 14:23

[QUOTE=Madpoo;466256]For backwards compatibility the server would need to accept current lines of text using the regex stuff, but notes about it being deprecated. Not sure how long that would fly... the server still accepts connections from v4 clients using some custom stuff. Last v4 client was what... 2008? 9 years ago?[/QUOTE]

There's exactly one user with a "v4_computers" machine name which continues to send in a few results every year, the latest being on 2017-06-23:

[CODE]
65845331,spica,v4_computers,CFE0826B189160__,,2015-09-23 21:34
68373079,spica,v4_computers,F2685063DE5D17__,,2015-01-07 21:35
68373499,spica,v4_computers,364E1B5376A617__,,2015-09-16 19:46
71977547,spica,v4_computers,9461428E20B01E__,,2016-09-21 20:12
74715103,spica,v4_computers,F35BDC84D67E2D__,,2016-07-16 11:44
75759427,spica,v4_computers,5E2E8312D3B4AB__,,2017-06-23 21:41
[/CODE]

kriesel 2017-08-24 15:54

[QUOTE=Madpoo;466256] Alternative clients like CUDALucas, clLucas, etc. don't have that check. The server will accept them but really, for purposes of verifying results, that's probably one more reason why there needs to be a Prime95 and GPU program involved, not two GPU programs doing both tests.

It'd be nice if there was some clever way around that... right now though, if the hashing code were known or included in public source code, then it defeats the purpose. Something to think on at any rate.[/QUOTE]
Yes, somewhat; that's why I raised the question of who should know what it is, in post #149. There are actually multiple purposes to the check value. It can be added to the gpu codes.

There are multiple reasons to prefer first and second primality tests to be done on different system types, software, algorithms, and by different users
1) security code on at least one, for confidence in the residue result
2) different software to dodge possible bugs
3) different hardware architecture to dodge possible hardware design flaws
4) it takes more effort to fake results in multiple formats, especially if at least one has a good nonpublic security code built in
5) different hardware can guard against an unreliable unit of hardware contaminating the checked results (that cpu or gpu on which one ran). Imagine a stuck-on-zero gpu memory cell that reliably gives repeatable wrong residues. Virtual memory mapping differently from run to run may mitigate that on a cpu run like mlucas or prime95, unless it is in cpu cache memory, but gpu memory is not virtualized so it won't help a gpu application. (There are ways to attack the bad memory issue on a gpu but it has not been done yet.)
6) different users, ideally, for the same reason an auditor is not the same accountant that prepared the books; whether intentional or mistake, inaccuracy is less likely if two are required to be involved
7) pseudorandom assignment of first and second checks to different parties makes collusion a challenge
8) social/psychological; it promotes trust and teamwork
9) possibilities of speed or error-detection enhancements and tradeoffs between algorithms (PRP3 with error detection along the way, and LL for definitive primality, for example)
10) if sufficiently different software and hardware are used for the two runs, the libraries may also be different. A bug in a standard library could introduce repeatable error otherwise, even for correct different application software and well functioning hardware. This is an argument for using different CUDA levels in two NVIDIA gpu runs. There are known cases of some CUDA levels producing some very wrong results in CUDALucas. (See the bad residues and false primes threads.) The OpenCL calls present a potential common source of error for clLucas and GpuOwl.
11) if sufficiently different software is used, it lowers the chances of an extant bug being present in reused code from one application to another. (See the parentage diagram on the Available Software thread.)
12) the reasons I didn't think of (this entry was originally following 7 as last item, not 11, so there may still be more)

I've been thinking about a tool that stands aside multiple gpu running applications on a system, and periodically finds their new results, adds the CRC-32 or whatever to each, condenses them into a set, and delivers it to PrimeNet, keeping track of what's sent so it doesn't resend old ones again later, and from which gpu for possible later error diagnosis etc. (Or even a Primenet-client-for-gpus, that updates the primenet server on progress along the way.) It still wouldn't stop a hypothetical someone intent on fraud from "stuffing the ballot box" by faking records in the gpu app's results file, to be pickedup and add the CRC and so make it look legit. But it would add some assurance that the message got transmitted without transmission errors, or transcription errors from the point where the record is picked up.

It would separate the completely open source CUDALucas et al from the presumably partly closed source transmission code, by them being completely separate applications. It may not be necessary to go that far, according to some recent posts about GPL V3 allowing some closed source.

For a hypothetical determined faker, what's to prevent him from posing as a new developer, who needs the limited-distribution code? Even if the security source is not shared to him, only a DLL or OBJ file or the linux etc equivalent?

Maybe the best we can do is pick a decent CRC standard, for the gpu apps, that is different from the security that Prime95 uses, have it public, define a standard for recommended use, implement it in some if not all the gpu apps, and accept that it isn't as airtight. Some is better than none.

Then, as Mihai poses, there will still be users, who may want to alter the records, which would require stripping off the CRC and the primenet server accepting records without CRC. It does not matter now whether the record includes the UID or not, for the user to get cpu/gpu time credit to their account; what matters is if they are logged in when submitting the manual results. The UID would matter if it was Primenet-interfaced and automatic reporting, which is not implemented yet to my knowledge in any gpu application. Misfit I think has it but is a separate case and has provisions for adding the user info.

The "legacy" no-CRC format is likely to hang around for years in any event, as the V4-client example showed. Adding CRC does not provide complete certainty, but does provide some increased confidence for the message to which it's attached.

preda 2017-08-27 14:13

I'd like to share that I'm currently working on switching gpuOwl to the PRP-3 algorithm with the new "robust error check".

Prime95 2017-08-27 15:57

[QUOTE=preda;466435]I'd like to share that I'm currently working on switching gpuOwl to the PRP-3 algorithm with the new "robust error check".[/QUOTE]

Do not make that the default option as the PrimeNet server is not ready to handle PRP residues.

henryzz 2017-08-27 17:20

[QUOTE=preda;466435]I'd like to share that I'm currently working on switching gpuOwl to the PRP-3 algorithm with the new "robust error check".[/QUOTE]

Did you ever make any progress on implementing alternative moduli such as k*2^n+-1?

preda 2017-08-27 23:06

[QUOTE=henryzz;466448]Did you ever make any progress on implementing alternative moduli such as k*2^n+-1?[/QUOTE]

No, sorry, and I don't anticipate doing this soon. I have my hands fulls with: performance tuning, fixing for Vega, and adding the next FFT step up as it's getting close. (but the code is open-source, and I try to simplify it as much as I can)


All times are UTC. The time now is 21:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.