mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Getting reliable LL from unreliable hardware (https://www.mersenneforum.org/showthread.php?t=22471)

preda 2017-08-27 23:18

[QUOTE=Prime95;466442]Do not make that the default option as the PrimeNet server is not ready to handle PRP residues.[/QUOTE]

It's not about a default -- I plan to replace LL completely (offer only PRP). (This keeps my work simple).

When can primeNet start accepting the new residues?

Accepting the new residues could be done adding a DB field specifying the "kind of residue" ("LL" or "PRP3"). On the interface side (e.g. web manual submit), a string prefix to the 16-char residue could be used to pass this info in a first step,
LL-0x0011..ff
P3-0x0011..ff

Or I can do whatever the server prefers.

(Of course, I don't want to start polluting LL-residues with PRP3 residues.)

preda 2017-08-28 10:40

[QUOTE=Madpoo;466256]..[/QUOTE]

I propose this extension of the residue with an checksum:

given a 64-bit residue, format it to 16 hex-digits, with leading zeros if needed, without leading "0x".
form a string containing these data, separated with minus '-':
algorithm-id ("LL" or "P3"),
exponent, decimal digits without padding,
iteration, decimal digits without padding,
residue, 16 hex-digits.

Compute the CRC32 of this string. Obtain a 32bit value, format it to 8 hex-digits zero padded. Append this check string to the end of the residue with '-' separator.

Example:
The PRP-3 residue of M(57885161), at iteration 30800000, is de91f60a865d2cb9.
So compute the CRC32 of this string:
"P3-57885161-30800000-de91f60a865d2cb9"
it turns out the crc is "0cc1209c",

So the "residue with check" string is:
"P3-de91f60a865d2cb9-0cc1209c", and this string is included either in a JSON field, or as a simple string in the current manual result form, etc.

Note that the iteration for residues "at the end" is always N-2 for LL, and N-1 for PRP-3. (but including it in the CRC makes sure there's no mismatch).

The goal of this checksum is to protect against unintentional errors, such as typing or cut-paste. It offers no additional trust or security.

What do you think?

kriesel 2017-08-28 14:56

[QUOTE=preda;466494]I propose this extension of the residue with an checksum:

given a 64-bit residue, format it to 16 hex-digits, with leading zeros if needed, without leading "0x".
form a string containing these data, separated with minus '-':
algorithm-id ("LL" or "P3"),
exponent, decimal digits without padding,
iteration, decimal digits without padding,
residue, 16 hex-digits.

Compute the CRC32 of this string. Obtain a 32bit value, format it to 8 hex-digits zero padded. Append this check string to the end of the residue with '-' separator.

Example:
The PRP-3 residue of M(57885161), at iteration 30800000, is de91f60a865d2cb9.
So compute the CRC32 of this string:
"P3-57885161-30800000-de91f60a865d2cb9"
it turns out the crc is "0cc1209c",

So the "residue with check" string is:
"P3-de91f60a865d2cb9-0cc1209c", and this string is included either in a JSON field, or as a simple string in the current manual result form, etc.

Note that the iteration for residues "at the end" is always N-2 for LL, and N-1 for PRP-3. (but including it in the CRC makes sure there's no mismatch).

The goal of this checksum is to protect against unintentional errors, such as typing or cut-paste. It offers no additional trust or security.

What do you think?[/QUOTE]

Creating and including a CRC for a string not present in the message seems like "bad form". Standard practice is the CRC is applied to the entire message, and appended. I've not heard of a case where the CRC is for a string not present in the message, that must be constructed by the receiver from the message, then the CRC applied to that to check a construct not present in the message against the CRC appended to the message. [url]https://en.wikipedia.org/wiki/Cyclic_redundancy_check[/url].

Previous discussion had settled on the CRC being its own JSON field, as I recall.

Also, I encourage you to continue including the LL algorithm in your code and give the user the choice of which to run, PRP3 or LL. Your software could remain useful in LL as double checking for some years to come. Users like choice. If you want more users of GpuOwl, provide choice. Prime95 is popular in part because it provides lots of user choice.

preda 2017-08-29 01:26

[QUOTE=kriesel;466501]Creating and including a CRC for a string not present in the message seems like "bad form". Standard practice is the CRC is applied to the entire message, and appended. I've not heard of a case where the CRC is for a string not present in the message, that must be constructed by the receiver from the message, then the CRC applied to that to check a construct not present in the message against the CRC appended to the message. [URL]https://en.wikipedia.org/wiki/Cyclic_redundancy_check[/URL].

Previous discussion had settled on the CRC being its own JSON field, as I recall.
[/QUOTE]

Because the residue is meaningless without an exponent, the exponent is always present next to the residue. That's why the checksum is not over a field "not included in the message". But including the exponent "under" the checksum protection prevents misaligning residue with exponent. Imagine that the checksum was only over the residue, than the user may mis-match the checked residue with another exponent, yet the checksum wouldn't trigger -- not a good thing.

I see this checksum as becoming an integral part of the residue. Anywhere you see a residue, it has a checksum and can be validated. Like when you send it by email to a friend. For comparison, if your credit-card number has a check digit (to insure correct transcription), you would not send the CC number anywhere without the check digit.

So this is focused on the residue, which is independent of the question how you package the residue -- in JSON, in email, over the phone, etc. The check field for a JSON message is a different topic. What if some software, let's say CUDALucas or the "manual result entry form", decides not to use JSON, they can still extend the residue with the "check digit". And when JSON is used, JSON can have its own independent message-checksum if desired there.

Now maybe 8 hexa-digits is overkill. I would find ok 2 hexa-digits if people prefer that, for a more compact residue.

BTW, may this is a good point to revisit the "transition to 128-bit residue" topic. (because what I propose here is some change to the residue form)

preda 2017-08-29 01:38

[QUOTE=kriesel;466501]
Also, I encourage you to continue including the LL algorithm in your code and give the user the choice of which to run, PRP3 or LL. Your software could remain useful in LL as double checking for some years to come. Users like choice. If you want more users of GpuOwl, provide choice. Prime95 is popular in part because it provides lots of user choice.[/QUOTE]

I'm trying to play nice by putting the code in the open, and by creating a convenient branch ("LL") in git starting at the point where LL function has not been altered and can be continued from.

But my goal is not to create the "ultimate general primality checking tool". For example, a major restriction is the single FFT-size. I want to make a tool that does useful work for the Mersenne search, *faster*. (of course, getting "bad" results faster is not useful, so checking/validation as discussed in this thread is an implied goal).

Specifically on double-checking, I would like to hope there will be much less need for double-checking in the future due to the new "robust results", and more computing would be used at the frontier instead.

Mark Rose 2017-08-29 02:13

[QUOTE=preda;466550]Specifically on double-checking, I would like to hope there will be much less need for double-checking in the future due to the new "robust results", and more computing would be used at the frontier instead.[/QUOTE]

That may take a while. Many people are still using version 27 because "28 makes my overclocked computer unstable". Many are still using 28, because "29 doesn't load the cores up the way I want for my stress test".

And there are people who just haven't had a need to upgrade. So there will be new LL tests to DC for many years to come. Prime95 can always handle those, I suppose.

kriesel 2017-08-29 03:31

[QUOTE=preda;466550]I'm trying to play nice by putting the code in the open, and by creating a convenient branch ("LL") in git starting at the point where LL function has not been altered and can be continued from.

But my goal is not to create the "ultimate general primality checking tool". For example, a major restriction is the single FFT-size. I want to make a tool that does useful work for the Mersenne search, *faster*. (of course, getting "bad" results faster is not useful, so checking/validation as discussed in this thread is an implied goal).

Specifically on double-checking, I would like to hope there will be much less need for double-checking in the future due to the new "robust results", and more computing would be used at the frontier instead.[/QUOTE]

You are probably right that double checking activity and backlog will reduce eventually. That single 4M fft length in an LL version of GpuOwl has a fairly secure future though, in double-checking _already-existing_ singly LL-tested exponents, along with other existing LL codes. Looking at past milestones, double check progresses by about 3M annually. There's currently a 33M+ span between the lowest untested exponent and the lowest single-checked exponent by LL. "All exponents below 40 392 769 have been tested and verified.
All exponents below 73 867 097 have been tested at least once.
"https://www.mersenne.org/report_milestones/ tonight.
And until such time as the code in Prime95 and some others also implement PRP3, and then the bulk of user throughput follows, there will be a plentiful supply of new exponents to DC by LL added to the queue. Those between 50M and 78M will be in reach. So another decade or more of utility of V0.6 GpuOwl in LL double-check seems indicated. (Faster hardware and slower bigger exponents tend to offset.)

It's clear we see some things differently. I don't see what is stopping a program from having a fast 4M LL, fast 4M PRP3, and a selector between, or for that matter also a 6M or 8M of one or both. Only one goes in the GPU at a time. Most of the inactive code can also be paged out of system memory. (I run mine for weeks or months at a time so program load time is trivial.) You've already written the 4M LL and the 4M PRP3, and a 2M LL. (And very quickly too it seems to me!)

On the other hand, first time tests progress at about 8M annually. There's a fraction of a year left within reach of the 4M fft in GpuOwl for first time tests. Cat 0, Cat 1, and about 5% of Cat 2. [url]https://www.mersenne.org/thresholds/[/url]

Sticking with your previously stated power-of-two-ffts-only policy, an 8M fft being useful around 100-155M will wait for the leading edge of the first-time test wavefront to reach it until late 2019. (100M-82M)/8 =~ 2.25 years. Or do you anticipate GpuOwl users leaping ahead of the pack, from 78M to 100M+?

A 6M fft would bridge those years nicely, at estimated 75M-117M usability.

Ready availability of executables for GpuOwl would help adoption. Lack of easy availability will slow adoption.

Even if adoption of your software is rare, its creation and your posing questions have already brought new ideas in, that may benefit other software in this effort too. The full benefit of that will take a while to unfold. (Years, not months, in my opinion. Some of the software has an active user base but not an active maintainer.)

If what motivates you is making the perfect filet knife for medium size fish, while others make assorted sizes in sets, and Leatherman Multitools, saws, hammers, etc, it's your time so spend it how you like. I'm spending time studying all the existing tools for strengths, weaknesses, capabilities and limits (and getting ready for sharpening some) while using several.

Have fun!

kriesel 2017-08-29 03:46

[QUOTE=preda;466548]Because the residue is meaningless without an exponent, the exponent is always present next to the residue. That's why the checksum is not over a field "not included in the message". But including the exponent "under" the checksum protection prevents misaligning residue with exponent. Imagine that the checksum was only over the residue, than the user may mis-match the checked residue with another exponent, yet the checksum wouldn't trigger -- not a good thing.

I see this checksum as becoming an integral part of the residue. Anywhere you see a residue, it has a checksum and can be validated. Like when you send it by email to a friend. For comparison, if your credit-card number has a check digit (to insure correct transcription), you would not send the CC number anywhere without the check digit.

So this is focused on the residue, which is independent of the question how you package the residue -- in JSON, in email, over the phone, etc. The check field for a JSON message is a different topic. What if some software, let's say CUDALucas or the "manual result entry form", decides not to use JSON, they can still extend the residue with the "check digit". And when JSON is used, JSON can have its own independent message-checksum if desired there.

Now maybe 8 hexa-digits is overkill. I would find ok 2 hexa-digits if people prefer that, for a more compact residue.

BTW, may this is a good point to revisit the "transition to 128-bit residue" topic. (because what I propose here is some change to the residue form)[/QUOTE]

CRC calculated based on an exponent, separator characters, label characters (P3), iteration number, and residue, is what you described, and what you described attaching the resulting CRC to is a residue. The exponent is located elsewhere in the record. Is the iteration count in the record at all? CRC depends on more than the full set of symbols, it depends on their position also, as you know. If the symbols are shuffled, or some are left out of the message, the check of the CRC fails.

Record size growth needs to be limited!

ewmayer 2017-08-29 23:23

[QUOTE=preda;466548]BTW, may this is a good point to revisit the "transition to 128-bit residue" topic. (because what I propose here is some change to the residue form)[/QUOTE]

My code has been computing/outputting the well-known Selfridge-Hurwitz residues (mod 2^36,2^35-1,2^36-1) to the exponent-associated results file (pXXXXX.stat, stat short for 'status') for many years now. Those are standard in reporting results of Pepin tests of Fermat numbers.

Since 2^36 is obviously just the low 9 hexits of the GIMPS-standard Res64, that gives 135 bits of 'residueness' (64+35+36).

preda 2017-08-30 06:03

gpuOwl head on github now does the "robust PRP-3". The (new) residue looks like this:
P3-fffffffffffffffc0d (the last 2 digits are check-digits).

(for example, if I state that "for M(57885161) I got the above residue, it can be checked by verifying that
crc32("P3-57885161-57885160-fffffffffffffffc") & 0xff == 0x0d ).

These new residues aren't yet accepted by PrimeNet / manual result form. Hopefully there is little danger of submitting them as LL-residues by mistake.

preda 2017-08-30 06:18

A few words about the check implementation in gpuOwl.
M = 2^p - 1

The computation proceeds in blocks of 1K iterations.
At the start of each block, there is a pair of two values: "residue" and "check", both are mod M.
Initially (at iteration 0):
residue==3, check==1.

Each block proceeds:
1. update check: check = residue * check. (one modMul, modular multiplication).
2. update residue: residue = residue^(2^1000). (1000 modSquare).

After 100 blocks (100K iterations), a verification is done:
residue * check == check^(2^1000) * 3. (1 modMul + 1000 modSquare)

Towards the end, the last block, which extends past p-1, is computed and verified. The "residue at p-1" is extracted from the inside of that block.


All times are UTC. The time now is 21:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.