mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > News

Reply
 
Thread Tools
Old 2020-06-18, 21:03   #34
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10000011100012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
These are Prime95/mprime checkpoint files? Because that's way bigger than needed for that exponent, even with 2 full-length PRP residues - 1 for the PRP tests, 1 for the Gerbicz-check residue - taken into account. A 91M expo yields a residue of ceiling(exponent/8) ~ 11.4Mbytes, a minimal-length checkpoint file will only be of that size plus a few more bytes for metadata.
I vaguely recall George describing conditions under which prime95 saves more than one residue per file. I happen to have some recent bu files handy, and see ~1x, 2x, and 3x the size you estimate for the same exponent. A 2x file easily compresses to a few percent larger than a 1x file, not so surprisingly, easily in IZArc. These are for PRP. The 1x I have are 91M LL files.
The .7z below is the product of the .bu4 file

Code:
06/18/2020  03:34 PM        35,639,672 p95038813
06/18/2020  03:52 PM        12,043,108 p95038813.7z
06/18/2020  03:04 PM        35,639,672 p95038813.bu
06/18/2020  02:34 PM        35,639,672 p95038813.bu2
06/18/2020  12:21 PM        23,759,816 p95038813.bu3
06/18/2020  06:17 AM        23,759,816 p95038813.bu4
kriesel is offline   Reply With Quote
Old 2020-06-18, 23:27   #35
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×23×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Double-checking has always lagged first-time testing and the lag gets worse every year. Imagine if 90% of the first time tests did not need a DC? Double-checking could close the gap within a few years.
Some scenarios:
  1. Business as usual. More first tests are done than DCs, while the proof development continues and gets tested, and rolled out. (Months) The backlog builds. All of those first tests will need DC because the LL first tests are not subject to the proof, and the PRP first tests mostly won't have the proof done either, and any gpuowl users may be scattered in regard to power and topk choices on proof attempts.
  2. Some level of deemphasizing first tests and increasing the DC rate, starting soon. It might consist of "whatever makes the most sense" becoming equivalent to LL DC, if that's not already in place. Fewer first tests for a while, maybe even catch up some on DC. Eventually deemphasizing LL relative to PRP even more than now; maybe even limiting LL first time assignments entirely.
  3. Pick a proof power and topk set soon and ask gpuowl PRP testers to start using them routinely, and to save the proof files as much as practical. Combine with #2 above. Development of verification continues in parallel; verification of the early proofs occurs later.
  4. Eventually there's a go-live of PRP proof verification with a backlog to catch up on for PRP gpuowl users. If what we end up with is compatible with earlier proof runs. Prime95/mprime, mlucas get adapted too.
  5. Further out there is a mass conversion of clients, motivated by a lack of availability of first test assignments without PRP and proof generation capability.
A complicating factor is COVID19 leaving some systems inaccessible to users for application updating, administration, or ordinary operation. And that's likely to go on another year.

Re credit, I suggest proof and verification by users count as some moderate multiple above what the same number of hours would get traditionally for LL or PRP, to encourage adoption. Remember that some hardware can't do PRP currently but can do LL or TF or P-1, and TF/other is a large ratio already for gpus.

A DC backlog measure versus DC wavefront or year is posted at https://www.mersenneforum.org/showpo...4&postcount=15
Substantial adoption of PRP with proof, and timely verification would not only help cut the backlog, it could reduce the workload for the strategic double and triple check effort and offer the possibility of quicker feedback about client reliability in a time frame where reliability issues can be addressed, not left to make more bad runs accumulated into the database to be found bad several years later.

Last fiddled with by kriesel on 2020-06-18 at 23:33
kriesel is offline   Reply With Quote
Old 2020-06-19, 11:57   #36
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

55028 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Anyhow, try your compression magic on either an Mlucas or gpuowl savefile, you'll see the expected "maximal entropy, effectively no redundancy" result.
Yeah you are correct, the smaller files do not compress. It was probably because the big file had the same residue several times.
ATH is offline   Reply With Quote
Old 2020-06-19, 12:06   #37
axn
 
axn's Avatar
 
Jun 2003

13×359 Posts
Default

Quote:
Originally Posted by ewmayer View Post
These are Prime95/mprime checkpoint files? Because that's way bigger than needed for that exponent, even with 2 full-length PRP residues - 1 for the PRP tests, 1 for the Gerbicz-check residue - taken into account.
You need 3 in the worst case -- current iteration, GEC base, and GEC cumulative product. At least, that's what I did for the cudaWagstaff stuff.
axn is offline   Reply With Quote
Old 2020-06-19, 18:46   #38
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

262048 Posts
Default

Quote:
Originally Posted by ATH View Post
Yeah you are correct, the smaller files do not compress. It was probably because the big file had the same residue several times.
IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there.

Quote:
Originally Posted by axn View Post
You need 3 in the worst case -- current iteration, GEC base, and GEC cumulative product. At least, that's what I did for the cudaWagstaff stuff.
I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?
ewmayer is offline   Reply With Quote
Old 2020-06-19, 19:58   #39
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

34×17 Posts
Default

Quote:
Originally Posted by ewmayer View Post
IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there.



I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?
If you want a flexible check, where the L=interval used for check is not fixed then you need to save the base, where you restarted the check with a new L. You can restart at every error checked residue, because restarting at t means only that for the new residue sequence
r(m+t)=base^(2^m) mod N will be true for base=r(t)=3^(2^t) mod N
The only change is that at error check you need to multiple by base, and here in general base is big, not a small base=3 number. The overhead of this will be very small, just one mulmod per error check.
R. Gerbicz is offline   Reply With Quote
Old 2020-06-19, 20:07   #40
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

701310 Posts
Default

Quote:
Originally Posted by ewmayer View Post
IIRC George's checkpoint files don't use bytewise 'compressed' residues, i.e. there is some 0-bits fat in there.
The files are "fat free".

After a GEC check, both matching GEC values are written to the save file. Why? If we only write one value and bit rot sets in after the GEC check and before the the save file is written then the save file is corrupt. Prime95 goes to great lengths to make sure there are always two GEC values so that corruption is near impossible.
Prime95 is offline   Reply With Quote
Old 2020-06-19, 20:56   #41
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·7·11·37 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The files are "fat free".

After a GEC check, both matching GEC values are written to the save file. Why? If we only write one value and bit rot sets in after the GEC check and before the the save file is written then the save file is corrupt. Prime95 goes to great lengths to make sure there are always two GEC values so that corruption is near impossible.
I supplement the GEC residue written to the savefile with the same kind of auxiliary checksum I use for the PRP-test residue. In my case, for more or less historical reason that is the triplet of Selfridge-Hurwitz residues: R mod(2^64,2^35-1,2^36-1). The first is just the GIMPS Res64 and is all but useless, but the other 2 combine to give a greater than 1 in 2^70 check strength. If the same set of checksums computed on-the-fly from the residue read from the savefile mismatch the reference ones, we try the redundant secondary savefile. If that also mismatches, we can try the last-good-GEC savefile, written every 1M iterations. if that also also mismatches, and our iteration count is > 10M, we can try the last every-10M-iter persistent savefile.

Per your recommendation, I also take great care to verify the integrity of the RAM-stored GEC residue used by the running program.

Last fiddled with by ewmayer on 2020-06-19 at 20:57
ewmayer is offline   Reply With Quote
Old 2020-06-20, 00:02   #42
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

107116 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I supplement the GEC residue written to the savefile with the same kind of auxiliary checksum I use for the PRP-test residue. In my case, for more or less historical reason that is the triplet of Selfridge-Hurwitz residues: R mod(2^64,2^35-1,2^36-1). The first is just the GIMPS Res64 and is all but useless, but the other 2 combine to give a greater than 1 in 2^70 check strength. If the same set of checksums computed on-the-fly from the residue read from the savefile mismatch the reference ones, we try the redundant secondary savefile. If that also mismatches, we can try the last-good-GEC savefile, written every 1M iterations. if that also also mismatches, and our iteration count is > 10M, we can try the last every-10M-iter persistent savefile.

Per your recommendation, I also take great care to verify the integrity of the RAM-stored GEC residue used by the running program.
What does the errored bit pattern look like?
How often are 1M iterations lost as a result?
How often are up to 10M iterations lost as a result?

Last fiddled with by kriesel on 2020-06-20 at 00:02
kriesel is offline   Reply With Quote
Old 2020-06-20, 00:15   #43
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×7×11×37 Posts
Default

Quote:
Originally Posted by kriesel View Post
What does the errored bit pattern look like?
How often are 1M iterations lost as a result?
How often are up to 10M iterations lost as a result?
o Don't know;
o Only ever happened on my notoriously flaky Haswell CPU, perhaps 1x per 100M iter, on average (max was 4 GEC failures on a ~104M expo, George did PRP-DC using his code, we matched);
o Never happened to me yet, over at least 50 PRP tests on Haswell, NUC and multiple Android broke-o-phones.
ewmayer is offline   Reply With Quote
Old 2020-06-20, 03:01   #44
axn
 
axn's Avatar
 
Jun 2003

123B16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I init the GEC product to the PRP-test seed, 3 - is there any good reason to do otherwise?
In which case, you'd need to save the current iteration, the last verified GEC check (for rolling back), and the current GEC cumulative product. Still 3 needed.

I'm wondering now how you're managing with just two?
axn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Your help wanted - Let's buy GIMPS a KNL development system! airsquirrels Hardware 313 2019-10-29 22:51
Is GMP-ECM still under active development? mathwiz GMP-ECM 0 2019-05-15 01:06
LLR 3.8.6 Development version Jean Penné Software 0 2011-06-16 20:05
LLR 3.8.5 Development version Jean Penné Software 6 2011-04-28 06:21
LLR 3.8.4 development version is available! Jean Penné Software 4 2010-11-14 17:32

All times are UTC. The time now is 12:11.

Fri Aug 7 12:11:49 UTC 2020 up 21 days, 7:58, 1 user, load averages: 2.07, 2.32, 2.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.