mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Archiving the halfway point in a primality test. (https://www.mersenneforum.org/showthread.php?t=8586)

jasong 2007-07-04 06:29

Archiving the halfway point in a primality test.
 
Hard drive space is getting super-cheap. I went to tigerdirect.com, looked for the biggest hard drive on the main list(750GB) and did a little math. The 750GB hard drive sells for about $240.00. That about 33.5MB for a penny. A Prime95 first-pass prime file is about 6-7MB. So you could store about 4-5 checkpoint files in a penny's worth of space.

Here's my idea, take it or leave it. I haven't actually crunched the project in a while, this is just me brainstorming. First, take each exponent that's about to be done and divide it by some number, like 75,000,000 or 50,000,000. Round this number up. This is the number of checkpoints you would do. So, if you were doing 42,000,000, your answer would be .84, which rounds up to 1. Add 1 to this number, AFTER the rounding. With my example, we have 2. So you want to mentally divide the exponent by 2(in this case 21,000,000). So you'd have exactly 1 checkpoint at 21,000,000. When the checkpoint is reached, the checkpoint file, along with the checksum for that file, gets backed up to a server to be saved for when someone does a double-check on that number.

But here's where the "magic" is. What if someone runs the doublecheck, and they DON'T match up at the halfway point? Normally, they wouldn't even realize they don't match up, and they'd run it to the end, and nobody would have a clue which one was right, so it would take a minimum of three tests to straighten things out. But now, we have the halfway point of these two tests. So we store both of them in the database, and start a third test from the very beginning, while the first test has been run fully, the second test is only halfway but can be continued at any time, and the triple check. When the triple check gets the checkpoint, a comparison is made, and hopefully it matches up to one of the other two tests. Whichever one it matches up to, we know the other one is bogus and can discard it. If it matches the first one, then the third test is continued and hopefully matches the first one at the end. If it ends and doesn't match the first one, we can restart from the halfway point because we know for a fact that it was correct to the halfway point. If the third test gets to the halfway point and it matches the second, then the first test is discarded, and BOTH tests are taken up by two different people, starting from the save file.

For those reading this, you either understand what I'm saying and are able to mentally fill in the rest, or you're confused. Well, at the moment, it's very late at night and I need to get to bed. I'll be able to respond to this topic in about 14 hours, after I get home from work.

Mini-Geek 2007-07-04 12:16

I like the idea, but a file that big is a lot to be uploaded and downloaded, especially if the person is on dial-up. My current exponent is in the 38xxxxxx's, and its backup files are 8 MB, but can easily compress (.7z Fast compression, taking 6 seconds on an Athlon XP 1800+@1610 MHz) to ~5.15 MB. I am a gamer and am on a shared internet connection, and I sure wouldn't want it to start uploading or downloading a 5 MB file while I'm in a game or browsing the web...uploading or downloading while browsing makes it go very noticeably slower, and completely kills the game.
For a person on dial-up, it would take over 17 minutes of being connected to be able to upload or download a 5 MB file, along with having the above problems.
For someone not connected (might get their work through manual form on one connected computer), this idea becomes almost completely unfeasible. ("almost" because it would be possible to transfer the file to the connected computer, and have that computer upload it)

I like the idea in theory, but there's just too many problems with it to be able to implement it.
Also, the benefits aren't very large...in the ~1-5% (would it double by having two computers then halve because it's at the halfway point? I'm not sure, not very knowledgeable about probability) of exponents that have errors, it would only eliminate a part of the extra time of a triple-check (if I understand it correctly).

Xyzzy 2007-07-04 12:42

Why send the save file?

Just send the interim checksums after every million iterations.

hhh 2007-07-04 13:12

[QUOTE=Xyzzy;109578]Why send the save file?

Just send the interim checksums after every million iterations.[/QUOTE]

Imagine the imterim checksums match, the last one doesn't. You still need a full triple test. If you have the file, you save half a test.

BUT: the file isn't needed the first time. If we start saving the checksums now, in a few years, when doublecheck is performed, bandwidth will no more be a matter, and then the files can be saved.

Right?

Yours H.

akruppa 2007-07-04 13:27

The idea has been discussed on the Mersenne mailing list a long time ago. Send 64 bit residues to the server periodically, say every 1M iterations, during the first time test. During the double-check compare these residues and if a residue doesn't match, we know that at least one run must have had an error. A third test can be started to see which one (if any) of the first two tests was correct. If (1) the first and third run's residue agrees, carry on with the third run and compare residues against the first. If (2) the third and second run's residue agrees, continue both. If (3) none of the residues agree, scrap everything and start over from the start with a new first test, perhaps with a higher FFT length.

The advantage is that in case (1), the erroneous double check can be aborted early.

From what I remember, everyone agreed that it was a good idea, but afaik it never got implemented in the server.

Alex

jasong 2007-07-04 20:01

Even if this idea is only partially implemented(maybe people on phone lines would only upload residues and people with faster lines would upload the whole file), it would still be very useful. Assuming 1 in 100 tests gets an error, that's about a nickel's cost in hard drive space to save a few dollars worth of electricity.

Edit: correction, about 25 cents worth of hard drive space. That's what, 10-15 Euros?

Edit2: Damn, apparently I need to practice mentally applying decimal points.

tha 2007-07-04 20:21

[QUOTE=jasong;109612]
Edit: correction, about 25 cents worth of hard drive space. That's what, 10-15 Euros?[/QUOTE]

25 US dollar cents is about 20 Euro cents

tha 2007-07-04 20:34

I proposed this idea half a decadeago or so. I suggested it because there is a substantial amount of exponents reported to the server to have been done for 50, 80% or more and than not completed. These exponents then get reassigned. As far as I remember the reports of the v4 server may not reflect the reality on this point and uploading all intermediate results was considered too much traffic.

After the v5 server has had it's first year of operation it may be worth to reconsider, now that most pc's are connected via ADSL.

davieddy 2007-07-07 10:38

We are talking 40 million bits here (=5MB) aren't we? Surely this is
worth doing these days.
The only questions are:
a) How often?
b) By whom?

David

davieddy 2007-07-07 11:08

[quote=akruppa;109581]
...
If (3) none of the residues agree, scrap everything and start over from the start with a new first test, perhaps with a higher FFT length.
...
From what I remember, everyone agreed that it was a good idea, but afaik it never got implemented in the server.

Alex[/quote]

Re FFT size, the boundaries are on the
conservative side of optimal: George et al
don't want the error rate (due to FFT size) to
approach 20%, because "I don't want to tell
someone he missed a prime because of an avoidable
error"

I sympathise with his point of view.

David


All times are UTC. The time now is 22:56.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.