mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Best use of large capacitor server (https://www.mersenneforum.org/showthread.php?t=20423)

Mark Rose 2015-08-20 16:54

[QUOTE=airsquirrels;408401]I have noticed that a ton of CPU cycles in GIMPS get wasted on half completed assignments by users that abandon, even amongst some of my coworkers who I convinced to run prime95 on their machines... Until they stopped. In 2015 with fast and very present Internet wouldn't it be possible to actively checkpoint to PrimeNet so other users could pick up where an exponent was left off? Just wait till the residue for a specific iteration is small and upload a checkpoint :)[/QUOTE]

Yes, it would be possible. The checkpoint files aren't that big -- a few MB a piece. The problem is one of resources: someone to code the necessary changes and paying for the TBs of storage.

Mark Rose 2015-08-20 17:00

[QUOTE=airsquirrels;408387]That said, if I can get under full steam (~24GhzDay/Day) three weeks of work should nearly clear the DCTF pile to current release levels. Of course that all depends on adequately addressing the power, cooling, and supporting infrastructure issues....[/QUOTE]

There is about 9 PHz days of DCTF work. Not all of it is being held by GPU72. Even at 30 THz-d/d, that's still going to take 10 months... but it would be pretty awesome to have that cleared in a year!

Prime95 2015-08-20 17:01

[QUOTE=airsquirrels;408401]I have noticed that a ton of CPU cycles in GIMPS get wasted on half completed assignments by users that abandon, even amongst some of my coworkers who I convinced to run prime95 on their machines... Until they stopped. In 2015 with fast and very present Internet wouldn't it be possible to actively checkpoint to PrimeNet so other users could pick up where an exponent was left off?[/QUOTE]

There is a downside to implementing this. The quality of the result is only as good as the worst computer to work on the LL test. If an overclocker puts a few million low quality iterations in, then a highly reliably machine may waste tens of millions iterations finishing the LL test.

VictordeHolland 2015-08-20 17:01

[QUOTE=airsquirrels;408401]Just wait till the residue for a specific iteration is small and upload a checkpoint :)[/QUOTE]
You don't have to wait for a specific iteration, you can save the residue at every iteration, but to limit I/O writes it is usually done only every 5-15 minutes.

airsquirrels 2015-08-20 17:03

[QUOTE=Mark Rose;408404]Yes, it would be possible. The checkpoint files aren't that big -- a few MB a piece. The problem is one of resources: someone to code the necessary changes and paying for the TBs of storage.[/QUOTE]

Hmm, well one problem at a time I suppose. It would also add a pretty significant benefit in that double checks could fail as soon as they don't match a checkpoint instead of needing a complete run. Last I heard a significant percentage still fail, so that's a not so insignificant amount of resources. Ideally that would lead to less DC backlog and also the ability for those 'slower' computers to contribute to the LL by advancing exponents the same way we do with TF, one iteration level at a time.

I know I'm new here so I don't want to overreach, but I'm just as happy to contribute code and storage when the time comes.

airsquirrels 2015-08-20 17:08

[QUOTE=Prime95;408406]There is a downside to implementing this. The quality of the result is only as good as the worst computer to work on the LL test. If an overclocker puts a few million low quality iterations in, then a highly reliably machine may waste tens of millions iterations finishing the LL test.[/QUOTE]

One (perhaps unpopular) way to mitigate this would be real time DC, each iteration group assigned to two users. The benefit of catching and correcting errors early would probably save enough resources to be worth it. How many resources are wasted by the first low quality computer completing weeks of work for an exponent when it could be detected as having made a mistake within the first few days.

I imagine it would also be easier to quickly flag and quarantine bad actors in that case

airsquirrels 2015-08-20 17:09

[QUOTE=VictordeHolland;408407]You don't have to wait for a specific iteration, you can save the residue at every iteration, but to limit I/O writes it is usually done only every 5-15 minutes.[/QUOTE]

Sorry for the triple reply, my thought on waiting was that at some points the residue is going to be much smaller than others, saving storage and bandwidth. Perhaps that is a premature optimization.

chalsall 2015-08-20 17:21

[QUOTE=Mark Rose;408405]There is about 9 PHz days of DCTF work. Not all of it is being held by GPU72. Even at 30 THz-d/d, that's still going to take 10 months... but it would be pretty awesome to have that cleared in a year![/QUOTE]

771.7 THz Days by my calculations.

This includes everything still to be working, not just that held by GPU72.

That being said, it's been interesting watching the [URL="https://www.gpu72.com/reports/estimated_completion/primenet/"]Estimated Days to Complete Trial Factoring for all Candidates[/URL] drop precipitously in the DCTF table the last few days! :smile:

chalsall 2015-08-20 17:30

[QUOTE=airsquirrels;408412]Sorry for the triple reply, my thought on waiting was that at some points the residue is going to be much smaller than others, saving storage and bandwidth.[/QUOTE]

Likely also wrong.

My understanding is the true residue is about as close to true noise as you can get. Read: uncompressable.

(Unless, of course, the candidate is a MP, at which point at the very last step it's *very* compressible!)

Mark Rose 2015-08-20 17:36

[QUOTE=chalsall;408416]771.7 THz Days by my calculations.[/quote]

[url]http://imgur.com/szouqSx[/url]

[quote]
That being said, it's been interesting watching the [URL="https://www.gpu72.com/reports/estimated_completion/primenet/"]Estimated Days to Complete Trial Factoring for all Candidates[/URL] drop precipitously in the DCTF table the last few days! :smile:[/QUOTE]

Yeah, it was 1400 not long ago!

airsquirrels 2015-08-20 17:43

[QUOTE=chalsall;408418]Likely also wrong.

My understanding is the true residue is about as close to true noise as you can get. Read: uncompressable.

(Unless, of course, the candidate is a MP, at which point at the very last step it's *very* compressible!)[/QUOTE]

Well, there is certainly some debate as to how truly chaotic the residue sequence is, but for practical purposes and until we advance the state of the theory there we can treat it as a truly random number between 1 and 2^p-2. Any specific iteration (10,000, etc.) is going to be essentially random, but there will be points in the sequence where the residue is much closer to 1, and thus would take less bits to store. The more I think about this the less it is probably worth the effort to consider optimizing for those opportunities to checkpoint cheaply.


All times are UTC. The time now is 21:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.