mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-07-26, 10:45   #12
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

10111111111012 Posts
Default

Quote:
Originally Posted by retina View Post
Torture tests should only be on known verified results. So torture tests need to be redundant, and therefore useless in terms of progressing the "goal" forward.
If a mismatch occurs then the torture testing pc can run a triple check with a different fft length.
henryzz is offline   Reply With Quote
Old 2017-07-26, 13:27   #13
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

22·3·499 Posts
Default

Quote:
Originally Posted by henryzz View Post
I suppose torture tests could actually be doing useful doublechecks.
Quote:
Originally Posted by retina View Post
Torture tests should only be on known verified results. So torture tests need to be redundant, and therefore useless in terms of progressing the "goal" forward.
Torture tests could be non-redundant, but then ~5% of them would return "indeterminate" because the residues wouldn't match and it wouldn't be clear which was wrong.

Edit: henryzz beat me to it.

Last fiddled with by CRGreathouse on 2017-07-26 at 13:28
CRGreathouse is offline   Reply With Quote
Old 2017-07-26, 15:56   #14
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
If 5% are bad, and we do a triple check 5% of the time, that's only 5/205 or 2.4% work saved, no?
LOL... look, I didn't say I worked out all the math exactly.

Actually I had one of those "late at night" moments where I thought back to this post and realized just what you mentioned, that the increased throughput isn't really 5% but approximately half of that (or less, since as you note the double-checks are also sometimes wrong as well leading to the necessity of quadruple+ checks just to finally get two that match).

Anyway, the point was, in theory it's good but in practice there needs to be that balance of making sure results are not only accurate, but also legitimate.
Madpoo is offline   Reply With Quote
Old 2017-07-26, 16:08   #15
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

Quote:
Originally Posted by ewmayer View Post
It' also easy in theory to envision how one might do this in practice - now, the first-time-tester's interim checkpoint files (say every 10Miter) get deposited on the Primenet server, and the DCer's Res64s at those same iterations diffed against those, etc. But that would require a massive increase in both storage capacity at the server end and comms-bandwidth between the users' machines and the server. The low-bandwidth alternative would be to let the interim CP files remain on the users' machines and then have the server direct the restart-both-runs-from-last-matching-CP mechanism, but I shudder to think of the effort needed to support that functionality, not to mention the myriad real-world reasons why it would be inherently fragile.
If we had a system where pairs of users were working on the same exponent, they could send periodic 64-bit residues like the final one, but in increments of every 5% of progress or something like that. It would also save that interim file in case a roll back is needed to one of those points.

If at any point they mismatched (during their daily checkins maybe the server informs them of a mismatch at iteration XXX) they roll back to that point and try again.

The trouble is, pairs of users on the same exponent are unlikely to progress at the same rate, so you could have one system racing ahead and finishing entirely by the time the other one even hits 10%. Or maybe the other system stops reporting entirely.

You'd have to also build in some kind of system that tries to evenly match users and systems to mitigate that, but even the most regular contributor sometimes takes a system down for maintenance or has a failure of some kind. I noticed that many of Curtis' machines have stopped reporting in (I noticed because he had a bunch reserved in the category 0 first-time-check ranges).

In terms of making sure the horsepower is evenly matched, it makes the most sense to have the same user doing both checks, but again we get into that question of how to keep out the bogus results.

I haven't really thought about it much, but it seems like maybe there would be a way for systems to report in a partial residue at some lower iteration count. Then if we had reason to suspect something was odd, we could run up to that point and compare that residue just to see if they even bothered going that far. Someone trying to cheat ... would they really run it 10% of the way or would they just make stuff up entirely? Who knows. People are weird, but we could probably count on most of them being lazy enough to not only cheat, but to do it in the laziest way possible.
Madpoo is offline   Reply With Quote
Old 2017-07-26, 16:41   #16
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

If cloud computing eats the world, and everyone eventually starts using it instead of home server farms, then we'll be in an interesting situation, because those cloud servers have an observed error rate of zero with mprime (certainly bounded below 0.05% in any case, based on several thousand error-free double-checked residues on AWS so far). Those cloud servers have ECC memory, strictly controlled ambient temperatures in the data centers, and they are routinely taken out of service whenever automated monitoring detects that they have started acting up.

In that hypothetical future it simply won't make sense to double-check exponents that were first-time checked using the cloud, because the payoff will be so low. Except, to prevent any sort of fraud or shenanigans, the actual LL testing would probably need to be centralized so that it would be certifiably known that the LL tests were performed on an approved server type in an approved cloud. So people would pay $ X or € to mersenne.org and purchase a batch of, say, 10 LL tests at a time. Would that still be as much fun, or too passive and boring? Hard to say.

Last fiddled with by GP2 on 2017-07-26 at 16:56
GP2 is offline   Reply With Quote
Old 2017-07-26, 19:32   #17
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

2×977 Posts
Default

[QUOTE=GP2;464217... So people would pay ... and purchase ...
Would that still be as much fun, or too passive and boring? Hard to say.[/QUOTE]"Hard to say ?"
It would stop me immediately from wasting energy for a a project whose only but huge merit is fun and cooperation : both of them would be replaced by sending money to some big corporation.
Would I be alone ?

Jacob

P.S. : Another thing : what data did you use to produce that error rate of 0,05 % for cloud computing ?

P.P.S. : I suppose that by paying some more money one would get a positive result : one of the exponents that yield a prime ? Oh ! What fun ! What a pleasure !

Last fiddled with by S485122 on 2017-07-26 at 19:38
S485122 is offline   Reply With Quote
Old 2017-07-27, 01:09   #18
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Quote:
Originally Posted by S485122 View Post
P.S. : Another thing : what data did you use to produce that error rate of 0,05 % for cloud computing ?
0.05% would be 1 in 2000.

I've personally run about 3000 double-checks, with zero errors so far, and all mismatches were verified in my favor except the most recent ten or so which are still pending.

Also, a few years ago, a now-dormant account named "Amazon EC2" ran about 27000 LL tests, of which the vast majority remain unverified but about 2900 so far have been verified by double-checking. The account name strongly suggests all those LL tests were run on the cloud, and the sheer quantity suggest some entity with large financial resources, whose name might even start with Ama and rhyme with zon.

So that 0.05% is actually just a conservative upper bound. The actual error rate would be considerably lower still.
GP2 is offline   Reply With Quote
Old 2017-07-27, 01:29   #19
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

267548 Posts
Default

Quote:
Originally Posted by GP2 View Post
Also, a few years ago, a now-dormant account named "Amazon EC2" ran about 27000 LL tests, of which the vast majority remain unverified but about 2900 so far have been verified by double-checking. The account name strongly suggests all those LL tests were run on the cloud, and the sheer quantity suggest some entity with large financial resources, whose name might even start with Ama and rhyme with zon.
This lady? Just curious, what pointed you in her direction?
ewmayer is offline   Reply With Quote
Old 2017-07-27, 21:11   #20
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2×13×131 Posts
Default

Quote:
Originally Posted by GP2 View Post
0.05% would be 1 in 2000.

I've personally run about 3000 double-checks, with zero errors so far, and all mismatches were verified in my favor except the most recent ten or so which are still pending.

Also, a few years ago, a now-dormant account named "Amazon EC2" ran about 27000 LL tests, of which the vast majority remain unverified but about 2900 so far have been verified by double-checking. The account name strongly suggests all those LL tests were run on the cloud, and the sheer quantity suggest some entity with large financial resources, whose name might even start with Ama and rhyme with zon.

So that 0.05% is actually just a conservative upper bound. The actual error rate would be considerably lower still.
A few notes...

I've had the pleasure of double-checking some of those "Amazon EC2" results and they've all matched (part of my thing to make sure each computer has at least one DC'd result in its history).

So, based on its results so far it is safe to say that ECC, no overclocking, and climate controlled environments make a big difference in reliability. I see that on my own systems too... no bad results (due to hardware) over thousands and thousands of double/triple+ checks.

Another note is that most people participating in GIMPS are using their desktops, not servers. While cloud computing may be on a tear lately (will it be sustained though, or just a fad or more of a niche long-term?) I don't think people will be swapping out their home machines (or work desktops, school computers, etc) for any typical kind of cloud solution.

I could be wrong... I've been wrong before when making bold predictions (I predicted Facebook stock would hover in the low $20s). I just have a hard time thinking society is ready to re-adopt the "dumb terminal" paradigm where simple equipment was connecting to some huge mainframe to do the heavy lifting.

I think the cloud server market is kind of weak and lazy personally, but then I'm biased about that. I just can't figure out why someone would pay more money to use someone else's computers when you can do it for half the cost (or get twice+ as much power for the same price) if you roll your own. Whatever... some people don't change their own oil either, or mow their own lawns.
Madpoo is offline   Reply With Quote
Old 2017-07-27, 21:35   #21
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I think the cloud server market is kind of weak and lazy personally, but then I'm biased about that. I just can't figure out why someone would pay more money to use someone else's computers when you can do it for half the cost (or get twice+ as much power for the same price) if you roll your own. Whatever... some people don't change their own oil either, or mow their own lawns.
1. Capital efficiency. The less money I have to raise to pay up-front for hardware, the less dilution I suffer.

2. Unknown future hardware requirements. The cloud essentially lets you pick very short lease terms.

3. Bursty hardware requirements.

4. Outsourcing non-expertise. I don't want to run multiple data centers, a CDN, etc.
Mark Rose is offline   Reply With Quote
Old 2017-07-27, 21:42   #22
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100100002 Posts
Default interim 64bit residue checks via primenet for very long runs

Quote:
Originally Posted by Madpoo View Post
If we had a system where pairs of users were working on the same exponent, they could send periodic 64-bit residues like the final one, but in increments of every 5% of progress or something like that. It would also save that interim file in case a roll back is needed to one of those points.

If at any point they mismatched (during their daily checkins maybe the server informs them of a mismatch at iteration XXX) they roll back to that point and try again.

The trouble is, pairs of users on the same exponent are unlikely to progress at the same rate, so you could have one system racing ahead and finishing entirely by the time the other one even hits 10%. Or maybe the other system stops reporting entirely.

You'd have to also build in some kind of system that tries to evenly match users and systems to mitigate that, but even the most regular contributor sometimes takes a system down for maintenance or has a failure of some kind. I noticed that many of Curtis' machines have stopped reporting in (I noticed because he had a bunch reserved in the category 0 first-time-check ranges).

In terms of making sure the horsepower is evenly matched, it makes the most sense to have the same user doing both checks, but again we get into that question of how to keep out the bogus results.

I haven't really thought about it much, but it seems like maybe there would be a way for systems to report in a partial residue at some lower iteration count. Then if we had reason to suspect something was odd, we could run up to that point and compare that residue just to see if they even bothered going that far. Someone trying to cheat ... would they really run it 10% of the way or would they just make stuff up entirely? Who knows. People are weird, but we could probably count on most of them being lazy enough to not only cheat, but to do it in the laziest way possible.
Interesting thread. The QA team for prime95 V18 to 20 faced similar issues, with long run times, mismatched equipment speeds, on smaller exponents but slower hardware years ago.

QA team members were volunteers. We were paired with another running the same exponents at the same time, and emailing 64-bit interim residues back and forth and generating and saving savefiles for possible restart. The emails cced prime95 and one of the volunteers who had additionally volunteered to put up an ftp server to store interim save files as an early form of cloud backup. To exercise lots of code paths, on lots of hardware types, quickly, some thoughtfully chosen 400-iteration candidate exponents were selected, and known-double-checked-by-different-software-on-different-hardware-architecture interim residues produced.

So substitute the primenet server for email, expand its detailed exponent report to optionally include lsb-masked interim residues every 10M or 100M, like residues are shown for not-yet-doublechecked LL tests, and manual submission to accommodate a new interim 64-bit residue result type, and it could indicate any of the following when an interim result is submitted:
-exponent not assigned to you
-first occurrence of that iteration count residue for that exponent, validity tbd
-not first occurrence of that iteration count residue for that exponent, matches another from another user id
-at least one other report received for that iteration count residue for that exponent, but no match
-at least one other report received for that iteration count residue for that exponent, and yours matches a previous report
-multiple other reports from other user ids for that iteration count residue for that exponent. Your reported residue does not match other matching reports

It could also involve a new type of assignment, called something like coordinated co-test.

I don't think there's much chance of cheating to fake double checking if going to the trouble of computing a long test. There also seems no simple way of preventing it.
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. MooMoo2 Other Chess Games 0 2016-11-26 06:52
Redoing factoring work done by unreliable machines tha Lone Mersenne Hunters 23 2016-11-02 08:51
Unreliable AMD Phenom 9850 xilman Hardware 4 2014-08-02 18:08
[new fish check in] heloo mwxdbcr Lounge 0 2009-01-14 04:55
The Happy Fish thread xilman Hobbies 24 2006-08-22 11:44

All times are UTC. The time now is 15:25.


Fri Jul 7 15:25:22 UTC 2023 up 323 days, 12:53, 0 users, load averages: 1.32, 1.16, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔