mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-05-20, 12:43   #2190
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by LaurV View Post
This should include shifts and proper file-names, and keeping the history, as cudaLucas is doing
What is the problem with the file-names?
preda is offline   Reply With Quote
Old 2020-05-20, 13:31   #2191
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72·197 Posts
Default

Quote:
Originally Posted by preda View Post
Hi, in a recent commit ...

1. Jacobi check will be done every 1M iterations.
2. Savefiles:
3. Moving backwards:
4. Logging:
Hi Mihai, thanks for the fast answer. I totally agree with what you said. 1M is often enough, and usually for a 100M or 332M test respectively, this will run every 5M or 10M or more iterations, and 20 minutes in 3 days, or in the larger case, 33 minutes in two-three weeks is not too much, and I could happily afford this time, if it ensures the sanity of the test in reasonable limits.
Points 3 and 4 are already working like that in the version I run.

I was never neither pro nor contra GC/JC, and personally it wont hurt neither help me. (You would remark that I didn't make any comment related to GC/JC).
My "test case" is for years now, TWO cards running in parallel, and checking the residues at every checkpoint. This way, there is no time lost when the mismatch happens, and I consider that such procedure can't be further optimized, no matter what other people think. I still do this with cudaLucas, but now, it is your fault because you made gpuOwl faster and I can't stop wishing to switch to it, and therefore I can't stop bothering you to make it to my liking

What do one needs for my "way"?

First, two identical cards. Or how many cards one has, they have to come in pairs. This I have.

Second, a fast software. That is for now, gpuOwl.

Third, due to the fact that the software runs in the same computer, the software should operate with different set of data. This will eliminate any bugs in the software, as long as both copies rund on different data and get the same result, the software is "sane", the FFT squaring is "sane". Otherwise, we don't know, unless a P95 (or other) test is run and we can compare the results. GC is still prone to errors, and when you have hundred million bits iterated hundred million times, the errors are usually cleverer than us. The chance is negligible, but not zero. Additionally, how do you "convince" PrimeNet to accept your DCs? If they were both done in a "no-shift" test, they are not two different tests. This just ensures the hardware is sane, as I already said, but it says nothing about the sanity of the software.

Fourth, a history. Sometimes, against all our precautions, one card gets faster than the other, because we do other work in the computer, or play games, or watch videos, etc, and when a mismatch happens, it can be that one card is "more than two checkpoints" in advance compared with the other. In that moment, the only way to continue the test is starting both instances from scratch, wasting a lot of time, days, or even weeks. Because you don't know which one was wrong, and the program only keeps the last two checkpoints. All checkpoints should be kept, the way cudaLucas does, and those should be deleted manually by the user at the end of the test. You can provide an option to delete them automatically, for the lazy users, but I strongly DO consider that doing a manual checking your folders and cleaning your checkpoints once or twice per month (one 332M test takes about 17 days on R7, and about double in all the others) it is not "too much" for the user. I mean, how lazy can one be?

You may say that such errors won't happen often, and their chance to happen when one card is more advanced than the other is slim, especially with GC active, but trust me, these things DO happen, and they may be "extremely rare", but every time such thing would happen, losing a week or two of two cards would be totally pissing off, and I love my monitor so much, I don't want to break it with my head.

In fact, in such situation, will be better to let both test finish and report both residues, in the hope that one of them would be good, and you didn't waste the time, at least for one card. But this is another can of worms, you will end up by everybody reporting two results, one correct and one fake, and claiming they ran two tests in parallel, when they in fact did only one (some people may do this, for credits, whatever).

And I could argue like that, endless... The moral is we need a history, i.e. instead of deleting old files, rename them "exponent.iteration.residue.whatever", where the first 3 fields are MUST to have, so our comparison/resume tools work with minimum changes .

Last fiddled with by LaurV on 2020-05-20 at 13:33
LaurV is offline   Reply With Quote
Old 2020-05-20, 13:42   #2192
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

965310 Posts
Default

Quote:
Originally Posted by preda View Post
What is the problem with the file-names?
Crosspost. I already replied, but to make it clear:

The file names should contain the exponent, the iteration, the residue. This is a must, for easily sorting, comparing, etc. Old files should not be deleted, but renamed properly and kept in the folder. The residue is needed in the name because (in case of shift/offset) the content of the files are different and can not be used for comparison. We discussed this in the past, and you came with the idea of putting the file header inside. Which is very good, but why would I need to open all >50MB files and "cat" them to get the residues? If I have the same files in both folders, it means the test is running smooth. If one folder has 55418387.12000000.adef1234cdeb9876.ll and the other folder has 55418387.12000000.def1234cdeb98765.ll instead, I know immediately that one card is in the weeds and I can stop and resume both from the last, I don't need any tool for that, just sharp eyes and fast fingers.

Last fiddled with by LaurV on 2020-05-20 at 13:44
LaurV is offline   Reply With Quote
Old 2020-05-20, 13:49   #2193
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by LaurV View Post
My "test case" is for years now, TWO cards running in parallel, and checking the residues at every checkpoint. This way, there is no time lost when the mismatch happens, and I consider that such procedure can't be further optimized, no matter what other people think.
OK I understand. At some point I was doing something similar, but on a single GPU, by running every iteration twice and comparing the residue for equality. That allows, as you say, detection of the errors as early as possible, but the cost is a halving of the throughput.

Running PRP you'd detect the errors just as well, and you'd double the capacity. Give it a try -- if you succedd in producing a failure of the check (i.e. a non-detected error), as you suspect may be possible, that would be a momentous achievement (but also much more difficult than simply finding the next mersenne prime IMO :)
preda is offline   Reply With Quote
Old 2020-05-20, 13:56   #2194
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72·197 Posts
Default

Quote:
Originally Posted by preda View Post
Give it a try -- if you succedd in producing a failure of the check (i.e. a non-detected error), as you suspect may be possible, that would be a momentous achievement (but also much more difficult than simply finding the next mersenne prime IMO :)
How can I? Do you mean to run a million PRP tests, report the result, and then wait for somebody to run P95 on them?
The problem is exactly THAT: you run two tests that will always match, as long as there are no glitches in the hardware, no matter what you do in the software, because you always do the same thing, applied to the same data. You don't know if they have an error, unless you have an etalon. Maybe that is why there was no error found up to now, and not because GC is so strong... (ranting here...I understand the math part).
LaurV is offline   Reply With Quote
Old 2020-05-20, 14:00   #2195
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by LaurV View Post
Crosspost. I already replied, but to make it clear:

The file names should contain the exponent, the iteration, the residue. This is a must, for easily sorting, comparing, etc. Old files should not be deleted, but renamed properly and kept in the folder. The residue is needed in the name because (in case of shift/offset) the content of the files are different and can not be used for comparison. We discussed this in the past, and you came with the idea of putting the file header inside. Which is very good, but why would I need to open all >50MB files and "cat" them to get the residues? If I have the same files in both folders, it means the test is running smooth. If one folder has 55418387.12000000.adef1234cdeb9876.ll and the other folder has 55418387.12000000.def1234cdeb98765.ll instead, I know immediately that one card is in the weeds and I can stop and resume both from the last, I don't need any tool for that, just sharp eyes and fast fingers.
For PRP, any savefile is validated before being written, thus the possibility of having different savefile residues for the same iteration of the same exponent simply does not exist.

I'll think about doing something better about keeping those files around.

Shouldn't be hard to make a script tool (bash, perl, etc) that would rename the files adding the residue which is easily parsed from the first line to the file-name if desired.

(but anyway the proper fix is to switch to PRP)
preda is offline   Reply With Quote
Old 2020-05-20, 14:03   #2196
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by LaurV View Post
How can I? Do you mean to run a million PRP tests, report the result, and then wait for somebody to run P95 on them?
The problem is exactly THAT: you run two tests that will always match, as long as there are no glitches in the hardware, no matter what you do in the software, because you always do the same thing, applied to the same data. You don't know if they have an error, unless you have an etalon. Maybe that is why there was no error found up to now, and not because GC is so strong... (ranting here...I understand the math part).
Well, I guess you run it twice and detect a difference.. Run it with different FFT setup, would be stronger than just a different offset.

(no, I don't actually recommend doing that, would be a waste of valuable resources)
preda is offline   Reply With Quote
Old 2020-05-20, 14:13   #2197
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

Quote:
Originally Posted by preda View Post
For PRP, any savefile is validated before being written, thus the possibility of having different savefile residues for the same iteration of the same exponent simply does not exist.
Running in circles here. You assume GC never fails, and if that would be the case, all this discussion would be futile.

Quote:
Shouldn't be hard to make a script tool (bash, perl, etc) that would rename the files adding the residue which is easily parsed from the first line to the file-name if desired.
I said in the forum that I already made one (batch, check if the file exists, rename it, sleep 10 minutes or so, repeat).

Quote:
(but anyway the proper fix is to switch to PRP)
I promise you I will switch to PRP and continue to run 2 instances of the same test in parallel until I will find that GC failure , if you give me the shift and the history (to be able to resume efficiently and to convince PrimeNet to accept my DCs, otherwise I only get candies for half of the effort, and waste the other half, hehe.

And when I'll catch you in Thai, after all this craziness with corona ends, I will force you to drink all the beer I will find in the fridge.

Last fiddled with by LaurV on 2020-05-20 at 14:14
LaurV is offline   Reply With Quote
Old 2020-05-20, 15:40   #2198
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by axn View Post
LL is not the future. PRP is. Non-zero shift is a relic of LL days without effective error check. It is completely unnecessary with PRP/GEC.

IMO, It is high time we made PRP the default test type and start forcing everyone to use these instead of 1st time LL test. [Yes, I know why it can't happen -- damn older clients].
I agree that PRP first test with the excellent GEC is preferable over LL with Jacobi check and its 50% error detection probability or LL without Jacobi as in CUDALucas and most gpuowl LL-supporting versions.
But the realities are that a great deal of LL is still being done by various programs.
https://www.mersenne.org/assignments...&exp1=1&extf=1 is almost all LL first tests. A check I did several months ago showed a nearly 50/50 mix of PRP and LL in recent results. And a great deal of LL was done in the past.
Even in PRP, shift has advantages. Suppose that the exponent under test is close enough to the limit of an fft length that roundoff error becomes an issue. A different shift may avoid the case where roundoff error repeatedly generates a Gerbicz error.
I think the case for preferring or requiring PRP first testing is stronger when pseudorandom or specifiable nonzero shift becomes available in gpuowl.
DC is still necessary for PRP for multiple reasons. (Errors have been observed outside the GEC check; users make manual reporting errors, and there is no PrimeNet API connection for gpu programs; there is no reliable built-in validation code to confirm actual work done; some rare few users submit falsified results intentionally.)

Gpuowl can't DC gpuowl without differing shift. The result is not accepted by the server.
As I recall, Ernst opined that adding shift has little if any effect on performance (from his mlucas development). It may be that Mihai and George choose to spend their time now on obtaining further performance. (And we are very appreciative of their efforts and results in this area.) Diminishing returns will occur. Perhaps they'll add shift later. When they do, I hope it is to both LL and PRP in gpuowl. I think the ideal situation would be the default is pseudorandom shift, and the user could specify a specific shift for QA test purposes.

Last fiddled with by kriesel on 2020-05-20 at 16:13
kriesel is online now   Reply With Quote
Old 2020-05-20, 16:07   #2199
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default

Two gpuowl instances running on the same gpu reportedly helps total testing throughput. But not always. Test what you run. GTX10x0 seems not to benefit in PRP.
A particularly severe case of lowered throughput I saw recently on a Radeon VII follows.
1 instance, 48M fft PRP, 10510 us/iter, 95.15 iter/sec;
1 instance, 8M fft LL, 1382 us/iter, 723.6 iter/sec;
These two run together, 8M fft LL 6610. us/iter (151.3 iter/sec, 20.9% of solo throughput), + 48M fft PRP, 52438. us/iter (19.07 iter/sec, 20.04% of solo throughput), combined for just 40.94% of solo throughput.
It's probably best to run same computation type, same fft size, or perhaps very similar size.

Last fiddled with by kriesel on 2020-05-20 at 16:14
kriesel is online now   Reply With Quote
Old 2020-05-20, 16:58   #2200
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72·197 Posts
Default

My bad wording. Sorry. I meant 2 instances, each in its own card.
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 19:05.


Sun Aug 1 19:05:46 UTC 2021 up 9 days, 13:34, 0 users, load averages: 2.33, 2.24, 1.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.