![]() |
|
|
#2179 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Here it is. See the commit descriptions at github for what's new or changed. https://github.com/preda/gpuowl
|
|
|
|
|
|
#2180 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
|
|
|
|
|
|
|
#2181 |
|
Romulan Interpreter
Jun 2011
Thailand
72×197 Posts |
Mihai, you miss the point. It doesn't matter how and where we got those assignments from. The tool doesn't work properly.
Assuming there is an yet-to-be-uncovered error in gpuOwl multiplication routines, starting always with the shift zero will always deal with the same data, therefore always producing the same incorrect result, and this we can't check. This is most relevant for LL tests, or for P-1, where there is no GC. Random (or specified) shift at startup is a must to have, it ensures not only the sanity of the tests, and allows re-testing/DCing/etc of ANY former result (including results produced by gpuOwl itself), therefore adding more utility to the tool, but also ensures the sanity of the code itself. Here is where we (and Ken) are barking. This is by no mean trying to undermine your work. You did an amazing effort to implement all the FFT and stuff by yourself from scratch, and to make it faster, and to share it with the community, and we really appreciate you for this. As a programmer myself, I can testify about the huge effort and knowledge needed for such a task. Now, why don't you want to make it... properly? This should include shifts and proper file-names, and keeping the history, as cudaLucas is doing, till then, many of us would still prefer cudaLucas, albeit they don't have the time/guts/whatever to publicly speak here. Everybody wants to use gpuOwl, because it is faster. But as it is now, its usefulness is quite limited, it can only be used to double check old runs which were NOT done with a zero shift. We are too paranoid to use it for new tests, and assuming the happiest case where more and more people start to use it, we will reach a point where gpuOwl users will have to WAIT for other people to complete P95 tests, to have something to DC for themselves. As a R7 is about 6-7 times faster than a 10-cores i7 processor at PRP, it is enough that 1/7 of the users put their cards to work, and we are in the mud. This may seem far-fetched, and long in the future, because many users don't have R7, but they also don't have 10-cores CPUs. The future may be sooner than most of us imagine. I already have a list of tests which were PRP and DC in parallel runs in two cards, and I could not report the DC because of the same shift. I don't cry for credit or candies, but first of all, this is a waste of resources, and this slows the project down in long run as somebody will have to re-do in the future the work I already did and can not report. Also, every time gpuOwl produces a mismatch, we will still need to wait for the (slow) run of P95 for the TC. And there are many other situations. But yet, this is not the worst, the worst is that, in spite of the fact that I did TWO RUNS in parallel, I am still not confident in the fact that the result is correct. I am only confident that there was no hardware error, as the both runs produced the same final residue, so my hardware is sane. But I cannot be sure (and I mean general "I" here) that the FFT implementation is correct, because both of the instances started with the same shift, so they dealt with the same data along the test. If there is an error in the code, then both of the runs have the error. And my paranoia won't let me sleep... hehe... Your job must be to offer an alternative to P95, not a secretary to it. Last fiddled with by LaurV on 2020-05-20 at 05:27 |
|
|
|
|
|
#2182 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
For we hard core GIMPSters, the numerical discrepancy between gpuowl and cpu throughput will become much more than what laurv has stated. One modest 4-core cpu can support several Radeon VII (or Radeon Pro VII when they come out), with a suitable power supply, motherboard and chassis. Something like that is what George and Ernst are now doing, and others too. The power/performance efficiency of the Radeon VII will drive it that way.
We need nonzero shift in gpuowl, both PRP and LL. You've done it before in LL. Please bring it back. Other error detection measures would be very welcome too. (You've done the Jacobi check before too.) |
|
|
|
|
|
#2183 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
mprime/prime95; mlucas; cudalucas (I think cllucas did not, and was not used much.) Last fiddled with by kriesel on 2020-05-20 at 09:51 |
|
|
|
|
|
|
#2184 | |
|
Jun 2003
2×3×7×112 Posts |
Quote:
IMO, It is high time we made PRP the default test type and start forcing everyone to use these instead of 1st time LL test. [Yes, I know why it can't happen -- damn older clients]. |
|
|
|
|
|
|
#2185 |
|
Oct 2018
Slovakia
2×3×11 Posts |
@axn:
But it's still here LL-double/triple check. Morning i started one triple check via GPUowl, but when i read this thread, i stopped him. |
|
|
|
|
|
#2186 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
Talking about LL and PRP (i.e. ignoring P-1 for a moment), I think the offset is most useful for LL. Talking about LL with gpuowl, the focus is on double-checking past LL results. The majority of the past LL was done with non-zero offset with mprime. Validating an mprime, non-zero offset result with gpuowl is very strong, stronger than validating different-offsets with a single software. So, what is the use-case that is a pain point for you, that is not covered? Are you doing first-time LL with GPUs? If so, maybe you should try to do PRP instead of LL. The number of first-time LL tests done with gpuowl should be a tiny minority, that minority will be checked with mprime without any difficulty. |
|
|
|
|
|
|
#2187 |
|
"Mihai Preda"
Apr 2015
25338 Posts |
Hi, in a recent commit https://github.com/preda/gpuowl/comm...a13478c192bb3d
I try to bring back the Jacobi check for LL. This is how it works and what changes for LL: 1. When: by default a Jacobi check will be done every 1M iterations. This can be configured with the -jacobi <step> command line argument, giving it a number of iterations. The check is rather slow (on the order of 1minute) and takes up 1-core of CPU, so I think it shouldn't be done too often (thus the default of 1M iterations) 2. Savefiles: an LL is only ever saved after a successful Jacobi check. There is no possibility to do an LL save that did not pass Jacobi. This, combined with the above point about the frequency of Jacobi, means that the frequency of saves is reduced (by default every 1M its). The Jacobi check is also triggered on exit (Ctrl-C), thus if the user is willing to wait the 1min after Ctrl-C the savefile will be up-to-date. OTOH if there's a power-cut no luck. 3. Moving backwards: the check is done in the background on CPU, while the LL test keeps advancing. In the eventuality that the background Jacobi fails, the test should automatically resume from the most recent savepoint. 4. Logging: the log-lines for LL now contain these codes: "LL": a simple not-checked log line of LL "OK": an iteration that passed Jacobi "EE": an iteration that failed Jacobi There may be bugs, as usual. |
|
|
|
|
|
#2188 |
|
Jun 2003
10011110110102 Posts |
Sure. But it doesn't make sense to invest time/effort in a dead end, There are cudalucas/ cllucas/ older versions of gpuowl, etc. for that purpose. Also, looking at the points preda said, it might be possible to doublecheck with zero-shift, if the original had non-zero. I say "might" because I don't know whether server will accept or reject it; it should, but I don't know.
|
|
|
|
|
|
#2189 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
The only problem appears when attempting to double-check gpuowl LL with gpuowl, that is not a good idea. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |