![]() |
|
|
#254 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
Hi,
I recently pulled down the Windows executable zip file for gpuowl 1.9 from http://www.mersenneforum.org/showpos...&postcount=226, unzipped, read its README.md, which says in part: Code:
## gpuowl -help outputs:
```
gpuOwL v0.6 GPU Lucas-Lehmer primality checker; Mon Aug 21 23:47:40 2017
Command line options:
-logstep <N> : to log every <N> iterations (default 20000)
-savestep <N> : to persist checkpoint every <N> iterations (default 500*logstep == 10000000)
-checkstep <N> : do Jacobi-symbol check every <N> iterations (default 50*logstep == 1000000)
-uid user/machine : set UID: string to be prepended to the result line
-cl "<OpenCL compiler options>", e.g. -cl "-save-temps=tmp/ -O2"
-selftest : perform self tests from 'selftest.txt'
Self-test mode does not load/save checkpoints, worktodo.txt or results.txt.
-time kernels : to benchmark kernels (logstep must be > 1)
-legacy : use legacy kernels
-device <N> : select specific device among:
0 : 64x1630MHz gfx900; OpenCL 1.2
```
Code:
if not exist gpuowlstart.txt gpuowl -selftest >>gpuowlstart.txt gpuowl -help >>gpuowlstart.txt Code:
gpuOwL v1.9- GPU Mersenne primality checker Argument '-selftest' not understood gpuOwL v1.9- GPU Mersenne primality checker Argument '-help' not understood Please update it to cover gpuowl 1.x also. Thanks! |
|
|
|
|
|
#255 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
542110 Posts |
Try null parameters:
Code:
$ gpuowl gpuOwL v1.9- GPU Mersenne primality checker Radeon 500 Series 8 @f:0.0, gfx804 1203MHz Can't open 'worktodo.txt' (mode 'r') Bye Code:
$ gpuowl --help
gpuOwL v1.9- GPU Mersenne primality checker
Command line options:
-size 2M|4M|8M : override FFT size.
-fft DP|SP|M61|M31 : choose FFT variant [default DP]:
DP : double precision floating point.
SP : single precision floating point.
M61 : Fast Galois Transform (FGT) modulo M(61).
M31 : FGT modulo M(31).
-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-legacy : use legacy kernels
-dump <path> : dump compiled ISA to the folder <path> that must exist.
-verbosity <level> : change amount of information logged. [0-2, default 0].
-device <N> : select specific device among:
0 : Radeon 500 Series 8 @f:0.0, gfx804 1203MHz
1 : 12 @0:0.0, Intel(R) Xeon(R) CPU E5645 @ 2.40GHz 2394MHz
$
What are the legacy kernels? What became of the self test? Code:
gpuOwL v1.9- GPU Mersenne primality checker
Radeon 500 Series 8 @f:0.0, gfx804 1203MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
C:\Users\ken\AppData\Local\Temp\\OCL4520T3.cl:1:10: fatal error: 'gpuowl.cl' fil
e not found
#include "gpuowl.cl"
^
1 error generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
Ok, now it seems to be working, with initial output showing 11.94-12.04 ms/iteration on the RX550 for PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) Last fiddled with by kriesel on 2018-01-19 at 19:07 |
|
|
|
|
|
#256 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·13·139 Posts |
(Later...) Whoa, what happened in that middle line? Over a minute per iteration computed (>5000 times the preceding and following). Momentarily projecting runtime of over 150 years!
Code:
OK 80000 / 76812401 [ 0.10%], 11.99 ms/it; ETA 10d 15:27; 6ee0f8a8a97d7812 [2018-01-19 13:19:36 Central Standard Time] OK 100000 / 76812401 [ 0.13%], 63362.75 ms/it; ETA 56258d 15:28; 3fb24c04ec7569db [2018-01-19 13:23:43 Central Standard Time] OK 150000 / 76812401 [ 0.20%], 11.99 ms/it; ETA 10d 15:25; 10bf91703f69c302 [2018-01-19 13:33:50 Central Standard Time] Last fiddled with by kriesel on 2018-01-19 at 20:00 |
|
|
|
|
|
#257 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
Quote:
Code:
OK 400000 / 76812401 [ 0.52%], 11.99 ms/it; ETA 10d 14:35; 1048549c8bed4e64 [2018-01-19 14:24:25 Central Standard Time] OK 450000 / 76812401 [ 0.59%], 25352.29 ms/it; ETA 22407d 03:22; 22b21a753ab5f8b5 [2018-01-19 14:34:31 Central Standard Time] OK 500000 / 76812401 [ 0.65%], 11.99 ms/it; ETA 10d 14:15; a7265bc29bf827f1 [2018-01-19 14:44:39 Central Standard Time] Code:
OK 800000 / 76812401 [ 1.04%], 11.99 ms/it; ETA 10d 13:08; 6dd5e2686143154e [2018-01-19 15:44:59 Central Standard Time] OK 900000 / 76812401 [ 1.17%], 12682.14 ms/it; ETA 11142d 19:40; e6f4ebf42ae5ab20 [2018-01-19 16:05:05 Central Standard Time] OK 1000000 / 76812401 [ 1.30%], 11.99 ms/it; ETA 10d 12:28; 4562268d31760723 [2018-01-19 16:25:11 Central Standard Time] Code:
OK 1145000 / 76812401 [ 1.49%], 11.95 ms/it; ETA 10d 11:12; 4243e97c3e1113e1 [2018-01-19 16:56:16 Central Standard Time] OK 1150000 / 76812401 [ 1.50%], 253415.12 ms/it; ETA 221923d 00:32; 43c530d6fa88a445 [2018-01-19 16:57:24 Central Standard Time] OK 1160000 / 76812401 [ 1.51%], 11.99 ms/it; ETA 10d 11:56; 059216a2fa1889b9 [2018-01-19 16:59:32 Central Standard Time] Log output apparently commits to disk at program exit. I'd prefer it to flush to disk at least hourly, so that less log output is lost in the case of a system or application crash. Built in logging is a great feature in my opinion. For comparison, cllucas at half the fftlength, on the same GPU does: Code:
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2048K, clLucas v1.04 err = 0.0781 (1:20 real, 8.0871 ms/iter, ETA 83:25:53) CUDALucas runs around 35. ms/iter for 4M fft length on a Quadro 2000 (2.92 times as long!); the mersenne.ca benchmarks indicate the Quadro 2000 and RX550 cards are quite comparable at LL performance near 75M. Time to check into OpenCL on NVIDIA perhaps. Last fiddled with by kriesel on 2018-01-20 at 00:05 |
|
|
|
|
|
|
#258 |
|
"Mihai Preda"
Apr 2015
55B16 Posts |
Thanks for the feedback! it seems there is one bug in the time-per-iteration computation, I'll look into that.
What OS and driver are you using? I'll try to investigate why the log is not flushed, but I suspect it's caused by different behavior Linux vs. Windows. Yes the readme needs updating too. And other important things need to be done: - new FFT sizes allowing efficient transition to "after 4M FFT" - automatic communication with primenet.. I don't know how hard this is. |
|
|
|
|
|
#259 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·13·139 Posts |
Quote:
Windows 7 is present for the reported observations, and most of my other GPU equipped systems, while it's Vista for my other GPU systems. See one of the attachments for driver info for what I reported. As often happens, the situation with logging is less simple than I first thought. See the log-lag attachment. Time and calendar computations have long been the bane of programmers' lives. Counters wrap, standards abound, and timekeeping and the calendars' histories are riddled with rules changes and special cases. (There's an xkcd comic about that, which I can't find at the moment.) Happy hunting. Aside from the occasional very long ms/iter values, there seems to be a difference of about 7 seconds, beyond ms/iter * iteration delta, to wall clock time difference output by GPUOwL. Perhaps that's the time to update the save files? My impression is other applications include the save file time in determining their iteration rates, while it looks like gpuowl does not. (Are you not doing memory buffer/asynchronous write to allow iteration to continue during save to disk?) Something else that I find interesting is gpuowl seems resistant to command-line redirection of output to the console. (Is it writing to stderr not stdout?) Builtin logging makes that less important, though the log lag observed makes it more significant. Documentation of a moving target is not easy. Efforts to catch it up soon (and periodically) would be appreciated. In other applications, authors have moved on, and it did not get done, years later, while the apps are still in use. It does not get easier with the passage of time. As I understand it, gpuowl now has 2M, 4M, and 8M fft lengths, so one is not stuck doing DC when first time tests suitable for 4M are soon exhausted, but the 4M<l-<6M suitable exponents could benefit a lot from a 6M fft length (or other sufficiently fast intermediate lengths). If implementing many fft lengths, selecting the right one has presented challenges. From what I've read, cudalucas handles fft length selection, for speed within the error constraints, better than cllucas thus far. When you eventually get around to trying primenet linkage, see http://mersenneforum.org/showpost.ph...&postcount=406 and there's a tabulation I made of other folks' efforts at http://www.mersenneforum.org/showthr...t=22450&page=3 It may be useful to look at a number of cases of how others have approached it. There's https://www.explainxkcd.com/wiki/ind...th_the_Time%3F Of course all the numbers change when considering n users of gpuowl, depending on how you weight the value of your time versus theirs, and n. (It's logical to weight an hour of code authors' time as more valuable than users' time, since the number of coders able to handle gpu programming, ffts, etc, and willing to spend the amounts of time required to make code effective and reliable for mersenne hunting is a small number compared to the user base.) Last fiddled with by kriesel on 2018-01-20 at 14:04 |
|
|
|
|
|
|
#260 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
542110 Posts |
Now 12+ hours. Values of ms/iter roughly 6347 have become common (9 of the last 16). Note the log contents halt in mid record.
Latest version of GPU-Z and the RX550 seem to be failing to communicate about clocks and temperature. |
|
|
|
|
|
#261 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·13·139 Posts |
Quote:
I have sequences that differ near start; iteration counts differ from line to line, by 1k,4k,5k,10k,20k...,50k,...,100k Code:
OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [2018-01-19 13:02:44 Central Standard Time] OK 1000 / 76812401 [ 0.00%], 11.96 ms/it; ETA 10d 15:06; aadc1acf24bf7d60 [2018-01-19 13:03:04 Central Standard Time] OK 5000 / 76812401 [ 0.01%], 11.94 ms/it; ETA 10d 14:44; 3db0edb3db578456 [2018-01-19 13:03:59 Central Standard Time] OK 10000 / 76812401 [ 0.01%], 12.04 ms/it; ETA 10d 16:46; 261173187b4c2ca3 [2018-01-19 13:05:07 Central Standard Time] OK 20000 / 76812401 [ 0.03%], 11.99 ms/it; ETA 10d 15:41; 32413ccc78fbad77 [2018-01-19 13:07:14 Central Standard Time] OK 40000 / 76812401 [ 0.05%], 11.99 ms/it; ETA 10d 15:36; 45e38194e8bad318 [2018-01-19 13:11:21 Central Standard Time] ... 1k,2k,5k,10k,..,20k,...40k,50k...,100k...,200k... Code:
OK 1142000 / 76812401 [ 1.49%], 0.00 ms/it; ETA 0d 00:00; 3db0227e36cac730 [2018-01-19 16:55:25 Central Standard Time] OK 1143000 / 76812401 [ 1.49%], 11.96 ms/it; ETA 10d 11:26; 1b8377874165d88d [2018-01-19 16:55:45 Central Standard Time] OK 1145000 / 76812401 [ 1.49%], 11.95 ms/it; ETA 10d 11:12; 4243e97c3e1113e1 [2018-01-19 16:56:16 Central Standard Time] OK 1150000 / 76812401 [ 1.50%], 253415.12 ms/it; ETA 221923d 00:32; 43c530d6fa88a445 [2018-01-19 16:57:24 Central Standard Time] OK 1160000 / 76812401 [ 1.51%], 11.99 ms/it; ETA 10d 11:56; 059216a2fa1889b9 [2018-01-19 16:59:32 Central Standard Time] OK 1170000 / 76812401 [ 1.52%], 11.99 ms/it; ETA 10d 11:58; f517fce8185f4163 [2018-01-19 17:01:39 Central Standard Time] OK 1180000 / 76812401 [ 1.54%], 11.99 ms/it; ETA 10d 11:55; 2e9b327003b87b79 [2018-01-19 17:03:47 Central Standard Time] OK 1200000 / 76812401 [ 1.56%], 11.99 ms/it; ETA 10d 11:50; 505bc0afc72a4862 [2018-01-19 17:07:54 Central Standard Time] Last fiddled with by kriesel on 2018-01-20 at 22:16 |
|
|
|
|
|
|
#262 | |
|
"Composite as Heck"
Oct 2017
3·52·11 Posts |
Quote:
http://www.mersenneforum.org/mayer/README.html#reserve |
|
|
|
|
|
|
#263 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·13·139 Posts |
Log updated to disk last night, ~21 hours after the previous, and is now 5 million iterations and about17 hours behind the program's progress.
As the Gerbicz check interval increased, ms/iter times became less drastic when off but more frequently off. Now that the Gerbicz check has advanced to 500k intervals, the ms/iter times are consistently too high, with values ~2534.03n+111.99 where n is 1 or 2 and 12 is the expected iteration time, in msec/iteration. That corresponds to time errors of 1,267,015,000 or 1.18 x 2^30 msec (~352 hours or 14 2/3 days) somewhere, occurring once or twice per 500k output interval. Taking the most common values of the 200k intervals, 6347.07-11.99 = 6335.08 ms/iter corresponds to an excess of 1,267,016,000 msec, very close to the 500k value; 1.208*2^20 seconds. Last fiddled with by kriesel on 2018-01-21 at 18:41 |
|
|
|
|
|
#264 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×13×139 Posts |
Quote:
It seems to me from my limited NVIDIA testing that dissimilar code instances can produce significant increases in total throughput. For GpuOwL, it might be interesting to try one of -fft DP with one of -fft M61 to work both the DP and integer units. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |