![]() |
|
|
#1409 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Nonzero pseudorandomly selected shift for gpuowl PRP would be useful. It would make life easier for uncwilly et al in the double, triple, quad checking effort, and gpuowl results could be checked with gpuowl.
|
|
|
|
|
|
#1410 | |
|
Jun 2003
10011110111012 Posts |
Quote:
/IIRC |
|
|
|
|
|
|
#1411 |
|
Romulan Interpreter
Jun 2011
Thailand
3×3,221 Posts |
Nope. Different shifts can DC an exponent, even if the same program was used. See my own DC history.
Which is not very good, because it can be easily abused, as (for example in CudaLucas) there is no checksum or crc/secret key, etc., and it was discussed in the past many times, but the actual state has its advantages, I personally would not like it changed. I would better like a "short list" of "trusted" users which won't abuse it (and of course, I must be the fist in the list , a mismatch in my self-DC-ed work is yet to be found, hehe). But this is not easy to implement. Now for example, even with a "short list" of users, you can easily abuse the system as you can report (fake) work in the name of other user and lower his credibility (is "denigrate" a word? ha, it seems it is!).
Last fiddled with by LaurV on 2019-10-02 at 15:42 |
|
|
|
|
|
#1412 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Mprime/prime95 are limited to various fft lengths and so exponents as a function of cpu capability, with at least FMA3 required to exceed 596M and only AVX512 able to exceed 920M and the mersenne.org 1G limit. Gpuowl (3.3G), Mlucas (~4.3G) and CUDALucas (2.1G) can far exceed that. The cpu-dependent limitation of prime95 affects P-1 as well as PRP and LL. (Running exponents above 109 is to be discouraged, since they are very slow, and there is now no online site like mersenne.org or mersenne.ca at which to coordinate effort or submit any such results.) My FMA3 hardware is scarce and AVX512 hardware nonexistent. But I have several gpus capable of large exponents. Perhaps someday George (working with Mihai?) will produce special builds of gpuowl that include the security code and are considered trusted. (Windows and linux flavors) Last fiddled with by kriesel on 2019-10-02 at 16:25 |
|
|
|
|
|
|
#1413 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
I have an unexplained gpuowl slowdown. The card is running 2 gpuowl instances. Two PRP tests completed within 7 seconds of each other. Upon starting the next tests a 6% slowdown is observed. I stopped the tests and resumed them (with a 30+ second stagger) and speeds are back to normal. Here are the two log files:
Code:
2019-10-11 15:59:54 radeon6.2 89048789 88800000 99.72%; 1718 us/sq; ETA 0d 00:07; 98485251fa66b1a7
2019-10-11 16:01:20 radeon6.2 89048789 88850000 99.78%; 1718 us/sq; ETA 0d 00:06; 890e528a5883bd6f
2019-10-11 16:02:46 radeon6.2 89048789 88900000 99.83%; 1715 us/sq; ETA 0d 00:04; 22e741d0d93ca955
2019-10-11 16:04:12 radeon6.2 89048789 88950000 99.89%; 1718 us/sq; ETA 0d 00:03; ad1698b105f02f48
2019-10-11 16:05:40 radeon6.2 89048789 OK 89000000 99.94%; 1718 us/sq; ETA 0d 00:01; 3c2df17632340d45 (check 2.40s)
2019-10-11 16:07:04 radeon6.2 CC 89048789 / 89048789, 45fcf6f11b116fYY
2019-10-11 16:07:06 radeon6.2 89048789 OK 89049000 100.00%; 1718 us/sq; ETA 0d 00:00; c1ec7cf569dc62YY (check 2.13s)
2019-10-11 16:07:06 radeon6.2 {"exponent":"89048789", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-10-11 20:07:06 UTC", "user":"gw2", "computer":"radeon6.2", "aid":"3F98B6BAF4453D8B86F66870E33ED5DF", "fft-length":5242880, "res64":"45fcf6f11b116fYY", "residue-type":1}
2019-10-11 16:07:06 radeon6.2 89048803 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.98 bits/word
2019-10-11 16:07:06 radeon6.2 using short carry kernels
2019-10-11 16:07:06 radeon6.2 OpenCL args "-DEXP=89048803u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x1.02ba3352d6a7ap+0 -DIWEIGHT_STEP=0x1.fa9a51aca2cfdp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-10-11 16:07:09 radeon6.2 OpenCL compilation in 2760 ms
2019-10-11 16:07:10 radeon6.2 89048803.owl not found, starting from the beginning.
2019-10-11 16:07:16 radeon6.2 89048803 OK 2000 0.00%; 987 us/sq; ETA 1d 00:25; 5c53bf84b606b38c (check 1.38s)
2019-10-11 16:08:42 radeon6.2 89048803 50000 0.06%; 1798 us/sq; ETA 1d 20:27; 0bce9df7b774451e
2019-10-11 16:10:14 radeon6.2 89048803 100000 0.11%; 1823 us/sq; ETA 1d 21:02; 6a7d31c61f3cae1f
2019-10-11 16:11:45 radeon6.2 89048803 150000 0.17%; 1821 us/sq; ETA 1d 20:58; 338e4f4e278beb78
Code:
2019-10-11 16:00:02 radeon6.1 89048411 88800000 99.72%; 1717 us/sq; ETA 0d 00:07; 0d9cd8ae231e6238
2019-10-11 16:01:28 radeon6.1 89048411 88850000 99.78%; 1719 us/sq; ETA 0d 00:06; 2c4b47d2e1394951
2019-10-11 16:02:54 radeon6.1 89048411 88900000 99.83%; 1721 us/sq; ETA 0d 00:04; 783b5263315e1130
2019-10-11 16:04:20 radeon6.1 89048411 88950000 99.89%; 1718 us/sq; ETA 0d 00:03; ca770a9a3e2a1db9
2019-10-11 16:05:48 radeon6.1 89048411 OK 89000000 99.94%; 1714 us/sq; ETA 0d 00:01; 8e609e2b77d4fa96 (check 2.47s)
2019-10-11 16:07:10 radeon6.1 CC 89048411 / 89048411, a0a0e9062a5434ZZ
2019-10-11 16:07:13 radeon6.1 89048411 OK 89049000 100.00%; 1682 us/sq; ETA 0d 00:00; 355a39c82bb90aZZ (check 2.20s)
2019-10-11 16:07:13 radeon6.1 {"exponent":"89048411", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-10-11 20:07:13 UTC", "user":"gw2", "computer":"radeon6.1", "aid":"227598F5DFD8F69D5CB01A83AFF90933", "fft-length":5242880, "res64":"a0a0e9062a5434ZZ", "residue-type":1}
2019-10-11 16:07:13 radeon6.1 89048419 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.98 bits/word
2019-10-11 16:07:13 radeon6.1 using short carry kernels
2019-10-11 16:07:13 radeon6.1 OpenCL args "-DEXP=89048419u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x1.02bd9028ab4b4p+0 -DIWEIGHT_STEP=0x1.fa93bc3216fp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-10-11 16:07:16 radeon6.1 OpenCL compilation in 2630 ms
2019-10-11 16:07:17 radeon6.1 89048419.owl not found, starting from the beginning.
2019-10-11 16:07:26 radeon6.1 89048419 OK 2000 0.00%; 1817 us/sq; ETA 1d 20:57; 991b5af4f773d55c (check 2.24s)
2019-10-11 16:08:53 radeon6.1 89048419 50000 0.06%; 1822 us/sq; ETA 1d 21:03; 953d0916398fa0ad
2019-10-11 16:10:24 radeon6.1 89048419 100000 0.11%; 1820 us/sq; ETA 1d 20:58; 8499cbc26e58f8d4
2019-10-11 16:11:55 radeon6.1 89048419 150000 0.17%; 1822 us/sq; ETA 1d 21:00; 111910b7676c7555
2019-10-11 16:13:26 radeon6.1 89048419 200000 0.22%; 1823 us/sq; ETA 1d 21:00; 3497e36e9c9e0b4b
My guess is that the 2 PRP tests somehow allocated unaligned memory or the various weights, sin/cos, and FFT allocs were a "bad" distance apart. My concern is that I might experience similar problems on reboots. Any thoughts preda? |
|
|
|
|
|
#1414 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
What is single-instance iteration time on the same gpu?
|
|
|
|
|
|
#1415 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011101112 Posts |
|
|
|
|
|
|
#1416 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
Some kernels are more memory-heavy and some more compute-heavy. Maybe if they happen by chance to hit a bad phase where the kernels from the two instances are running memory-bound at the same time (or compute-bound at the same time) it would produce lower perfermance. I have no idea though if such a phase pattern is stable. But all this is just guessing -- I don't have much experience with running two instances in parallel. |
|
|
|
|
|
|
#1417 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Rocm 2.9 warning: Expect a 4+% slowdown if you "upgrade" from rocm 2.5 and are running one instance of gpuowl. I saw times go from ~909 us to ~949us.
The good news is my 2 instance timings dropped from ~1729 us to ~1723 us. |
|
|
|
|
|
#1418 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
Quote:
Interestingly, the 10% slowdown happens every time 2 tests end and the next two begin. Of 5 GPUs, this is the only one that exhibits a slowdown. Weird. |
|
|
|
|
|
|
#1419 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
Quote:
I've seen throughput advantages to staggering multiple runs of other applications. (Sometimes requiring considerable desynch; up to an hour for CUDAPm1.) Presumably you've already looked for possible differences among the gpus (model, BIOS version), supporting system, and workloads. Last fiddled with by kriesel on 2019-10-21 at 04:46 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |