mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-10-02, 15:04   #1409
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default And...

Nonzero pseudorandomly selected shift for gpuowl PRP would be useful. It would make life easier for uncwilly et al in the double, triple, quad checking effort, and gpuowl results could be checked with gpuowl.
kriesel is offline   Reply With Quote
Old 2019-10-02, 15:12   #1410
axn
 
axn's Avatar
 
Jun 2003

10011110111012 Posts
Default

Quote:
Originally Posted by kriesel View Post
Nonzero pseudorandomly selected shift for gpuowl PRP would be useful. It would make life easier for uncwilly et al in the double, triple, quad checking effort, and gpuowl results could be checked with gpuowl.
I believe untrusted software cannot be doublechecked by another untrusted software, so atleast one test must be done by P95/mprime. This is regardless of shift count.
/IIRC
axn is online now   Reply With Quote
Old 2019-10-02, 15:39   #1411
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

Nope. Different shifts can DC an exponent, even if the same program was used. See my own DC history.

Which is not very good, because it can be easily abused, as (for example in CudaLucas) there is no checksum or crc/secret key, etc., and it was discussed in the past many times, but the actual state has its advantages, I personally would not like it changed. I would better like a "short list" of "trusted" users which won't abuse it (and of course, I must be the fist in the list , a mismatch in my self-DC-ed work is yet to be found, hehe). But this is not easy to implement. Now for example, even with a "short list" of users, you can easily abuse the system as you can report (fake) work in the name of other user and lower his credibility (is "denigrate" a word? ha, it seems it is!).

Last fiddled with by LaurV on 2019-10-02 at 15:42
LaurV is offline   Reply With Quote
Old 2019-10-02, 16:11   #1412
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by axn View Post
I believe untrusted software cannot be doublechecked by another untrusted software, so at least one test must be done by P95/mprime. This is regardless of shift count.
/IIRC
Interesting position. Yet CUDALucas on a gpu, having neither security codes, Jacobi check, nor Gerbicz check, went 18 for 18 good in a batch of strategic double checks I ran, while 8 of the 18 illegal sumout first tests in the batch (which I think were from prime95) were mismatches and subsequently verified bad by triple check. Gpuowl with the Gerbicz check may be technically untrusted, yet considerably more reliable than some prime95/mprime installations. The highest doublechecked exponents are mixed. https://www.mersenne.org/report_ll/?...exfactor=1&B1=

Mprime/prime95 are limited to various fft lengths and so exponents as a function of cpu capability, with at least FMA3 required to exceed 596M and only AVX512 able to exceed 920M and the mersenne.org 1G limit. Gpuowl (3.3G), Mlucas (~4.3G) and CUDALucas (2.1G) can far exceed that. The cpu-dependent limitation of prime95 affects P-1 as well as PRP and LL. (Running exponents above 109 is to be discouraged, since they are very slow, and there is now no online site like mersenne.org or mersenne.ca at which to coordinate effort or submit any such results.)
My FMA3 hardware is scarce and AVX512 hardware nonexistent. But I have several gpus capable of large exponents.

Perhaps someday George (working with Mihai?) will produce special builds of gpuowl that include the security code and are considered trusted. (Windows and linux flavors)

Last fiddled with by kriesel on 2019-10-02 at 16:25
kriesel is offline   Reply With Quote
Old 2019-10-12, 01:10   #1413
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default Mysterious slowdown

I have an unexplained gpuowl slowdown. The card is running 2 gpuowl instances. Two PRP tests completed within 7 seconds of each other. Upon starting the next tests a 6% slowdown is observed. I stopped the tests and resumed them (with a 30+ second stagger) and speeds are back to normal. Here are the two log files:

Code:
2019-10-11 15:59:54 radeon6.2 89048789    88800000 99.72%; 1718 us/sq; ETA 0d 00:07; 98485251fa66b1a7
2019-10-11 16:01:20 radeon6.2 89048789    88850000 99.78%; 1718 us/sq; ETA 0d 00:06; 890e528a5883bd6f
2019-10-11 16:02:46 radeon6.2 89048789    88900000 99.83%; 1715 us/sq; ETA 0d 00:04; 22e741d0d93ca955
2019-10-11 16:04:12 radeon6.2 89048789    88950000 99.89%; 1718 us/sq; ETA 0d 00:03; ad1698b105f02f48
2019-10-11 16:05:40 radeon6.2 89048789 OK 89000000 99.94%; 1718 us/sq; ETA 0d 00:01; 3c2df17632340d45 (check 2.40s)
2019-10-11 16:07:04 radeon6.2 CC 89048789 / 89048789, 45fcf6f11b116fYY
2019-10-11 16:07:06 radeon6.2 89048789 OK 89049000 100.00%; 1718 us/sq; ETA 0d 00:00; c1ec7cf569dc62YY (check 2.13s)
2019-10-11 16:07:06 radeon6.2 {"exponent":"89048789", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-10-11 20:07:06 UTC", "user":"gw2", "computer":"radeon6.2", "aid":"3F98B6BAF4453D8B86F66870E33ED5DF", "fft-length":5242880, "res64":"45fcf6f11b116fYY", "residue-type":1}
2019-10-11 16:07:06 radeon6.2 89048803 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.98 bits/word
2019-10-11 16:07:06 radeon6.2 using short carry kernels
2019-10-11 16:07:06 radeon6.2 OpenCL args "-DEXP=89048803u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x1.02ba3352d6a7ap+0 -DIWEIGHT_STEP=0x1.fa9a51aca2cfdp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-10-11 16:07:09 radeon6.2 OpenCL compilation in 2760 ms
2019-10-11 16:07:10 radeon6.2 89048803.owl not found, starting from the beginning.
2019-10-11 16:07:16 radeon6.2 89048803 OK     2000  0.00%;  987 us/sq; ETA 1d 00:25; 5c53bf84b606b38c (check 1.38s)
2019-10-11 16:08:42 radeon6.2 89048803       50000  0.06%; 1798 us/sq; ETA 1d 20:27; 0bce9df7b774451e
2019-10-11 16:10:14 radeon6.2 89048803      100000  0.11%; 1823 us/sq; ETA 1d 21:02; 6a7d31c61f3cae1f
2019-10-11 16:11:45 radeon6.2 89048803      150000  0.17%; 1821 us/sq; ETA 1d 20:58; 338e4f4e278beb78
and

Code:
2019-10-11 16:00:02 radeon6.1 89048411    88800000 99.72%; 1717 us/sq; ETA 0d 00:07; 0d9cd8ae231e6238
2019-10-11 16:01:28 radeon6.1 89048411    88850000 99.78%; 1719 us/sq; ETA 0d 00:06; 2c4b47d2e1394951
2019-10-11 16:02:54 radeon6.1 89048411    88900000 99.83%; 1721 us/sq; ETA 0d 00:04; 783b5263315e1130
2019-10-11 16:04:20 radeon6.1 89048411    88950000 99.89%; 1718 us/sq; ETA 0d 00:03; ca770a9a3e2a1db9
2019-10-11 16:05:48 radeon6.1 89048411 OK 89000000 99.94%; 1714 us/sq; ETA 0d 00:01; 8e609e2b77d4fa96 (check 2.47s)
2019-10-11 16:07:10 radeon6.1 CC 89048411 / 89048411, a0a0e9062a5434ZZ
2019-10-11 16:07:13 radeon6.1 89048411 OK 89049000 100.00%; 1682 us/sq; ETA 0d 00:00; 355a39c82bb90aZZ (check 2.20s)
2019-10-11 16:07:13 radeon6.1 {"exponent":"89048411", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-10-11 20:07:13 UTC", "user":"gw2", "computer":"radeon6.1", "aid":"227598F5DFD8F69D5CB01A83AFF90933", "fft-length":5242880, "res64":"a0a0e9062a5434ZZ", "residue-type":1}
2019-10-11 16:07:13 radeon6.1 89048419 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.98 bits/word
2019-10-11 16:07:13 radeon6.1 using short carry kernels
2019-10-11 16:07:13 radeon6.1 OpenCL args "-DEXP=89048419u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x1.02bd9028ab4b4p+0 -DIWEIGHT_STEP=0x1.fa93bc3216fp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-10-11 16:07:16 radeon6.1 OpenCL compilation in 2630 ms
2019-10-11 16:07:17 radeon6.1 89048419.owl not found, starting from the beginning.
2019-10-11 16:07:26 radeon6.1 89048419 OK     2000  0.00%; 1817 us/sq; ETA 1d 20:57; 991b5af4f773d55c (check 2.24s)
2019-10-11 16:08:53 radeon6.1 89048419       50000  0.06%; 1822 us/sq; ETA 1d 21:03; 953d0916398fa0ad
2019-10-11 16:10:24 radeon6.1 89048419      100000  0.11%; 1820 us/sq; ETA 1d 20:58; 8499cbc26e58f8d4
2019-10-11 16:11:55 radeon6.1 89048419      150000  0.17%; 1822 us/sq; ETA 1d 21:00; 111910b7676c7555
2019-10-11 16:13:26 radeon6.1 89048419      200000  0.22%; 1823 us/sq; ETA 1d 21:00; 3497e36e9c9e0b4b

My guess is that the 2 PRP tests somehow allocated unaligned memory or the various weights, sin/cos, and FFT allocs were a "bad" distance apart.

My concern is that I might experience similar problems on reboots. Any thoughts preda?
Prime95 is offline   Reply With Quote
Old 2019-10-12, 02:12   #1414
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

What is single-instance iteration time on the same gpu?
kriesel is offline   Reply With Quote
Old 2019-10-12, 03:59   #1415
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011101112 Posts
Default

Quote:
Originally Posted by kriesel View Post
What is single-instance iteration time on the same gpu?
907 us.

Running two instances at 1718 us gives better throughput (but uses more electricity).
Prime95 is offline   Reply With Quote
Old 2019-10-12, 07:07   #1416
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by Prime95 View Post
[...] My guess is that the 2 PRP tests somehow allocated unaligned memory or the various weights, sin/cos, and FFT allocs were a "bad" distance apart.

My concern is that I might experience similar problems on reboots. Any thoughts preda?
The initial buffer setup (done CPU-side) should take much shorter than 10s (on the order of 1s?), so I don't expect the buffer memory allocation to be interleaved.

Some kernels are more memory-heavy and some more compute-heavy. Maybe if they happen by chance to hit a bad phase where the kernels from the two instances are running memory-bound at the same time (or compute-bound at the same time) it would produce lower perfermance. I have no idea though if such a phase pattern is stable. But all this is just guessing -- I don't have much experience with running two instances in parallel.
preda is offline   Reply With Quote
Old 2019-10-20, 05:25   #1417
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Rocm 2.9 warning: Expect a 4+% slowdown if you "upgrade" from rocm 2.5 and are running one instance of gpuowl. I saw times go from ~909 us to ~949us.

The good news is my 2 instance timings dropped from ~1729 us to ~1723 us.
Prime95 is offline   Reply With Quote
Old 2019-10-21, 01:54   #1418
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My guess is that the 2 PRP tests somehow allocated unaligned memory or the various weights, sin/cos, and FFT allocs were a "bad" distance apart.

My concern is that I might experience similar problems on reboots. Any thoughts preda?

Interestingly, the 10% slowdown happens every time 2 tests end and the next two begin. Of 5 GPUs, this is the only one that exhibits a slowdown.

Weird.
Prime95 is offline   Reply With Quote
Old 2019-10-21, 04:34   #1419
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Interestingly, the 10% slowdown happens every time 2 tests end and the next two begin. Of 5 GPUs, this is the only one that exhibits a slowdown.

Weird.
That implies the runs are essentially synchronized. Judging by https://www.mersenneforum.org/showpo...postcount=1413 a minor desynch is sufficient.

I've seen throughput advantages to staggering multiple runs of other applications. (Sometimes requiring considerable desynch; up to an hour for CUDAPm1.)

Presumably you've already looked for possible differences among the gpus (model, BIOS version), supporting system, and workloads.

Last fiddled with by kriesel on 2019-10-21 at 04:46
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 07:16.


Fri Aug 6 07:16:06 UTC 2021 up 14 days, 1:45, 1 user, load averages: 3.36, 2.97, 2.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.