mersenneforum.org gpuOwL: an OpenCL program for Mersenne primality testing
 Register FAQ Search Today's Posts Mark Forums Read

2020-01-10, 14:09   #1750
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×5×23 Posts

Quote:
 Originally Posted by preda Ken, I'm aware of your complaint against those warnings, and I did look into them. IMO those warnings are invalid, a compiler problem. They could be silenced with some effort, but again IMO that effort is not worth expending because the [invalid] warnings are an inconvenience only for the person building the program (Ken) but not for the users.
Better you than me to look at whether the warnings may indicate possible improper program execution that could cause problems. And they will reoccur for anyone else doing a similar build a similar way, perhaps on a commit that I skip.

2020-01-10, 19:06   #1751
preda

"Mihai Preda"
Apr 2015

5A116 Posts

Quote:
 Originally Posted by Prime95 If P-1 finds a factor are both the P-1 and PRP lines deleted?
Related to the "implicit P-1 before PRP" feature, would it be possible to request manual assignments "first-time PRP without P-1 done", for somebody who's willing to do both P-1 and PRP.

Right now for my own testing, I request P-1 and fudge the type to PRP.

2020-01-10, 19:38   #1752
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

23×1,019 Posts

Quote:
 Originally Posted by preda Related to the "implicit P-1 before PRP" feature, would it be possible to request manual assignments "first-time PRP without P-1 done", for somebody who's willing to do both P-1 and PRP. Right now for my own testing, I request P-1 and fudge the type to PRP.
Try requesting a P-1 assignment, then unreserve it, and then request a PRP assignment with a range of one exponent (the exponent that P-1 returned).

2020-01-10, 19:43   #1753
PhilF

"6800 descendent"
Feb 2005

10111000002 Posts

Quote:
 Originally Posted by preda Right now for my own testing, I request P-1 and fudge the type to PRP.
I've been doing just the opposite.

I request a first-time PRP, but I fudge the type to P-1 to report the P-1 results. Seems to work, but that may be only because I am getting CAT3 and CAT4 exponents. None of them seems to have had any P-1 factoring done on them at all.

2020-01-10, 20:13   #1754
PhilF

"6800 descendent"
Feb 2005

2E016 Posts

Quote:
 Originally Posted by preda if I simply drop the PRP assignment from worktodo.txt on P-1 factor found, it would still be assigned on the server even if the factor is reported?
I had an exponent (M103464293) that I had manually checked out as a PRP test. When a P-1 factor was found and the results from gpuowl were submitted, the result was accepted and the PRP test was removed.

2020-01-11, 01:19   #1755
preda

"Mihai Preda"
Apr 2015

11·131 Posts

Quote:
 Originally Posted by PhilF I had an exponent (M103464293) that I had manually checked out as a PRP test. When a P-1 factor was found and the results from gpuowl were submitted, the result was accepted and the PRP test was removed.
Good, this is the behavior we want from the server (i.e. to drop the PRP assignment when a factor-found is submitted by the same user). Do you remember the AID behavior -- did you submit the P-1 result with the AID of the PRP, or with no AID? (AID == Asisgnment ID)

2020-01-11, 01:45   #1756
PhilF

"6800 descendent"
Feb 2005

25·23 Posts

Quote:
 Originally Posted by preda Good, this is the behavior we want from the server (i.e. to drop the PRP assignment when a factor-found is submitted by the same user). Do you remember the AID behavior -- did you submit the P-1 result with the AID of the PRP, or with no AID? (AID == Asisgnment ID)
I'm pretty sure in that case I submitted it with the AID. But I have also submitted a few P-1 no factor results without the AID, which were also accepted and properly credited.

While I have your attention, I have a question. If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?

2020-01-11, 03:05   #1757
preda

"Mihai Preda"
Apr 2015

11×131 Posts

Quote:
 Originally Posted by PhilF If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?
Yes, that's how it's supposed to work. If it doesn't it's a bug.

2020-01-11, 12:08   #1758
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

736010 Posts

Quote:
 Originally Posted by PhilF If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?
Yes. And I've observed it to correctly report B1 and not B2 in that case. (IIRC, at an earlier version still reporting B2 for a stage 1 factor was a bug, subsequently fixed.)
Code:
2019-11-23 07:54:23 414000127 P1  4500000  99.97%; 23332 us/sq; ETA 0d 00:00; 2fe7a97d66c4de7a
2019-11-23 07:54:53 414000127 P1  4501162 100.00%; 24325 us/sq; ETA 0d 00:00; ca35d9148b84d827
2019-11-23 07:54:53 414000127 P2 using blocks [104 - 2494] to cover 3507310 primes
2019-11-23 07:54:55 414000127 P2 using 25 buffers of 192.0 MB each
2019-11-23 08:09:29 414000127 P2   25/2880: setup 2950 ms; 28503 us/prime, 30578 primes
2019-11-23 08:09:29 414000127 P1 GCD: 17000407212943276068260591201
2019-11-23 08:09:30 {"exponent":"414000127", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-9-g9ae3189"}, "timestamp":"2019-11-23 14:09:30 UTC", "user":"kriesel", "computer":"emu/gtx1080", "aid":"0", "fft-length":25165824, "B1":3120000, "factors":["17000407212943276068260591201"]}
2019-11-23 08:09:30 419000017 FFT 24576K: Width 256x4, Height 256x4, Middle 12; 16.65 bits/word
2019-11-23 08:09:31 OpenCL args "-DEXP=419000017u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=12u -DWEIGHT_STEP=0xa.33167595f77ap-3 -DIWEIGHT_STEP=0xc.8cafe8fb59668p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-23 08:09:34

2019-11-23 08:09:34 OpenCL compilation in 3547 ms
2019-11-23 08:09:41 419000017 P1 B1=3140000, B2=75360000; 4529903 bits; starting at 0
2019-11-23 08:13:36 419000017 P1    10000   0.22%; 23475 us/sq; ETA 1d 05:28; 52356fa6dbf7ef75

 2020-01-11, 16:53 #1759 PhilF     "6800 descendent" Feb 2005 Colorado 25×23 Posts I have an interesting observation concerning gpuowl and thermal throttling with the Radeon VII. It is so cold here that I decided today to crank my Radeon VII to my "high power" setting, which pulls 185W and has the fan set at 95%. The junction temperature was hovering around 92 degrees. This gave me a consistent 893us timing on a 102M exponent, but the "check" timing was varying between 0.48s and 0.54s. That was strange, because on my cooler "medium power" setting of 165W and 85 degrees the check timing was always quite consistent. On a whim I decided to crank the fan up to 99%, giving me only an extra 100 RPM , but that dropped the junction temperature 2 degrees, down to 90. Now the check timing has returned to a consistent 0.53s, but the iteration timing is unchanged at 893us. I can only assume that 893us is too short of a sample time to detect the onset of thermal throttling, meaning the "check" timing code can detect it first. I submit that if your check time is varying but the iteration time is consistent, then your card is at the onset of thermal throttling. Thoughts?
 2020-01-11, 18:55 #1760 nomead     "Sam Laur" Dec 2018 Turku, Finland 317 Posts Ok some benchmarks on RTX 2080 again clock locked to 1920 MHz. Testing 94000013 -fft +2, that still seems to be the fastest combination on this card (width 512, height 512, middle 10) On the 2020 Jan 02 version (commits up to f1b00d1) the big difference was between CARRY32 and CARRY64. Oddly enough, on this card, the performance was the opposite of what was expected : CARRY64: 3.618 ms/iter CARRY32: 3.792 ms/iter But even with CARRY64 it was a couple percent faster than when I tested it the last time (2019 Dec 11) so there's definitely improvements here and there. On the freshest 2020 Jan 11 version (commits up to 61f00d9) with the default options, and with either CARRY64 or CARRY32, the program exits with errors: Code: 2020-01-11 20:10:25 GeForce RTX 2080-0 94000013 OK 0 loaded: blockSize 400, 0000000000000003 2020-01-11 20:10:29 GeForce RTX 2080-0 94000013 EE 800 0.00%; 3474 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 2020-01-11 20:10:30 GeForce RTX 2080-0 94000013 OK 0 loaded: blockSize 400, 0000000000000003 2020-01-11 20:10:35 GeForce RTX 2080-0 94000013 EE 800 0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 1 errors 2020-01-11 20:10:36 GeForce RTX 2080-0 94000013 OK 0 loaded: blockSize 400, 0000000000000003 2020-01-11 20:10:40 GeForce RTX 2080-0 94000013 EE 800 0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 2 errors 2020-01-11 20:10:40 GeForce RTX 2080-0 3 sequential errors, will stop. 2020-01-11 20:10:40 GeForce RTX 2080-0 Exiting because "too many errors" 2020-01-11 20:10:40 GeForce RTX 2080-0 Bye This seems to be connected to the MiddleMul1 implementations somehow. I first thought it came from the trig routines, but changing them only reduced the errors, and didn't remove them completely. So now, one at a time, everything still with CARRY64... - trig options, with ORIGINAL_METHOD: ORIG_SLOWTRIG: 3.618 ms (which matches the Jan 02 performance as it should) NEW_SLOWTRIG,MORE_ACCURATE: 3.527 ms NEW_SLOWTRIG,LESS_ACCURATE: 3.513 ms - MiddleMul1 options, with ORIG_SLOWTRIG: ORIGINAL_TWEAKED: 3.618 ms (same as ORIGINAL_METHOD, weird?) FANCY_MIDDLEMUL1: 3.605 ms MORE_SQUARES_MIDDLEMUL1: 3.620 ms CHEBYSHEV_METHOD: 3.563 ms (wow!) CHEBYSHEV_METHOD_FMA: Fails checks, but gives the same 3.563 ms timing, so never mind. So, everything combined: ./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,CHEBYSHEV_METHOD -yield -log 10000 -prp 94000013 -iters 100000 -fft +2 Hmm... 3.463 ms/iter but fails checks after 20k iterations and then gets stuck there and exits after three errors. Try again: ./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,CHEBYSHEV_METHOD -yield -log 10000 -prp 94000013 -iters 100000 -fft +2 Now it's 3.471 ms/iter but it still fails checks after 30k iterations. Crap. Third time, perhaps? ./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,FANCY_MIDDLEMUL1 -yield -log 10000 -prp 94000013 -iters 100000 -fft +2 Now we're back to 3.514 ms... and runs smoothly for 100k iters at least. Let's see if the trig options still help a bit? ./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,FANCY_MIDDLEMUL1 -yield -log 10000 -prp 94000013 -iters 100000 -fft +2 Works, 3.500 ms/iter pretty much spot on. (in 100k iters, lowest 3.496 highest 3.500) So that's still more than 3% off, neat.

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1719 2023-01-16 15:51 xx005fs GpuOwl 0 2019-07-26 21:37 1260 Software 17 2015-08-28 01:35 CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12 Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 05:36.

Sun Jan 29 05:36:00 UTC 2023 up 164 days, 3:04, 0 users, load averages: 0.65, 0.95, 1.07