20200110, 14:09  #1750  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{6}×5×23 Posts 
Quote:


20200110, 19:06  #1751 
"Mihai Preda"
Apr 2015
5A1_{16} Posts 
Related to the "implicit P1 before PRP" feature, would it be possible to request manual assignments "firsttime PRP without P1 done", for somebody who's willing to do both P1 and PRP.
Right now for my own testing, I request P1 and fudge the type to PRP. 
20200110, 19:38  #1752 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{3}×1,019 Posts 
Try requesting a P1 assignment, then unreserve it, and then request a PRP assignment with a range of one exponent (the exponent that P1 returned).

20200110, 19:43  #1753  
"6800 descendent"
Feb 2005
Colorado
1011100000_{2} Posts 
Quote:
I request a firsttime PRP, but I fudge the type to P1 to report the P1 results. Seems to work, but that may be only because I am getting CAT3 and CAT4 exponents. None of them seems to have had any P1 factoring done on them at all. 

20200110, 20:13  #1754 
"6800 descendent"
Feb 2005
Colorado
2E0_{16} Posts 
I had an exponent (M103464293) that I had manually checked out as a PRP test. When a P1 factor was found and the results from gpuowl were submitted, the result was accepted and the PRP test was removed.

20200111, 01:19  #1755 
"Mihai Preda"
Apr 2015
11·131 Posts 
Good, this is the behavior we want from the server (i.e. to drop the PRP assignment when a factorfound is submitted by the same user). Do you remember the AID behavior  did you submit the P1 result with the AID of the PRP, or with no AID? (AID == Asisgnment ID)

20200111, 01:45  #1756  
"6800 descendent"
Feb 2005
Colorado
2^{5}·23 Posts 
Quote:
While I have your attention, I have a question. If a factor is found during the stage 1 GCD, does that stop the alreadyrunning stage 2 so it can move on to the next exponent? 

20200111, 03:05  #1757 
"Mihai Preda"
Apr 2015
11×131 Posts 

20200111, 12:08  #1758  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7360_{10} Posts 
Quote:
Code:
20191123 07:54:23 414000127 P1 4500000 99.97%; 23332 us/sq; ETA 0d 00:00; 2fe7a97d66c4de7a 20191123 07:54:53 414000127 P1 4501162 100.00%; 24325 us/sq; ETA 0d 00:00; ca35d9148b84d827 20191123 07:54:53 414000127 P2 using blocks [104  2494] to cover 3507310 primes 20191123 07:54:55 414000127 P2 using 25 buffers of 192.0 MB each 20191123 08:09:29 414000127 P2 25/2880: setup 2950 ms; 28503 us/prime, 30578 primes 20191123 08:09:29 414000127 P1 GCD: 17000407212943276068260591201 20191123 08:09:30 {"exponent":"414000127", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.119g9ae3189"}, "timestamp":"20191123 14:09:30 UTC", "user":"kriesel", "computer":"emu/gtx1080", "aid":"0", "fftlength":25165824, "B1":3120000, "factors":["17000407212943276068260591201"]} 20191123 08:09:30 419000017 FFT 24576K: Width 256x4, Height 256x4, Middle 12; 16.65 bits/word 20191123 08:09:31 OpenCL args "DEXP=419000017u DWIDTH=1024u DSMALL_HEIGHT=1024u DMIDDLE=12u DWEIGHT_STEP=0xa.33167595f77ap3 DIWEIGHT_STEP=0xc.8cafe8fb59668p4 DWEIGHT_BIGSTEP=0xd.744fccad69d68p3 DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p4 DORIG_X2=1 I. clfastrelaxedmath clstd=CL2.0" 20191123 08:09:34 20191123 08:09:34 OpenCL compilation in 3547 ms 20191123 08:09:41 419000017 P1 B1=3140000, B2=75360000; 4529903 bits; starting at 0 20191123 08:13:36 419000017 P1 10000 0.22%; 23475 us/sq; ETA 1d 05:28; 52356fa6dbf7ef75 

20200111, 16:53  #1759 
"6800 descendent"
Feb 2005
Colorado
2^{5}×23 Posts 
I have an interesting observation concerning gpuowl and thermal throttling with the Radeon VII.
It is so cold here that I decided today to crank my Radeon VII to my "high power" setting, which pulls 185W and has the fan set at 95%. The junction temperature was hovering around 92 degrees. This gave me a consistent 893us timing on a 102M exponent, but the "check" timing was varying between 0.48s and 0.54s. That was strange, because on my cooler "medium power" setting of 165W and 85 degrees the check timing was always quite consistent. On a whim I decided to crank the fan up to 99%, giving me only an extra 100 RPM , but that dropped the junction temperature 2 degrees, down to 90. Now the check timing has returned to a consistent 0.53s, but the iteration timing is unchanged at 893us. I can only assume that 893us is too short of a sample time to detect the onset of thermal throttling, meaning the "check" timing code can detect it first. I submit that if your check time is varying but the iteration time is consistent, then your card is at the onset of thermal throttling. Thoughts? 
20200111, 18:55  #1760 
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts 
Ok some benchmarks on RTX 2080 again clock locked to 1920 MHz. Testing 94000013 fft +2, that still seems to be the fastest combination on this card (width 512, height 512, middle 10)
On the 2020 Jan 02 version (commits up to f1b00d1) the big difference was between CARRY32 and CARRY64. Oddly enough, on this card, the performance was the opposite of what was expected : CARRY64: 3.618 ms/iter CARRY32: 3.792 ms/iter But even with CARRY64 it was a couple percent faster than when I tested it the last time (2019 Dec 11) so there's definitely improvements here and there. On the freshest 2020 Jan 11 version (commits up to 61f00d9) with the default options, and with either CARRY64 or CARRY32, the program exits with errors: Code:
20200111 20:10:25 GeForce RTX 20800 94000013 OK 0 loaded: blockSize 400, 0000000000000003 20200111 20:10:29 GeForce RTX 20800 94000013 EE 800 0.00%; 3474 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 20200111 20:10:30 GeForce RTX 20800 94000013 OK 0 loaded: blockSize 400, 0000000000000003 20200111 20:10:35 GeForce RTX 20800 94000013 EE 800 0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 1 errors 20200111 20:10:36 GeForce RTX 20800 94000013 OK 0 loaded: blockSize 400, 0000000000000003 20200111 20:10:40 GeForce RTX 20800 94000013 EE 800 0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 2 errors 20200111 20:10:40 GeForce RTX 20800 3 sequential errors, will stop. 20200111 20:10:40 GeForce RTX 20800 Exiting because "too many errors" 20200111 20:10:40 GeForce RTX 20800 Bye  trig options, with ORIGINAL_METHOD: ORIG_SLOWTRIG: 3.618 ms (which matches the Jan 02 performance as it should) NEW_SLOWTRIG,MORE_ACCURATE: 3.527 ms NEW_SLOWTRIG,LESS_ACCURATE: 3.513 ms  MiddleMul1 options, with ORIG_SLOWTRIG: ORIGINAL_TWEAKED: 3.618 ms (same as ORIGINAL_METHOD, weird?) FANCY_MIDDLEMUL1: 3.605 ms MORE_SQUARES_MIDDLEMUL1: 3.620 ms CHEBYSHEV_METHOD: 3.563 ms (wow!) CHEBYSHEV_METHOD_FMA: Fails checks, but gives the same 3.563 ms timing, so never mind. So, everything combined: ./gpuowl use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,CHEBYSHEV_METHOD yield log 10000 prp 94000013 iters 100000 fft +2 Hmm... 3.463 ms/iter but fails checks after 20k iterations and then gets stuck there and exits after three errors. Try again: ./gpuowl use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,CHEBYSHEV_METHOD yield log 10000 prp 94000013 iters 100000 fft +2 Now it's 3.471 ms/iter but it still fails checks after 30k iterations. Crap. Third time, perhaps? ./gpuowl use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,FANCY_MIDDLEMUL1 yield log 10000 prp 94000013 iters 100000 fft +2 Now we're back to 3.514 ms... and runs smoothly for 100k iters at least. Let's see if the trig options still help a bit? ./gpuowl use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,FANCY_MIDDLEMUL1 yield log 10000 prp 94000013 iters 100000 fft +2 Works, 3.500 ms/iter pretty much spot on. (in 100k iters, lowest 3.496 highest 3.500) So that's still more than 3% off, neat. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1719  20230116 15:51 
GPUOWL AMD Windows OpenCL issues  xx005fs  GpuOwl  0  20190726 21:37 
Testing an expression for primality  1260  Software  17  20150828 01:35 
Testing Mersenne cofactors for primality?  CRGreathouse  Computer Science & Computational Number Theory  18  20130608 19:12 
Primalitytesting program with multiple types of moduli (PFGWrelated)  Unregistered  Information & Answers  4  20061004 22:38 