mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-10, 14:09   #1750
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default

Quote:
Originally Posted by preda View Post
Ken, I'm aware of your complaint against those warnings, and I did look into them. IMO those warnings are invalid, a compiler problem. They could be silenced with some effort, but again IMO that effort is not worth expending because the [invalid] warnings are an inconvenience only for the person building the program (Ken) but not for the users.
Better you than me to look at whether the warnings may indicate possible improper program execution that could cause problems. And they will reoccur for anyone else doing a similar build a similar way, perhaps on a commit that I skip.
kriesel is offline   Reply With Quote
Old 2020-01-10, 19:06   #1751
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24608 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If P-1 finds a factor are both the P-1 and PRP lines deleted?
Related to the "implicit P-1 before PRP" feature, would it be possible to request manual assignments "first-time PRP without P-1 done", for somebody who's willing to do both P-1 and PRP.

Right now for my own testing, I request P-1 and fudge the type to PRP.
preda is offline   Reply With Quote
Old 2020-01-10, 19:38   #1752
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1BF616 Posts
Default

Quote:
Originally Posted by preda View Post
Related to the "implicit P-1 before PRP" feature, would it be possible to request manual assignments "first-time PRP without P-1 done", for somebody who's willing to do both P-1 and PRP.

Right now for my own testing, I request P-1 and fudge the type to PRP.
Try requesting a P-1 assignment, then unreserve it, and then request a PRP assignment with a range of one exponent (the exponent that P-1 returned).
Prime95 is offline   Reply With Quote
Old 2020-01-10, 19:43   #1753
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32·61 Posts
Default

Quote:
Originally Posted by preda View Post
Right now for my own testing, I request P-1 and fudge the type to PRP.
I've been doing just the opposite.

I request a first-time PRP, but I fudge the type to P-1 to report the P-1 results. Seems to work, but that may be only because I am getting CAT3 and CAT4 exponents. None of them seems to have had any P-1 factoring done on them at all.
PhilF is online now   Reply With Quote
Old 2020-01-10, 20:13   #1754
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32·61 Posts
Default

Quote:
Originally Posted by preda View Post
if I simply drop the PRP assignment from worktodo.txt on P-1 factor found, it would still be assigned on the server even if the factor is reported?
I had an exponent (M103464293) that I had manually checked out as a PRP test. When a P-1 factor was found and the results from gpuowl were submitted, the result was accepted and the PRP test was removed.
PhilF is online now   Reply With Quote
Old 2020-01-11, 01:19   #1755
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101001100002 Posts
Default

Quote:
Originally Posted by PhilF View Post
I had an exponent (M103464293) that I had manually checked out as a PRP test. When a P-1 factor was found and the results from gpuowl were submitted, the result was accepted and the PRP test was removed.
Good, this is the behavior we want from the server (i.e. to drop the PRP assignment when a factor-found is submitted by the same user). Do you remember the AID behavior -- did you submit the P-1 result with the AID of the PRP, or with no AID? (AID == Asisgnment ID)
preda is offline   Reply With Quote
Old 2020-01-11, 01:45   #1756
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

54910 Posts
Default

Quote:
Originally Posted by preda View Post
Good, this is the behavior we want from the server (i.e. to drop the PRP assignment when a factor-found is submitted by the same user). Do you remember the AID behavior -- did you submit the P-1 result with the AID of the PRP, or with no AID? (AID == Asisgnment ID)
I'm pretty sure in that case I submitted it with the AID. But I have also submitted a few P-1 no factor results without the AID, which were also accepted and properly credited.

While I have your attention, I have a question. If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?
PhilF is online now   Reply With Quote
Old 2020-01-11, 03:05   #1757
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default

Quote:
Originally Posted by PhilF View Post
If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?
Yes, that's how it's supposed to work. If it doesn't it's a bug.
preda is offline   Reply With Quote
Old 2020-01-11, 12:08   #1758
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default

Quote:
Originally Posted by PhilF View Post
If a factor is found during the stage 1 GCD, does that stop the already-running stage 2 so it can move on to the next exponent?
Yes. And I've observed it to correctly report B1 and not B2 in that case. (IIRC, at an earlier version still reporting B2 for a stage 1 factor was a bug, subsequently fixed.)
Code:
2019-11-23 07:54:23 414000127 P1  4500000  99.97%; 23332 us/sq; ETA 0d 00:00; 2fe7a97d66c4de7a
2019-11-23 07:54:53 414000127 P1  4501162 100.00%; 24325 us/sq; ETA 0d 00:00; ca35d9148b84d827
2019-11-23 07:54:53 414000127 P2 using blocks [104 - 2494] to cover 3507310 primes
2019-11-23 07:54:55 414000127 P2 using 25 buffers of 192.0 MB each
2019-11-23 08:09:29 414000127 P2   25/2880: setup 2950 ms; 28503 us/prime, 30578 primes
2019-11-23 08:09:29 414000127 P1 GCD: 17000407212943276068260591201
2019-11-23 08:09:30 {"exponent":"414000127", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-9-g9ae3189"}, "timestamp":"2019-11-23 14:09:30 UTC", "user":"kriesel", "computer":"emu/gtx1080", "aid":"0", "fft-length":25165824, "B1":3120000, "factors":["17000407212943276068260591201"]}
2019-11-23 08:09:30 419000017 FFT 24576K: Width 256x4, Height 256x4, Middle 12; 16.65 bits/word
2019-11-23 08:09:31 OpenCL args "-DEXP=419000017u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=12u -DWEIGHT_STEP=0xa.33167595f77ap-3 -DIWEIGHT_STEP=0xc.8cafe8fb59668p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-23 08:09:34 

2019-11-23 08:09:34 OpenCL compilation in 3547 ms
2019-11-23 08:09:41 419000017 P1 B1=3140000, B2=75360000; 4529903 bits; starting at 0
2019-11-23 08:13:36 419000017 P1    10000   0.22%; 23475 us/sq; ETA 1d 05:28; 52356fa6dbf7ef75
kriesel is offline   Reply With Quote
Old 2020-01-11, 16:53   #1759
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32·61 Posts
Default

I have an interesting observation concerning gpuowl and thermal throttling with the Radeon VII.

It is so cold here that I decided today to crank my Radeon VII to my "high power" setting, which pulls 185W and has the fan set at 95%. The junction temperature was hovering around 92 degrees.

This gave me a consistent 893us timing on a 102M exponent, but the "check" timing was varying between 0.48s and 0.54s. That was strange, because on my cooler "medium power" setting of 165W and 85 degrees the check timing was always quite consistent. On a whim I decided to crank the fan up to 99%, giving me only an extra 100 RPM , but that dropped the junction temperature 2 degrees, down to 90.

Now the check timing has returned to a consistent 0.53s, but the iteration timing is unchanged at 893us.

I can only assume that 893us is too short of a sample time to detect the onset of thermal throttling, meaning the "check" timing code can detect it first. I submit that if your check time is varying but the iteration time is consistent, then your card is at the onset of thermal throttling.

Thoughts?
PhilF is online now   Reply With Quote
Old 2020-01-11, 18:55   #1760
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

23×41 Posts
Default

Ok some benchmarks on RTX 2080 again clock locked to 1920 MHz. Testing 94000013 -fft +2, that still seems to be the fastest combination on this card (width 512, height 512, middle 10)

On the 2020 Jan 02 version (commits up to f1b00d1) the big difference was between CARRY32 and CARRY64. Oddly enough, on this card, the performance was the opposite of what was expected :
CARRY64: 3.618 ms/iter
CARRY32: 3.792 ms/iter
But even with CARRY64 it was a couple percent faster than when I tested it the last time (2019 Dec 11) so there's definitely improvements here and there.

On the freshest 2020 Jan 11 version (commits up to 61f00d9) with the default options, and with either CARRY64 or CARRY32, the program exits with errors:
Code:
2020-01-11 20:10:25 GeForce RTX 2080-0 94000013 OK        0 loaded: blockSize 400, 0000000000000003
2020-01-11 20:10:29 GeForce RTX 2080-0 94000013 EE      800   0.00%; 3474 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s)
2020-01-11 20:10:30 GeForce RTX 2080-0 94000013 OK        0 loaded: blockSize 400, 0000000000000003
2020-01-11 20:10:35 GeForce RTX 2080-0 94000013 EE      800   0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 1 errors
2020-01-11 20:10:36 GeForce RTX 2080-0 94000013 OK        0 loaded: blockSize 400, 0000000000000003
2020-01-11 20:10:40 GeForce RTX 2080-0 94000013 EE      800   0.00%; 3473 us/it; ETA 3d 18:42; 599534af704d7e17 (check 1.43s) 2 errors
2020-01-11 20:10:40 GeForce RTX 2080-0 3 sequential errors, will stop.
2020-01-11 20:10:40 GeForce RTX 2080-0 Exiting because "too many errors"
2020-01-11 20:10:40 GeForce RTX 2080-0 Bye
This seems to be connected to the MiddleMul1 implementations somehow. I first thought it came from the trig routines, but changing them only reduced the errors, and didn't remove them completely. So now, one at a time, everything still with CARRY64...

- trig options, with ORIGINAL_METHOD:
ORIG_SLOWTRIG: 3.618 ms (which matches the Jan 02 performance as it should)
NEW_SLOWTRIG,MORE_ACCURATE: 3.527 ms
NEW_SLOWTRIG,LESS_ACCURATE: 3.513 ms

- MiddleMul1 options, with ORIG_SLOWTRIG:
ORIGINAL_TWEAKED: 3.618 ms (same as ORIGINAL_METHOD, weird?)
FANCY_MIDDLEMUL1: 3.605 ms
MORE_SQUARES_MIDDLEMUL1: 3.620 ms
CHEBYSHEV_METHOD: 3.563 ms (wow!)
CHEBYSHEV_METHOD_FMA: Fails checks, but gives the same 3.563 ms timing, so never mind.

So, everything combined:
./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,CHEBYSHEV_METHOD -yield -log 10000 -prp 94000013 -iters 100000 -fft +2
Hmm... 3.463 ms/iter but fails checks after 20k iterations and then gets stuck there and exits after three errors.

Try again:
./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,CHEBYSHEV_METHOD -yield -log 10000 -prp 94000013 -iters 100000 -fft +2
Now it's 3.471 ms/iter but it still fails checks after 30k iterations. Crap.

Third time, perhaps?
./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,MORE_ACCURATE,FANCY_MIDDLEMUL1 -yield -log 10000 -prp 94000013 -iters 100000 -fft +2
Now we're back to 3.514 ms... and runs smoothly for 100k iters at least. Let's see if the trig options still help a bit?

./gpuowl -use NO_ASM,CARRY64,NEW_SLOWTRIG,LESS_ACCURATE,FANCY_MIDDLEMUL1 -yield -log 10000 -prp 94000013 -iters 100000 -fft +2
Works, 3.500 ms/iter pretty much spot on. (in 100k iters, lowest 3.496 highest 3.500)

So that's still more than 3% off, neat.
nomead is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:42.

Sat Nov 28 02:42:31 UTC 2020 up 78 days, 23:53, 3 users, load averages: 0.84, 1.06, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.