mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2021-12-26, 19:55   #2751
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

32·1,303 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I suspect if you run the same exponent on an AMD GPU you would have the same problem. But maybe not as gpuowl chooses different default optimizations for the different architectures.

IIRC, Mihai was pretty aggressive in choosing FFT size. Something like willing to tolerate a 0.5% chance or a bad run. I think I lobbied for at most 0.1%. I don't remember what finally happened. No matter what target we choose, this scenario was going to be a possibility -- even a likelihood.
Still, there's no good reason the run should just abort, since the context - exponent close to an FFT-length breakover point - points to fatal ROE (not detected by gpuOwl except in this backdoor way) as the likely cause of the repeated GEC errors. The easy thing to do is for the program to auto-switch to the next-larger FFT length. Or rerun an iteration interval bracketing the error with ROE checking enabled. Or restart from the last savefile and fiddle the residue shift - one more reason supporting shift is useful - since different shift at same iteration yields different error behavior.
ewmayer is online now   Reply With Quote
Old 2022-01-03, 14:57   #2752
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

3,313 Posts
Default

Quote:
Originally Posted by ATH View Post
I can find 19 "PRP Bad" which is a lot more than I thought, but it is only 4 different users:
A new PRP DC mismatch: 85090169
ATH is online now   Reply With Quote
Old 2022-02-08, 16:21   #2753
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×3×1,409 Posts
Default

3060 Ti
Code:
20220208 09:22:45 GpuOwl VERSION v7.2-86-gddf3314
20220208 09:22:45 config: -maxAlloc 4096M
20220208 09:22:45 config: -yield
20220208 09:22:45 config: -iters 1000000 -prp 77936867 
20220208 09:22:45 device 0, unique id ''
20220208 09:22:45 NVIDIA GeForce RTX 3060 Ti-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20220208 09:22:46 NVIDIA GeForce RTX 3060 Ti-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20220208 09:22:47 NVIDIA GeForce RTX 3060 Ti-0 77936867 OpenCL compilation in 1.17 s
20220208 09:22:47 NVIDIA GeForce RTX 3060 Ti-0 77936867 maxAlloc: 4.0 GB
20220208 09:22:47 NVIDIA GeForce RTX 3060 Ti-0 77936867 P1(0) 0 bits
20220208 09:22:47 NVIDIA GeForce RTX 3060 Ti-0 77936867 PRP starting from beginning
20220208 09:22:49 NVIDIA GeForce RTX 3060 Ti-0 77936867 OK         0 on-load: blockSize 400, 0000000000000003
20220208 09:22:49 NVIDIA GeForce RTX 3060 Ti-0 77936867 validating proof residues for power 8
20220208 09:22:49 NVIDIA GeForce RTX 3060 Ti-0 77936867 Proof using power 8
20220208 09:22:53 NVIDIA GeForce RTX 3060 Ti-0 77936867 OK       800   0.00% 1579c241dc63eca6 3387 us/it + check 1.39s + save 0.09s; ETA 3d 01:20
20220208 09:23:24 NVIDIA GeForce RTX 3060 Ti-0 77936867     10000 fc4f135f7cf4ad29 3400
20220208 09:23:58 NVIDIA GeForce RTX 3060 Ti-0 77936867     20000 3cd1bd9d5e09cbc5 3370
20220208 09:24:32 NVIDIA GeForce RTX 3060 Ti-0 77936867     30000 c4e0ff35e3290d98 3368
20220208 09:25:05 NVIDIA GeForce RTX 3060 Ti-0 77936867     40000 dffe1b1b0d748128 3386
20220208 09:25:39 NVIDIA GeForce RTX 3060 Ti-0 77936867     50000 52e286945371ed29 3389
20220208 09:26:13 NVIDIA GeForce RTX 3060 Ti-0 77936867     60000 0945da4dc08bdd95 3389
20220208 09:26:47 NVIDIA GeForce RTX 3060 Ti-0 77936867     70000 7131fa4eb77f4bb2 3389
20220208 09:27:21 NVIDIA GeForce RTX 3060 Ti-0 77936867     80000 8d76071d27ee4221 3389
20220208 09:27:55 NVIDIA GeForce RTX 3060 Ti-0 77936867     90000 0bacff453b2f470e 3389
20220208 09:28:29 NVIDIA GeForce RTX 3060 Ti-0 77936867    100000 6d7296b9e2830f50 3389
20220208 09:29:03 NVIDIA GeForce RTX 3060 Ti-0 77936867    110000 8cbfd4435622bda7 3389
20220208 09:29:37 NVIDIA GeForce RTX 3060 Ti-0 77936867    120000 79ae5dad855057ad 3389
20220208 09:30:10 NVIDIA GeForce RTX 3060 Ti-0 77936867    130000 50c97bcbf876231f 3389
20220208 09:30:44 NVIDIA GeForce RTX 3060 Ti-0 77936867    140000 e1db15f897271496 3389
20220208 09:31:18 NVIDIA GeForce RTX 3060 Ti-0 77936867    150000 127631386c6a9b17 3389
20220208 09:31:52 NVIDIA GeForce RTX 3060 Ti-0 77936867    160000 25b7b6206fc6f085 3389
20220208 09:32:26 NVIDIA GeForce RTX 3060 Ti-0 77936867    170000 416816b0d9f4bba8 3389
20220208 09:33:00 NVIDIA GeForce RTX 3060 Ti-0 77936867    180000 6bee5d054f770861 3391
20220208 09:33:34 NVIDIA GeForce RTX 3060 Ti-0 77936867    190000 f37f068f014b18a0 3387
20220208 09:34:09 NVIDIA GeForce RTX 3060 Ti-0 77936867 OK    200000   0.26% f0b04b45b0855bd2 3395 us/it + check 1.39s + save 0.10s; ETA 3d 01:18
Attached Thumbnails
Click image for larger version

Name:	a.gif
Views:	58
Size:	24.3 KB
ID:	26512   Click image for larger version

Name:	b.gif
Views:	54
Size:	17.8 KB
ID:	26513  
Xyzzy is offline   Reply With Quote
Old 2022-03-21, 04:11   #2754
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2·17·293 Posts
Default

Today we got an error when restarting the 332M PRP test with gpuOwl on two of our Colab instances. They were using the same remote storage, which somehow got either disconnected, or temporary full (100GB, but used for other things too) and the last proof checkpoint file was not saved, in both cases.

Click image for larger version

Name:	332252653.JPG
Views:	31
Size:	80.7 KB
ID:	26667 Click image for larger version

Name:	332252719.JPG
Views:	25
Size:	82.3 KB
ID:	26668
Now, as seen in the photos, one is 38%, the other is 40%, into the rabbit hole, so we can either (option A) restart from scratch and lose another 5 days (probably more, because in the beginning we got better cards, which we will not get again, as the job gets older and uses the resources constant, the hardware we get will "deprecate" in time), but play nice and generate proof files, or either (option B) play selfish and finish our tests, which will take another 7-8 days, get our credit, and let the other waste their cycles to do a double-check.

This time, over the weekend, we didn't run that script to periodically rename the ".old" files to keep them, the colab instance used its time and got shut down, and we were not in front of the terminal for two days, so we can not resume from a former stage. We wasted some time trying to recover older ".old" files or other checkpoints, but as the space is shared, the chances were quite small, and the success rate was zero. We have JUST the FORMER proof checkpoints, saved on 19th (the "crash" was yesterday, on 20th of March, but we resumed today in the morning), i.e. for the instance which wants 119403396 we have 118105533, and for that which wants 124594752 we have 123296890.

Double you tea eff man?

Just few hours behind. If we would have the former ".old" file saved, (i.e. the one before the actual .old), then everything would have been ok.

Which extremely freaks us off. Better don't cross our path today!

So, we use the "opportunity" to advocate again for an option in gpuOwl, to keep all the checkpoint residue files, the same way cudaLucas does! And, if possible, with proper names (that include the residue in the file name - easy to compare from the OS without opening the files effectively, etc. - same arguments we already exposed here around in the past - in a situation like this, it would save a lot of time for the project).

We will chose option B: play selfish this time, and let the tests finish. Sorry for whoever may have to DC them in the future.
(read as "fcuk whoever will do the DC")

Last fiddled with by LaurV on 2022-03-21 at 04:17 Reason: spaces
LaurV is offline   Reply With Quote
Old 2022-03-21, 05:19   #2755
Zhangrc
 
"University student"
May 2021
Beijing, China

2·53 Posts
Default

Quote:
Originally Posted by LaurV View Post
Today we got an error when restarting the 332M PRP test with gpuOwl on two of our Colab instances.
Try chmod 777 and deleting memlock-0, if not working, try copying the files to your computer and run on your own GPU. If nothing works, wait for an update of GPUOwl in the hope of addressing the problem.
Zhangrc is offline   Reply With Quote
Old 2022-03-27, 05:59   #2756
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·691 Posts
Default

Quote:
Originally Posted by LaurV View Post
So, we use the "opportunity" to advocate again for an option in gpuOwl, to keep all the checkpoint residue files, the same way cudaLucas does! And, if possible, with proper names (that include the residue in the file name - easy to compare from the OS without opening the files effectively, etc. - same arguments we already exposed here around in the past - in a situation like this, it would save a lot of time for the project).
Laur, the version I'm using (v7.2-86) does keep many checkpoints to restart from, and whatsmore the number is configurable (so it's possible to keep much more if desired), and they also have the iteration number in filenames. E.g. this is one of my exponents:
Code:
13968008 Mar 26 00:50 111743537-032000000.prp
13968008 Mar 26 04:09 111743537-040000000.prp
13968007 Mar 26 08:19 111743537-050000000.prp
13968008 Mar 26 08:49 111743537-051200000.prp
13968008 Mar 26 12:29 111743537-060000000.prp
13968006 Mar 26 14:09 111743537-064000000.prp
13968008 Mar 26 18:43 111743537-075000000.prp
13968008 Mar 26 19:28 111743537-076800000.prp
13968008 Mar 26 20:48 111743537-080000000.prp
13968007 Mar 27 00:08 111743537-088000000.prp
13968008 Mar 27 00:48 111743537-089600000.prp
13968007 Mar 27 00:58 111743537-090000000.prp
13968007 Mar 27 04:28 111743537-096000000.prp
13968009 Mar 27 06:08 111743537-100000000.prp
13968009 Mar 27 07:08 111743537-102400000.prp
13968008 Mar 27 07:47 111743537-104000000.prp
13968009 Mar 27 08:12 111743537-105000000.prp
13968009 Mar 27 08:28 111743537-105600000.prp
13968009 Mar 27 08:38 111743537-106000000.prp
13968009 Mar 27 08:53 111743537-106600000.prp
13967983 Mar 25 10:00 111743537-3000000.p1final
            55 Mar 25 13:03 111743537-3000000.p2
        4096 Mar 27 08:50 proof
PS: I see, you want the res64 not the iteration in the filenames.


What about this:
Code:
for x in *.prp ; do echo $x `head -1 $x | cut -d" " -f7` ; done

111743537-032000000.prp affd93bd13cfb071
111743537-040000000.prp 3d05381d87207c4b
111743537-050000000.prp 0140f43472f39ce1
111743537-051200000.prp 0dc6e42d148c8275
111743537-060000000.prp 6007eb9aece3c817
111743537-064000000.prp e62ca752af63566f
111743537-075000000.prp 3d4bd49596228310
111743537-076800000.prp 1b400ecb2ef20e6b
111743537-080000000.prp e0b6bb9244a2965a
111743537-088000000.prp 4d1d31fda051a481
111743537-089600000.prp 56452ee0d3ae12ca
111743537-090000000.prp 602810e957581fe9
111743537-096000000.prp a68afb3f1a879db0
111743537-100000000.prp 2b02a9ff23545edd
111743537-102400000.prp 4d2c1d36435613dc
111743537-104000000.prp 50ee35d43dff387f
111743537-105000000.prp 249c27886315981b
111743537-105600000.prp 1fc9a0880409bb10
111743537-106000000.prp c6dbae6d5bb96124
111743537-107200000.prp 6fd6e637124904c0

Last fiddled with by preda on 2022-03-27 at 06:12
preda is offline   Reply With Quote
Old 2022-03-27, 06:16   #2757
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·691 Posts
Default

Quote:
Originally Posted by LaurV View Post
Today we got an error when restarting the 332M PRP test with gpuOwl on two of our Colab instances. They were using the same remote storage, which somehow got either disconnected, or temporary full (100GB, but used for other things too) and the last proof checkpoint file was not saved, in both cases.
But maybe something should be done to fix the root of the problem, which is IMO that the proof file was not saved, but the checkpoint was advanced.

Edit: I see now what the code does. If it can't dump a proof residue to disk, it keeps it around in RAM to write it later (in the hope that the user frees up disk in the meantime -- that was the motivation for doing it that way).

OTOH seems a fragile solution for un-attended runs. Would it be better to just stop if the proof residue can't be written? or what's a better solution?

Last fiddled with by preda on 2022-03-27 at 06:28
preda is offline   Reply With Quote
Old 2022-03-27, 10:14   #2758
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

233528 Posts
Default

Quote:
Originally Posted by preda View Post
What about this:
And do what with it? We already discussed in the past why that's not useful for what I want. Of course, there are (many) ways around, like writing scripts or just modifying the line you wrote to change the file names instead of just "echo". But ways around are just that, ways around...

(sorry for being a pain in the butt )

Care to share a linux version of the version you use? (so I can try to "colab" it, hopefully). Few older checkpoints there should be gold (compared with the 6.x version I use now in colab, compiled years ago by kriesel - I was still using that as it does stand-alone P-1, but since George did the magic with P95 stage 2, I think is time to update the colab to new gpuOwl 7, or whatever the new version is).
.

Last fiddled with by LaurV on 2022-03-27 at 10:20
LaurV is offline   Reply With Quote
Old 2022-04-12, 00:41   #2759
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×1,619 Posts
Default Gpuowl PRP proof generation and validation related times

The GPU appears underutilized during PRP proof generation at the end of a gpuowl PRP run; ~37% average indicated in GPU-Z on Windows 10.
There's significant GPU memory occupancy during PRP proof generation; ~6 GB of 16 GiB on a Radeon VII for M998999999, V6.11-380.
PRP proof generation is indicated as single-threaded on the CPU (2.1% ~ 1/48 of a 24-core HT system in Task Manager)
PRP proof generation to power 9 can take several minutes, up to nearly an hour depending on exponent, power and CPU core (~26.5 minutes on e5-2697V2 Xeon core, power 9, M998999999; extrapolate to ~50 minutes for power 10)
Next assignment is not started on the GPU while this is occurring. (Verified by examining log entries for a worktodo file that had multiple PRP worktodo lines.)
At exponents at or below the first test wavefront, the GPU-underutilized time during proof file generation is still appreciable. (At power 9, 82,589,687 PRP DC with proof, ~4 minutes. At 100-37% that is ~0.25% of the 16 hour 54 minute total run time lost. That is ~22 hours per year.)

Something similar, although individually less lengthy, occurs during each resumption of a PRP run in progress and validation of its saved residues for later generation of a proof. (~4.4 minutes on 940M, power 9, 36.5% complete, per resumption, on V7.2-53-ge27846f.) I think the run time increases nearing completion since there would be more residue files to validate. Extrapolating linearly to near 100% complete, ~12 minutes each resumption.
Since runs of 3xxM take 2 weeks, while 9xxM take months, even on Radeon VII, systems and power are not 100% reliable, and shorter work may sometimes be inserted ahead of such a long run (~3078. hours for 940M), multiple resumptions are likely for large exponents. These validations at resumptions may average ~ 6 minutes for large exponents and could be done in parallel with the last bit of a preceding exponent. Possible savings ~ 6 minutes / resume * 10 resumes / 3078 hours * 1 hour /60 minutes ~ 0.03%. (Figures based on the 940M run.) Not very compelling.

These constitute small percentages of total run time, but do present some possibilities for small reductions of total run time per exponent.
Running multiple instances could reduce their impact on total throughput, at no cost of programming effort, but some cost of end user management effort.
Impact of other tuning, such as increasing block size from default, is more significant than the possible resumption validation parallelism.
kriesel is online now   Reply With Quote
Old 2022-05-12, 11:11   #2760
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101011001102 Posts
Default

Quote:
Originally Posted by kriesel View Post
These validations at resumptions may average ~ 6 minutes for large exponents and could be done in parallel with the last bit of a preceding exponent.
Ken, I don't understand what you propose there -- to do the validation in parallel with the last bit of a preceding exponent, upon resume?
preda is offline   Reply With Quote
Old 2022-05-15, 00:00   #2761
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×1,619 Posts
Default

Yes. I think at least two cases feasible, out of five I've identified, to save a little time. Savings would be most significant when running a single gpuowl instance per GPU. Running a single instance is sometimes the most productive, despite these effects.

1) proof generation at the end of a PRP run, using a single core of CPU heavily and the GPU lightly, is somewhat analogous to a P-1 GCD (which used to pause activity on the GPU while doing the GCD on the CPU, but now occurs in a separate thread while execution continues on the GPU speculatively, betting on the no-factor-found case being most probable). If available GPU and system memory allow, the next worktodo item could be started while waiting for the proof generation to complete, increasing GPU utilization. PRP or P-1 s1 seem feasible as short term "keep the GPU busy" work during proof generation without too much demand for GPU ram. (Running dual instances would also address GPU utilization, but may require using a lower --maxalloc setting & so reduce efficiency of P-1.)
Code:
2022-05-14 14:52:26 asr2-radeonvii3 109653007 OK 109656000 100.00%;  926 us/it; ETA 0d 00:00; 3904ac10e553____ (check 1.98s)
2022-05-14 14:52:26 asr2-radeonvii3 proof: building level 1, hash 61255733492c____
2022-05-14 14:52:27 asr2-radeonvii3 proof: building level 2, hash 97e28a93d584____
2022-05-14 14:52:28 asr2-radeonvii3 proof: building level 3, hash 30eee9924f90____
2022-05-14 14:52:30 asr2-radeonvii3 proof: building level 4, hash c8039a19a760____
2022-05-14 14:52:34 asr2-radeonvii3 proof: building level 5, hash 53f89bd8ea7e____
2022-05-14 14:52:42 asr2-radeonvii3 proof: building level 6, hash 3a9a483df815____
2022-05-14 14:52:56 asr2-radeonvii3 proof: building level 7, hash 391688b5bc95____
2022-05-14 14:53:24 asr2-radeonvii3 proof: building level 8, hash 2b81d622f78a____
2022-05-14 14:54:21 asr2-radeonvii3 proof: building level 9, hash 713417ef599b____
2022-05-14 14:56:18 asr2-radeonvii3 proof: building level 10, hash 35e43a2cadb3____
2022-05-14 15:00:28 asr2-radeonvii3 proof: building level 11, hash 30a5cd616a87____
2022-05-14 15:08:52 asr2-radeonvii3 PRP-Proof 'proof\109653007-11.proof' generated
2022-05-14 15:08:52 asr2-radeonvii3 Proof: cleaning up temporary storage
2022-05-14 15:08:56 asr2-radeonvii3 {"status":"C", "exponent":"109653007", "worktype":"PRP-3", "res64":"(redacted)", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"6291456", "proof":{"version":"1", "power":"11", "hashsize":"64", "md5":"28120e1f645c5fbd3bc51663e6c5____"}, "program":{"name":"gpuowl", "version":"v6.11-382-g98ff9c7-dirty"}, "user":"kriesel", "computer":"asr2-radeonvii3", "aid":"redacted", "timestamp":"2022-05-14 20:08:56 UTC"}
2022-05-14 15:08:56 asr2-radeonvii3 405001661 FFT: 22M 1K:11:1K (17.56 bpw)
2022-05-14 15:08:56 asr2-radeonvii3 Expected maximum carry32: 703B0000
2022-05-14 15:08:59 asr2-radeonvii3 OpenCL args "-DEXP=405001661u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xb.857604c77ddfp-5 -DIWEIGHT_STEP_MINUS_1=-0x8.78a7aaf626898p-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-05-14 15:09:04 asr2-radeonvii3 OpenCL compilation in 4.57 s
2022-05-14 15:10:01 asr2-radeonvii3 405001661 OK  9310000 loaded: blockSize 10000, 889ab9a1802fcfb0
2022-05-14 15:10:01 asr2-radeonvii3 validating proof residues for power 11
2022-05-14 15:10:01 asr2-radeonvii3 Can't open '.\405001661\proof\197755' (mode 'rb')
2022-05-14 15:10:01 asr2-radeonvii3 validating proof residues for power 12
2022-05-14 15:10:01 asr2-radeonvii3 Can't open '.\405001661\proof\98878' (mode 'rb')
2022-05-14 15:10:01 asr2-radeonvii3 validating proof residues for power 11
2022-05-14 15:10:01 asr2-radeonvii3 Can't open '.\405001661\proof\197755' (mode 'rb')
2022-05-14 15:10:01 asr2-radeonvii3 validating proof residues for power 10
2022-05-14 15:10:23 asr2-radeonvii3 Proof using power 10 (vs 11) for 405001661
2022-05-14 15:12:00 asr2-radeonvii3 405001661 OK  9320000   2.30%; 4812 us/it; ETA 22d 00:57; f5d52a30056294db (check 49.38s)
2022-05-14 15:13:38 asr2-radeonvii3 405001661 OK  9330000   2.30%; 4827 us/it; ETA 22d 02:35; 4b826ec2974e3d23 (check 49.57s)
The first two attached GPU-Z screen captures were saved at 15:04 and 15:13 during the run above, corresponding to the bold log portions. From ~15:00-15:12, GPU loading looked like the first; occasional brief spikes with a low average loading.

2) PRP proof residues validation at resumption of a PRP/proof previously paused is CPU and disk intensive but leaves the GPU idle. The case arises if other work is inserted in the worktodo file before a task already partially completed, such as inserting as first worktodo line, PRP on exponent b (109M in the attached pix & preceding example) before completing an already begun PRP on exponent a (405M in the attached pix & preceding example), stop and restart gpuowl. When b finishes, then a is resumed, and the GPU goes largely idle while a's residues get validated. Sometimes later assignments may get prioritized by the user, and earlier interrupted, leading to this case. If gpuowl could do validation on the next worktodo item (a) in parallel with the last minutes of the currently completing PRP assignment (b), or in parallel on another CPU core during b's proof generation, doing a bit of lookahead would increase GPU utilization overall. If the lookahead was early enough, the GPU could stay fully loaded during proof generation, and residue validation of a complete by the time proof generation of b does. Timelines in parallel:
Code:
first worktodo line   . . . PRP . . . ( GPU-intensive ) . . . PRP . . . last iters | proof generation & cleanup (low GPU use)
second worktodo line lookahead, estimate required validation time, start validation | start or resume PRP (GPU intensive)
Alignment in time of transition to/from high GPU load for the two worktodo items doesn't need to be exact, to save considerable time. There's an ~18 minute span of low loading in the example log above that could be closed somewhat. Shortening it by 15 minutes is a win. A little overlap could be ok too.

3) At program launch, resuming a PRP/proof previously paused as the first worktodo item, is CPU and disk intensive but leaves the GPU idle during save file load and proof residues validation. Unless the second worktodo item has no such startup delay and could be worked on temporarily while the first gets loaded & validated, I can't think of a way to avoid that GPU-idle time. (P-1 stage 1 on second item would probably be productive.)

4) There might also be other shorter time periods that could be capitalized on too. I see dropouts of GPU load, wattage, GPU clock, etc periodically in monitoring a GPU running PRP. These can be seen and logged with gpu monitoring utilities such as GPU-Z.
What happens to GPU utilization during GEC & saving an interim residue file & console output?
(These are very short in human time terms, of order ~0.6 second each observed at 405M PRP on a Radeon VII, but are the duration of hundreds of iterations each. Is the overhead of loading an alternate task and saving its brief progress low enough to get a net throughput gain here too? Would the secondary task need to be already in GPU ram at the ready? Or does it make more sense to copy the current task's state to a memory buffer, & continue the current task in parallel assuming the GEC will pass and the file save will succeed?)
These load dropouts occur frequently during an undisturbed ongoing PRP run. The third attachment shows some brief GPU load dropouts during PRP.
There's a quick 20C dip in the GPU hot spot temp when these brief idles occur. Avoiding these dips might be better for GPU longevity. I managed to catch a few with fast GPU-Z logging. (See attached zip file, 10:45:02, 10:45:51, 10:47:28 in log covering 10:43:31 to 10:48:13; only the 10:45:51 coincides with a gpuowl log line of output; 3 x ~0.6 sec is ~0.64% of a 282 second logged period.)
This was with gpuowl v6.11-382-dirty, -maxAlloc 15000 -use NO_ASM -block 2000 -log 20000 -proof 11 -yield
Code:
2022-05-12 10:43:24 asr2-radeonvii3 405001661 OK  8880000   2.19%; 4861 us/it; ETA 22d 06:55; a5c8406225b54ead (check 50.01s)
2022-05-12 10:45:51 asr2-radeonvii3 405001661 OK  8900000   2.20%; 4847 us/it; ETA 22d 05:16; 8fd79621a3ad4101 (check 49.36s)
2022-05-12 10:48:16 asr2-radeonvii3 405001661 OK  8920000   2.20%; 4818 us/it; ETA 22d 02:03; 4da84b7ac2877d80 (check 49.35s)
The GPU-Z third screen capture attached corresponds to this log portion, with a GPU load dropout occurring at the bold log line as shown in the GPU-Z log file in the zip file.

5) For large proof powers, cleaning up temporaries can run for several seconds. I think the log flow indicates the GPU would be idle during this time. Following was a test using excessive proof power; more efficient would be power 9 for a DC I think.
2022-04-15 22:41:45 asr2-radeonvii3 63376373 OK 63378000 100.00%; 520 us/it; ETA 0d 00:00; 9685d55a113605dd (check 0.61s)
2022-04-15 22:41:46 asr2-radeonvii3 proof: building level 1, hash 5c52c913291bcc77
2022-04-15 22:41:46 asr2-radeonvii3 proof: building level 2, hash 63994d163f83b9d5
2022-04-15 22:41:47 asr2-radeonvii3 proof: building level 3, hash 288f005cdbf83e96
2022-04-15 22:41:48 asr2-radeonvii3 proof: building level 4, hash 507147e13938b1d7
2022-04-15 22:41:51 asr2-radeonvii3 proof: building level 5, hash bbb53c67b6942ecf
2022-04-15 22:41:56 asr2-radeonvii3 proof: building level 6, hash 933a8f8dbedb1934
2022-04-15 22:42:06 asr2-radeonvii3 proof: building level 7, hash 640f5c1ff94249de
2022-04-15 22:42:23 asr2-radeonvii3 proof: building level 8, hash aa23ed3bc552e11e
2022-04-15 22:42:59 asr2-radeonvii3 proof: building level 9, hash 666e2661676c3301
2022-04-15 22:44:10 asr2-radeonvii3 proof: building level 10, hash a0aa5fc250008a33
2022-04-15 22:46:37 asr2-radeonvii3 proof: building level 11, hash 18d0b8852bb6775c
2022-04-15 22:51:53 asr2-radeonvii3 proof: building level 12, hash 873d71f96030372d
2022-04-15 23:02:10 asr2-radeonvii3 PRP-Proof 'proof\63376373-12.proof' generated
2022-04-15 23:02:10 asr2-radeonvii3 Proof: cleaning up temporary storage
2022-04-15 23:02:23 asr2-radeonvii3 {"status":"C", "exponent":"63376373", "worktype":"PRP-3", "res64":"2b6aa647bab5da4b", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"3670016", "proof":{"version":"1", "power":"12", "hashsize":"64", "md5":"b72d071b4aeec030f9b9f9e58ce3f583"}, "program":{"name":"gpuowl", "version":"v6.11-382-g98ff9c7-dirty"}, "user":"kriesel", "computer":"asr2-radeonvii3", "aid":"5CF2EE9E86D3F9DF863E280769E02857", "timestamp":"2022-04-16 04:02:23 UTC"}

I think there is more than 1.5% additional throughput that could be gained from doing some subset of these. Whether increasing utilization & throughput during these times are worth the effort to change the code and the risk of creating bugs with the increase in complexity, or whether some are even feasible, is your call.
Attached Thumbnails
Click image for larger version

Name:	building level 12 proof on M109653007 gpuowl 6-11-382 radeon vii.png
Views:	5
Size:	20.9 KB
ID:	26876   Click image for larger version

Name:	validating M405M residues and resumption after a M109M PRP proof generation completed.png
Views:	4
Size:	19.1 KB
ID:	26877   Click image for larger version

Name:	gpuowl load dropouts.png
Views:	7
Size:	17.2 KB
ID:	26879  
Attached Files
File Type: zip GPU-Z Sensor Log radeon 7 gpuowl6.11-382 405M prp.zip (14.7 KB, 4 views)
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1684 2022-04-19 20:25
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:36.


Sun May 22 20:36:47 UTC 2022 up 38 days, 18:38, 0 users, load averages: 1.33, 1.10, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔