mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-08-27, 21:10   #2399
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

100110011000002 Posts
Default

E-mail exchange between me and George in last 24 hours which may be of use to gpuowl users preparing to switch to the latest build which supports the verified-proof mechanism:
Quote:
EWM: 2 questions re. PRP-with-verify runs:

1. Will existing first-time PRP assignments - i.e. ones reserved before PRP-with-verify went live in gpuowl and Prime95 - automatically be run in PRP-with-verify mode when started with the new builds?

2. Can in-progress first-time PRP assignments be switched to PRP-with-verify mode mid-run - I suspect not - or should one finish each such assignment, then kill the program instance in question and delete any savefiles created by same, before starting afresh with a new PRP-with-verify-supporting build?

GW: Yes, new gpuowl builds will create proofs for old PRP assignments.

You can switch to the new gpuowl mid-run. You won't get a proof, but the next assignment will.

EWM: OK, restarted all jobs on my 2 multi-gpu systems using new build
yesterday afternoon. In-progress jobs restarted using the new build get a "Proof disabled because of missing checkpoints" info-message in the logfile. First new-PRP-job using new build should start in a couple hours.

One more question re. the gpuowl primenet.py-script - I currently just run that once a week in single-shot (t=0) mode, each time requesting 10 PRPs, enough for over a week ... I suspect I'm going to want to switch to default several-times-per-day mode to allow the latest version of the script to upload cert-related files more frequently, is that right?

GW: Your call on how to use the python script. I use the old one which only gets assignments. I wrote a cron job that moves gpuowl proofs to the prime95 directory for uploads. I find that I need prime95's rate-limited upload feature.

With 6 GPUs, about 9 PRPs a day -- that's 63 uploads if run once a week. The server should be able to handle that kind of bursty behavior.

EWM: With my downclocking settings (average sclk=3 across the 6 R7s, consisting of a mix of 2,3 and 4) it's closer to 7 PRPs per day, but the difference does not signify. How much bandwidth are we talking about for 50 PRP proof files for current-wavefront exponents?

GW: 50 proofs @ power=8 is about 50 * 9 * 13MB

If I upload at unthrottled speeds (12Mbps), my download speed is affected (drops from 200Mbps to much less). This was explained to me as downloading cannot send the many required ACK responses because the upload queue is clogged. In short, wife is not happy if videos get choppy.

I've set prime95 to rate limit and send the proofs late at night. Works well.
So I'm looking at ~6GB of proof-file uploads per week ... yes, definitely going to switch to default (no explicit setting of the primenet.py-script -t flag) mode, which translates to hourly updates:
parser.add_argument('-t', dest='timeout', type=int, default=3600, help="Seconds to sleep between updates")
ewmayer is offline   Reply With Quote
Old 2020-08-30, 20:44   #2400
moebius
 
moebius's Avatar
 
Jul 2009
Germany

7038 Posts
Thumbs up

Uploaded my first proof-file generated with gpuowl
uploader ********** 104984071-8.proof
MD5 of 104984071-8.proof is ******************************************
Proof file exponent is 104984071
Filesize of 104984071-8.proof is 118107139
Success!
moebius is online now   Reply With Quote
Old 2020-08-30, 21:35   #2401
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

231408 Posts
Default

In addition to the upload-data-volume issue detailed in my previous post, I'm finding that on my older Haswell system, which started life as a barebones Sandy Bridge system I bought used from our own Xyzzy, I need to make sure to delete old test-related folders and files at least once a month. That system has a relatively tiny 40GB SSD [yes, I can easily buy a bigger one, but as Xyzzy has noted with a countrified turn of phrase, I'm (in)famous for "making a hardware-purchase dollar holler"], and in PRP-with-proof mode there is a whole slew of interim residue files that get generated and, on completion of the PRP test, get combined into the final proof file. For example, one of the 4 ongoing PRP tests (2 on each of the 2 installed Radeon VII cards) is of p = 106770193. The proof/ subdirectory under that exponent-named main run directory is showing one 13346280-byte savefile getting added every 417072 = 2^4*3*8689 iterations, so at run-completion time, there will be ceiling(p/417072) = 256 = 2^8 such savefiles, corresponding to proof power 8. Said files use ~3.2GB; worst-case scenario is all 4 runs completing around the same time, in which case we have 1024 interim proof-related files using ~13GB, which will be processed into smaller proof files and then deleted. That leaves just a little over 20GB available for all other user files, including older-run savefiles and proof files. Xmas-gift-to-self this year will definitely be a larger SSD for that system. :)

Last fiddled with by ewmayer on 2020-08-30 at 21:36
ewmayer is offline   Reply With Quote
Old 2020-08-31, 14:51   #2402
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

69316 Posts
Default

Here is an oddity. Anyone want to take a guess as to what the 491.74 seconds is about?
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	58
Size:	44.4 KB
ID:	23224  
storm5510 is offline   Reply With Quote
Old 2020-08-31, 18:49   #2403
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23·3·29 Posts
Default

Apologies for being lazy but I'm being lazy, no apologies ;)


I think -L/opt/rocm-3.7.0/opencl/lib needs to be added to LIBPATH in the Makefile, and/or generically -L/opt/rocm/opencl/lib, gpuowl doesn't compile on Ubuntu 20.04.1 using rocm 3.7.0 without it (there appears to be no x86_64 directory anymore as all the libraries are directly in the lib dir).


As an aside ROCm says it supports Ubuntu 20.04 but still relies on gcc-7, which is a bit of a pain as Ubuntu 20.04 uses gcc-9 by default (gcc-7 is still installable but requires the Universe repo, may be useful knowledge for those doing a fresh install). The current status quo is gcc-7 OR gcc-5, hopefully gcc-9 gets added to the fly list within the next few minor releases to put a bow on 20.04 support as it feels ever so slightly dirty to install an old version of gcc.
M344587487 is online now   Reply With Quote
Old 2020-09-02, 13:48   #2404
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32×331 Posts
Default

I have been getting failed LL DC tests with gpuowl on Colab P100 and V100 even though they have ECC RAM.

I think it is because gpuowl is choosing too small FFT size: 2.75M for a 54.1M exponent.

I'm trying to force 3M FFT now and see if it helps. I am using a version from several weeks ago: "version":"v6.11-278-ga39cc1a" Did anything change in FFT size selection?


Code:
2020-09-01 01:28:48 config: -use ORIG_SLOWTRIG,CARRY32,NO_T2_SHUFFLE,OUT_WG=64,OUT_SIZEX=8,OUT_SPACING=4,IN_WG=128,IN_SIZEX=16,IN_SPACING=4
2020-09-01 01:28:48 device 0, unique id ''
2020-09-01 01:28:48 Exception gpu_error:  clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs
2020-09-01 01:28:48 Bye
2020-09-01 01:30:23 config: -use ORIG_SLOWTRIG,CARRY32,NO_T2_SHUFFLE,OUT_WG=64,OUT_SIZEX=8,OUT_SPACING=4,IN_WG=128,IN_SIZEX=16,IN_SPACING=4
2020-09-01 01:30:24 device 0, unique id ''
2020-09-01 01:30:25 Tesla V100-SXM2-16GB-0 54094109 FFT: 2.75M 256:11:512 (18.76 bpw)
2020-09-01 01:30:25 Tesla V100-SXM2-16GB-0 Expected maximum carry32: 50F80000
2020-09-01 01:30:26 Tesla V100-SXM2-16GB-0 NO_T2_SHUFFLE not used
2020-09-01 01:30:26 Tesla V100-SXM2-16GB-0 OpenCL args "-DEXP=54094109u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.2e79643428021p+0 -DIWEIGHT_STEP=0x1.b155357f46494p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DCARRY32=1 -DIN_SIZEX=16 -DIN_SPACING=4 -DIN_WG=128 -DNO_T2_SHUFFLE=1 -DORIG_SLOWTRIG=1 -DOUT_SIZEX=8 -DOUT_SPACING=4 -DOUT_WG=64  -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-09-01 01:30:30 Tesla V100-SXM2-16GB-0 

2020-09-01 01:30:30 Tesla V100-SXM2-16GB-0 OpenCL compilation in 3.84 s
2020-09-01 01:30:30 Tesla V100-SXM2-16GB-0 54094109 LL        0 loaded: 0000000000000004
2020-09-01 01:30:59 Tesla V100-SXM2-16GB-0 54094109 LL   100000   0.18%;  291 us/it; ETA 0d 04:22; 51ac2a394d14ff92
2020-09-01 01:31:28 Tesla V100-SXM2-16GB-0 54094109 LL   200000   0.37%;  291 us/it; ETA 0d 04:22; 5e302a3c7092adc4

Last fiddled with by ATH on 2020-09-02 at 13:51
ATH is offline   Reply With Quote
Old 2020-09-02, 15:32   #2405
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3·1,193 Posts
Default

Quote:
Originally Posted by ATH View Post
I am using a version from several weeks ago: "version":"v6.11-278-ga39cc1a" Did anything change in FFT size selection?
Quite a lot has changed. When you update to a newer version, consider not using ORIG_SLOWTRIG.
Prime95 is online now   Reply With Quote
Old 2020-09-02, 23:51   #2406
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

297910 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Quite a lot has changed. When you update to a newer version, consider not using ORIG_SLOWTRIG.
I just tried the newest version v6.11-380-g79ea0cc and it still chose 2.75M FFT for a 54.1M exponent, and I did not use ORIG_SLOWTRIG.

I forgot that this version I run LL tests on is several months old, not weeks (because it was a tad faster than the newest versions for some reason). The one I use for PRP CERT tests are only a few weeks old.

Last fiddled with by ATH on 2020-09-02 at 23:56
ATH is offline   Reply With Quote
Old 2020-09-03, 16:42   #2407
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

32·11·17 Posts
Default

Quote:
Originally Posted by ATH View Post
I just tried the newest version v6.11-380-g79ea0cc...
I see I am behind, again. This pretty much voids what I was going to write. Nothing of significance though.
storm5510 is offline   Reply With Quote
Old 2020-09-03, 17:08   #2408
Viliam Furik
 
Jul 2018
Martin, Slovakia

251 Posts
Default

Could somebody, please, compile it for Windows 10, 64-bit?
Viliam Furik is offline   Reply With Quote
Old 2020-09-03, 18:43   #2409
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,567 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Could somebody, please, compile it for Windows 10, 64-bit?
What do you need that v6.11-364 hasn't got, that is in the gpuowl executable? My impression from reviewing the git commits is that subsequent change has been mostly if not entirely in primenet.py. That's currently the latest given in the dedicated gpuowl Windows builds thread https://www.mersenneforum.org/showthread.php?t=25624, which links back to older versions. Every build I've made was on Windows 7 x64, and I've never seen one of those fail to run on Windows 10 x64. First post of that thread links to a how to build post also.

Last fiddled with by kriesel on 2020-09-03 at 18:50
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 09:11.

Wed Nov 25 09:11:06 UTC 2020 up 76 days, 6:22, 4 users, load averages: 1.74, 1.59, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.