View Single Post
Old 2018-05-29, 02:56   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,923 Posts
Default gpuOwL-specific reference material

This thread is intended to hold only reference material specifically for GpuOwL.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383.)


To get started in gpuowl, on linux, see Ernst Mayer's directions: https://mersenneforum.org/showthread.php?t=25601
For Windows, see below.
In either case, note that the computation types, hardware supported, fft size limits, file formats, etc have varied greatly and rapidly over the course of the hundreds of versions. Choose version according to what you want to run and what each offers.

On a completed install of Windows (may as well have done Windows updates to current also):
Enable or install and configure any remote desktop software you plan to use; Windows Remote Desktop, TightVNC, etc.
Create a working folder. (I create one for each instance for each gpu.) Do create it as a subdirectory of the user's default folder. Permissions will be ok there. DO NOT place it in system directories, Program Files, etc. Permission problems are common if attempting to run there.

Get and install the gpuowl software
Either
download a current build from the end of the Gpuowl Windows build thread https://www.mersenneforum.org/showthread.php?t=25624 or for an earlier version https://www.mersenneforum.org/showpo...39&postcount=4, or from the download mirror https://download.mersenne.ca/ which has many Windows versions and a few Linux versions. Unzip it into a working folder under the user's home directory (NOT in Program Files or other restricted areas as some have attempted, and run into permissions problems).
Or
follow the Windows build instructions at https://www.mersenneforum.org/showpo...4&postcount=21 to create a build environment (once) and follow the compile and link section there as needed.
Next, decide whether you will run it manually or use primenet.py.

I recommend running one instance of gpuowl manually at first, to learn what normal operation looks like, so that if/when issues with operation appear, you're familiar with the program. (Add complexity later, after learning the basics.) It will also give you a greater appreciation for the automation built into primenet.py and other programs such as prime95 and mprime.

If using gpuowl's primenet.py or certain other tools provided for gpuowl, you'll need a Python 3 installation. For Python, follow the instructions of a good one, such as https://docs.python.org/3/using/wind...l#windows-full (I've been exploring compiling primenet.py into a standalone executable, but haven't quite worked out how to get one small enough to post it on the forum as an attachment yet.)

Note, not all gpuowl versions include all the features described below. Some are rather recent additions.

Create a config.txt
Suggested contents:
-user your-primenet-uid -cpu systemname-gpumodel-number-winstance -device n -maxAlloc gpuram-delta
For example, since my primenet-uid is kriesel, for system asr2, second Radeon VII gpu, instance 2, the gpuram is 16GB total but if I have 2 instances running P-1 on the same gpu and their stage 2s might coincide, I might use maxAlloc= gpuram 8000 - delta 500 =7500 for each instance.
Then the config.txt line for the second gpu, second instance would be
-user kriesel -cpu asr2-radeonvii-2-w2 -device 1 -maxAlloc 7500
(Device numbers start at 0 in gpuowl and some other GIMPS gpu applications.)
Other -options are given in help output. -yield or -power 9 may be useful. Whoever runs the cert will appreciate the reduced effort required for the cert and the overall efficiency. (And occasionally that may be you!)
Format of command line options and config.txt are the same.
However, config.txt must be one line and followed by a return.

Batch file or shell script
I find it useful to create a short batch file also, and desktop shortcut to it. The batch file can be as simple as g6.bat:
Code:
title %cd%
gpuowl-win
Adding distinctive colors for text and background can be useful. Background color associated with gpu, and text color associated with instance number for example.

The shortcut command should not be a direct invocation of the batch file. Use a cmd /k prefix so the window lingers in case of problems, to give you long enough to see and read any error message.

Having the help.txt and use-flags.txt files in the working directory for ready reference is also convenient.

Create a worktodo.txt
Go to https://www.mersenne.org/manual_assignment/ to get a single PRP assignment or PRP DC assignment. The excellent Gerbicz error check GEC) on PRP work will determine whether the gpu is producing reliable interim results. Verify the gpu and system combination is reliable in PRP/GEC, before attempting any P-1 factoring or LL DC, which have less error detection.
Open worktodo.txt for editing.
Paste the assignment into worktodo.txt and follow it with a return.
Save the modified file.

Try running gpuowl via the batch file at a Windows command prompt: g6
If it works, it should look something like the following, allowing for differences in parameters entered in config.txt, gpuowl version, exponent, work type, etc. If not, fix and retry.

A PRP start: lines should contain "OK", "EE" instead means errors, trouble, perhaps clocks set too high or unreliable system ram or a failed fan, or the fft length is specified too short for the exponent and computation type.
Code:
2020-05-29 15:03:44 config: -device 1 -user kriesel -cpu roa/rx550 -use NO_ASM -maxAlloc 1500
2020-05-29 15:03:44 device 1, unique id ''
2020-05-29 15:03:44 roa/rx550 94955299 FFT: 5M 1K:10:256 (18.11 bpw)
2020-05-29 15:03:44 roa/rx550 Expected maximum carry32: 48210000
2020-05-29 15:03:45 roa/rx550 OpenCL args "-DEXP=94955299u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.cfec567b14fd8p-3 -DIWEIGHT_STEP=0x8.a43aff8beae48p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1  -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-05-29 15:03:52 roa/rx550 OpenCL compilation in 6.31 s
2020-05-29 15:03:58 roa/rx550 94955299 OK        0 loaded: blockSize 400, 0000000000000003
2020-05-29 15:04:16 roa/rx550 94955299 OK      800   0.00%; 14232 us/it; ETA 15d 15:24; 69f923b24568ac18 (check 5.88s)
2020-05-29 15:51:37 roa/rx550 94955299 OK   200000   0.21%; 14233 us/it; ETA 15d 14:38; 986d9b55f22ac736 (check 5.88s)
A P-1 run start:
Code:
2020-08-15 11:30:03 gpuowl v6.11-340-g41d435f
2020-08-15 11:30:04 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -proof 8
2020-08-15 11:30:04 device 0, unique id ''
2020-08-15 11:30:04 condorella/rx480 183000023 FFT: 10M 1K:10:512 (17.45 bpw)
2020-08-15 11:30:04 condorella/rx480 Expected maximum carry32: 43400000
2020-08-15 11:30:07 condorella/rx480 OpenCL args "-DEXP=183000023u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=10u -DPM1=1 -DAMDGPU=1 -DCARRYM64=1 -DWEIGHT_STEP_
MINUS_1=0xe.c72a0862a91p-5 -DIWEIGHT_STEP_MINUS_1=-0xa.1bff0fe0af57p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-15 11:30:07 condorella/rx480 ASM compilation failed, retrying compilation using NO_ASM
2020-08-15 11:30:15 condorella/rx480 OpenCL compilation in 8.00 s
2020-08-15 11:30:15 condorella/rx480 183000023 P1 B1=700000, B2=26000000; 1009635 bits; starting at 0
2020-08-15 11:31:24 condorella/rx480 183000023 P1    10000   0.99%; 6840 us/it; ETA 0d 01:54; 58a6331a302132cb
2020-08-15 11:32:32 condorella/rx480 183000023 P1    20000   1.98%; 6877 us/it; ETA 0d 01:53; 38e0ea00e8d3dfcd
2020-08-15 11:33:41 condorella/rx480 183000023 P1    30000   2.97%; 6897 us/it; ETA 0d 01:53; 52d0881a817b7a2d
2020-08-15 11:34:50 condorella/rx480 183000023 P1    40000   3.96%; 6904 us/it; ETA 0d 01:52; 5cc6128ee3cf27dd
2020-08-15 11:35:16 condorella/rx480 saved
2020-08-15 11:36:00 condorella/rx480 183000023 P1    50000   4.95%; 6928 us/it; ETA 0d 01:51; 4980d669e97a532f
Manually report the result first (BEFORE uploading the proof file)
Open https://www.mersenne.org/manual_result/ in a web browser. Verify you're logged in (see upper right of the web page.)
Open the results.txt file in an editor. Copy the result. Paste it into the results field of the web page. Click on "Submit".
Optionally, enter a note in the results.txt file that the results line has been reported. I usually place "reported mm/dd/yy" filling in the date of report, on a separate line after the last reported line. This helps avoid duplicate reports, and scans easily.

Upload the proof file (AFTER reporting the result record)
There's only a proof file for PRP runs begun with proof-capable versions of gpuowl. It's found in subfolder proof, and has a name composed of the exponent and proof power. For example, exponent 1234567 power 8 would be 1234567-8.proof in folder workingdirectory\proof. There are several possible upload methods listed at https://www.mersenneforum.org/showpo...0&postcount=26 for which some have the steps described there.

Using multiple instances
There are two ways that running multiple instances on the same gpu at the same time may increase throughput.
1. Aggregate throughput may be higher running two instances on the same gpu at the same time. Any wait time that occurs for one instance on the gpu while the cpu performs the GEC, or reads from or writes to files or the console, or moves data over the PCIe bus between gpu ram and system ram, may be usable by the other instance.
2. Emptying the worktodo file or halting on an error condition by one instance does not completely stop progress; any other running instance then can use resources the stopped instance is no longer using. A single instance if halted does not leave the gpu idle for hours or days if at least one other instance is still running on it.
Multiple instances are not guaranteed to improve sustained throughput. Throughput seems to be better if the two instances are running similar code and parameters; two 5M fft PRPs for example (not a 5M and an 8M, or P-1 and LLDC).

The conceptually simplest way to run multiple instances is to create a separate working folder with its own set of files including gpuowl executable, same as the first instance. A more compact way is to use the executable in a common folder by all instances of all gpus, IF updating version on all instances simultaneously is ok.

Backups
Now might be a good time to refresh your Windows backup, system restore point, etc.
Also verify that your gpuowl folders, containing assignments, results, interim save files, configuration, etc. will be backed up.

Using gpuowl's primenet.py
There's a separate post detailing primenet.py setup and use at https://www.mersenneforum.org/showpo...2&postcount=25. It's probably best to rename the results file containing already-reported results before starting to use primenet.py for reporting.

Pool
I haven't tried it yet, but particularly with multiple instances, multiple gpus of the same type, or both, it appears it could make manual work assignment and result reporting easier. The help output says:
Code:
-pool <dir>        : specify a directory with the shared (pooled) worktodo.txt and results.txt
                     Multiple GpuOwl instances, each in its own directory, can share a pool of assignments
So presumably putting the -pool option in each config.txt does it.


Check throughput / iteration times are about as expected, error rate is low.
Occurrence of EE in the console or log output should be low, ideally less than once a week. For iteration times, see https://drive.google.com/file/d/10fC...enkBdAaRP/view and run time posts in this thread, and note iteration times vary greatly by exponent (more precisely fft length) and hardware and tuning.


Table of contents
  1. This post
  2. Run time versus exponent or fft length for the RX550 or RX480 or Radeon VII of gpuOwL from V1.9 up. Currently up to v7.2-69 on RX480 or Radeon VII http://www.mersenneforum.org/showpos...35&postcount=2
  3. gpuOwL bug and wish list http://www.mersenneforum.org/showpos...37&postcount=3
  4. Getting started with gpuOwL http://www.mersenneforum.org/showpos...39&postcount=4
  5. gpuOwL requirements http://www.mersenneforum.org/showpos...76&postcount=5
  6. Features and requests http://www.mersenneforum.org/showpos...77&postcount=6
  7. Feature / version announcements http://www.mersenneforum.org/showpos...83&postcount=7
  8. Determining upper exponent limit for a transform type and fft length https://www.mersenneforum.org/showpo...31&postcount=8
  9. FFT lengths https://www.mersenneforum.org/showpo...36&postcount=9
  10. PRP-3 run time scaling in V5.0-9c13870 (no P-1) https://www.mersenneforum.org/showpo...6&postcount=10
  11. Gpuowl .owl file header style versus gpuowl version samples https://www.mersenneforum.org/showpo...7&postcount=11
  12. Gpuowl PRP3 continuation compatibility https://www.mersenneforum.org/showpo...7&postcount=12
  13. Validation and verification runs https://www.mersenneforum.org/showpo...1&postcount=13
  14. gpuowl-win V6.5-c48d46f run times on AMD and NVIDIA vs. CUDALucas https://www.mersenneforum.org/showpo...7&postcount=14
  15. Gpuowl residue type etc versus version https://www.mersenneforum.org/showpo...3&postcount=15
  16. Gpuowl v6.5-84-g30c0508 -h help output https://www.mersenneforum.org/showpo...9&postcount=16
  17. Gpuowl P-1 run time scaling on AMD and NVIDIA https://www.mersenneforum.org/showpo...5&postcount=17
  18. Gerbicz error check detection rate https://www.mersenneforum.org/showpo...4&postcount=18
  19. Increased throughput with simultaneous runs https://www.mersenneforum.org/showpo...6&postcount=19
  20. What's a good P-1 factoring strategy? Best? https://www.mersenneforum.org/showpo...9&postcount=20
  21. Compiling Gpuowl https://www.mersenneforum.org/showpo...4&postcount=21
  22. Save file size versus exponent or fft length https://www.mersenneforum.org/showpo...7&postcount=22
  23. Setting up in Linux https://www.mersenneforum.org/showpo...5&postcount=23
  24. Gpuowl gpu ram use scaling with exponent https://www.mersenneforum.org/showpo...3&postcount=24
  25. Using gpuowl's primenet.py on Windows https://www.mersenneforum.org/showpo...2&postcount=25
  26. Methods of uploading proof files https://www.mersenneforum.org/showpo...0&postcount=26
  27. User base OS mix indications https://www.mersenneforum.org/showpo...3&postcount=27
  28. Performance variables for gpuowl https://www.mersenneforum.org/showpo...6&postcount=28
  29. P-1 speed, 103M, V6.11-380 vs. v7.2-53 https://www.mersenneforum.org/showpo...9&postcount=29
  30. Gpuowl automatic P-1 bounds selection vs. version and exponent https://www.mersenneforum.org/showpo...6&postcount=30
  31. Gpuowl primenet interface https://www.mersenneforum.org/showpo...4&postcount=31
  32. etc tbd

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-23 at 15:08 Reason: added link to moebius's benchmark list
kriesel is online now