mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-05-29, 02:56   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·593 Posts
Default gpuOwL-specific reference material

This thread is intended to hold only reference material specifically for GpuOwL.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383.)


To get started in gpuowl, on linux, see Ernst Mayer's directions: https://mersenneforum.org/showthread.php?t=25601
For Windows, see below.
In either case, note that the computation types, hardware supported, fft size limits, file formats, etc have varied greatly and rapidly over the course of the hundreds of versions. Choose version according to what you want to run and what each offers.

On a completed install of Windows (may as well have done Windows updates to current also):
Enable or install and configure any remote desktop software you plan to use; Windows Remote Desktop, TightVNC, etc.
Create a working folder. (I create one for each instance for each gpu.) Do create it as a subdirectory of the user's default folder. Permissions will be ok there. DO NOT place it in system directories, Program Files, etc. Permission problems are common if attempting to run there.

Get and install the gpuowl software
Either
download a current build from the end of the Gpuowl Windows build thread https://www.mersenneforum.org/showthread.php?t=25624 or for an earlier version https://www.mersenneforum.org/showpo...39&postcount=4, or from the download mirror https://download.mersenne.ca/ which has many Windows versions and a few Linux versions. Unzip it into a working folder under the user's home directory (NOT in Program Files or other restricted areas as some have attempted, and run into permissions problems).
Or
follow the Windows build instructions at https://www.mersenneforum.org/showpo...4&postcount=21 to create a build environment (once) and follow the compile and link section there as needed.
Next, decide whether you will run it manually or use primenet.py.

I recommend running one instance of gpuowl manually at first, to learn what normal operation looks like, so that if/when issues with operation appear, you're familiar with the program. (Add complexity later, after learning the basics.) It will also give you a greater appreciation for the automation built into primenet.py and other programs such as prime95 and mprime.

If using gpuowl's primenet.py or certain other tools provided for gpuowl, you'll need a Python 3 installation. For Python, follow the instructions of a good one, such as https://docs.python.org/3/using/wind...l#windows-full (I've been exploring compiling primenet.py into a standalone executable, but haven't quite worked out how to get one small enough to post it on the forum as an attachment yet.)

Note, not all gpuowl versions include all the features described below. Some are rather recent additions.

Create a config.txt
Suggested contents:
-user your-primenet-uid -cpu systemname-gpumodel-number-winstance -device n -maxAlloc gpuram-delta
For example, since my primenet-uid is kriesel, for system asr2, second Radeon VII gpu, instance 2, the gpuram is 16GB total but if I have 2 instances running P-1 on the same gpu and their stage 2s might coincide, I might use maxAlloc= gpuram 8000 - delta 500 =7500 for each instance.
Then the config.txt line for the second gpu, second instance would be
-user kriesel -cpu asr2-radeonvii-2-w2 -device 1 -maxAlloc 7500
(Device numbers start at 0 in gpuowl and some other GIMPS gpu applications.)
Other -options are given in help output. -yield or -power 9 may be useful. Whoever runs the cert will appreciate the reduced effort required for the cert and the overall efficiency. (And occasionally that may be you!)
Format of command line options and config.txt are the same.
However, config.txt must be one line and followed by a return.

Batch file or shell script
I find it useful to create a short batch file also, and desktop shortcut to it. The batch file can be as simple as g6.bat:
Code:
title %cd%
gpuowl-win
Adding distinctive colors for text and background can be useful. Background color associated with gpu, and text color associated with instance number for example.

The shortcut command should not be a direct invocation of the batch file. Use a cmd /k prefix so the window lingers in case of problems, to give you long enough to see and read any error message.

Having the help.txt and use-flags.txt files in the working directory for ready reference is also convenient.

Create a worktodo.txt
Go to https://www.mersenne.org/manual_assignment/ to get a single PRP assignment or PRP DC assignment. The excellent Gerbicz error check GEC) on PRP work will determine whether the gpu is producing reliable interim results. Verify the gpu and system combination is reliable in PRP/GEC, before attempting any P-1 factoring or LL DC, which have less error detection.
Open worktodo.txt for editing.
Paste the assignment into worktodo.txt and follow it with a return.
Save the modified file.

Try running gpuowl via the batch file at a Windows command prompt: g6
If it works, it should look something like the following, allowing for differences in parameters entered in config.txt, gpuowl version, exponent, work type, etc. If not, fix and retry.

A PRP start: lines should contain "OK", "EE" instead means errors, trouble, perhaps clocks set too high or unreliable system ram or a failed fan, or the fft length is specified too short for the exponent and computation type.
Code:
2020-05-29 15:03:44 config: -device 1 -user kriesel -cpu roa/rx550 -use NO_ASM -maxAlloc 1500
2020-05-29 15:03:44 device 1, unique id ''
2020-05-29 15:03:44 roa/rx550 94955299 FFT: 5M 1K:10:256 (18.11 bpw)
2020-05-29 15:03:44 roa/rx550 Expected maximum carry32: 48210000
2020-05-29 15:03:45 roa/rx550 OpenCL args "-DEXP=94955299u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.cfec567b14fd8p-3 -DIWEIGHT_STEP=0x8.a43aff8beae48p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1  -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-05-29 15:03:52 roa/rx550 OpenCL compilation in 6.31 s
2020-05-29 15:03:58 roa/rx550 94955299 OK        0 loaded: blockSize 400, 0000000000000003
2020-05-29 15:04:16 roa/rx550 94955299 OK      800   0.00%; 14232 us/it; ETA 15d 15:24; 69f923b24568ac18 (check 5.88s)
2020-05-29 15:51:37 roa/rx550 94955299 OK   200000   0.21%; 14233 us/it; ETA 15d 14:38; 986d9b55f22ac736 (check 5.88s)
A P-1 run start:
Code:
2020-08-15 11:30:03 gpuowl v6.11-340-g41d435f
2020-08-15 11:30:04 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -proof 8
2020-08-15 11:30:04 device 0, unique id ''
2020-08-15 11:30:04 condorella/rx480 183000023 FFT: 10M 1K:10:512 (17.45 bpw)
2020-08-15 11:30:04 condorella/rx480 Expected maximum carry32: 43400000
2020-08-15 11:30:07 condorella/rx480 OpenCL args "-DEXP=183000023u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=10u -DPM1=1 -DAMDGPU=1 -DCARRYM64=1 -DWEIGHT_STEP_
MINUS_1=0xe.c72a0862a91p-5 -DIWEIGHT_STEP_MINUS_1=-0xa.1bff0fe0af57p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-15 11:30:07 condorella/rx480 ASM compilation failed, retrying compilation using NO_ASM
2020-08-15 11:30:15 condorella/rx480 OpenCL compilation in 8.00 s
2020-08-15 11:30:15 condorella/rx480 183000023 P1 B1=700000, B2=26000000; 1009635 bits; starting at 0
2020-08-15 11:31:24 condorella/rx480 183000023 P1    10000   0.99%; 6840 us/it; ETA 0d 01:54; 58a6331a302132cb
2020-08-15 11:32:32 condorella/rx480 183000023 P1    20000   1.98%; 6877 us/it; ETA 0d 01:53; 38e0ea00e8d3dfcd
2020-08-15 11:33:41 condorella/rx480 183000023 P1    30000   2.97%; 6897 us/it; ETA 0d 01:53; 52d0881a817b7a2d
2020-08-15 11:34:50 condorella/rx480 183000023 P1    40000   3.96%; 6904 us/it; ETA 0d 01:52; 5cc6128ee3cf27dd
2020-08-15 11:35:16 condorella/rx480 saved
2020-08-15 11:36:00 condorella/rx480 183000023 P1    50000   4.95%; 6928 us/it; ETA 0d 01:51; 4980d669e97a532f
Manually report the result first (BEFORE uploading the proof file)
Open https://www.mersenne.org/manual_result/ in a web browser. Verify you're logged in (see upper right of the web page.)
Open the results.txt file in an editor. Copy the result. Paste it into the results field of the web page. Click on "Submit".
Optionally, enter a note in the results.txt file that the results line has been reported. I usually place "reported mm/dd/yy" filling in the date of report, on a separate line after the last reported line. This helps avoid duplicate reports, and scans easily.

Upload the proof file (AFTER reporting the result record)
There's only a proof file for PRP runs begun with proof-capable versions of gpuowl. It's found in subfolder proof, and has a name composed of the exponent and proof power. For example, exponent 1234567 power 8 would be 1234567-8.proof in folder workingdirectory\proof. There are several possible upload methods listed at https://www.mersenneforum.org/showpo...0&postcount=26 for which some have the steps described there.

Using multiple instances
There are two ways that running multiple instances on the same gpu at the same time may increase throughput.
1. Aggregate throughput may be higher running two instances on the same gpu at the same time. Any wait time that occurs for one instance on the gpu while the cpu performs the GEC, or reads from or writes to files or the console, or moves data over the PCIe bus between gpu ram and system ram, may be usable by the other instance.
2. Emptying the worktodo file or halting on an error condition by one instance does not completely stop progress; any other running instance then can use resources the stopped instance is no longer using. A single instance if halted does not leave the gpu idle for hours or days if at least one other instance is still running on it.
Multiple instances are not guaranteed to improve sustained throughput. Throughput seems to be better if the two instances are running similar code and parameters; two 5M fft PRPs for example (not a 5M and an 8M, or P-1 and LLDC).

The conceptually simplest way to run multiple instances is to create a separate working folder with its own set of files including gpuowl executable, same as the first instance. A more compact way is to use the executable in a common folder by all instances of all gpus, IF updating version on all instances simultaneously is ok.

Backups
Now might be a good time to refresh your Windows backup, system restore point, etc.
Also verify that your gpuowl folders, containing assignments, results, interim save files, configuration, etc. will be backed up.

Using gpuowl's primenet.py
There's a separate post detailing primenet.py setup and use at https://www.mersenneforum.org/showpo...2&postcount=25. It's probably best to rename the results file containing already-reported results before starting to use primenet.py for reporting.

Pool
I haven't tried it yet, but particularly with multiple instances, multiple gpus of the same type, or both, it appears it could make manual work assignment and result reporting easier. The help output says:
Code:
-pool <dir>        : specify a directory with the shared (pooled) worktodo.txt and results.txt
                     Multiple GpuOwl instances, each in its own directory, can share a pool of assignments
So presumably putting the -pool option in each config.txt does it.


Check throughput / iteration times are about as expected, error rate is low.
Occurrence of EE in the console or log output should be low, ideally less than once a week. For iteration times, see https://drive.google.com/file/d/10fC...enkBdAaRP/view and run time posts in this thread, and note iteration times vary greatly by exponent (more precisely fft length) and hardware and tuning.


Table of contents
  1. This post
  2. Run time versus exponent or fft length for the RX550 or RX480 or Radeon VII of gpuOwL from V1.9 up. Currently up to v7.2-69 on RX480 or Radeon VII http://www.mersenneforum.org/showpos...35&postcount=2
  3. gpuOwL bug and wish list http://www.mersenneforum.org/showpos...37&postcount=3
  4. Getting started with gpuOwL http://www.mersenneforum.org/showpos...39&postcount=4
  5. gpuOwL requirements http://www.mersenneforum.org/showpos...76&postcount=5
  6. Features and requests http://www.mersenneforum.org/showpos...77&postcount=6
  7. Feature / version announcements http://www.mersenneforum.org/showpos...83&postcount=7
  8. Determining upper exponent limit for a transform type and fft length https://www.mersenneforum.org/showpo...31&postcount=8
  9. FFT lengths https://www.mersenneforum.org/showpo...36&postcount=9
  10. PRP-3 run time scaling in V5.0-9c13870 (no P-1) https://www.mersenneforum.org/showpo...6&postcount=10
  11. Gpuowl .owl file header style versus gpuowl version samples https://www.mersenneforum.org/showpo...7&postcount=11
  12. Gpuowl PRP3 continuation compatibility https://www.mersenneforum.org/showpo...7&postcount=12
  13. Validation and verification runs https://www.mersenneforum.org/showpo...1&postcount=13
  14. gpuowl-win V6.5-c48d46f run times on AMD and NVIDIA vs. CUDALucas https://www.mersenneforum.org/showpo...7&postcount=14
  15. Gpuowl residue type etc versus version https://www.mersenneforum.org/showpo...3&postcount=15
  16. Gpuowl v6.5-84-g30c0508 -h help output https://www.mersenneforum.org/showpo...9&postcount=16
  17. Gpuowl P-1 run time scaling on AMD and NVIDIA https://www.mersenneforum.org/showpo...5&postcount=17
  18. Gerbicz error check detection rate https://www.mersenneforum.org/showpo...4&postcount=18
  19. Increased throughput with simultaneous runs https://www.mersenneforum.org/showpo...6&postcount=19
  20. What's a good P-1 factoring strategy? Best? https://www.mersenneforum.org/showpo...9&postcount=20
  21. Compiling Gpuowl https://www.mersenneforum.org/showpo...4&postcount=21
  22. Save file size versus exponent or fft length https://www.mersenneforum.org/showpo...7&postcount=22
  23. Setting up in Linux https://www.mersenneforum.org/showpo...5&postcount=23
  24. Gpuowl gpu ram use scaling with exponent https://www.mersenneforum.org/showpo...3&postcount=24
  25. Using gpuowl's primenet.py on Windows https://www.mersenneforum.org/showpo...2&postcount=25
  26. Methods of uploading proof files https://www.mersenneforum.org/showpo...0&postcount=26
  27. User base OS mix indications https://www.mersenneforum.org/showpo...3&postcount=27
  28. Performance variables for gpuowl https://www.mersenneforum.org/showpo...6&postcount=28
  29. P-1 speed, 103M, V6.11-380 vs. v7.2-53 https://www.mersenneforum.org/showpo...9&postcount=29
  30. Gpuowl automatic P-1 bounds selection vs. version and exponent https://www.mersenneforum.org/showpo...6&postcount=30
  31. Gpuowl primenet interface https://www.mersenneforum.org/showpo...4&postcount=31
  32. etc tbd

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-23 at 15:08 Reason: added link to moebius's benchmark list
kriesel is online now  
Old 2018-05-29, 03:10   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

134528 Posts
Default GpuOwL run time vs exponent or fft length or version

RX550 data

gpuOwl v2 5000k fft RX550 gpu, MSI 18.2.3 driver Feb 26 2018:in a quick test (~40,000 iterations each) was:
short carry 17.3 ms/iter,
medium 17.6,
long 17.4,
compared to V1.9 gpuOwL on the same gpu, same pcie physical connection, April 2017 MSI driver,
10.9 ms/iter for -fft DP -legacy -size 4M;
18.9 ms/iter -fft M61 -size 4M;
21.4 ms/iter -fft DP -legacy -size 8M.

The driver change coincided with an increase by about 5% of iteration time, on the same gpu, in V1.9 gpuOwL. http://www.mersenneforum.org/showpos...&postcount=370
See the first attachment below for V1.9 on an RX550.

See also the 4-program speed comparison in the general reference thread. http://www.mersenneforum.org/showpos...76&postcount=8

For an RX480 my data indicates 3.4-3.6 times faster than RX550,
on the same exponents and gpuOwL versions, at http://www.mersenneforum.org/showpos...&postcount=386 and subsequently

An Intel IGP HD620 could run V0.5 or v1.9 but it was not worth doing. On mine the hit on prime95 throughput was larger than the gpuOwL throughput as a result. More detail on the V0.5 try (LL): http://www.mersenneforum.org/showpos...&postcount=176 (I discontinued running gpuOwl on the IGP. The tradeoff with mfakto there was much better.)
Detail on the V1.9 try (PRP): http://www.mersenneforum.org/showpos...&postcount=285
A listing of V3.5 OpenOwL command line options and fft lengths can be found at http://www.mersenneforum.org/showpos...&postcount=565

Detail on benchmarking V3.3 and V3.5 OpenOwL fft lengths on RX480 can be found at http://www.mersenneforum.org/showpos...&postcount=570

Second attachment below tabulates ms/iteration timings for various versions, V3.x - V3.9, V4.6, and V5.0, and fft lengths, on an RX480, and includes some graphs and ratios.

Third attachment compares V6.2, 5.0, 3.8, 2.0, and 1.9. Each are fastest for some fft length / exponent ranges, except v2.0. The trend line fit for asymptotic scaling of the fastest version versus fft length or exponent is iteration time p1.078, so run time p2.078, for exponents 100M<p<~2520M (6M to 144M fft length).

Updated timings for RX480 and Radeon VII under Windows 7 and 10 respectively, up to Gpuowl v7.2-69 are included in the fourth and fifth attachments. These are works in progress currently. (Lots of data points, so reading glasses and zoom.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf speeds and limits.pdf (13.3 KB, 315 views)
File Type: pdf openowl v3x 46 50 timings.pdf (28.3 KB, 274 views)
File Type: pdf v6.2 etc benchmarks.pdf (37.6 KB, 277 views)
File Type: pdf owl many versions benchmarked rx480.pdf (79.8 KB, 99 views)
File Type: pdf owl many versions benchmarked radeon7.pdf (72.6 KB, 103 views)

Last fiddled with by kriesel on 2021-04-16 at 20:46 Reason: more benchmarks for Radeon VII
kriesel is online now  
Old 2018-05-29, 04:34   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111001010102 Posts
Default gpuOwL bug and wish list

Here is the latest posted version of the list I am maintaining for gpuOwL. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, or for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have, preferably by PM to kriesel.


(last updated 2018-12-31)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf gpuowl bug and wish list.pdf (76.0 KB, 297 views)

Last fiddled with by kriesel on 2019-11-17 at 14:48
kriesel is online now  
Old 2018-05-29, 04:44   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·593 Posts
Default Getting started with gpuOwL

This is an old post, but kept in place for its documentation of what can be done with the very old builds, and the long list of (mostly Windows, plus rarely Linux) builds available.


See the Available Software guide portion for gpuOwL, for where to get code, a brief summary of capability, and a discussion thread for it. Or scroll to the bottom of this post. Note this was originally written for very early versions, and that has been left in place here for those occasions when an old version is the best tool for a particular task.

It's pretty simple to get started with gpuOwL. Get the version that supports the fft length corresponding to the exponents you want to run, and build it for your operating system, or find a suitable executable someone else has already built. Kracker was kind enough to post build directions for Windows (including setting up a free open source build environment) at http://www.mersenneforum.org/showpos...&postcount=356 and a Windows build or two in the past in that same thread.

Install the OpenCL drivers on your system and confirm function with a separate OpenCL query utility.

Make sure you have gpuowl.cl in the working directory. For V1.9, depending on the transform type used, you may want nttshared.h in there too, such as if using -fft M61. (See http://www.mersenneforum.org/showpos...&postcount=224)

No ini file. Very little setup.

Manually check out some exponents for PRP test or PRP double check (unless you're using an old version that does LL, get that type of assignment instead) and put those records in a file called worktodo.txt, just as mersenne.org's manual checkout gives them.

You may want to use a small shell script or batch file depending on which OS you're using.

Syntax and options change with gpuowl version.
https://www.mersenneforum.org/showpo...&postcount=353
V0.6 syntax example:
Code:
gpuowl -logstep 5000 -savestep 2000000 -checkstep 250000 -uid kriesel/condorella-rx550
For V1.9, which has multiple power-of-two fft lengths, I use a simple batch file as follows (allowing switching options with a couple keystrokes and cutting way down on typos):
Code:
:set opts=-fft M61 -size 4M
set opts=-legacy
set dev=2

gpuowl -user kriesel -cpu condorella-rx550 -device %dev% -verbosity 2 %opts%
Observed memory requirements for 8m fft, 150M exponent, V1.9 gpuOwL, ~475-490MB on gpu, and about 100MB on cpu side (380MB peak). FFT sizes, transforms, and RX550 speeds see http://www.mersenneforum.org/showpos...&postcount=313

For V2 it's also simple, and somewhat differs:
Code:
gpuowl -device 0 -user kriesel -cpu condorella-rx480 -carry long
In my case I needed to update the MSI display adapter driver from April 2017 to Feb 2018 version to get V2.0 gpuOwL to run on an RX550 on Windows 7.
http://www.mersenneforum.org/showpos...&postcount=370

In V2 there's a -step option; see http://www.mersenneforum.org/showpos...&postcount=353

V3.x is different yet. See for example http://www.mersenneforum.org/showpos...&postcount=565
As is V4.x. As is V5.

Code (for Windows unless otherwise indicated)
For gpuOwL Windows code, and source see http://www.mersenneforum.org/showthread.php?t=22204
An early guide for compiling 0.x on windows with msys64+mingw64 http://www.mersenneforum.org/showpos...3&postcount=26
Windows in current versions includes the ability to handle .zip files but does not include support for some other compressed archive forms. IZArc is available for free download. It supports many formats, popular with/for Windows or Linux. https://www.izarc.org/
May 2017 v0.1 version Windows build (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=112
May 2017 V0.3 Windows binary (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=168
Jun 2017 V0.5 Windows binary (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=170
(LL discontinued, PRP with Gerbicz block error check beginning V0.7)

Sep 2017 V1.0 binaries for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=190
Nov 2017 V1.9 binaries for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=226
Jan 2018 V1.9 binaries updated for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=272
Aug 2018 V2.0 binary for Windows 64 bit .exe http://www.mersenneforum.org/showpos...&postcount=556
Aug 2018 V3.3 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=558
Aug 2018 V3.5 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=560
Aug 2018 V3.6 binary for Windows 64-bit .7z http://www.mersenneforum.org/showpos...&postcount=581
Aug 2018 V3.8 binary for Windows 64-bit (this and all the above are for OpenCl) .7z http://www.mersenneforum.org/showpost.php?p=494169&postcount=615
Aug 2018 V3.9 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=666

Nov 2018 V4.3 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=832
Nov 2018 V4.6 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=828
Oct 2018 V4.7 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=792 (not recommended, fails for me)

Nov 2018 V5.0 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=831
and with some fixes and new shorter fft lengths, .7z https://www.mersenneforum.org/showpo...&postcount=867
v5.0-9c13870 .7z https://www.mersenneforum.org/showpo...&postcount=869

Feb 2019 V6.0 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=967
V6.1 do not use the posted binary for V6.1 or for an early commit of V6.2. There was a bug that caused primes to be indicated composite in both.
Feb 2019 V6.2 binary for Windows 64 bit .zip https://www.mersenneforum.org/showpo...&postcount=983
Apr 2019 V6.4 binary for Windows 64 bit .zip https://www.mersenneforum.org/showpo...postcount=1057
May 2019 V6.5 binary for Windows 64 bit (AMD or NVIDIA!) .7z https://www.mersenneforum.org/showpo...postcount=1171
July 2019 V6.5-84-30c0508 for Windows 64 bit residue type 1 .7z https://www.mersenneforum.org/showpo...postcount=1274
(V6.6)
V6.7-4-g278407a Windows build .7z https://www.mersenneforum.org/showpo...postcount=1343
(V6.8)
version uncertain, Woltman's test version .zip file of source suitable for Linux building https://mersenneforum.org/showpost.p...postcount=1364
(V6.9)
V6.10-9-g54cba1d Windows build .zip https://mersenneforum.org/showpost.p...postcount=1385
V6.11-9-ga9e3189 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1403
Woltman's dropbox Windows build .exe https://mersenneforum.org/showpost.p...postcount=1510
Another Woltman dropbox version .exe https://mersenneforum.org/showpost.p...postcount=1539
V6.11-83-ge270393 Windows build .7z https://www.mersenneforum.org/showpo...postcount=1584
v6.11-88 build for Windows .7z https://mersenneforum.org/showpost.p...postcount=1629
gpuowl v6.11-99-gdd8527b Windows build .7z https://www.mersenneforum.org/showpo...postcount=1652
v6.11-104-g91ef9a8 .zip https://mersenneforum.org/showpost.p...postcount=1664
v6.11-112-gf1b00d1 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1682
January 2020 V6.11-116-g5ca090d P-1 PRP assignment split rewrite Windows build .7z https://www.mersenneforum.org/showpo...postcount=1740
v6.11-132-gfd01ee5 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1787
January 2020 V6.11-134-g1e0ce1d Windows build .7z https://mersenneforum.org/showpost.p...postcount=1796
February 2020 V6.11-142-gf54af2e Windows build .zip https://mersenneforum.org/showpost.p...postcount=1829
v6.11-145-g6146b6d Windows build .zip https://mersenneforum.org/showpost.p...postcount=1840
v6.11-147-g3b8b00e Windows build .zip https://mersenneforum.org/showpost.p...postcount=1866
v6.11-148-gfc93773 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1877
March 2020 v6.11-163-gec98bfe Windows build .7z https://mersenneforum.org/showpost.p...postcount=1903
v6.11-198-g628f3cd Windows build .7z https://mersenneforum.org/showpost.p...postcount=1959
v6.11-219-ge70ec99 ffts up to 192M Windows build .7z https://mersenneforum.org/showpost.p...postcount=1984
v6.11-?-af403e2 (by kracker) the return of LL? Windows build .zip https://mersenneforum.org/showpost.p...postcount=2047
v6.11-255-g81fa7c3 max fft 96M Windows build .7z https://mersenneforum.org/showpost.p...postcount=2063
v6.11-257-g39fc002 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2073
v6.11-259-g83434d8 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2089
April 2020 v6.11-264-g5c977d4-dirty Windows build .7z https://mersenneforum.org/showpost.p...postcount=2095
v6.11-268-g0d07d21 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2106
v6.11-270-gf1fd1f7 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2124
v6.11-272-g07718b9 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2139
May 2020 v6.11-278-ga39cc1a Windows build .7z https://mersenneforum.org/showpost.p...postcount=2161
v6.11-285-gf25ecbd Windows build .7z https://mersenneforum.org/showpost.p...postcount=2179
v6.11-288-g20c4213 Jacobi check returns! .7z https://mersenneforum.org/showpost.p...postcount=2202
v6.11-292-gecab9ae Windows build .7z https://mersenneforum.org/showpost.p...postcount=2220
June 2020 v6.11-295-gaecf041 (the last I could build until ~ -316) .7z https://mersenneforum.org/showpost.p...postcount=2274
v6.11-318-g3109989 Windows build, max fft 120M, includes PRP proof capability .7z https://mersenneforum.org/showpost.p...99&postcount=1
gpuowl for Windows 7 or up 64-bit v6.11-325-g7c09e38 .7z https://mersenneforum.org/showpost.p...29&postcount=2
gpuowl for Windows v611-327-g43cdf1c by Dylan14 .7z https://mersenneforum.org/showpost.p...25&postcount=3
gpuowl commit e5a8f2c for Google Colaboratory Linux environment built by Fan Ming .zip https://mersenneforum.org/showpost.p...&postcount=958
gpuowl for Windows v6.11-330-ge5a8f2c .7z https://www.mersenneforum.org/showpo...30&postcount=4
July 2020 Gpuowl-win v6.11-335-gff60b08 .7z https://mersenneforum.org/showpost.p...31&postcount=5
Gpuowl-win v6.11-340-g41d435f .7z https://www.mersenneforum.org/showpo...87&postcount=6
Gpuowl-win v6.11-357-g1f41292 build .7z https://mersenneforum.org/showpost.p...37&postcount=7
Gpuowl-win v6.11-364-g36f4e2a .7z https://mersenneforum.org/showpost.p...94&postcount=8
August 2020 Gpuowl v6.11-366-gf887d6e for Linux Google Colab .7z https://www.mersenneforum.org/showpo...postcount=1020
(Note, August development focused more on primenet.py and less on the gpuowl executable.)
September 2020 Gpuowl for Linux v6.11-380-g79ea0cc .7z https://mersenneforum.org/showpost.p...1&postcount=40
Gpuowl for Windows v6.11-380-g79ea0cc .7z https://mersenneforum.org/showpost.p...92&postcount=9

October 2020 Gpuowl for Windows v7.0-18-g69c2b85 .7z (LL and standalone P-1 removed, joint P-1/PRP introduced) https://www.mersenneforum.org/showpo...7&postcount=10
Gpuowl-win v7.0-26-g8e6a1d1 .7z https://www.mersenneforum.org/showpo...1&postcount=11
gpuowl-win v7.0-35-gf06bc5b .7z https://www.mersenneforum.org/showpo...7&postcount=12
gpuowl-win v7.0-40-gb62d4fd .7z https://www.mersenneforum.org/showpo...8&postcount=13
gpuowl-win v7.0-47-ga8664fe .7z https://www.mersenneforum.org/showpo...2&postcount=14
gpuowl-win v7.0-66-gebe49cc .7z https://www.mersenneforum.org/showpo...1&postcount=15

Note, do not use the self-verify option with v7.1, or the resulting proof files will be bad.
gpuowl-win v7.1-1-g0f73d04 .7z https://www.mersenneforum.org/showpo...2&postcount=16
(Ethan EO multiple vendors' OpenCL flavors) gpuowl-win v7.1-7 .7z https://www.mersenneforum.org/showpo...postcount=2558
GpuOwl-win v7.1-11-g97cfbd2 2xSP fft experimentation .7z https://www.mersenneforum.org/showpo...9&postcount=17

November 2020 GpuOwl-win v7.2-2-ga135d8d .7z https://www.mersenneforum.org/showpo...5&postcount=18 or .zip
gpuowl-win v7.2-13-g266aed4 .7z https://www.mersenneforum.org/showpo...7&postcount=23
gpuowl-win v7.2-21-g28dbf88 .zip https://www.mersenneforum.org/showpo...1&postcount=24
Febrary 2021 gpuowl-win v7.2-39-ga87a679 .zip https://mersenneforum.org/showpost.p...3&postcount=25
gpuowl-win v7.2-53-ge27846f https://mersenneforum.org/showpost.p...5&postcount=26
gpuowl-win v7.2-63-ge47361b https://mersenneforum.org/showpost.p...5&postcount=27
March 2021 gpuowl-win v7.2-69-g23c14a1 https://mersenneforum.org/showpost.p...9&postcount=28
gpuowl 7.2-70 for Linux https://mersenneforum.org/showpost.p...73&postcount=3
November 2021 gpuowl-win v7.2-86-gddf3314 https://mersenneforum.org/showpost.p...9&postcount=29


For the current version source (and previous too) https://github.com/preda/gpuowl
A separate forum thread was created for Windows gpuowl build posting. It is here


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-11-28 at 21:10 Reason: case consistency
kriesel is online now  
Old 2018-06-03, 20:29   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·593 Posts
Default gpuOwL requirements

My current understanding of the requirements, from the Windows point of view

OpenCL installed, at least version 1.2 if not 2.0. (Some of the more recent versions require OpenCL 2.0)
One or more units of OpenCL compatible hardware, with corresponding driver(s) supporting OpenCL of the required level, such as certain AMD GPUs, Intel IGPs, or CPUs. NVIDIA gpus with compute capability somewhere above 3.0 began to be supported at gpuowl v6.5. Currently GTX10xx and newer NVIDIA model gpus work, and somewhat older too.

gpuOwL below v6.5 does not currently run on NVIDIA gpus, on Linux or Windows, and to my knowledge Preda's releases before that did not. http://www.mersenneforum.org/showpos...&postcount=277 Someone reported porting a long ago version for his own use. http://www.mersenneforum.org/showpos...&postcount=107

Per instance, gpuOwL v1.9 on 8M fft length running exponents ~150M, exhibited in Task Manager, ~115MB private working set, 145MB working set, 382MB peak working set on Windows 7 64-bit. Meanwhile GPU occupancy was ~475-490MB each.

Discrete (add-in card) GPUs give better performance because of their dedicated memory. Integrated graphics processors use memory and TDP budget shared with the CPU core(s) and will affect performance of CPU applications. IGPs may lack DP support or otherwise lack compatibility with the AMD-oriented gpuOwL DP code, and so require running V1.9 -fft M61, which is slower.

In case of difficulty, it's recommended to verify the successful installation of OpenCL and compatible drivers with a utility, such as clinfo, oclDeviceQuery.exe, or the advanced tab of GPU-Z (a Windows gpu status graphical utility).

Memory requirements are modest. I'm seeing only about 290MB occupied during 4M fft length -DP transform on an RX550. That may scale to roughly 1.3GB for a future 16M fft implementation, 2.7GB? for 32M, which would not fit on that 2GB card. (It would probably also run way too slowly for that card to be practical, at roughly estimated 2-3 years per exponent.) It illustrates that for primality testing, 4GB is probably enough for a long time.

gpuOwL v2 on Windows 7 sp1 with current updates failed with an MSI-sourced driver dated April 2017 on an MSI RX550.
It worked with the MSI driver 18.2.3 dated Feb 26 2018.
There's a March 23 2018 driver available from AMD, v18.3.4, or probably more recent by now, that I have not tried on v2.0. https://support.amd.com/en-us/download


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-08-09 at 19:17 Reason: update for v6.5
kriesel is online now  
Old 2018-06-03, 20:38   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111001010102 Posts
Default gpuOwL features and requests

(caution, some of this is outdated. gpuowl is up to v6.5 as of 2019-05-09.)

Nice features
The Gerbicz check, of course, detecting errors and allowing timely rollback to ensure an accurate result.

Full time logging in addition to console display.

The -step argument in gpuOwl
It seems to me after a quick look at the source, the ability for the user to override the automatic program behavior, with a specified constant step count between Gerbicz checks. I interpret this to mean -step requires a step count parameter in the range 1000 to 500000 that are 1, 2, or 5 times a power of ten; 1000 or 2000 or 5000 or 10000 ... 500000. Or perhaps larger also.
It may be both output and Gerbicz check interval.
Two use cases I've run into are:
1) Hardware and software are very stable, exponent is far from an fft length limit, overhead of starting at small step sizes is not necessary, run a large step size from the start. (This case might benefit from adaptive step size after starting large if an error occurs during the run. Also user settable number of consecutive retries if an error occurs)
2) Repeatable error has occurred, such as the exponent is slightly too large for the fft length, I'd like to determine as finely as possible, at what iteration it occurs, with a rerun from last known good save file, using minimum step size until encountering the error again. (This case might benefit from a user set limit of retries (0 - ~9) before giving up on the exponent and starting or resuming the next worktodo entry.) Source fragments supporting the opinion are at http://www.mersenneforum.org/showpos...&postcount=353

The ability to switch transform midstream
Per Preda the author of gpuOwL, the save file is in compacted bits format (independent of the transform). see http://www.mersenneforum.org/showpos...&postcount=312

Feature requests
Save frequency option
Is there a command line option to control the frequency of saving a disaster-mitigator interim file, which seems to be produced at 10^7 iterations intervals in V1.9? I would like to try running at 5M iteration intervals for safety files. I don't see any in the V1.10 source. I suppose I could run some little batch file.

Gpu operation priority lower for gpuOwL computation or periodic yielding by gpuOwL
When running gpuOwL on the same card running the display, and using the local display rather than remote access, the screen seemed sluggish; I'm not aware of any option in gpuOwL equivalent to the -polite option in CUDALucas, which gives display operations a turn now and then (with frequency user settable).

A port to NVIDIA!

More fft lengths where useful, integrated into one application; 4M and 8M DP and M61 fft; 5000K DP, and anything new.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-17 at 14:49
kriesel is online now  
Old 2018-06-03, 22:08   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111001010102 Posts
Default Feature / version announcements

Gpuowl began as LL on AMD: initial github commit d5c48dd 2017-04-11
Introducing gpuOwL http://www.mersenneforum.org/showpos...32&postcount=1 2017-04-19
V0.2 http://www.mersenneforum.org/showpos...&postcount=135 2017-5-21
V0.3 (offsets) http://www.mersenneforum.org/showpos...&postcount=147 2017-05-26
V0.4 http://www.mersenneforum.org/showpos...&postcount=169 2017-06-05
V0.5 http://www.mersenneforum.org/showpos...&postcount=171 2017-06-12
V0.6 Addition of Jacobi check to LL flavor of gpuOwL, zero offset in my test http://www.mersenneforum.org/showpos...5&postcount=46 2017-08-08
Nonzero offset dropped and -supersafe option added http://www.mersenneforum.org/showpos...5&postcount=61 2017-08-10

switch from LL to PRP occurs. See also https://www.mersenneforum.org/showpo...3&postcount=15 for residue type versus version

V0.7 commit ccb7ed2 2017-08-27
V1.0 http://www.mersenneforum.org/showpos...&postcount=186 2017-08-30 PRP residue type 4
V1.1 http://www.mersenneforum.org/showpos...&postcount=191
V1.2-1.4 ?
V1.5 http://www.mersenneforum.org/showpos...&postcount=223 2017-9-30 PRP residue type 1
V1.7 f5198fc 2017-10-26
V1.8 http://www.mersenneforum.org/showpos...&postcount=224 2017-11-08
V1.8 help http://www.mersenneforum.org/showpos...&postcount=225 2017-11-08
V1.9 ?
V1.10 commit 83001d4 2018-01-27 (seen on github https://github.com/preda/gpuowl/blob/NTT/README.md)

V2.0 http://www.mersenneforum.org/showpos...&postcount=320 2018-02-07
perf tune and -time option http://www.mersenneforum.org/showpos...&postcount=331
V2.1-2.3 ?

V3.0 ?
V3.1 commit 5495ecf 2018-07-07
V3.2 ?
V3.3 fft lengths 4, 5, 8, 10, 16, 20M http://www.mersenneforum.org/showpos...&postcount=468 2018-07-13
V3.4 ?
V3.5 "Moar fft" A lot more lengths, from 0.5M to 144M (up to ~2.5x109 exponent) http://www.mersenneforum.org/showpos...&postcount=505 2018-07-15
V3.6 2018-08-11 commit f7c3865 see http://www.mersenneforum.org/showpos...&postcount=581
V3.7 TF integrated, TF works on OpenCL Linux ROCm 1.8.2 only http://www.mersenneforum.org/showpos...&postcount=586 2018-08-16
V3.8 commit a7ef0e5 2018-08-17
V3.8 fixes http://www.mersenneforum.org/showpos...&postcount=612 2018-08-17
V3.9 commit 4c4e034 2018-08-21

V4.0 commit fe7cd08 2018-09-10
V4.1 commit d77c6f0 2018-09-18
V4.3 PRP & P-1 combined https://www.mersenneforum.org/showpo...&postcount=694 2018-09-20 PRP residue type 4
V4.6 commit bb691cb 2018-10-20
V4.7 commit 12c6b75 2018-10-23 https://www.mersenneforum.org/showpo...&postcount=765 and see also post 766

V5.0 commit 1339429 2018-10-24 PRP & two stages of P-1 https://www.mersenneforum.org/showpo...&postcount=796
and see also https://www.mersenneforum.org/showpo...&postcount=798 2018-10-31

V6.0 PRP, a primenet.py script added for getting and queuing work and reporting results, and P-1 has been removed. https://www.mersenneforum.org/showpo...&postcount=912 2019-01-03
https://www.mersenneforum.org/showpo...&postcount=913
V6.1, commit c02a6ce, support for standalone P-1 has been added https://www.mersenneforum.org/showpo...&postcount=945
https://www.mersenneforum.org/showpo...&postcount=946
v6.2, commit 5b26497 2019-01-27, fft lengths up to 160M, some speedups https://www.mersenneforum.org/showpo...&postcount=956
v6.4 commit f6d3153 2019-04-09, added command line options -prp -pm1 https://www.mersenneforum.org/showpo...postcount=1056
v6.5 added command line option -dir for working directory; max fft length 192M https://www.mersenneforum.org/showpo...postcount=1062
V6.5-30c0508 switched back from prp residue type 4 to type 1 https://www.mersenneforum.org/showpo...postcount=1273 2019-07-10
V6.7-4, P-1 on NVIDIA https://www.mersenneforum.org/showpo...postcount=1343 2019-09-05
v6.8 per-exponent savefile folders https://www.mersenneforum.org/showpo...postcount=1335 2019-09-06
v6.9 https://www.mersenneforum.org/showpo...postcount=1361
v6.10-9-g54cba1d P-1 savefiles added https://www.mersenneforum.org/showpo...postcount=1384
v6.11-9-g9ae3189 NVIDIA CPU yield https://www.mersenneforum.org/showpo...postcount=1403
v6.11-83-ge270393 increased performance with various -use options https://www.mersenneforum.org/showpo...postcount=1584

V7.0-18 drops LLDC, merges P-1 into PRP https://www.mersenneforum.org/showthread.php?t=26007 2020-10-07
V7.1 proof V2 https://www.mersenneforum.org/showpo...&postcount=110 2020-10-22

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: zip gpuowl-v0.5-92b94b8.zip (289.0 KB, 215 views)
File Type: 7z gpuowl-0.6-LL.7z (155.1 KB, 233 views)

Last fiddled with by kriesel on 2020-10-23 at 10:13 Reason: added v7, 7.1
kriesel is online now  
Old 2018-10-18, 14:30   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111001010102 Posts
Default Determining upper exponent limit for a transform type and fft length

In gpuOwL V1.9, a lengthy experiment was conducted on how to efficiently determine the upper exponent limit for PRP with Gerbicz check for a given fft length and transform type. This was conducted on the M61 transform type and 4M fft length, for which the program author did not know the limit. (It was not possible to do it on 8M fft length, since maximum exponent is higher for the M61 transform than the corresponding DP transform, and the program's maximum exponent was capped at the 8M length DP transform approximate limit. It would also have taken more than 4 times as long.) The calculations were performed on a relatively slow RX550, which contributed significantly to the experiment's calendar duration.

By approaching the limit from above, generating error failures quickly in relatively few iterations, convergence to an approximate limit is achieved much faster than approaching the limit from below, with fully run to completion exponents. This might seem like a lot of iterations would be wasted for runs that generate errrors. However, gpuOwL in v1.9 and later had the useful property of storing interim results in a form that could be continued by a different program version and transform type. So a run that produces errors at a few million or even tens of millions of iterations with one transform type and fft length can be continued to completion by a different program version, transform type, and fft length. Many of the exponents that generated errors as M61 4M, have been run to completion with newer faster DP fft lengths as PRP tests or PRP DC tests on an RX480, as will be a few more still remaining.

Tabulating exponents tried, the success or failure, number of iterations completed, and fits through the failure data and success data separately, produce a good picture of a limit estimate. In this way, it can be determined to fairly close accuracy where the limit of completion lies, while doing work useful to the GIMPS project progression in first-time or double check effort. Tabulating along the way, with spreadsheet-generated regression fits, was used to somewhat guide the selection of next trial exponent. When practical, avoiding overlap with existing assignments was also considered in trial exponent selection.

In the example attachment, about 1.06 exponents' equivalent of failed run iterations, plus 5 completed exponents, were used, to determine a limit value around 83869400, within a span of about 190 out of ~84 million, or 2.24ppm. Approaching strictly from above, stopping when two exponents are completed, and using less closely spaced test exponents, one could reduce the work to less than the equivalent of 3 completed exponent runs.

At a cost of at most one full run per trial, the experiment could be extended to give about one bit more precision in the limit per additional trial. The practical utility of adding more bits precision to the limit is low, since there are only 5 currently unfactored candidates between 83869319 and 83869507, all of which have a LL or PRP result reported currently, and there are considerably faster fft length/transform combinations available now for performing primality or pseudoprimality tests in that exponent range. https://www.mersenne.org/report_exponent/?exp_lo=83869319&exp_hi=83869507&full=1

It's worthwhile to note that the limit value determined is not a guarantee that an exponent below that value would be certain to run to completion without error. It's merely a limit below which no error was seen, in the hundreds of millions of iterations required for the 4 exponents that completed. The error occurrences are not predictable, within a span of exponent of 15,000, and seem to behave statistically.

The limit of the M61 4M transform appears to occur at about 19.996 bits/word, or approximately 20 - 1/256, somewhere between 83,869,319 and 83869507.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Thumbnails
Click image for larger version

Name:	limit of m61.png
Views:	256
Size:	70.3 KB
ID:	19153  

Last fiddled with by kriesel on 2019-11-17 at 14:50
kriesel is online now  
Old 2018-11-05, 13:32   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×593 Posts
Default gpuowl V6.11, 6.7, 6.5, 6.2, 6.0, V5.0-9c13870 fft lengths, and earlier

For gpuowl fft lengths, K=1024, M=10242. Prior to theV5.0-9c13870 commit, the M=3 in the V5.0 list were not available.
Prior to V5.0-df2bdf2, <0.5M were not available.
Prior to ~V3.5, the M=5 and M=9 were not available.
V3.3 supported 4, 5, 8, 10, 16, 20M.
V2.0 supported 5000K only.
V1.9 supported 2, 4, and 8M.
V1.0 and earlier were 4M only. (I think the earliest PRP was at V0.7, prior to that it was LL)

V4.3:
Code:
openowl -list fft
2021-03-30 19:51:07 gpuowl 4.3-537c681
2021-03-30 19:51:07    FFT  maxExp    W    H M
2021-03-30 19:51:07   0.5M   10.3M  512  512 1
2021-03-30 19:51:07   1.0M   20.3M 1024  512 1
2021-03-30 19:51:07   1.0M   20.3M  512 1024 1
2021-03-30 19:51:07   2.0M   39.8M 1024 1024 1
2021-03-30 19:51:07   2.0M   39.8M  512 2048 1
2021-03-30 19:51:07   2.0M   39.8M 2048  512 1
2021-03-30 19:51:07   2.5M   49.4M  512  512 5
2021-03-30 19:51:07   4.0M   78.0M 1024 2048 1
2021-03-30 19:51:07   4.0M   78.0M 2048 1024 1
2021-03-30 19:51:07   4.0M   78.0M 4096  512 1
2021-03-30 19:51:07   4.5M   87.5M  512  512 9
2021-03-30 19:51:07   5.0M   96.9M 1024  512 5
2021-03-30 19:51:07   5.0M   96.9M  512 1024 5
2021-03-30 19:51:07   8.0M  153.0M 2048 2048 1
2021-03-30 19:51:07   8.0M  153.0M 4096 1024 1
2021-03-30 19:51:07   9.0M  171.6M 1024  512 9
2021-03-30 19:51:07   9.0M  171.6M  512 1024 9
2021-03-30 19:51:07  10.0M  190.0M 1024 1024 5
2021-03-30 19:51:07  10.0M  190.0M  512 2048 5
2021-03-30 19:51:07  10.0M  190.0M 2048  512 5
2021-03-30 19:51:07  16.0M  300.0M 4096 2048 1
2021-03-30 19:51:07  18.0M  336.3M 1024 1024 9
2021-03-30 19:51:07  18.0M  336.3M  512 2048 9
2021-03-30 19:51:07  18.0M  336.3M 2048  512 9
2021-03-30 19:51:07  20.0M  372.5M 1024 2048 5
2021-03-30 19:51:07  20.0M  372.5M 2048 1024 5
2021-03-30 19:51:07  20.0M  372.5M 4096  512 5
2021-03-30 19:51:07  36.0M  659.0M 1024 2048 9
2021-03-30 19:51:07  36.0M  659.0M 2048 1024 9
2021-03-30 19:51:07  36.0M  659.0M 4096  512 9
2021-03-30 19:51:07  40.0M  730.0M 2048 2048 5
2021-03-30 19:51:07  40.0M  730.0M 4096 1024 5
2021-03-30 19:51:07  72.0M 1290.9M 2048 2048 9
2021-03-30 19:51:07  72.0M 1290.9M 4096 1024 9
2021-03-30 19:51:07  80.0M 1429.8M 4096 2048 5
2021-03-30 19:51:07 144.0M 2527.5M 4096 2048 9
V5.0:
Code:
gpuowl 5.0-9c13870
 -list fft
   FFT  maxExp    W    H M
  0.1M    2.6M  256  256 1
  0.2M    5.2M  256  512 1
  0.2M    5.2M  512  256 1
  0.4M    7.7M  256  256 3
  0.5M   10.2M 1024  256 1
  0.5M   10.2M  256 1024 1
  0.5M   10.2M  512  512 1
  0.6M   12.7M  256  256 5
  0.8M   15.1M  256  512 3
  0.8M   15.1M  512  256 3
  1.0M   20.0M 1024  512 1
  1.0M   20.0M  256 2048 1
  1.0M   20.0M  512 1024 1
  1.0M   20.0M 2048  256 1
  1.1M   22.5M  256  256 9
  1.2M   24.9M  256  512 5
  1.2M   24.9M  512  256 5
  1.5M   29.7M 1024  256 3
  1.5M   29.7M  256 1024 3
  1.5M   29.7M  512  512 3
  2.0M   39.3M 1024 1024 1
  2.0M   39.3M  512 2048 1
  2.0M   39.3M 2048  512 1
  2.0M   39.3M 4096  256 1
  2.2M   44.1M  256  512 9
  2.2M   44.1M  512  256 9
  2.5M   48.9M 1024  256 5
  2.5M   48.9M  256 1024 5
  2.5M   48.9M  512  512 5
  3.0M   58.4M 1024  512 3
  3.0M   58.4M  256 2048 3
  3.0M   58.4M  512 1024 3
  3.0M   58.4M 2048  256 3
  4.0M   77.3M 1024 2048 1
  4.0M   77.3M 2048 1024 1
  4.0M   77.3M 4096  512 1
  4.5M   86.7M 1024  256 9
  4.5M   86.7M  256 1024 9
  4.5M   86.7M  512  512 9
  5.0M   96.1M 1024  512 5
  5.0M   96.1M  256 2048 5
  5.0M   96.1M  512 1024 5
  5.0M   96.1M 2048  256 5
  6.0M  114.7M 1024 1024 3
  6.0M  114.7M  512 2048 3
  6.0M  114.7M 2048  512 3
  6.0M  114.7M 4096  256 3
  8.0M  151.8M 2048 2048 1
  8.0M  151.8M 4096 1024 1
  9.0M  170.3M 1024  512 9
  9.0M  170.3M  256 2048 9
  9.0M  170.3M  512 1024 9
  9.0M  170.3M 2048  256 9
 10.0M  188.7M 1024 1024 5
 10.0M  188.7M  512 2048 5
 10.0M  188.7M 2048  512 5
 10.0M  188.7M 4096  256 5
 12.0M  225.3M 1024 2048 3
 12.0M  225.3M 2048 1024 3
 12.0M  225.3M 4096  512 3
 16.0M  298.1M 4096 2048 1
 18.0M  334.3M 1024 1024 9
 18.0M  334.3M  512 2048 9
 18.0M  334.3M 2048  512 9
 18.0M  334.3M 4096  256 9
 20.0M  370.4M 1024 2048 5
 20.0M  370.4M 2048 1024 5
 20.0M  370.4M 4096  512 5
 24.0M  442.3M 2048 2048 3
 24.0M  442.3M 4096 1024 3
 36.0M  656.2M 1024 2048 9
 36.0M  656.2M 2048 1024 9
 36.0M  656.2M 4096  512 9
 40.0M  727.0M 2048 2048 5
 40.0M  727.0M 4096 1024 5
 48.0M  868.1M 4096 2048 3
 72.0M 1287.5M 2048 2048 9
 72.0M 1287.5M 4096 1024 9
 80.0M 1426.4M 4096 2048 5
 144.0M 2525.2M 4096 2048 9
V6.0:
Code:
C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft
2019-02-04 23:05:21 gpuowl 6.0-b7bb1c3
2019-02-04 23:05:21 -list fft
2019-02-04 23:05:21 FFT    8K [  0.01M -    0.18M]  64-64
2019-02-04 23:05:21 FFT   24K [  0.04M -    0.51M]  64-64-3
2019-02-04 23:05:21 FFT   32K [  0.05M -    0.68M]  64-256 256-64
2019-02-04 23:05:21 FFT   40K [  0.06M -    0.85M]  64-64-5
2019-02-04 23:05:21 FFT   64K [  0.10M -    1.34M]  64-512 512-64
2019-02-04 23:05:21 FFT   72K [  0.11M -    1.50M]  64-64-9
2019-02-04 23:05:21 FFT   96K [  0.15M -    1.99M]  64-256-3 256-64-3
2019-02-04 23:05:21 FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256
2019-02-04 23:05:21 FFT  160K [  0.25M -    3.27M]  64-256-5 256-64-5
2019-02-04 23:05:21 FFT  192K [  0.29M -    3.91M]  64-512-3 512-64-3
2019-02-04 23:05:21 FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64
2019-02-04 23:05:21 FFT  288K [  0.44M -    5.81M]  64-256-9 256-64-9
2019-02-04 23:05:21 FFT  320K [  0.49M -    6.44M]  64-512-5 512-64-5
2019-02-04 23:05:21 FFT  384K [  0.59M -    7.69M]  1K-64-3 64-1K-3 256-256-3
2019-02-04 23:05:21 FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64
2019-02-04 23:05:21 FFT  576K [  0.88M -   11.42M]  64-512-9 512-64-9
2019-02-04 23:05:21 FFT  640K [  0.98M -   12.66M]  1K-64-5 64-1K-5 256-256-5
2019-02-04 23:05:21 FFT  768K [  1.18M -   15.12M]  64-2K-3 256-512-3 512-256-3 2K-64-3
2019-02-04 23:05:21 FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256
2019-02-04 23:05:21 FFT 1152K [  1.77M -   22.45M]  1K-64-9 64-1K-9 256-256-9
2019-02-04 23:05:21 FFT 1280K [  1.97M -   24.88M]  64-2K-5 256-512-5 512-256-5 2K-64-5
2019-02-04 23:05:21 FFT 1536K [  2.36M -   29.72M]  1K-256-3 256-1K-3 512-512-3 4K-64-3
2019-02-04 23:05:21 FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256
2019-02-04 23:05:21 FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9 2K-64-9
2019-02-04 23:05:21 FFT 2560K [  3.93M -   48.90M]  1K-256-5 256-1K-5 512-512-5 4K-64-5
2019-02-04 23:05:21 FFT    3M [  4.72M -   58.41M]  1K-512-3 256-2K-3 512-1K-3 2K-256-3
2019-02-04 23:05:21 FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512
2019-02-04 23:05:21 FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9 4K-64-9
2019-02-04 23:05:21 FFT    5M [  7.86M -   96.07M]  1K-512-5 256-2K-5 512-1K-5 2K-256-5
2019-02-04 23:05:21 FFT    6M [  9.44M -  114.74M]  1K-1K-3 512-2K-3 2K-512-3 4K-256-3
2019-02-04 23:05:21 FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K
2019-02-04 23:05:21 FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
2019-02-04 23:05:21 FFT   10M [ 15.73M -  188.68M]  1K-1K-5 512-2K-5 2K-512-5 4K-256-5
2019-02-04 23:05:21 FFT   12M [ 18.87M -  225.32M]  1K-2K-3 2K-1K-3 4K-512-3
2019-02-04 23:05:21 FFT   16M [ 25.17M -  298.13M]  4K-2K
2019-02-04 23:05:21 FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
2019-02-04 23:05:21 FFT   20M [ 31.46M -  370.44M]  1K-2K-5 2K-1K-5 4K-512-5
2019-02-04 23:05:21 FFT   24M [ 37.75M -  442.34M]  2K-2K-3 4K-1K-3
2019-02-04 23:05:21 FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9
2019-02-04 23:05:21 FFT   40M [ 62.91M -  727.03M]  2K-2K-5 4K-1K-5
2019-02-04 23:05:21 FFT   48M [ 75.50M -  868.07M]  4K-2K-3
2019-02-04 23:05:21 FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9
2019-02-04 23:05:21 FFT   80M [125.83M - 1426.38M]  4K-2K-5
2019-02-04 23:05:21 FFT  144M [226.49M - 2525.23M]  4K-2K-9
V6.2-4a213af:
Code:
FFT Configurations:
FFT    8K [  0.01M -    0.18M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   48K [  0.07M -    1.01M]  64-64-6
FFT   64K [  0.10M -    1.34M]  64-512 512-64
FFT   72K [  0.11M -    1.50M]  64-64-9
FFT   80K [  0.12M -    1.66M]  64-64-10
FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.91M]  64-256-6 256-64-6
FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.81M]  64-256-9 256-64-9
FFT  320K [  0.49M -    6.44M]  64-256-10 256-64-10
FFT  384K [  0.59M -    7.69M]  64-512-6 512-64-6
FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.42M]  64-512-9 512-64-9
FFT  640K [  0.98M -   12.66M]  64-512-10 512-64-10
FFT  768K [  1.18M -   15.12M]  1K-64-6 64-1K-6 256-256-6
FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.45M]  1K-64-9 64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.88M]  1K-64-10 64-1K-10 256-256-10
FFT 1536K [  2.36M -   29.72M]  64-2K-6 256-512-6 512-256-6 2K-64-6
FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9 2K-64-9
FFT 2560K [  3.93M -   48.90M]  64-2K-10 256-512-10 512-256-10 2K-64-10
FFT    3M [  4.72M -   58.41M]  1K-256-6 256-1K-6 512-512-6 4K-64-6
FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9 4K-64-9
FFT    5M [  7.86M -   96.07M]  1K-256-10 256-1K-10 512-512-10 4K-64-10
FFT    6M [  9.44M -  114.74M]  1K-512-6 256-2K-6 512-1K-6 2K-256-6
FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  188.68M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   12M [ 18.87M -  225.32M]  1K-1K-6 512-2K-6 2K-512-6 4K-256-6
FFT   16M [ 25.17M -  298.13M]  4K-2K
FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  370.44M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   24M [ 37.75M -  442.34M]  1K-2K-6 2K-1K-6 4K-512-6
FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  727.03M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   48M [ 75.50M -  868.07M]  2K-2K-6 4K-1K-6
FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1426.38M]  2K-2K-10 4K-1K-10
FFT   96M [150.99M - 1702.92M]  4K-2K-6
FFT  144M [226.49M - 2525.23M]  4K-2K-9
FFT  160M [251.66M - 2797.39M]  4K-2K-10
As far as I know, up to and including some commit of V6.5 the fft list is the same as for V6.2.
For v6.5-c48d46f (but note, don't use combinations with height 64 and a middle step; https://www.mersenneforum.org/showpost.php?p=517774&postcount=1204 assuming the fft list is again W H Middle, that's the ones below in bold):
Code:
FFT Configurations: 
FFT    8K [  0.01M -    0.18M]  64-64 
FFT   32K [  0.05M -    0.68M]  64-256 256-64 
FFT   48K [  0.07M -    1.01M]  64-64-6
FFT   64K [  0.10M -    1.34M]  64-512 512-64 
FFT   72K [  0.11M -    1.50M]  64-64-9
FFT   80K [  0.12M -    1.66M]  64-64-10 
FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256 
FFT  192K [  0.29M -    3.91M]  64-256-6 256-64-6 
FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64 
FFT  288K [  0.44M -    5.81M]  64-256-9 256-64-9 
FFT  320K [  0.49M -    6.44M]  64-256-10 256-64-10 
FFT  384K [  0.59M -    7.69M]  64-512-6 512-64-6 
FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64 
FFT  576K [  0.88M -   11.42M]  64-512-9 512-64-9 
FFT  640K [  0.98M -   12.66M]  64-512-10 512-64-10 
FFT  768K [  1.18M -   15.12M]  1K-64-6 64-1K-6 256-256-6 
FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256 
FFT 1152K [  1.77M -   22.45M]  1K-64-9 64-1K-9 256-256-9 
FFT 1280K [  1.97M -   24.88M]  1K-64-10 64-1K-10 256-256-10 
FFT 1536K [  2.36M -   29.72M]  64-2K-6 256-512-6 512-256-6 2K-64-6 
FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256 
FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9 2K-64-9 
FFT 2560K [  3.93M -   48.90M]  64-2K-10 256-512-10 512-256-10 2K-64-10 
FFT    3M [  4.72M -   58.41M]  1K-256-6 256-1K-6 512-512-6 4K-64-6 
FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512 
FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9 4K-64-9 
FFT    5M [  7.86M -   96.07M]  1K-256-10 256-1K-10 512-512-10 4K-64-10 
FFT    6M [  9.44M -  114.74M]  1K-512-6 256-2K-6 512-1K-6 2K-256-6 
FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K 
FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9 
FFT   10M [ 15.73M -  188.68M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10 
FFT   12M [ 18.87M -  225.32M]  1K-1K-6 512-2K-6 2K-512-6 4K-256-6 
FFT   16M [ 25.17M -  298.13M]  4K-2K 
FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9 
FFT   20M [ 31.46M -  370.44M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10 
FFT   24M [ 37.75M -  442.34M]  1K-2K-6 2K-1K-6 4K-512-6 
FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9 
FFT   40M [ 62.91M -  727.03M]  1K-2K-10 2K-1K-10 4K-512-10 
FFT   48M [ 75.50M -  868.07M]  2K-2K-6 4K-1K-6 
FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9 
FFT   80M [125.83M - 1426.38M]  2K-2K-10 4K-1K-10 
FFT   96M [150.99M - 1702.92M]  4K-2K-6 
FFT  144M [226.49M - 2525.23M]  4K-2K-9 
FFT  160M [251.66M - 2797.39M]  4K-2K-10
At some point, between v6.5-c48d46f 5/7/19 and v6.5-76-g1ca08e2 5/27/19 there was a change to up to 192M fft sizes allowing approximately gigadigit tests. V6.5-84-g30c0508 is the last build I have indicating FFT 192M [301.99M - 3339.40M] 4K-2K-12.(upper limit 1.005gigadigits). I think that's due to experience with error rates, limits specific to fft lengths were revised downward slightly.

Following are for V6.7-4.
Code:
FFT Configurations:
FFT    8K [  0.01M -    0.17M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   64K [  0.10M -    1.33M]  64-512 512-64
FFT  128K [  0.20M -    2.62M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.89M]  64-256-6
FFT  224K [  0.34M -    4.52M]  64-256-7
FFT  256K [  0.39M -    5.15M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.77M]  64-256-9
FFT  320K [  0.49M -    6.40M]  64-256-10
FFT  352K [  0.54M -    7.02M]  64-256-11
FFT  384K [  0.59M -    7.64M]  64-256-12 64-512-6
FFT  448K [  0.69M -    8.88M]  64-512-7
FFT  512K [  0.79M -   10.12M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.35M]  64-512-9
FFT  640K [  0.98M -   12.58M]  64-512-10
FFT  704K [  1.08M -   13.81M]  64-512-11
FFT  768K [  1.18M -   15.03M]  64-512-12 64-1K-6 256-256-6
FFT  896K [  1.38M -   17.47M]  64-1K-7 256-256-7
FFT    1M [  1.57M -   19.89M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.32M]  64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.73M]  64-1K-10 256-256-10
FFT 1408K [  2.16M -   27.14M]  64-1K-11 256-256-11
FFT 1536K [  2.36M -   29.54M]  64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6
FFT 1792K [  2.75M -   34.33M]  64-2K-7 256-512-7 512-256-7
FFT    2M [  3.15M -   39.10M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   43.85M]  64-2K-9 256-512-9 512-256-9
FFT 2560K [  3.93M -   48.59M]  64-2K-10 256-512-10 512-256-10
FFT 2816K [  4.33M -   53.32M]  64-2K-11 256-512-11 512-256-11
FFT    3M [  4.72M -   58.04M]  1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [  5.51M -   67.44M]  1K-256-7 256-1K-7 512-512-7
FFT    4M [  6.29M -   76.81M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.15M]  1K-256-9 256-1K-9 512-512-9
FFT    5M [  7.86M -   95.46M]  1K-256-10 256-1K-10 512-512-10
FFT 5632K [  8.65M -  104.74M]  1K-256-11 256-1K-11 512-512-11
FFT    6M [  9.44M -  114.00M]  1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT    7M [ 11.01M -  132.46M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  150.85M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  169.18M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  187.45M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   11M [ 17.30M -  205.67M]  1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT   12M [ 18.87M -  223.85M]  1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT   14M [ 22.02M -  260.08M]  1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT   16M [ 25.17M -  296.17M]  4K-2K
FFT   18M [ 28.31M -  332.13M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  367.98M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   22M [ 34.60M -  403.74M]  1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT   24M [ 37.75M -  439.40M]  1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT   28M [ 44.04M -  510.47M]  1K-2K-7 2K-1K-7 4K-512-7
FFT   36M [ 56.62M -  651.81M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  722.13M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   44M [ 69.21M -  792.25M]  1K-2K-11 2K-1K-11 4K-512-11
FFT   48M [ 75.50M -  862.18M]  1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT   56M [ 88.08M - 1001.57M]  2K-2K-7 4K-1K-7
FFT   72M [113.25M - 1278.70M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1416.57M]  2K-2K-10 4K-1K-10
FFT   88M [138.41M - 1554.04M]  2K-2K-11 4K-1K-11
FFT   96M [150.99M - 1691.15M]  2K-2K-12 4K-1K-12 4K-2K-6
FFT  112M [176.16M - 1964.39M]  4K-2K-7
FFT  144M [226.49M - 2507.57M]  4K-2K-9
FFT  160M [251.66M - 2777.78M]  4K-2K-10
FFT  176M [276.82M - 3047.18M]  4K-2K-11
FFT  192M [301.99M - 3315.86M]  4K-2K-12
As of V6.11-219, the list was:
Code:
FFT Configurations:
FFT  128K [  0.20M -    2.62M]  256-256
FFT  256K [  0.39M -    5.15M]  256-512 512-256
FFT  512K [  0.79M -   10.12M]  1K-256 256-256-4 256-1K 512-512
FFT  768K [  1.18M -   15.03M]  256-256-6
FFT  896K [  1.38M -   17.47M]  256-256-7
FFT    1M [  1.57M -   19.89M]  1K-512 256-256-8 256-512-4 256-2K 512-256-4 512-1K 2K-256
FFT 1152K [  1.77M -   22.32M]  256-256-9
FFT 1280K [  1.97M -   24.73M]  256-256-10
FFT 1408K [  2.16M -   27.14M]  256-256-11
FFT 1536K [  2.36M -   29.54M]  256-256-12 256-512-6 512-256-6
FFT 1792K [  2.75M -   34.33M]  256-512-7 512-256-7
FFT    2M [  3.15M -   39.10M]  1K-256-4 1K-1K 256-512-8 256-1K-4 512-256-8 512-512-4 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   43.85M]  256-512-9 512-256-9
FFT 2560K [  3.93M -   48.59M]  256-512-10 512-256-10
FFT 2816K [  4.33M -   53.32M]  256-512-11 512-256-11
FFT    3M [  4.72M -   58.04M]  1K-256-6 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [  5.51M -   67.44M]  1K-256-7 256-1K-7 512-512-7
FFT    4M [  6.29M -   76.81M]  1K-256-8 1K-512-4 1K-2K 256-1K-8 256-2K-4 512-512-8 512-1K-4 2K-256-4 2K-1K 4K-512
FFT 4608K [  7.08M -   86.15M]  1K-256-9 256-1K-9 512-512-9
FFT    5M [  7.86M -   95.46M]  1K-256-10 256-1K-10 512-512-10
FFT 5632K [  8.65M -  104.74M]  1K-256-11 256-1K-11 512-512-11
FFT    6M [  9.44M -  114.00M]  1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT    7M [ 11.01M -  132.46M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  150.85M]  1K-512-8 1K-1K-4 256-2K-8 512-1K-8 512-2K-4 2K-256-8 2K-512-4 2K-2K 4K-256-4 4K-1K
FFT    9M [ 14.16M -  169.18M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  187.45M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   11M [ 17.30M -  205.67M]  1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT   12M [ 18.87M -  223.85M]  1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT   14M [ 22.02M -  260.08M]  1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT   16M [ 25.17M -  296.17M]  1K-1K-8 1K-2K-4 512-2K-8 2K-512-8 2K-1K-4 4K-256-8 4K-512-4 4K-2K
FFT   18M [ 28.31M -  332.13M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  367.98M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   22M [ 34.60M -  403.74M]  1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT   24M [ 37.75M -  439.40M]  1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT   28M [ 44.04M -  510.47M]  1K-2K-7 2K-1K-7 4K-512-7
FFT   32M [ 50.33M -  581.27M]  1K-2K-8 2K-1K-8 2K-2K-4 4K-512-8 4K-1K-4
FFT   36M [ 56.62M -  651.81M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  722.13M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   44M [ 69.21M -  792.25M]  1K-2K-11 2K-1K-11 4K-512-11
FFT   48M [ 75.50M -  862.18M]  1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT   56M [ 88.08M - 1001.57M]  2K-2K-7 4K-1K-7
FFT   64M [100.66M - 1140.39M]  2K-2K-8 4K-1K-8 4K-2K-4
FFT   72M [113.25M - 1278.70M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1416.57M]  2K-2K-10 4K-1K-10
FFT   88M [138.41M - 1554.04M]  2K-2K-11 4K-1K-11
FFT   96M [150.99M - 1691.15M]  2K-2K-12 4K-1K-12 4K-2K-6
FFT  112M [176.16M - 1964.39M]  4K-2K-7
FFT  128M [201.33M - 2236.48M]  4K-2K-8
FFT  144M [226.49M - 2507.57M]  4K-2K-9
FFT  160M [251.66M - 2777.78M]  4K-2K-10
FFT  176M [276.82M - 3047.18M]  4K-2K-11
FFT  192M [301.99M - 3315.86M]  4K-2K-12
Subsequently the maximum fft was pared back by v6.11-255-g81fa7c3 to 96M. By v6.11-318-g3109989 the max fft became 120M allowing ~231 max exponent. As of v6.11-330-ge5a8f2c, the fft lengths are:
Code:
FFT Configurations (specify with -fft <width>:<middle>:<height> from the set below):
FFT  128K [  0.20M -    2.63M]  256:1:256
FFT  256K [  0.39M -    5.18M]  256:1:512 512:1:256
FFT  384K [  0.59M -    7.72M]  256:3:256
FFT  512K [  0.79M -   10.25M]  256:4:256
FFT  640K [  0.98M -   12.72M]  256:5:256
FFT  768K [  1.18M -   15.22M]  256:6:256 256:3:512 512:3:256
FFT  896K [  1.38M -   17.68M]  256:7:256
FFT    1M [  1.57M -   20.20M]  256:8:256 256:4:512 512:4:256
FFT 1152K [  1.77M -   22.62M]  256:9:256
FFT 1.25M [  1.97M -   25.07M]  256:10:256 256:5:512 512:5:256
FFT 1408K [  2.16M -   27.52M]  256:11:256
FFT 1.50M [  2.36M -   30.00M]  1K:3:256 256:12:256 256:6:512 256:3:1K 512:6:256 512:3:512
FFT 1664K [  2.56M -   32.44M]  256:13:256
FFT 1.75M [  2.75M -   34.85M]  256:14:256 256:7:512 512:7:256
FFT 1920K [  2.95M -   37.23M]  256:15:256
FFT    2M [  3.15M -   39.82M]  1K:4:256 256:8:512 256:4:1K 512:8:256 512:4:512
FFT 2.25M [  3.54M -   44.57M]  256:9:512 512:9:256
FFT 2.50M [  3.93M -   49.41M]  1K:5:256 256:10:512 256:5:1K 512:10:256 512:5:512
FFT 2.75M [  4.33M -   54.24M]  256:11:512 512:11:256
FFT    3M [  4.72M -   59.13M]  1K:6:256 1K:3:512 256:12:512 256:6:1K 512:12:256 512:6:512 512:3:1K
FFT 3.25M [  5.11M -   63.93M]  256:13:512 512:13:256
FFT 3.50M [  5.51M -   68.67M]  1K:7:256 256:14:512 256:7:1K 512:14:256 512:7:512
FFT 3.75M [  5.90M -   73.37M]  256:15:512 512:15:256
FFT    4M [  6.29M -   78.46M]  1K:8:256 1K:4:512 256:8:1K 512:8:512 512:4:1K
FFT 4.50M [  7.08M -   87.83M]  1K:9:256 256:9:1K 512:9:512
FFT    5M [  7.86M -   97.36M]  1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K
FFT 5.50M [  8.65M -  106.88M]  1K:11:256 256:11:1K 512:11:512
FFT    6M [  9.44M -  116.51M]  1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256
FFT 6.50M [ 10.22M -  125.95M]  1K:13:256 256:13:1K 512:13:512
FFT    7M [ 11.01M -  135.29M]  1K:14:256 1K:7:512 256:14:1K 512:14:512 512:7:1K
FFT 7.50M [ 11.80M -  144.55M]  1K:15:256 256:15:1K 512:15:512
FFT    8M [ 12.58M -  154.59M]  1K:8:512 1K:4:1K 512:8:1K 4K:4:256
FFT    9M [ 14.16M -  173.03M]  1K:9:512 512:9:1K
FFT   10M [ 15.73M -  191.79M]  1K:10:512 1K:5:1K 512:10:1K 4K:5:256
FFT   11M [ 17.30M -  210.53M]  1K:11:512 512:11:1K
FFT   12M [ 18.87M -  229.51M]  1K:12:512 1K:6:1K 512:12:1K 4K:6:256 4K:3:512
FFT   13M [ 20.45M -  248.10M]  1K:13:512 512:13:1K
FFT   14M [ 22.02M -  266.49M]  1K:14:512 1K:7:1K 512:14:1K 4K:7:256
FFT   15M [ 23.59M -  284.71M]  1K:15:512 512:15:1K
FFT   16M [ 25.17M -  304.49M]  1K:8:1K 4K:8:256 4K:4:512
FFT   18M [ 28.31M -  340.79M]  1K:9:1K 4K:9:256
FFT   20M [ 31.46M -  377.72M]  1K:10:1K 4K:10:256 4K:5:512
FFT   22M [ 34.60M -  414.63M]  1K:11:1K 4K:11:256
FFT   24M [ 37.75M -  451.99M]  1K:12:1K 4K:12:256 4K:6:512 4K:3:1K
FFT   26M [ 40.89M -  488.59M]  1K:13:1K 4K:13:256
FFT   28M [ 44.04M -  524.79M]  1K:14:1K 4K:14:256 4K:7:512
FFT   30M [ 47.19M -  560.64M]  1K:15:1K 4K:15:256
FFT   32M [ 50.33M -  599.62M]  4K:8:512 4K:4:1K
FFT   36M [ 56.62M -  671.04M]  4K:9:512
FFT   40M [ 62.91M -  743.74M]  4K:10:512 4K:5:1K
FFT   44M [ 69.21M -  816.39M]  4K:11:512
FFT   48M [ 75.50M -  889.11M]  4K:12:512 4K:6:1K
FFT   52M [ 81.79M -  961.97M]  4K:13:512
FFT   56M [ 88.08M - 1033.20M]  4K:14:512 4K:7:1K
FFT   60M [ 94.37M - 1103.74M]  4K:15:512
FFT   64M [100.66M - 1177.31M]  4K:8:1K
FFT   72M [113.25M - 1321.02M]  4K:9:1K
FFT   80M [125.83M - 1464.31M]  4K:10:1K
FFT   88M [138.41M - 1607.03M]  4K:11:1K
FFT   96M [150.99M - 1751.79M]  4K:12:1K
FFT  104M [163.58M - 1893.52M]  4K:13:1K
FFT  112M [176.16M - 2035.14M]  4K:14:1K
FFT  120M [188.74M - 2172.36M]  4K:15:1K
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-31 at 00:54 Reason: minor edits, added v4.3 fft choices
kriesel is online now  
Old 2018-12-14, 19:46   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·593 Posts
Default PRP3 run time scaling in V5.0-9c13870 (no P-1)

Gpuowl PRP3 has been run on all known Mersenne prime exponents feasible on its currently available fft lengths, mostly in ascending order. This provides run time scaling, reliability check on the hardware, and a check for any occurrence of false negatives or error detections, from the same run set. The test is being run on an RX480 under Windows 7 x64, along with a running instance of prime95 and mfakto running on an RX550 in the same system.

For the exponents below 216091, the minimum available fft length, 128K, is too large, giving bits/word below 1.5, and in most cases immediate fatal errors. p=132049 runs briefly, at 1.01 bits/word, but detects Gerbicz check errors repeatably in the initial 800 iteration block and exits after 3 rounds of that.
For exponents 216091 to 1398269, the run time is highly linear since they all are run at fft length 128K; p0.99.
For exponents above 1398269, since the fft length is chosen approximately proportional to the exponent, it seems reasonable to expect the scaling to approximate a power law above 2, since fft multiplication time is, per Knuth and other sources, proportional to n ln n ln ln n. Then a full PRP3 test would take n-1 iterations, or approximately n2 ln n ln ln n for large n.

In the attachment for CUDALucas run time scaling at https://www.mersenneforum.org/showpo...23&postcount=2 there is scaling to p1.85 for 106<p<107, and to p2.095 for 107<p<108.
Run time scaling for prime95 for 86243<=p<=2976221 was p2.094.https://www.mersenneforum.org/showpo...78&postcount=2

The scaling for gpuowl appears to be lower than expected and lower than seen for other applications. For 1398269<p<107, runtime scales as p1.518; for 107<p<108 it is p1.72 to 1.88, which implies an fft multiplication time scaling proportional to lower than linear, similar to a lower exponent range in CUDALucas. Perhaps gpuowl does not reach asymptotic scaling until higher exponents. From 100M exponent to 100Mdigit, the gpuowl scaling was p2.04, consistent with that. Low n runs appear to be affected by setup overhead in CUDALucas and clLucas also, reducing the power seen in scaling fits. For gpuOwL, the OpenCl compilation each time contributes 2 to 3 seconds overhead. Frequent console or log output may also be contributing.

Finally, and importantly, no false negatives and no detected errors were observed.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf gpuowl Mp PRP run times.pdf (16.0 KB, 239 views)

Last fiddled with by kriesel on 2019-11-17 at 14:52
kriesel is online now  
Old 2019-02-13, 01:55   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·593 Posts
Default gpuowl .owl file header style versus gpuowl version samples

more <n>.owl or head <n>.owl

Z:\sources\mersennes\gpuowl\ken\v10test>more c77500079.ll
(none)
Code:
Pí² d   gÜ?
?g¦ s¢?
gpuowl V1.9 line 1 of 38000009.owl
Code:
OWL 3 38000009 103000 0 500
C:\msys64\home\ken\gpuowl-compile\v1.10>head 77230663.owl
Code:
OWL 3 77230663 1414500 0 500
C:\msys64\home\ken\gpuowl-compile\v2.0>more 89000167.owl
Code:
OWL 3 89000167 41500 0 500
-------------------didn't see a file format 4------------
C:\msys64\home\ken\gpuowl-compile\v3.3>more 89000167.owl
Code:
OWL 5
Comment: gpuOwL v3.3-bc4a29f; 2018-11-04 03:07:53 UTC
Type: PRP
Exponent: 89000167
Iteration: 22000
PRP-block-size: 400
Residue-64: 0xb90013de9a857278
Errors: 0
End-of-header:
gpuowl 3.8 .owl file header
Code:
OWL 5
Comment: gpuOwL v3.8-91c52fa; 2019-02-12 23:14:59 UTC
Type: PRP
Exponent: 299000059
Iteration: 93760000
PRP-block-size: 400
Residue-64: 0x95d3c1aae6883a8b
Errors: 0
End-of-header:
C:\msys64\home\ken\gpuowl-compile\v3.9>type 89000167.owl | more
Code:
OWL 5
Comment: gpuOwL v3.9-da61ebd; 2018-11-04 03:43:17 UTC
Type: PRP
Exponent: 89000167
Iteration: 26400
PRP-block-size: 400
Residue-64: 0x0a03f10ca11565dc
Errors: 0
End-of-header:
-----------------------didn't see a file format 6------

C:\msys64\home\ken\gpuowl-compile\v4.3>head 89000167.owl
Code:
OWL PRP 7 89000167 144000 0 400 624cac006596e5bb
C:\msys64\home\ken\gpuowl-compile\v4.6>more 89000167.owl
Code:
OWL PRP 7 89000167 22000 0 400 b90013de9a857278
C:\msys64\home\ken\gpuowl-compile\v4.7>more 89000167.owl
Code:
OWL PRP 7 89000167 0 0 400 0000000000000003
C:\msys64\home\ken\gpuowl-compile\v5.0>more 89000167.owl
Code:
OWL PRP 8 89000167 44000 0 400 57049b5adf2df847 1 0
C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>more 81885841.owl
Code:
OWL PRP 8 81885841 81760000 860000 400 35d1c3b4bd099ce1 1 0
PRP-1 (PRP&P-1 combined): "OWL PRP" file version 8, exponent iteration B1 blocksize res64 ? ?
PRP-only has B1=0


C:\msys64\home\ken\gpuowl-compile\v6.2-e2ffe65>more 86243.owl
Code:
OWL PRP 9 86243 800 400 47fcdf05631f4989
Caveats:
Does not include the old 0.1-0.6 LL file formats.
Does not include header info for any version above v6.2.
Didn't find a way to compile the TF-capable versions on Windows, so no such files to look into.
Haven't attempted any P-1-only.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-17 at 14:54 Reason: added line for upper version limit of content
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mersenne Prime mostly-GPU Computing reference material kriesel kriesel 34 2021-11-13 00:26
Reference material discussion thread kriesel kriesel 78 2021-07-12 13:51
CUDALucas-specific reference material kriesel kriesel 9 2020-05-28 23:32
Mfaktc-specific reference material kriesel kriesel 8 2020-04-17 03:50
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 05:52.


Tue Dec 7 05:52:45 UTC 2021 up 137 days, 21 mins, 0 users, load averages: 1.29, 1.31, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.