2018-05-29, 02:56 | #1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
gpuOwL-specific reference material
This thread is intended to hold only reference material specifically for GpuOwL.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383.) To get started in gpuowl, on linux, see Ernst Mayer's directions: https://mersenneforum.org/showthread.php?t=25601 For Windows, see below, which assumes the GPU's driver, AMD OpenCL, etc are installed and working, and confirmed so with an OpenCL test utility. In either case, note that the computation types, hardware supported, fft size limits, file formats, etc have varied greatly and rapidly over the course of the hundreds of versions. Choose version according to what you want to run and what each offers. On a completed install of Windows (may as well have done Windows updates to current also): Enable or install and configure any remote desktop software you plan to use; Windows Remote Desktop, TightVNC, etc. Create a working folder. (I create one for each instance for each GPU.) Do create it as a subdirectory of the user's default folder. Permissions will be ok there. DO NOT place it in system directories, Program Files, etc. Permission problems are common if attempting to run there. Backups Now might be a good time to refresh your Windows backup, system restore point, etc. Also verify that your gpuowl folders, containing assignments, results, interim save files, configuration, etc. will be backed up. Get and install the gpuowl software Either download a current build from the end of the Gpuowl Windows build thread https://www.mersenneforum.org/showthread.php?t=25624 or for an earlier version https://www.mersenneforum.org/showpo...39&postcount=4, or from the download mirror https://download.mersenne.ca/ which has many Windows versions and a few Linux versions. Unzip it into a working folder under the user's home directory (NOT in Program Files or other restricted areas as some have attempted, and run into permissions problems).Or follow the Windows build instructions at https://www.mersenneforum.org/showpo...4&postcount=21 to create a build environment (once) and follow the compile and link section there as needed.Next, decide whether you will run it manually or use primenet.py. I recommend running one instance of gpuowl, manually, at first, to learn what normal operation looks like, so that if/when issues with operation appear, you're familiar with the program. (Start with the simplest possible configuration/scenario. Add complexity later, after learning the basics.) It will also give you a greater appreciation for the automation built into primenet.py and other programs such as prime95 and mprime. If using gpuowl's primenet.py or certain other tools provided for gpuowl, you'll need a Python 3 installation. For Python, follow the instructions of a good one, such as https://docs.python.org/3/using/wind...l#windows-full (I've been exploring compiling primenet.py into a standalone executable, but haven't quite worked out how to get one small enough to post it on the forum as an attachment yet.) Note, not all gpuowl versions include all the features described below. Some are rather recent additions. Confirm gpuowl will run, and can find OpenCL-supported GPU(s), by locally generating a help output from it. Code:
gpuowl-win -h >help.txt Type help.txt If gpuowl's help.txt generated on your system does not list at least one detected GPU, there may be a problem with OpenCL support. Resolve that somehow before continuing. (See a sample help.txt in a downloaded Windows build zip file for comparison. Explore OpenCL related test utilities.) Create a config.txt Suggested contents: -user your-primenet-uid -cpu systemname-gpumodel-number-winstance -device n -maxAlloc gpuram-delta For example, since my primenet-uid is kriesel, for system asr2, second Radeon VII gpu, instance 2, the gpuram is 16GB total but if I have 2 instances running P-1 on the same gpu and their stage 2s might coincide, I might use maxAlloc= gpuram 8000 - delta 500 =7500 for each instance. Then the config.txt line for the second gpu, second instance would be -user kriesel -cpu asr2-radeonvii-2-w2 -device 1 -maxAlloc 7500 (Device numbers start at 0 in gpuowl and some other GIMPS GPU applications.) Other -options are given in help output. -yield or -proof 9 may be useful. Whoever runs the cert will appreciate the reduced effort required for the cert and the overall efficiency. (And occasionally that may be you!) Format of command line options and config.txt are the same. However, config.txt must be one line and followed by a return. Batch file or shell script I find it useful to create a short batch file also, and desktop shortcut to it. The batch file can be as simple as g6.bat: Code:
title %cd% gpuowl-win The shortcut command should not be a direct invocation of the batch file. Use a cmd /k prefix so the window lingers in case of problems, to give you long enough to see and read any error message. Having the help.txt and use-flags.txt files in the working directory for ready reference is also convenient. Create a worktodo.txt Go to https://www.mersenne.org/manual_assignment/ to get a single PRP assignment or PRP DC assignment. The excellent Gerbicz error check GEC) on PRP work will determine whether the GPU is producing reliable interim results. Verify the gpu and system combination is reliable in PRP/GEC, before attempting any P-1 factoring or LL DC, which have less error detection. Open worktodo.txt for editing. Paste the assignment into worktodo.txt and follow it with a return. Save the modified file. Try running gpuowl via the batch file at a Windows command prompt: g6 If it works, it should look something like the following, allowing for differences in parameters entered in config.txt, gpuowl version, exponent, work type, etc. If not, fix and retry. A PRP start: lines should contain "OK", "EE" instead means errors, trouble, perhaps clocks set too high or unreliable system ram or a failed fan, or the fft length is specified too short for the exponent and computation type. Code:
2020-05-29 15:03:44 config: -device 1 -user kriesel -cpu roa/rx550 -use NO_ASM -maxAlloc 1500 2020-05-29 15:03:44 device 1, unique id '' 2020-05-29 15:03:44 roa/rx550 94955299 FFT: 5M 1K:10:256 (18.11 bpw) 2020-05-29 15:03:44 roa/rx550 Expected maximum carry32: 48210000 2020-05-29 15:03:45 roa/rx550 OpenCL args "-DEXP=94955299u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.cfec567b14fd8p-3 -DIWEIGHT_STEP=0x8.a43aff8beae48p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-05-29 15:03:52 roa/rx550 OpenCL compilation in 6.31 s 2020-05-29 15:03:58 roa/rx550 94955299 OK 0 loaded: blockSize 400, 0000000000000003 2020-05-29 15:04:16 roa/rx550 94955299 OK 800 0.00%; 14232 us/it; ETA 15d 15:24; 69f923b24568ac18 (check 5.88s) 2020-05-29 15:51:37 roa/rx550 94955299 OK 200000 0.21%; 14233 us/it; ETA 15d 14:38; 986d9b55f22ac736 (check 5.88s) Code:
2020-08-15 11:30:03 gpuowl v6.11-340-g41d435f 2020-08-15 11:30:04 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -proof 8 2020-08-15 11:30:04 device 0, unique id '' 2020-08-15 11:30:04 condorella/rx480 183000023 FFT: 10M 1K:10:512 (17.45 bpw) 2020-08-15 11:30:04 condorella/rx480 Expected maximum carry32: 43400000 2020-08-15 11:30:07 condorella/rx480 OpenCL args "-DEXP=183000023u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=10u -DPM1=1 -DAMDGPU=1 -DCARRYM64=1 -DWEIGHT_STEP_ MINUS_1=0xe.c72a0862a91p-5 -DIWEIGHT_STEP_MINUS_1=-0xa.1bff0fe0af57p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-08-15 11:30:07 condorella/rx480 ASM compilation failed, retrying compilation using NO_ASM 2020-08-15 11:30:15 condorella/rx480 OpenCL compilation in 8.00 s 2020-08-15 11:30:15 condorella/rx480 183000023 P1 B1=700000, B2=26000000; 1009635 bits; starting at 0 2020-08-15 11:31:24 condorella/rx480 183000023 P1 10000 0.99%; 6840 us/it; ETA 0d 01:54; 58a6331a302132cb 2020-08-15 11:32:32 condorella/rx480 183000023 P1 20000 1.98%; 6877 us/it; ETA 0d 01:53; 38e0ea00e8d3dfcd 2020-08-15 11:33:41 condorella/rx480 183000023 P1 30000 2.97%; 6897 us/it; ETA 0d 01:53; 52d0881a817b7a2d 2020-08-15 11:34:50 condorella/rx480 183000023 P1 40000 3.96%; 6904 us/it; ETA 0d 01:52; 5cc6128ee3cf27dd 2020-08-15 11:35:16 condorella/rx480 saved 2020-08-15 11:36:00 condorella/rx480 183000023 P1 50000 4.95%; 6928 us/it; ETA 0d 01:51; 4980d669e97a532f Open https://www.mersenne.org/manual_result/ in a web browser. Verify you're logged in (see upper right of the web page.) Open the results.txt file in an editor. Copy the result. Paste it into the results field of the web page. Click on "Submit". Optionally, enter a note in the results.txt file that the results line has been reported. I usually place "reported mm/dd/yy" filling in the date of report, on a separate line after the last reported line. This helps avoid duplicate reports, and scans easily. Upload the proof file (AFTER reporting the result record) There's only a proof file for PRP runs begun with proof-capable versions of gpuowl. It's found in sub-folder proof, and has a name composed of the exponent and proof power. For example, exponent 1234567 power 8 would be 1234567-8.proof in folder workingdirectory\proof. There are several possible upload methods listed at https://www.mersenneforum.org/showpo...0&postcount=26 for which some have the steps described there. Explore performance tuning Different versions, different parameter settings. -block larger than default to reduce GEC overhead on a reliable GPU. -log less frequently than default to limit log size and minor overhead. Some gpuowl versions offer more tuning variables than others. Using multiple instances There are two ways that running multiple instances on the same GPU at the same time may increase throughput. 1. Aggregate throughput may be higher running two instances on the same GPU at the same time. Any wait time that occurs for one instance on the GPU while the cpu performs the GEC, or reads from or writes to files or the console, or moves data over the PCIe bus between GPU ram and system ram, may be usable by the other instance. 2. Emptying the worktodo file or halting on an error condition by one instance does not completely stop progress; any other running instance then can use resources the stopped instance is no longer using. A single instance if halted does not leave the GPU idle for hours or days if at least one other instance is still running on it. Multiple instances are not guaranteed to improve sustained throughput. Throughput seems to be better if the two instances are running similar code and parameters; two 5M fft PRPs for example (not a 5M and an 8M, or P-1 and LLDC). The conceptually simplest way to run multiple instances is to create a separate working folder with its own set of files including gpuowl executable, same as the first instance. A more compact way is to use the executable in a common folder by all instances of all GPUs, IF updating version on all instances simultaneously is okay. Using gpuowl's primenet.py There's a separate post detailing primenet.py setup and use at https://www.mersenneforum.org/showpo...2&postcount=25. It's probably best to rename the results file containing already-reported results before starting to use primenet.py for reporting. Pool I haven't tried it yet, but particularly with multiple instances, multiple GPUs of the same type, or both, it appears it could make manual work assignment and result reporting easier. The help output says: Code:
-pool <dir> : specify a directory with the shared (pooled) worktodo.txt and results.txt Multiple GpuOwl instances, each in its own directory, can share a pool of assignments Check throughput / iteration times are about as expected, error rate is low. Occurrence of EE in the console or log output should be low, ideally less than once a week. For iteration times, see https://drive.google.com/file/d/10fC...enkBdAaRP/view and run time posts in this thread, and note iteration times vary greatly by exponent (more precisely fft length) and hardware and tuning. Table of contents
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-08-07 at 17:20 Reason: added Loss of periodic console output, logging to gpuowl.log, and Jacobi checks |
2018-05-29, 03:10 | #2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
GpuOwL run time vs exponent or fft length or version
RX550 data
gpuOwl v2 5000k fft RX550 gpu, MSI 18.2.3 driver Feb 26 2018:in a quick test (~40,000 iterations each) was: short carry 17.3 ms/iter, medium 17.6, long 17.4, compared to V1.9 gpuOwL on the same gpu, same pcie physical connection, April 2017 MSI driver, 10.9 ms/iter for -fft DP -legacy -size 4M; 18.9 ms/iter -fft M61 -size 4M; 21.4 ms/iter -fft DP -legacy -size 8M. The driver change coincided with an increase by about 5% of iteration time, on the same gpu, in V1.9 gpuOwL. http://www.mersenneforum.org/showpos...&postcount=370 See the first attachment below for V1.9 on an RX550. See also the 4-program speed comparison in the general reference thread. http://www.mersenneforum.org/showpos...76&postcount=8 For an RX480 my data indicates 3.4-3.6 times faster than RX550, on the same exponents and gpuOwL versions, at http://www.mersenneforum.org/showpos...&postcount=386 and subsequently An Intel IGP HD620 could run V0.5 or v1.9 but it was not worth doing. On mine the hit on prime95 throughput was larger than the gpuOwL throughput as a result. More detail on the V0.5 try (LL): http://www.mersenneforum.org/showpos...&postcount=176 (I discontinued running gpuOwl on the IGP. The tradeoff with mfakto there was much better.) Detail on the V1.9 try (PRP): http://www.mersenneforum.org/showpos...&postcount=285 A listing of V3.5 OpenOwL command line options and fft lengths can be found at http://www.mersenneforum.org/showpos...&postcount=565 Detail on benchmarking V3.3 and V3.5 OpenOwL fft lengths on RX480 can be found at http://www.mersenneforum.org/showpos...&postcount=570 Second attachment below tabulates ms/iteration timings for various versions, V3.x - V3.9, V4.6, and V5.0, and fft lengths, on an RX480, and includes some graphs and ratios. Third attachment compares V6.2, 5.0, 3.8, 2.0, and 1.9. Each are fastest for some fft length / exponent ranges, except v2.0. The trend line fit for asymptotic scaling of the fastest version versus fft length or exponent is iteration time p^{1.078}, so run time p^{2.078}, for exponents 100M<p<~2520M (6M to 144M fft length). Updated timings for RX480 and Radeon VII under Windows 7 and 10 respectively, up to Gpuowl v7.2-69 are included in the fourth and fifth attachments. These are works in progress currently. (Lots of data points, so reading glasses and zoom.) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-04-16 at 20:46 Reason: more benchmarks for Radeon VII |
2018-05-29, 04:34 | #3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7101_{10} Posts |
gpuOwL bug and wish list
Here is the latest posted version of the list I am maintaining for gpuOwL. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, or for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have, preferably by PM to kriesel.
(attachment last updated 2018-12-31) V7.2-53 (really V7.1.1 and up): results.txt indicates V1 proof, while header of proof file indicates V2 for prp/proof. Not sure if there are also issues with hashes etc. or whether one is JSON version, the other proof file version. Confusing at best. Results line and console output and log file say version 1, proof file header correctly says version 2. Transition in gpuowl from version 1 to 2 proofs was at v7.1.1, so it appears dozens of commits report different proof version. It does not appear to interfere with verification, since M58847203 run with v7.2-63 which also has the issue was successfully verified. The latest commit v7.2-86 also reports proof version 1. It could be proof file version versus JSON version are intended to be different. Or it could be an inconsistency in json implementation. The json results produced by prime95 show version 2. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-02-10 at 17:39 |
2018-05-29, 04:44 | #4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
Getting started with gpuOwL
This is an old post, but kept in place for its documentation of what can be done with the very old builds, and the long list of (mostly Windows, plus rarely Linux) builds available.
See the Available Software guide portion for gpuOwL, for where to get code, a brief summary of capability, and a discussion thread for it. Or scroll to the bottom of this post. Note this was originally written for very early versions, and that has been left in place here for those occasions when an old version is the best tool for a particular task. It's pretty simple to get started with gpuOwL. Get the version that supports the fft length corresponding to the exponents you want to run, and build it for your operating system, or find a suitable executable someone else has already built. Kracker was kind enough to post build directions for Windows (including setting up a free open source build environment) at http://www.mersenneforum.org/showpos...&postcount=356 and a Windows build or two in the past in that same thread. Install the OpenCL drivers on your system and confirm function with a separate OpenCL query utility. Make sure you have gpuowl.cl in the working directory. For V1.9, depending on the transform type used, you may want nttshared.h in there too, such as if using -fft M61. (See http://www.mersenneforum.org/showpos...&postcount=224) No ini file. Very little setup. Manually check out some exponents for PRP test or PRP double check (unless you're using an old version that does LL, get that type of assignment instead) and put those records in a file called worktodo.txt, just as mersenne.org's manual checkout gives them. You may want to use a small shell script or batch file depending on which OS you're using. Syntax and options change with gpuowl version. https://www.mersenneforum.org/showpo...&postcount=353 V0.6 syntax example: Code:
gpuowl -logstep 5000 -savestep 2000000 -checkstep 250000 -uid kriesel/condorella-rx550 Code:
:set opts=-fft M61 -size 4M set opts=-legacy set dev=2 gpuowl -user kriesel -cpu condorella-rx550 -device %dev% -verbosity 2 %opts% For V2 it's also simple, and somewhat differs: Code:
gpuowl -device 0 -user kriesel -cpu condorella-rx480 -carry long http://www.mersenneforum.org/showpos...&postcount=370 In V2 there's a -step option; see http://www.mersenneforum.org/showpos...&postcount=353 V3.x is different yet. See for example http://www.mersenneforum.org/showpos...&postcount=565 As is V4.x. As is V5. Code (for Windows unless otherwise indicated) For gpuOwL Windows code, and source see http://www.mersenneforum.org/showthread.php?t=22204 An early guide for compiling 0.x on windows with msys64+mingw64 http://www.mersenneforum.org/showpos...3&postcount=26 Windows in current versions includes the ability to handle .zip files but does not include support for some other compressed archive forms. IZArc is available for free download. It supports many formats, popular with/for Windows or Linux. https://www.izarc.org/ May 2017 v0.1 version Windows build (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=112 May 2017 V0.3 Windows binary (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=168 Jun 2017 V0.5 Windows binary (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=170 (LL discontinued, PRP with Gerbicz block error check beginning V0.7) Sep 2017 V1.0 binaries for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=190 Nov 2017 V1.9 binaries for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=226 Jan 2018 V1.9 binaries updated for Windows (kracker) .zip http://www.mersenneforum.org/showpos...&postcount=272 Aug 2018 V2.0 binary for Windows 64 bit .exe http://www.mersenneforum.org/showpos...&postcount=556 Aug 2018 V3.3 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=558 Aug 2018 V3.5 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=560 Aug 2018 V3.6 binary for Windows 64-bit .7z http://www.mersenneforum.org/showpos...&postcount=581 Aug 2018 V3.8 binary for Windows 64-bit (this and all the above are for OpenCl) .7z http://www.mersenneforum.org/showpost.php?p=494169&postcount=615 Aug 2018 V3.9 binary for Windows 64 bit .7z http://www.mersenneforum.org/showpos...&postcount=666 Nov 2018 V4.3 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=832 Nov 2018 V4.6 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=828 Oct 2018 V4.7 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=792 (not recommended, fails for me) Nov 2018 V5.0 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=831 and with some fixes and new shorter fft lengths, .7z https://www.mersenneforum.org/showpo...&postcount=867 v5.0-9c13870 .7z https://www.mersenneforum.org/showpo...&postcount=869 Feb 2019 V6.0 binary for Windows 64 bit .7z https://www.mersenneforum.org/showpo...&postcount=967 V6.1 do not use the posted binary for V6.1 or for an early commit of V6.2. There was a bug that caused primes to be indicated composite in both. Feb 2019 V6.2 binary for Windows 64 bit .zip https://www.mersenneforum.org/showpo...&postcount=983 Apr 2019 V6.4 binary for Windows 64 bit .zip https://www.mersenneforum.org/showpo...postcount=1057 May 2019 V6.5 binary for Windows 64 bit (AMD or NVIDIA!) .7z https://www.mersenneforum.org/showpo...postcount=1171 July 2019 V6.5-84-30c0508 for Windows 64 bit residue type 1 .7z https://www.mersenneforum.org/showpo...postcount=1274 (V6.6) V6.7-4-g278407a Windows build .7z https://www.mersenneforum.org/showpo...postcount=1343 (V6.8) version uncertain, Woltman's test version .zip file of source suitable for Linux building https://mersenneforum.org/showpost.p...postcount=1364 (V6.9) V6.10-9-g54cba1d Windows build .zip https://mersenneforum.org/showpost.p...postcount=1385 V6.11-9-ga9e3189 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1403 Woltman's dropbox Windows build .exe https://mersenneforum.org/showpost.p...postcount=1510 Another Woltman dropbox version .exe https://mersenneforum.org/showpost.p...postcount=1539 V6.11-83-ge270393 Windows build .7z https://www.mersenneforum.org/showpo...postcount=1584 v6.11-88 build for Windows .7z https://mersenneforum.org/showpost.p...postcount=1629 gpuowl v6.11-99-gdd8527b Windows build .7z https://www.mersenneforum.org/showpo...postcount=1652 v6.11-104-g91ef9a8 .zip https://mersenneforum.org/showpost.p...postcount=1664 v6.11-112-gf1b00d1 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1682 January 2020 V6.11-116-g5ca090d P-1 PRP assignment split rewrite Windows build .7z https://www.mersenneforum.org/showpo...postcount=1740 v6.11-132-gfd01ee5 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1787 January 2020 V6.11-134-g1e0ce1d Windows build .7z https://mersenneforum.org/showpost.p...postcount=1796 February 2020 V6.11-142-gf54af2e Windows build .zip https://mersenneforum.org/showpost.p...postcount=1829 v6.11-145-g6146b6d Windows build .zip https://mersenneforum.org/showpost.p...postcount=1840 v6.11-147-g3b8b00e Windows build .zip https://mersenneforum.org/showpost.p...postcount=1866 v6.11-148-gfc93773 Windows build .7z https://mersenneforum.org/showpost.p...postcount=1877 March 2020 v6.11-163-gec98bfe Windows build .7z https://mersenneforum.org/showpost.p...postcount=1903 v6.11-198-g628f3cd Windows build .7z https://mersenneforum.org/showpost.p...postcount=1959 v6.11-219-ge70ec99 ffts up to 192M Windows build .7z https://mersenneforum.org/showpost.p...postcount=1984 v6.11-?-af403e2 (by kracker) the return of LL? Windows build .zip https://mersenneforum.org/showpost.p...postcount=2047 v6.11-255-g81fa7c3 max fft 96M Windows build .7z https://mersenneforum.org/showpost.p...postcount=2063 v6.11-257-g39fc002 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2073 v6.11-259-g83434d8 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2089 April 2020 v6.11-264-g5c977d4-dirty Windows build .7z https://mersenneforum.org/showpost.p...postcount=2095 v6.11-268-g0d07d21 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2106 v6.11-270-gf1fd1f7 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2124 v6.11-272-g07718b9 Windows build .7z https://mersenneforum.org/showpost.p...postcount=2139 May 2020 v6.11-278-ga39cc1a Windows build .7z https://mersenneforum.org/showpost.p...postcount=2161 v6.11-285-gf25ecbd Windows build .7z https://mersenneforum.org/showpost.p...postcount=2179 v6.11-288-g20c4213 Jacobi check returns! .7z https://mersenneforum.org/showpost.p...postcount=2202 v6.11-292-gecab9ae Windows build .7z https://mersenneforum.org/showpost.p...postcount=2220 June 2020 v6.11-295-gaecf041 (the last I could build until ~ -316) .7z https://mersenneforum.org/showpost.p...postcount=2274 v6.11-318-g3109989 Windows build, max fft 120M, includes PRP proof capability .7z https://mersenneforum.org/showpost.p...99&postcount=1 gpuowl for Windows 7 or up 64-bit v6.11-325-g7c09e38 .7z https://mersenneforum.org/showpost.p...29&postcount=2 gpuowl for Windows v611-327-g43cdf1c by Dylan14 .7z https://mersenneforum.org/showpost.p...25&postcount=3 gpuowl commit e5a8f2c for Google Colaboratory Linux environment built by Fan Ming .zip https://mersenneforum.org/showpost.p...&postcount=958 gpuowl for Windows v6.11-330-ge5a8f2c .7z https://www.mersenneforum.org/showpo...30&postcount=4 July 2020 Gpuowl-win v6.11-335-gff60b08 .7z https://mersenneforum.org/showpost.p...31&postcount=5 Gpuowl-win v6.11-340-g41d435f .7z https://www.mersenneforum.org/showpo...87&postcount=6 Gpuowl-win v6.11-357-g1f41292 build .7z https://mersenneforum.org/showpost.p...37&postcount=7 Gpuowl-win v6.11-364-g36f4e2a .7z https://mersenneforum.org/showpost.p...94&postcount=8 August 2020 Gpuowl v6.11-366-gf887d6e for Linux Google Colab .7z https://www.mersenneforum.org/showpo...postcount=1020 (Note, August development focused more on primenet.py and less on the gpuowl executable.) September 2020 Gpuowl for Linux v6.11-380-g79ea0cc .7z https://mersenneforum.org/showpost.p...1&postcount=40 Gpuowl for Windows v6.11-380-g79ea0cc .7z https://mersenneforum.org/showpost.p...92&postcount=9 April 2022 Gpuowl-win v6.11-382-g98ff9c7-dirty (proof powers 1-12) .zip https://mersenneforum.org/showpost.p...5&postcount=32 October 2020 Gpuowl for Windows v7.0-18-g69c2b85 .7z (LL and standalone P-1 removed, joint P-1/PRP introduced) https://www.mersenneforum.org/showpo...7&postcount=10 Gpuowl-win v7.0-26-g8e6a1d1 .7z https://www.mersenneforum.org/showpo...1&postcount=11 gpuowl-win v7.0-35-gf06bc5b .7z https://www.mersenneforum.org/showpo...7&postcount=12 gpuowl-win v7.0-40-gb62d4fd .7z https://www.mersenneforum.org/showpo...8&postcount=13 gpuowl-win v7.0-47-ga8664fe .7z https://www.mersenneforum.org/showpo...2&postcount=14 gpuowl-win v7.0-66-gebe49cc .7z https://www.mersenneforum.org/showpo...1&postcount=15 Note, do not use the self-verify option with v7.1, or the resulting proof files will be bad. gpuowl-win v7.1-1-g0f73d04 .7z https://www.mersenneforum.org/showpo...2&postcount=16 (Ethan EO multiple vendors' OpenCL flavors) gpuowl-win v7.1-7 .7z https://www.mersenneforum.org/showpo...postcount=2558 GpuOwl-win v7.1-11-g97cfbd2 2xSP fft experimentation .7z https://www.mersenneforum.org/showpo...9&postcount=17 November 2020 GpuOwl-win v7.2-2-ga135d8d .7z https://www.mersenneforum.org/showpo...5&postcount=18 or .zip gpuowl-win v7.2-13-g266aed4 .7z https://www.mersenneforum.org/showpo...7&postcount=23 gpuowl-win v7.2-21-g28dbf88 .zip https://www.mersenneforum.org/showpo...1&postcount=24 Febrary 2021 gpuowl-win v7.2-39-ga87a679 .zip https://mersenneforum.org/showpost.p...3&postcount=25 gpuowl-win v7.2-53-ge27846f https://mersenneforum.org/showpost.p...5&postcount=26 gpuowl-win v7.2-63-ge47361b https://mersenneforum.org/showpost.p...5&postcount=27 March 2021 gpuowl-win v7.2-69-g23c14a1 https://mersenneforum.org/showpost.p...9&postcount=28 gpuowl 7.2-70 for Linux https://mersenneforum.org/showpost.p...73&postcount=3 November 2021 gpuowl-win v7.2-86-gddf3314 https://mersenneforum.org/showpost.p...9&postcount=29 April 2022 gpuowl-win v7.2-93-ga5402c5-dirty (proof powers 1-12) .zip https://mersenneforum.org/showpost.p...4&postcount=30 October 2022 gpuowl-win v7.2-112-gd6ad1e0-dirty (proof powers 1-12) https://mersenneforum.org/showpost.p...2&postcount=34 For the current version source (and previous too) https://github.com/preda/gpuowl A separate forum thread was created for Windows gpuowl build posting. It is here Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-10-16 at 17:03 Reason: v7.2 build with latest commits, proof powers 1 to 12 |
2018-06-03, 20:29 | #5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
gpuOwL requirements
My current understanding of the requirements, from the Windows point of view
OpenCL installed, at least version 1.2 if not 2.0. (Some of the more recent versions require OpenCL 2.0) One or more units of OpenCL compatible hardware, with corresponding driver(s) supporting OpenCL of the required level, such as certain AMD GPUs, Intel IGPs, or CPUs. NVIDIA gpus with compute capability somewhere above 3.0 began to be supported at gpuowl v6.5. Currently GTX10xx and newer NVIDIA model gpus work, and somewhat older too. gpuOwL below v6.5 does not currently run on NVIDIA gpus, on Linux or Windows, and to my knowledge Preda's releases before that did not. http://www.mersenneforum.org/showpos...&postcount=277 Someone reported porting a long ago version for his own use. http://www.mersenneforum.org/showpos...&postcount=107 Per instance, gpuOwL v1.9 on 8M fft length running exponents ~150M, exhibited in Task Manager, ~115MB private working set, 145MB working set, 382MB peak working set on Windows 7 64-bit. Meanwhile GPU occupancy was ~475-490MB each. Discrete (add-in card) GPUs give better performance because of their dedicated memory. Integrated graphics processors use memory and TDP budget shared with the CPU core(s) and will affect performance of CPU applications. IGPs may lack DP support or otherwise lack compatibility with the AMD-oriented gpuOwL DP code, and so require running V1.9 -fft M61, which is slower. In case of difficulty, it's recommended to verify the successful installation of OpenCL and compatible drivers with a utility, such as clinfo, oclDeviceQuery.exe, or the advanced tab of GPU-Z (a Windows gpu status graphical utility). Memory requirements are modest. I'm seeing only about 290MB occupied during 4M fft length -DP transform on an RX550. That may scale to roughly 1.3GB for a future 16M fft implementation, 2.7GB? for 32M, which would not fit on that 2GB card. (It would probably also run way too slowly for that card to be practical, at roughly estimated 2-3 years per exponent.) It illustrates that for primality testing, 4GB is probably enough for a long time. gpuOwL v2 on Windows 7 sp1 with current updates failed with an MSI-sourced driver dated April 2017 on an MSI RX550. It worked with the MSI driver 18.2.3 dated Feb 26 2018. There's a March 23 2018 driver available from AMD, v18.3.4, or probably more recent by now, that I have not tried on v2.0. https://support.amd.com/en-us/download Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-08-09 at 19:17 Reason: update for v6.5 |
2018-06-03, 20:38 | #6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}·263 Posts |
gpuOwL features and requests
(caution, some of this is outdated. gpuowl is up to v6.5 as of 2019-05-09.)
Nice features The Gerbicz check, of course, detecting errors and allowing timely rollback to ensure an accurate result. Full time logging in addition to console display. The -step argument in gpuOwl It seems to me after a quick look at the source, the ability for the user to override the automatic program behavior, with a specified constant step count between Gerbicz checks. I interpret this to mean -step requires a step count parameter in the range 1000 to 500000 that are 1, 2, or 5 times a power of ten; 1000 or 2000 or 5000 or 10000 ... 500000. Or perhaps larger also. It may be both output and Gerbicz check interval. Two use cases I've run into are: 1) Hardware and software are very stable, exponent is far from an fft length limit, overhead of starting at small step sizes is not necessary, run a large step size from the start. (This case might benefit from adaptive step size after starting large if an error occurs during the run. Also user settable number of consecutive retries if an error occurs) 2) Repeatable error has occurred, such as the exponent is slightly too large for the fft length, I'd like to determine as finely as possible, at what iteration it occurs, with a rerun from last known good save file, using minimum step size until encountering the error again. (This case might benefit from a user set limit of retries (0 - ~9) before giving up on the exponent and starting or resuming the next worktodo entry.) Source fragments supporting the opinion are at http://www.mersenneforum.org/showpos...&postcount=353 The ability to switch transform midstream Per Preda the author of gpuOwL, the save file is in compacted bits format (independent of the transform). see http://www.mersenneforum.org/showpos...&postcount=312 Feature requests Save frequency option Is there a command line option to control the frequency of saving a disaster-mitigator interim file, which seems to be produced at 10^7 iterations intervals in V1.9? I would like to try running at 5M iteration intervals for safety files. I don't see any in the V1.10 source. I suppose I could run some little batch file. Gpu operation priority lower for gpuOwL computation or periodic yielding by gpuOwL When running gpuOwL on the same card running the display, and using the local display rather than remote access, the screen seemed sluggish; I'm not aware of any option in gpuOwL equivalent to the -polite option in CUDALucas, which gives display operations a turn now and then (with frequency user settable). A port to NVIDIA! More fft lengths where useful, integrated into one application; 4M and 8M DP and M61 fft; 5000K DP, and anything new. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-17 at 14:49 |
2018-06-03, 22:08 | #7 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}·263 Posts |
Feature / version announcements
Gpuowl began as LL on AMD: initial github commit d5c48dd 2017-04-11
Introducing gpuOwL http://www.mersenneforum.org/showpos...32&postcount=1 2017-04-19 V0.2 http://www.mersenneforum.org/showpos...&postcount=135 2017-5-21 V0.3 (offsets) http://www.mersenneforum.org/showpos...&postcount=147 2017-05-26 V0.4 http://www.mersenneforum.org/showpos...&postcount=169 2017-06-05 V0.5 http://www.mersenneforum.org/showpos...&postcount=171 2017-06-12 V0.6 Addition of Jacobi check to LL flavor of gpuOwL, zero offset in my test http://www.mersenneforum.org/showpos...5&postcount=46 2017-08-08 Nonzero offset dropped and -supersafe option added http://www.mersenneforum.org/showpos...5&postcount=61 2017-08-10 switch from LL to PRP occurs. See also https://www.mersenneforum.org/showpo...3&postcount=15 for residue type versus version V0.7 commit ccb7ed2 2017-08-27 V1.0 http://www.mersenneforum.org/showpos...&postcount=186 2017-08-30 PRP residue type 4 V1.1 http://www.mersenneforum.org/showpos...&postcount=191 V1.2-1.4 ? V1.5 http://www.mersenneforum.org/showpos...&postcount=223 2017-9-30 PRP residue type 1 V1.7 f5198fc 2017-10-26 V1.8 http://www.mersenneforum.org/showpos...&postcount=224 2017-11-08 V1.8 help http://www.mersenneforum.org/showpos...&postcount=225 2017-11-08 V1.9 ? V1.10 commit 83001d4 2018-01-27 (seen on github https://github.com/preda/gpuowl/blob/NTT/README.md) V2.0 http://www.mersenneforum.org/showpos...&postcount=320 2018-02-07 perf tune and -time option http://www.mersenneforum.org/showpos...&postcount=331 V2.1-2.3 ? V3.0 ? V3.1 commit 5495ecf 2018-07-07 V3.2 ? V3.3 fft lengths 4, 5, 8, 10, 16, 20M http://www.mersenneforum.org/showpos...&postcount=468 2018-07-13 V3.4 ? V3.5 "Moar fft" A lot more lengths, from 0.5M to 144M (up to ~2.5x10^{9} exponent) http://www.mersenneforum.org/showpos...&postcount=505 2018-07-15 V3.6 2018-08-11 commit f7c3865 see http://www.mersenneforum.org/showpos...&postcount=581 V3.7 TF integrated, TF works on OpenCL Linux ROCm 1.8.2 only http://www.mersenneforum.org/showpos...&postcount=586 2018-08-16 V3.8 commit a7ef0e5 2018-08-17 V3.8 fixes http://www.mersenneforum.org/showpos...&postcount=612 2018-08-17 V3.9 commit 4c4e034 2018-08-21 V4.0 commit fe7cd08 2018-09-10 V4.1 commit d77c6f0 2018-09-18 V4.3 PRP & P-1 combined https://www.mersenneforum.org/showpo...&postcount=694 2018-09-20 PRP residue type 4 V4.6 commit bb691cb 2018-10-20 V4.7 commit 12c6b75 2018-10-23 https://www.mersenneforum.org/showpo...&postcount=765 and see also post 766 V5.0 commit 1339429 2018-10-24 PRP & two stages of P-1 https://www.mersenneforum.org/showpo...&postcount=796 and see also https://www.mersenneforum.org/showpo...&postcount=798 2018-10-31 V6.0 PRP, a primenet.py script added for getting and queuing work and reporting results, and P-1 has been removed. https://www.mersenneforum.org/showpo...&postcount=912 2019-01-03 https://www.mersenneforum.org/showpo...&postcount=913 V6.1, commit c02a6ce, support for standalone P-1 has been added https://www.mersenneforum.org/showpo...&postcount=945 https://www.mersenneforum.org/showpo...&postcount=946 v6.2, commit 5b26497 2019-01-27, fft lengths up to 160M, some speedups https://www.mersenneforum.org/showpo...&postcount=956 v6.4 commit f6d3153 2019-04-09, added command line options -prp -pm1 https://www.mersenneforum.org/showpo...postcount=1056 v6.5 added command line option -dir for working directory; max fft length 192M https://www.mersenneforum.org/showpo...postcount=1062 V6.5-30c0508 switched back from prp residue type 4 to type 1 https://www.mersenneforum.org/showpo...postcount=1273 2019-07-10 V6.7-4, P-1 on NVIDIA https://www.mersenneforum.org/showpo...postcount=1343 2019-09-05 v6.8 per-exponent savefile folders https://www.mersenneforum.org/showpo...postcount=1335 2019-09-06 v6.9 https://www.mersenneforum.org/showpo...postcount=1361 v6.10-9-g54cba1d P-1 savefiles added https://www.mersenneforum.org/showpo...postcount=1384 v6.11-9-g9ae3189 NVIDIA CPU yield https://www.mersenneforum.org/showpo...postcount=1403 v6.11-83-ge270393 increased performance with various -use options https://www.mersenneforum.org/showpo...postcount=1584 V7.0-18 drops LLDC, merges P-1 into PRP https://www.mersenneforum.org/showthread.php?t=26007 2020-10-07 V7.1 proof V2 https://www.mersenneforum.org/showpo...&postcount=110 2020-10-22 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-10-23 at 10:13 Reason: added v7, 7.1 |
2018-10-18, 14:30 | #8 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}·263 Posts |
Determining upper exponent limit for a transform type and fft length
In gpuOwL V1.9, a lengthy experiment was conducted on how to efficiently determine the upper exponent limit for PRP with Gerbicz check for a given fft length and transform type. This was conducted on the M61 transform type and 4M fft length, for which the program author did not know the limit. (It was not possible to do it on 8M fft length, since maximum exponent is higher for the M61 transform than the corresponding DP transform, and the program's maximum exponent was capped at the 8M length DP transform approximate limit. It would also have taken more than 4 times as long.) The calculations were performed on a relatively slow RX550, which contributed significantly to the experiment's calendar duration.
By approaching the limit from above, generating error failures quickly in relatively few iterations, convergence to an approximate limit is achieved much faster than approaching the limit from below, with fully run to completion exponents. This might seem like a lot of iterations would be wasted for runs that generate errrors. However, gpuOwL in v1.9 and later had the useful property of storing interim results in a form that could be continued by a different program version and transform type. So a run that produces errors at a few million or even tens of millions of iterations with one transform type and fft length can be continued to completion by a different program version, transform type, and fft length. Many of the exponents that generated errors as M61 4M, have been run to completion with newer faster DP fft lengths as PRP tests or PRP DC tests on an RX480, as will be a few more still remaining. Tabulating exponents tried, the success or failure, number of iterations completed, and fits through the failure data and success data separately, produce a good picture of a limit estimate. In this way, it can be determined to fairly close accuracy where the limit of completion lies, while doing work useful to the GIMPS project progression in first-time or double check effort. Tabulating along the way, with spreadsheet-generated regression fits, was used to somewhat guide the selection of next trial exponent. When practical, avoiding overlap with existing assignments was also considered in trial exponent selection. In the example attachment, about 1.06 exponents' equivalent of failed run iterations, plus 5 completed exponents, were used, to determine a limit value around 83869400, within a span of about 190 out of ~84 million, or 2.24ppm. Approaching strictly from above, stopping when two exponents are completed, and using less closely spaced test exponents, one could reduce the work to less than the equivalent of 3 completed exponent runs. At a cost of at most one full run per trial, the experiment could be extended to give about one bit more precision in the limit per additional trial. The practical utility of adding more bits precision to the limit is low, since there are only 5 currently unfactored candidates between 83869319 and 83869507, all of which have a LL or PRP result reported currently, and there are considerably faster fft length/transform combinations available now for performing primality or pseudoprimality tests in that exponent range. https://www.mersenne.org/report_exponent/?exp_lo=83869319&exp_hi=83869507&full=1 It's worthwhile to note that the limit value determined is not a guarantee that an exponent below that value would be certain to run to completion without error. It's merely a limit below which no error was seen, in the hundreds of millions of iterations required for the 4 exponents that completed. The error occurrences are not predictable, within a span of exponent of 15,000, and seem to behave statistically. The limit of the M61 4M transform appears to occur at about 19.996 bits/word, or approximately 20 - 1/256, somewhere between 83,869,319 and 83869507. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-17 at 14:50 |
2018-11-05, 13:32 | #9 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
gpuowl V6.11, 6.7, 6.5, 6.2, 6.0, V5.0-9c13870 fft lengths, and earlier
For gpuowl fft lengths, K=1024, M=1024^{2}. Prior to theV5.0-9c13870 commit, the M=3 in the V5.0 list were not available.
Prior to V5.0-df2bdf2, <0.5M were not available. Prior to ~V3.5, the M=5 and M=9 were not available. V3.3 supported 4, 5, 8, 10, 16, 20M. V2.0 supported 5000K only. V1.9 supported 2, 4, and 8M. V1.0 and earlier were 4M only. (I think the earliest PRP was at V0.7, prior to that it was LL) V4.3: Code:
openowl -list fft 2021-03-30 19:51:07 gpuowl 4.3-537c681 2021-03-30 19:51:07 FFT maxExp W H M 2021-03-30 19:51:07 0.5M 10.3M 512 512 1 2021-03-30 19:51:07 1.0M 20.3M 1024 512 1 2021-03-30 19:51:07 1.0M 20.3M 512 1024 1 2021-03-30 19:51:07 2.0M 39.8M 1024 1024 1 2021-03-30 19:51:07 2.0M 39.8M 512 2048 1 2021-03-30 19:51:07 2.0M 39.8M 2048 512 1 2021-03-30 19:51:07 2.5M 49.4M 512 512 5 2021-03-30 19:51:07 4.0M 78.0M 1024 2048 1 2021-03-30 19:51:07 4.0M 78.0M 2048 1024 1 2021-03-30 19:51:07 4.0M 78.0M 4096 512 1 2021-03-30 19:51:07 4.5M 87.5M 512 512 9 2021-03-30 19:51:07 5.0M 96.9M 1024 512 5 2021-03-30 19:51:07 5.0M 96.9M 512 1024 5 2021-03-30 19:51:07 8.0M 153.0M 2048 2048 1 2021-03-30 19:51:07 8.0M 153.0M 4096 1024 1 2021-03-30 19:51:07 9.0M 171.6M 1024 512 9 2021-03-30 19:51:07 9.0M 171.6M 512 1024 9 2021-03-30 19:51:07 10.0M 190.0M 1024 1024 5 2021-03-30 19:51:07 10.0M 190.0M 512 2048 5 2021-03-30 19:51:07 10.0M 190.0M 2048 512 5 2021-03-30 19:51:07 16.0M 300.0M 4096 2048 1 2021-03-30 19:51:07 18.0M 336.3M 1024 1024 9 2021-03-30 19:51:07 18.0M 336.3M 512 2048 9 2021-03-30 19:51:07 18.0M 336.3M 2048 512 9 2021-03-30 19:51:07 20.0M 372.5M 1024 2048 5 2021-03-30 19:51:07 20.0M 372.5M 2048 1024 5 2021-03-30 19:51:07 20.0M 372.5M 4096 512 5 2021-03-30 19:51:07 36.0M 659.0M 1024 2048 9 2021-03-30 19:51:07 36.0M 659.0M 2048 1024 9 2021-03-30 19:51:07 36.0M 659.0M 4096 512 9 2021-03-30 19:51:07 40.0M 730.0M 2048 2048 5 2021-03-30 19:51:07 40.0M 730.0M 4096 1024 5 2021-03-30 19:51:07 72.0M 1290.9M 2048 2048 9 2021-03-30 19:51:07 72.0M 1290.9M 4096 1024 9 2021-03-30 19:51:07 80.0M 1429.8M 4096 2048 5 2021-03-30 19:51:07 144.0M 2527.5M 4096 2048 9 Code:
gpuowl 5.0-9c13870 -list fft FFT maxExp W H M 0.1M 2.6M 256 256 1 0.2M 5.2M 256 512 1 0.2M 5.2M 512 256 1 0.4M 7.7M 256 256 3 0.5M 10.2M 1024 256 1 0.5M 10.2M 256 1024 1 0.5M 10.2M 512 512 1 0.6M 12.7M 256 256 5 0.8M 15.1M 256 512 3 0.8M 15.1M 512 256 3 1.0M 20.0M 1024 512 1 1.0M 20.0M 256 2048 1 1.0M 20.0M 512 1024 1 1.0M 20.0M 2048 256 1 1.1M 22.5M 256 256 9 1.2M 24.9M 256 512 5 1.2M 24.9M 512 256 5 1.5M 29.7M 1024 256 3 1.5M 29.7M 256 1024 3 1.5M 29.7M 512 512 3 2.0M 39.3M 1024 1024 1 2.0M 39.3M 512 2048 1 2.0M 39.3M 2048 512 1 2.0M 39.3M 4096 256 1 2.2M 44.1M 256 512 9 2.2M 44.1M 512 256 9 2.5M 48.9M 1024 256 5 2.5M 48.9M 256 1024 5 2.5M 48.9M 512 512 5 3.0M 58.4M 1024 512 3 3.0M 58.4M 256 2048 3 3.0M 58.4M 512 1024 3 3.0M 58.4M 2048 256 3 4.0M 77.3M 1024 2048 1 4.0M 77.3M 2048 1024 1 4.0M 77.3M 4096 512 1 4.5M 86.7M 1024 256 9 4.5M 86.7M 256 1024 9 4.5M 86.7M 512 512 9 5.0M 96.1M 1024 512 5 5.0M 96.1M 256 2048 5 5.0M 96.1M 512 1024 5 5.0M 96.1M 2048 256 5 6.0M 114.7M 1024 1024 3 6.0M 114.7M 512 2048 3 6.0M 114.7M 2048 512 3 6.0M 114.7M 4096 256 3 8.0M 151.8M 2048 2048 1 8.0M 151.8M 4096 1024 1 9.0M 170.3M 1024 512 9 9.0M 170.3M 256 2048 9 9.0M 170.3M 512 1024 9 9.0M 170.3M 2048 256 9 10.0M 188.7M 1024 1024 5 10.0M 188.7M 512 2048 5 10.0M 188.7M 2048 512 5 10.0M 188.7M 4096 256 5 12.0M 225.3M 1024 2048 3 12.0M 225.3M 2048 1024 3 12.0M 225.3M 4096 512 3 16.0M 298.1M 4096 2048 1 18.0M 334.3M 1024 1024 9 18.0M 334.3M 512 2048 9 18.0M 334.3M 2048 512 9 18.0M 334.3M 4096 256 9 20.0M 370.4M 1024 2048 5 20.0M 370.4M 2048 1024 5 20.0M 370.4M 4096 512 5 24.0M 442.3M 2048 2048 3 24.0M 442.3M 4096 1024 3 36.0M 656.2M 1024 2048 9 36.0M 656.2M 2048 1024 9 36.0M 656.2M 4096 512 9 40.0M 727.0M 2048 2048 5 40.0M 727.0M 4096 1024 5 48.0M 868.1M 4096 2048 3 72.0M 1287.5M 2048 2048 9 72.0M 1287.5M 4096 1024 9 80.0M 1426.4M 4096 2048 5 144.0M 2525.2M 4096 2048 9 Code:
C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft 2019-02-04 23:05:21 gpuowl 6.0-b7bb1c3 2019-02-04 23:05:21 -list fft 2019-02-04 23:05:21 FFT 8K [ 0.01M - 0.18M] 64-64 2019-02-04 23:05:21 FFT 24K [ 0.04M - 0.51M] 64-64-3 2019-02-04 23:05:21 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 2019-02-04 23:05:21 FFT 40K [ 0.06M - 0.85M] 64-64-5 2019-02-04 23:05:21 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 2019-02-04 23:05:21 FFT 72K [ 0.11M - 1.50M] 64-64-9 2019-02-04 23:05:21 FFT 96K [ 0.15M - 1.99M] 64-256-3 256-64-3 2019-02-04 23:05:21 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 2019-02-04 23:05:21 FFT 160K [ 0.25M - 3.27M] 64-256-5 256-64-5 2019-02-04 23:05:21 FFT 192K [ 0.29M - 3.91M] 64-512-3 512-64-3 2019-02-04 23:05:21 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 2019-02-04 23:05:21 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 2019-02-04 23:05:21 FFT 320K [ 0.49M - 6.44M] 64-512-5 512-64-5 2019-02-04 23:05:21 FFT 384K [ 0.59M - 7.69M] 1K-64-3 64-1K-3 256-256-3 2019-02-04 23:05:21 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 2019-02-04 23:05:21 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 2019-02-04 23:05:21 FFT 640K [ 0.98M - 12.66M] 1K-64-5 64-1K-5 256-256-5 2019-02-04 23:05:21 FFT 768K [ 1.18M - 15.12M] 64-2K-3 256-512-3 512-256-3 2K-64-3 2019-02-04 23:05:21 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 2019-02-04 23:05:21 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 2019-02-04 23:05:21 FFT 1280K [ 1.97M - 24.88M] 64-2K-5 256-512-5 512-256-5 2K-64-5 2019-02-04 23:05:21 FFT 1536K [ 2.36M - 29.72M] 1K-256-3 256-1K-3 512-512-3 4K-64-3 2019-02-04 23:05:21 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 2019-02-04 23:05:21 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 2019-02-04 23:05:21 FFT 2560K [ 3.93M - 48.90M] 1K-256-5 256-1K-5 512-512-5 4K-64-5 2019-02-04 23:05:21 FFT 3M [ 4.72M - 58.41M] 1K-512-3 256-2K-3 512-1K-3 2K-256-3 2019-02-04 23:05:21 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 2019-02-04 23:05:21 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 2019-02-04 23:05:21 FFT 5M [ 7.86M - 96.07M] 1K-512-5 256-2K-5 512-1K-5 2K-256-5 2019-02-04 23:05:21 FFT 6M [ 9.44M - 114.74M] 1K-1K-3 512-2K-3 2K-512-3 4K-256-3 2019-02-04 23:05:21 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K 2019-02-04 23:05:21 FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 2019-02-04 23:05:21 FFT 10M [ 15.73M - 188.68M] 1K-1K-5 512-2K-5 2K-512-5 4K-256-5 2019-02-04 23:05:21 FFT 12M [ 18.87M - 225.32M] 1K-2K-3 2K-1K-3 4K-512-3 2019-02-04 23:05:21 FFT 16M [ 25.17M - 298.13M] 4K-2K 2019-02-04 23:05:21 FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 2019-02-04 23:05:21 FFT 20M [ 31.46M - 370.44M] 1K-2K-5 2K-1K-5 4K-512-5 2019-02-04 23:05:21 FFT 24M [ 37.75M - 442.34M] 2K-2K-3 4K-1K-3 2019-02-04 23:05:21 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 2019-02-04 23:05:21 FFT 40M [ 62.91M - 727.03M] 2K-2K-5 4K-1K-5 2019-02-04 23:05:21 FFT 48M [ 75.50M - 868.07M] 4K-2K-3 2019-02-04 23:05:21 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 2019-02-04 23:05:21 FFT 80M [125.83M - 1426.38M] 4K-2K-5 2019-02-04 23:05:21 FFT 144M [226.49M - 2525.23M] 4K-2K-9 Code:
FFT Configurations: FFT 8K [ 0.01M - 0.18M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 48K [ 0.07M - 1.01M] 64-64-6 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 FFT 72K [ 0.11M - 1.50M] 64-64-9 FFT 80K [ 0.12M - 1.66M] 64-64-10 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.91M] 64-256-6 256-64-6 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 FFT 320K [ 0.49M - 6.44M] 64-256-10 256-64-10 FFT 384K [ 0.59M - 7.69M] 64-512-6 512-64-6 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 FFT 640K [ 0.98M - 12.66M] 64-512-10 512-64-10 FFT 768K [ 1.18M - 15.12M] 1K-64-6 64-1K-6 256-256-6 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.88M] 1K-64-10 64-1K-10 256-256-10 FFT 1536K [ 2.36M - 29.72M] 64-2K-6 256-512-6 512-256-6 2K-64-6 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 2K-64-10 FFT 3M [ 4.72M - 58.41M] 1K-256-6 256-1K-6 512-512-6 4K-64-6 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 4K-64-10 FFT 6M [ 9.44M - 114.74M] 1K-512-6 256-2K-6 512-1K-6 2K-256-6 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 12M [ 18.87M - 225.32M] 1K-1K-6 512-2K-6 2K-512-6 4K-256-6 FFT 16M [ 25.17M - 298.13M] 4K-2K FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 24M [ 37.75M - 442.34M] 1K-2K-6 2K-1K-6 4K-512-6 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 48M [ 75.50M - 868.07M] 2K-2K-6 4K-1K-6 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10 FFT 96M [150.99M - 1702.92M] 4K-2K-6 FFT 144M [226.49M - 2525.23M] 4K-2K-9 FFT 160M [251.66M - 2797.39M] 4K-2K-10 For v6.5-c48d46f (but note, don't use combinations with height 64 and a middle step; https://www.mersenneforum.org/showpost.php?p=517774&postcount=1204 assuming the fft list is again W H Middle, that's the ones below in bold): Code:
FFT Configurations: FFT 8K [ 0.01M - 0.18M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 48K [ 0.07M - 1.01M] 64-64-6 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 FFT 72K [ 0.11M - 1.50M] 64-64-9 FFT 80K [ 0.12M - 1.66M] 64-64-10 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.91M] 64-256-6 256-64-6 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 FFT 320K [ 0.49M - 6.44M] 64-256-10 256-64-10 FFT 384K [ 0.59M - 7.69M] 64-512-6 512-64-6 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 FFT 640K [ 0.98M - 12.66M] 64-512-10 512-64-10 FFT 768K [ 1.18M - 15.12M] 1K-64-6 64-1K-6 256-256-6 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.88M] 1K-64-10 64-1K-10 256-256-10 FFT 1536K [ 2.36M - 29.72M] 64-2K-6 256-512-6 512-256-6 2K-64-6 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 2K-64-10 FFT 3M [ 4.72M - 58.41M] 1K-256-6 256-1K-6 512-512-6 4K-64-6 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 4K-64-10 FFT 6M [ 9.44M - 114.74M] 1K-512-6 256-2K-6 512-1K-6 2K-256-6 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 12M [ 18.87M - 225.32M] 1K-1K-6 512-2K-6 2K-512-6 4K-256-6 FFT 16M [ 25.17M - 298.13M] 4K-2K FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 24M [ 37.75M - 442.34M] 1K-2K-6 2K-1K-6 4K-512-6 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 48M [ 75.50M - 868.07M] 2K-2K-6 4K-1K-6 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10 FFT 96M [150.99M - 1702.92M] 4K-2K-6 FFT 144M [226.49M - 2525.23M] 4K-2K-9 FFT 160M [251.66M - 2797.39M] 4K-2K-10 Following are for V6.7-4. Code:
FFT Configurations: FFT 8K [ 0.01M - 0.17M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 64K [ 0.10M - 1.33M] 64-512 512-64 FFT 128K [ 0.20M - 2.62M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.89M] 64-256-6 FFT 224K [ 0.34M - 4.52M] 64-256-7 FFT 256K [ 0.39M - 5.15M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.77M] 64-256-9 FFT 320K [ 0.49M - 6.40M] 64-256-10 FFT 352K [ 0.54M - 7.02M] 64-256-11 FFT 384K [ 0.59M - 7.64M] 64-256-12 64-512-6 FFT 448K [ 0.69M - 8.88M] 64-512-7 FFT 512K [ 0.79M - 10.12M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.35M] 64-512-9 FFT 640K [ 0.98M - 12.58M] 64-512-10 FFT 704K [ 1.08M - 13.81M] 64-512-11 FFT 768K [ 1.18M - 15.03M] 64-512-12 64-1K-6 256-256-6 FFT 896K [ 1.38M - 17.47M] 64-1K-7 256-256-7 FFT 1M [ 1.57M - 19.89M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.32M] 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.73M] 64-1K-10 256-256-10 FFT 1408K [ 2.16M - 27.14M] 64-1K-11 256-256-11 FFT 1536K [ 2.36M - 29.54M] 64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6 FFT 1792K [ 2.75M - 34.33M] 64-2K-7 256-512-7 512-256-7 FFT 2M [ 3.15M - 39.10M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 43.85M] 64-2K-9 256-512-9 512-256-9 FFT 2560K [ 3.93M - 48.59M] 64-2K-10 256-512-10 512-256-10 FFT 2816K [ 4.33M - 53.32M] 64-2K-11 256-512-11 512-256-11 FFT 3M [ 4.72M - 58.04M] 1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6 FFT 3584K [ 5.51M - 67.44M] 1K-256-7 256-1K-7 512-512-7 FFT 4M [ 6.29M - 76.81M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.15M] 1K-256-9 256-1K-9 512-512-9 FFT 5M [ 7.86M - 95.46M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 104.74M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.00M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6 FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 187.45M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 11M [ 17.30M - 205.67M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11 FFT 12M [ 18.87M - 223.85M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6 FFT 14M [ 22.02M - 260.08M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7 FFT 16M [ 25.17M - 296.17M] 4K-2K FFT 18M [ 28.31M - 332.13M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 367.98M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 22M [ 34.60M - 403.74M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11 FFT 24M [ 37.75M - 439.40M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6 FFT 28M [ 44.04M - 510.47M] 1K-2K-7 2K-1K-7 4K-512-7 FFT 36M [ 56.62M - 651.81M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 722.13M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 44M [ 69.21M - 792.25M] 1K-2K-11 2K-1K-11 4K-512-11 FFT 48M [ 75.50M - 862.18M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6 FFT 56M [ 88.08M - 1001.57M] 2K-2K-7 4K-1K-7 FFT 72M [113.25M - 1278.70M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1416.57M] 2K-2K-10 4K-1K-10 FFT 88M [138.41M - 1554.04M] 2K-2K-11 4K-1K-11 FFT 96M [150.99M - 1691.15M] 2K-2K-12 4K-1K-12 4K-2K-6 FFT 112M [176.16M - 1964.39M] 4K-2K-7 FFT 144M [226.49M - 2507.57M] 4K-2K-9 FFT 160M [251.66M - 2777.78M] 4K-2K-10 FFT 176M [276.82M - 3047.18M] 4K-2K-11 FFT 192M [301.99M - 3315.86M] 4K-2K-12 Code:
FFT Configurations: FFT 128K [ 0.20M - 2.62M] 256-256 FFT 256K [ 0.39M - 5.15M] 256-512 512-256 FFT 512K [ 0.79M - 10.12M] 1K-256 256-256-4 256-1K 512-512 FFT 768K [ 1.18M - 15.03M] 256-256-6 FFT 896K [ 1.38M - 17.47M] 256-256-7 FFT 1M [ 1.57M - 19.89M] 1K-512 256-256-8 256-512-4 256-2K 512-256-4 512-1K 2K-256 FFT 1152K [ 1.77M - 22.32M] 256-256-9 FFT 1280K [ 1.97M - 24.73M] 256-256-10 FFT 1408K [ 2.16M - 27.14M] 256-256-11 FFT 1536K [ 2.36M - 29.54M] 256-256-12 256-512-6 512-256-6 FFT 1792K [ 2.75M - 34.33M] 256-512-7 512-256-7 FFT 2M [ 3.15M - 39.10M] 1K-256-4 1K-1K 256-512-8 256-1K-4 512-256-8 512-512-4 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 43.85M] 256-512-9 512-256-9 FFT 2560K [ 3.93M - 48.59M] 256-512-10 512-256-10 FFT 2816K [ 4.33M - 53.32M] 256-512-11 512-256-11 FFT 3M [ 4.72M - 58.04M] 1K-256-6 256-512-12 256-1K-6 512-256-12 512-512-6 FFT 3584K [ 5.51M - 67.44M] 1K-256-7 256-1K-7 512-512-7 FFT 4M [ 6.29M - 76.81M] 1K-256-8 1K-512-4 1K-2K 256-1K-8 256-2K-4 512-512-8 512-1K-4 2K-256-4 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.15M] 1K-256-9 256-1K-9 512-512-9 FFT 5M [ 7.86M - 95.46M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 104.74M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.00M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6 FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 150.85M] 1K-512-8 1K-1K-4 256-2K-8 512-1K-8 512-2K-4 2K-256-8 2K-512-4 2K-2K 4K-256-4 4K-1K FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 187.45M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 11M [ 17.30M - 205.67M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11 FFT 12M [ 18.87M - 223.85M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6 FFT 14M [ 22.02M - 260.08M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7 FFT 16M [ 25.17M - 296.17M] 1K-1K-8 1K-2K-4 512-2K-8 2K-512-8 2K-1K-4 4K-256-8 4K-512-4 4K-2K FFT 18M [ 28.31M - 332.13M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 367.98M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 22M [ 34.60M - 403.74M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11 FFT 24M [ 37.75M - 439.40M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6 FFT 28M [ 44.04M - 510.47M] 1K-2K-7 2K-1K-7 4K-512-7 FFT 32M [ 50.33M - 581.27M] 1K-2K-8 2K-1K-8 2K-2K-4 4K-512-8 4K-1K-4 FFT 36M [ 56.62M - 651.81M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 722.13M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 44M [ 69.21M - 792.25M] 1K-2K-11 2K-1K-11 4K-512-11 FFT 48M [ 75.50M - 862.18M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6 FFT 56M [ 88.08M - 1001.57M] 2K-2K-7 4K-1K-7 FFT 64M [100.66M - 1140.39M] 2K-2K-8 4K-1K-8 4K-2K-4 FFT 72M [113.25M - 1278.70M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1416.57M] 2K-2K-10 4K-1K-10 FFT 88M [138.41M - 1554.04M] 2K-2K-11 4K-1K-11 FFT 96M [150.99M - 1691.15M] 2K-2K-12 4K-1K-12 4K-2K-6 FFT 112M [176.16M - 1964.39M] 4K-2K-7 FFT 128M [201.33M - 2236.48M] 4K-2K-8 FFT 144M [226.49M - 2507.57M] 4K-2K-9 FFT 160M [251.66M - 2777.78M] 4K-2K-10 FFT 176M [276.82M - 3047.18M] 4K-2K-11 FFT 192M [301.99M - 3315.86M] 4K-2K-12 Code:
FFT Configurations (specify with -fft <width>:<middle>:<height> from the set below): FFT 128K [ 0.20M - 2.63M] 256:1:256 FFT 256K [ 0.39M - 5.18M] 256:1:512 512:1:256 FFT 384K [ 0.59M - 7.72M] 256:3:256 FFT 512K [ 0.79M - 10.25M] 256:4:256 FFT 640K [ 0.98M - 12.72M] 256:5:256 FFT 768K [ 1.18M - 15.22M] 256:6:256 256:3:512 512:3:256 FFT 896K [ 1.38M - 17.68M] 256:7:256 FFT 1M [ 1.57M - 20.20M] 256:8:256 256:4:512 512:4:256 FFT 1152K [ 1.77M - 22.62M] 256:9:256 FFT 1.25M [ 1.97M - 25.07M] 256:10:256 256:5:512 512:5:256 FFT 1408K [ 2.16M - 27.52M] 256:11:256 FFT 1.50M [ 2.36M - 30.00M] 1K:3:256 256:12:256 256:6:512 256:3:1K 512:6:256 512:3:512 FFT 1664K [ 2.56M - 32.44M] 256:13:256 FFT 1.75M [ 2.75M - 34.85M] 256:14:256 256:7:512 512:7:256 FFT 1920K [ 2.95M - 37.23M] 256:15:256 FFT 2M [ 3.15M - 39.82M] 1K:4:256 256:8:512 256:4:1K 512:8:256 512:4:512 FFT 2.25M [ 3.54M - 44.57M] 256:9:512 512:9:256 FFT 2.50M [ 3.93M - 49.41M] 1K:5:256 256:10:512 256:5:1K 512:10:256 512:5:512 FFT 2.75M [ 4.33M - 54.24M] 256:11:512 512:11:256 FFT 3M [ 4.72M - 59.13M] 1K:6:256 1K:3:512 256:12:512 256:6:1K 512:12:256 512:6:512 512:3:1K FFT 3.25M [ 5.11M - 63.93M] 256:13:512 512:13:256 FFT 3.50M [ 5.51M - 68.67M] 1K:7:256 256:14:512 256:7:1K 512:14:256 512:7:512 FFT 3.75M [ 5.90M - 73.37M] 256:15:512 512:15:256 FFT 4M [ 6.29M - 78.46M] 1K:8:256 1K:4:512 256:8:1K 512:8:512 512:4:1K FFT 4.50M [ 7.08M - 87.83M] 1K:9:256 256:9:1K 512:9:512 FFT 5M [ 7.86M - 97.36M] 1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K FFT 5.50M [ 8.65M - 106.88M] 1K:11:256 256:11:1K 512:11:512 FFT 6M [ 9.44M - 116.51M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256 FFT 6.50M [ 10.22M - 125.95M] 1K:13:256 256:13:1K 512:13:512 FFT 7M [ 11.01M - 135.29M] 1K:14:256 1K:7:512 256:14:1K 512:14:512 512:7:1K FFT 7.50M [ 11.80M - 144.55M] 1K:15:256 256:15:1K 512:15:512 FFT 8M [ 12.58M - 154.59M] 1K:8:512 1K:4:1K 512:8:1K 4K:4:256 FFT 9M [ 14.16M - 173.03M] 1K:9:512 512:9:1K FFT 10M [ 15.73M - 191.79M] 1K:10:512 1K:5:1K 512:10:1K 4K:5:256 FFT 11M [ 17.30M - 210.53M] 1K:11:512 512:11:1K FFT 12M [ 18.87M - 229.51M] 1K:12:512 1K:6:1K 512:12:1K 4K:6:256 4K:3:512 FFT 13M [ 20.45M - 248.10M] 1K:13:512 512:13:1K FFT 14M [ 22.02M - 266.49M] 1K:14:512 1K:7:1K 512:14:1K 4K:7:256 FFT 15M [ 23.59M - 284.71M] 1K:15:512 512:15:1K FFT 16M [ 25.17M - 304.49M] 1K:8:1K 4K:8:256 4K:4:512 FFT 18M [ 28.31M - 340.79M] 1K:9:1K 4K:9:256 FFT 20M [ 31.46M - 377.72M] 1K:10:1K 4K:10:256 4K:5:512 FFT 22M [ 34.60M - 414.63M] 1K:11:1K 4K:11:256 FFT 24M [ 37.75M - 451.99M] 1K:12:1K 4K:12:256 4K:6:512 4K:3:1K FFT 26M [ 40.89M - 488.59M] 1K:13:1K 4K:13:256 FFT 28M [ 44.04M - 524.79M] 1K:14:1K 4K:14:256 4K:7:512 FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256 FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K FFT 36M [ 56.62M - 671.04M] 4K:9:512 FFT 40M [ 62.91M - 743.74M] 4K:10:512 4K:5:1K FFT 44M [ 69.21M - 816.39M] 4K:11:512 FFT 48M [ 75.50M - 889.11M] 4K:12:512 4K:6:1K FFT 52M [ 81.79M - 961.97M] 4K:13:512 FFT 56M [ 88.08M - 1033.20M] 4K:14:512 4K:7:1K FFT 60M [ 94.37M - 1103.74M] 4K:15:512 FFT 64M [100.66M - 1177.31M] 4K:8:1K FFT 72M [113.25M - 1321.02M] 4K:9:1K FFT 80M [125.83M - 1464.31M] 4K:10:1K FFT 88M [138.41M - 1607.03M] 4K:11:1K FFT 96M [150.99M - 1751.79M] 4K:12:1K FFT 104M [163.58M - 1893.52M] 4K:13:1K FFT 112M [176.16M - 2035.14M] 4K:14:1K FFT 120M [188.74M - 2172.36M] 4K:15:1K Last fiddled with by kriesel on 2021-03-31 at 00:54 Reason: minor edits, added v4.3 fft choices |
2018-12-14, 19:46 | #10 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1101110111101_{2} Posts |
PRP3 run time scaling in V5.0-9c13870 (no P-1)
Gpuowl PRP3 has been run on all known Mersenne prime exponents feasible on its currently available fft lengths, mostly in ascending order. This provides run time scaling, reliability check on the hardware, and a check for any occurrence of false negatives or error detections, from the same run set. The test is being run on an RX480 under Windows 7 x64, along with a running instance of prime95 and mfakto running on an RX550 in the same system.
For the exponents below 216091, the minimum available fft length, 128K, is too large, giving bits/word below 1.5, and in most cases immediate fatal errors. p=132049 runs briefly, at 1.01 bits/word, but detects Gerbicz check errors repeatably in the initial 800 iteration block and exits after 3 rounds of that. For exponents 216091 to 1398269, the run time is highly linear since they all are run at fft length 128K; p^{0.99}. For exponents above 1398269, since the fft length is chosen approximately proportional to the exponent, it seems reasonable to expect the scaling to approximate a power law above 2, since fft multiplication time is, per Knuth and other sources, proportional to n ln n ln ln n. Then a full PRP3 test would take n-1 iterations, or approximately n^{2} ln n ln ln n for large n. In the attachment for CUDALucas run time scaling at https://www.mersenneforum.org/showpo...23&postcount=2 there is scaling to p^{1.85} for 10^{6}<p<10^{7}, and to p^{2.095} for 10^{7}<p<10^{8}. Run time scaling for prime95 for 86243<=p<=2976221 was p^{2.094}.https://www.mersenneforum.org/showpo...78&postcount=2 The scaling for gpuowl appears to be lower than expected and lower than seen for other applications. For 1398269<p<10^{7}, runtime scales as p^{1.518}; for 10^{7}<p<10^{8} it is p^{1.72 to 1.88}, which implies an fft multiplication time scaling proportional to lower than linear, similar to a lower exponent range in CUDALucas. Perhaps gpuowl does not reach asymptotic scaling until higher exponents. From 100M exponent to 100Mdigit, the gpuowl scaling was p^{2.04}, consistent with that. Low n runs appear to be affected by setup overhead in CUDALucas and clLucas also, reducing the power seen in scaling fits. For gpuOwL, the OpenCl compilation each time contributes 2 to 3 seconds overhead. Frequent console or log output may also be contributing. Finally, and importantly, no false negatives and no detected errors were observed. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-17 at 14:52 |
2019-02-13, 01:55 | #11 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3^{3}×263 Posts |
gpuowl .owl file header style versus gpuowl version samples
more <n>.owl or head <n>.owl
Z:\sources\mersennes\gpuowl\ken\v10test>more c77500079.ll (none) Code:
Pí² d gÜ? ?g¦ s¢? Code:
OWL 3 38000009 103000 0 500 Code:
OWL 3 77230663 1414500 0 500 Code:
OWL 3 89000167 41500 0 500 C:\msys64\home\ken\gpuowl-compile\v3.3>more 89000167.owl Code:
OWL 5 Comment: gpuOwL v3.3-bc4a29f; 2018-11-04 03:07:53 UTC Type: PRP Exponent: 89000167 Iteration: 22000 PRP-block-size: 400 Residue-64: 0xb90013de9a857278 Errors: 0 End-of-header: Code:
OWL 5 Comment: gpuOwL v3.8-91c52fa; 2019-02-12 23:14:59 UTC Type: PRP Exponent: 299000059 Iteration: 93760000 PRP-block-size: 400 Residue-64: 0x95d3c1aae6883a8b Errors: 0 End-of-header: Code:
OWL 5 Comment: gpuOwL v3.9-da61ebd; 2018-11-04 03:43:17 UTC Type: PRP Exponent: 89000167 Iteration: 26400 PRP-block-size: 400 Residue-64: 0x0a03f10ca11565dc Errors: 0 End-of-header: C:\msys64\home\ken\gpuowl-compile\v4.3>head 89000167.owl Code:
OWL PRP 7 89000167 144000 0 400 624cac006596e5bb Code:
OWL PRP 7 89000167 22000 0 400 b90013de9a857278 Code:
OWL PRP 7 89000167 0 0 400 0000000000000003 Code:
OWL PRP 8 89000167 44000 0 400 57049b5adf2df847 1 0 Code:
OWL PRP 8 81885841 81760000 860000 400 35d1c3b4bd099ce1 1 0 PRP-only has B1=0 C:\msys64\home\ken\gpuowl-compile\v6.2-e2ffe65>more 86243.owl Code:
OWL PRP 9 86243 800 400 47fcdf05631f4989 Does not include the old 0.1-0.6 LL file formats. Does not include header info for any version above v6.2. Didn't find a way to compile the TF-capable versions on Windows, so no such files to look into. Haven't attempted any P-1-only. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-17 at 14:54 Reason: added line for upper version limit of content |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Reference material discussion thread | kriesel | kriesel | 93 | 2022-09-13 05:02 |
Mersenne Prime mostly-GPU Computing reference material | kriesel | kriesel | 36 | 2022-06-26 17:11 |
Mfaktc-specific reference material | kriesel | kriesel | 9 | 2022-05-15 13:21 |
CUDALucas-specific reference material | kriesel | kriesel | 9 | 2020-05-28 23:32 |
CUDAPm1-specific reference material | kriesel | kriesel | 12 | 2019-08-12 15:51 |