![]() |
|
|
#463 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Or run v2.0 which has only 5000K, while you wait. It should work up to around 93M. It's around 5.25ms/iter on an RX-480. Code:
gpuOwL v2.0- GPU Mersenne primality checker Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics Note: using long carry and fused tail kernels OpenCL compilation in 1544 ms, with " -DEXP=83871259u -I. -cl-fast-relaxed-math -cl-kernel-arg-info " PRP-3: FFT 5000K (625 * 4096 * 2) of 83871259 (16.38 bits/word) [2018-07-05 16:49:02 Central Daylight Time] |
|
|
|
|
|
|
#464 | |
|
2·4,391 Posts |
Quote:
|
|
|
|
|
#465 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
a) save a safety duplicate copy of the checkpoint file and then try the current exponent on v2.0 b) run the current exponent on 8M to completion in 3.x c) switch to a different exponent that can run on v2.0 from the start in 5000K fft length d) wait on the current exponent until Preda provides a 5M length in 3.x Last fiddled with by kriesel on 2018-07-10 at 19:59 |
|
|
|
|
|
|
#466 | |
|
3×389 Posts |
Quote:
|
|
|
|
|
#467 |
|
2×3×967 Posts |
gpuOwl Memory Consumption (smemstat -m): PID Swap USS PSS RSS D User Command 1230 0.000 57.246 69.139 116.152 ↑ sel ./openowl 1301 0.000 55.887 67.527 113.723 ↑ sel ./openowl 1278 0.000 52.070 64.017 111.398 ↑ sel ./openowl 1253 0.000 51.504 63.448 110.934 ↑ sel ./openowl 1326 0.000 43.871 55.805 103.395 ↑ sel ./openowl 4078 0.000 0.246 0.445 2.848 ↑ root smemstat -m Note: Memory reported in units of megabytes. |
|
|
|
#468 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Hi, I'm happy to bring some exciting news: I have upgraded openOwl's FFT framework to make incorporating some NPOT sizes easier; and added a new factor-5 "middle step". Thus, openOwl should now support these FFT sizes: 4M, 5M, 8M, 10M, 16M, 20M.
What is good is that two of these sizes are particularly useful: 5M for wavefront PRP (80M -- 96M), and 20M for "100 million digits" PRP. The speeds (on my Vega64, stock, air, 1400MHz, 150W) are roughly 2.5ms/it for 5M FFT, and 9.77ms/it for 20M FFT. The FFT size is by default chosen automatically based on the exponent, but can be also be "forced" on the command line with -fft, e.g.: "-fft 8M" "-fft +1" or "-fft -1" (use the next higher/lower size, relative to the auto-selected size). Another piece of "news" is that openOwl uses "rolling offset", which means that it dynamically changes the offset when an error is encountered. This trick allows to continue an exponent at the very upper edge of a given FFT size, where numerical errors are present. In my observations, the benefit is small, allowing an exponent-size extension of (less than) 0.5%. The "-block" command line argument sets the "block size" of the GEC ("Gerbicz Error Checking"). The values accepted now are 100, 200, 400. An error check is done at every block^2 iterations (thus, 10K iterations for -block 100, and 160K iterations for block 400). So, a smaller block detects errors earlier because it checks more often. The drawback is the cost, block 100 having an overhead of roughly 3%, while block 400 an overhead of roughly 0.75%). Default is block 200, overhead 1.5%, check every 40K iterations. The block size can only be set when starting a new exponent, it being fixed afterwards for the exponent. Bugs are expected. Last fiddled with by preda on 2018-07-13 at 09:38 |
|
|
|
|
|
#469 |
|
Jun 2003
5,087 Posts |
Is it difficult to do a factor 3 FFT? That would give you 3M & 6M (although 3M wouldn't be useful for PRP tests as there are no candidates available for it).
|
|
|
|
|
|
#470 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
Quote:
|
|
|
|
|
|
|
#471 | |
|
"Mihai Preda"
Apr 2015
137110 Posts |
Quote:
It *should* be easy to replace the "5" with either 3 or 3*3. But a 6M FFT is not really "hot" yet. (may become useful later, when the wavefront moves past 5M FFT, but that may take years). OTOH 9 would allow a 18M FFT, which might be fastest for 100M-digits. |
|
|
|
|
|
|
#472 | |
|
7·557 Posts |
Quote:
Currently testing latest. It selected fft 5M on the current exponents 85M. The timing is 4-5 ms/it. Waiting for completion. |
|
|
|
|
#473 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |