![]() |
|
|
#485 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
(the situation is different in CUDA, where the FFT sizes are "compact" (small jumps) and some sizes are particularly inefficient, so sometimes it makes sense to go to a bit larger FFT than strictly needed). Last fiddled with by preda on 2018-07-13 at 15:14 |
|
|
|
|
|
|
#486 | ||
|
739 Posts |
Quote:
Quote:
This is what I think: as long as you select the same FFT size, v3.3 is faster than previous versions. The 4M size is too small for 85M exponent, I tested it and it failed, with bits/word > 20. The 5M size is the faster for 85M-86M exponent. |
||
|
|
|
#487 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
In V1.9, the -fft M61 -size 4M can do up to ~83.87M. (some limit <83871433)
On RX550, it is ~7.4-9.2% slower than V2.0 5000K DP, and on RX480, 4M M61 is faster than V2.0 5000K DP, by about 6.3%; Windows 7 x64, same gpu, driver, etc. Code:
gpuOwL v2.0- GPU Mersenne primality checker Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics Note: using long carry and fused tail kernels OpenCL compilation in 1598 ms, with " -DEXP=83871259u -I. -cl-fast-relaxed-math -cl-kernel-arg-info " PRP-3: FFT 5000K (625 * 4096 * 2) of 83871259 (16.38 bits/word) [2018-07-11 09:20:32 Central Daylight Time] OK 75500000 / 83871259 [90.02%], 5.25 ms/it [5.18, 5.81], check 4.76s; ETA 0d 12:12; 68406df4ababed55 [00:45:22] OK 76000000 / 83871259 [90.61%], 5.25 ms/it [5.18, 5.72], check 4.80s; ETA 0d 11:28; 26f19477440580e1 [01:29:10] OK 76500000 / 83871259 [91.21%], 5.25 ms/it [5.19, 5.82], check 4.79s; ETA 0d 10:45; e9bf0d6048a79a9b [02:13:00] Code:
gpuOwL v1.9- GPU Mersenne primality checker Radeon (TM) RX 480 Graphics 36 @28:0.0, Ellesmere 1266MHz OpenCL compilation in 9860 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=83870041u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFGT_61=1 -DLOG_ROOT2=49u " Warning: high word size of 20.00 bits may result in errors Note: using long carry kernels PRP-3: FFT 4M (1024 * 2048 * 2) of 83870041 (20.00 bits/word) [2018-07-13 13:41:07 Central Daylight Time] OK 3710000 / 83870041 [ 4.42%], 4.92 ms/it [4.87, 5.34] CV 2.1%, check 4.09s; ETA 4d 13:33; 1683dd3b253ee628 [13:44:44] OK 3720000 / 83870041 [ 4.44%], 4.93 ms/it [4.88, 5.35] CV 2.2%, check 3.89s; ETA 4d 13:41; d286de5a16debe4d [13:45:38] OK 3740000 / 83870041 [ 4.46%], 4.92 ms/it [4.87, 5.36] CV 2.1%, check 4.08s; ETA 4d 13:35; b84105343fdda8a3 [13:47:20] OK 3760000 / 83870041 [ 4.48%], 4.93 ms/it [4.87, 5.36] CV 2.3%, check 3.88s; ETA 4d 13:42; 9c0b1d4ddcf1bd2c [13:49:03] OK 3780000 / 83870041 [ 4.51%], 4.93 ms/it [4.88, 5.29] CV 2.1%, check 4.03s; ETA 4d 13:46; 44b788f4b55511a2 [13:50:45] OK 3800000 / 83870041 [ 4.53%], 4.93 ms/it [4.88, 5.28] CV 2.0%, check 3.90s; ETA 4d 13:37; 6b319d3fa95c5f7a [13:52:28] OK 3850000 / 83870041 [ 4.59%], 4.95 ms/it [4.89, 5.81] CV 2.8%, check 3.91s; ETA 4d 13:56; 9fbf9129fb4516a8 [13:56:39] OK 3900000 / 83870041 [ 4.65%], 4.99 ms/it [4.88, 6.78] CV 5.4%, check 3.96s; ETA 4d 14:55; 08250aa4bd6bf9c1 [14:00:53] OK 3950000 / 83870041 [ 4.71%], 4.97 ms/it [4.88, 5.98] CV 3.3%, check 3.98s; ETA 4d 14:14; b6958fdc5d218140 [14:05:05] OK 4000000 / 83870041 [ 4.77%], 4.93 ms/it [4.88, 5.39] CV 2.2%, check 3.88s; ETA 4d 13:27; 079471495f27e70b [14:09:15] OK 4050000 / 83870041 [ 4.83%], 4.96 ms/it [4.87, 6.11] CV 3.8%, check 3.90s; ETA 4d 14:03; 8b6870aa4b408e7f [14:13:28] OK 4100000 / 83870041 [ 4.89%], 4.98 ms/it [4.88, 5.53] CV 3.2%, check 3.90s; ETA 4d 14:16; 4cb590ea14c87340 [14:17:40] OK 4150000 / 83870041 [ 4.95%], 4.93 ms/it [4.88, 5.38] CV 2.2%, check 4.09s; ETA 4d 13:12; ba6740911ef7be07 [14:21:51] |
|
|
|
|
|
#488 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Never mind about the AMD system monitor utility. V1.0.0.9 from 2012 and reliably produced only an error message box on my AMD gpu system in Win 7 x64. |
|
|
|
|
|
|
#489 | |
|
11111011100102 Posts |
Quote:
Started a 150M exponent, gpuowl selected 8M FFT, timing is 6-7 ms/it, ETA 11d. |
|
|
|
|
#490 |
|
"Mihai Preda"
Apr 2015
25338 Posts |
I made a little theoretical estimation of the amount of GPU memory used by openowl.
When working with an FFT of size N (i.e. N words), the total amount of GPU memory is [a bit more than] 8*8*N bytes. For example when doing a 20M FFT, this comes to about 1.25GB, and for a 5M FFT about 320MB. Last fiddled with by preda on 2018-07-14 at 08:44 |
|
|
|
|
|
#491 | |
|
2·11·421 Posts |
Quote:
Lm-sensors is so slow that it is almost unusable to monitor adequately and any way it does not show GPU RAM. |
|
|
|
|
#492 |
|
23×3×19 Posts |
|
|
|
|
#493 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
On 20M it may be worth doing a bit higher exponents, 332M, which reach into "100M digits" domain. You can get such exponents from the "manual assignments" page, "first time 100M digits PRP".
|
|
|
|
|
|
#494 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
That's why my memory info is "theoretical", not reported from the GPU. |
|
|
|
|
|
|
#495 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
In msys2/mingw64 (there does not seem to be a make available there), see the attachment for warnings/errors.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |