![]() |
![]() |
#67 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
5×112×17 Posts |
![]()
To be clear, the error is caused by the hardware, not by the program. The program does a wonderful job in catching the error!
My quest is (first) with properly resuming after an error or a ctrl+c, and (second) with having saved all the residue files, and having them properly named (CL style). |
![]() |
![]() |
![]() |
#68 | |
"David"
Jul 2015
Ohio
11·47 Posts |
![]() Quote:
I've setup a different Ubuntu 16.30 based system with the latest AMDGPU-PRO driver 17.10 on a FuryX. In this case, timing is incredibly improved. Code:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker Fiji; OpenCL 1.2 AMD-APP (2348.3) LL FFT 4096K (1024*2048*2) of 75002911 (17.88 bits/word) at iteration 0 OpenCL setup: 2419 ms 00020000 / 75002911 [0.03%], ms/iter: 2.400, ETA: 2d 02:00; 777b6635d6b78b75 error 0.125 (max 0.125) 00040000 / 75002911 [0.05%], ms/iter: 2.432, ETA: 2d 02:38; b9fc5678347cad9f error 0.125 (max 0.125) 00060000 / 75002911 [0.08%], ms/iter: 2.432, ETA: 2d 02:38; e7fab5c1f11d0f39 error 0.125 (max 0.125) Code:
.... Iteration 10000 M( 75002911 )C, 0xc9a6d6ecad1fb00c, n = 4096K, clLucas v1.04 err = 0.1914 (0:48 real, 4.7985 ms/iter, ETA 99:57:18) Iteration 20000 M( 75002911 )C, 0x777b6635d6b78b75, n = 4096K, clLucas v1.04 err = 0.1914 (0:48 real, 4.8387 ms/iter, ETA 100:46:48) .... Code:
(clLucas) Iteration 10000 M( 75002911 )C, 0xc9a6d6ecad1fb00c, n = 4096K, clLucas v1.04 err = 0.1914 (0:52 real, 5.1695 ms/iter, ETA 107:40:57) Iteration 20000 M( 75002911 )C, 0x777b6635d6b78b75, n = 4096K, clLucas v1.04 err = 0.1914 (0:52 real, 5.1787 ms/iter, ETA 107:51:42) Iteration 30000 M( 75002911 )C, 0x0f0c343e5174fa89, n = 4096K, clLucas v1.04 err = 0.1914 (0:52 real, 5.2092 ms/iter, ETA 108:28:54) (gpuOwl) gpuOwL v0.1 GPU Lucas-Lehmer primality checker Ellesmere; OpenCL 1.2 AMD-APP (2348.3) LL FFT 4096K (1024*2048*2) of 75002911 (17.88 bits/word) at iteration 0 OpenCL setup: 2419 ms 00020000 / 75002911 [0.03%], ms/iter: 3.677, ETA: 3d 04:35; 777b6635d6b78b75 error 0.125 (max 0.125) 00040000 / 75002911 [0.05%], ms/iter: 3.691, ETA: 3d 04:51; b9fc5678347cad9f error 0.125 (max 0.125) Code:
(W9100 - gpuOwl) gpuOwL v0.1 GPU Lucas-Lehmer primality checker Hawaii; OpenCL 2.0 AMD-APP (1912.5) LL FFT 4096K (1024*2048*2) of 75002911 (17.88 bits/word) at iteration 0 OpenCL setup: 888 ms 00020000 / 75002911 [0.03%], ms/iter: 3.180, ETA: 2d 18:14; 777b6635d6b78b75 error 0.140625 (max 0.140625) 00040000 / 75002911 [0.05%], ms/iter: 3.138, ETA: 2d 17:21; b9fc5678347cad9f error 0.132812 (max 0.140625) (W9100 - clLucas) Iteration 10000 M( 75002911 )C, 0xc9a6d6ecad1fb00c, n = 4096K, clLucas v1.04 err = 0.1914 (0:47 real, 4.7813 ms/iter, ETA 99:35:46) Iteration 20000 M( 75002911 )C, 0x777b6635d6b78b75, n = 4096K, clLucas v1.04 err = 0.1914 (0:48 real, 4.7354 ms/iter, ETA 98:37:42) The doubled performance is pretty amazing - now we just need more FFT sizes :) |
|
![]() |
![]() |
![]() |
#69 | |
"Mihai Preda"
Apr 2015
26508 Posts |
![]() Quote:
- persist checkpoint every "savestep" (new command line argument, defaulting to 500 * logstep). - use new name format for persist checkpoints (but with final extension .ll) - use new checkpoint format. The human-readable info is now at the end. Can be printed nicely with: "tail -1 c<N>.ll" (i.e. use "tail" to print only the very last line, which is the human-readable part). - in general, use file naming in the style of CUDALucas but with .ll extension There may be bugs/problems with these new things, looking for feedback :) Not done yet: no sub-folders. |
|
![]() |
![]() |
![]() |
#70 |
"Mihai Preda"
Apr 2015
23·181 Posts |
![]()
And a couple of other fixes:
- add a trivial checksum, to catch partially-written checkpoints. - correctly handle multiple OpenCL "platforms" (discover all the devices in some multi-device setups) |
![]() |
![]() |
![]() |
#71 | |
"Mihai Preda"
Apr 2015
23×181 Posts |
![]() Quote:
OK, what FFT sizes do you need? (and why?) |
|
![]() |
![]() |
![]() |
#72 |
"Mihai Preda"
Apr 2015
23·181 Posts |
![]()
BTW, did you remark the improved error margin as well? Not a huge deal, but it does extend a bit the exponent range available for a given FFT size. (which changes 'radically' the cost for the exponents 'on the border' that become now included in the lower, POT FFT).
|
![]() |
![]() |
![]() |
#73 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
5×112×17 Posts |
![]()
Yes, we said in the very first post that some 77M exponent can be done with 4K, in spite of the fact that c*Lucas wants more, and we appreciate this!
Short question: Does the new format of the file implies that I can not resume from the old format? (if so, then I will have to wait first to finish 76453229 before playing with the new version, sorry.You do not have to do anything in this direction, whatever format you chose for the future, it is ok with us). Last fiddled with by LaurV on 2017-05-01 at 11:57 |
![]() |
![]() |
![]() |
#74 |
"/X\(‘-‘)/X\"
Jan 2013
2×19×83 Posts |
![]()
Well DC is basically beyond 2048K at this time, mostly in the 2560K range, but starting to reach into the 3072K range. LL, as you have discovered, is in the 4096K and 4608K range.
But it really comes down to what exact FFT sizes will be fastest for the hardware. |
![]() |
![]() |
![]() |
#75 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
10111101010002 Posts |
![]()
There are many people who would appreciate a fast fft modulo k*2^n-1 for the LLR test for the GPU
|
![]() |
![]() |
![]() |
#76 |
"/X\(‘-‘)/X\"
Jan 2013
C5216 Posts |
![]() |
![]() |
![]() |
![]() |
#77 |
"Mark"
Apr 2003
Between here and the
7,069 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1720 | 2023-02-27 03:10 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |