![]() |
|
|
#936 | |
|
"David"
Jul 2015
Ohio
11·47 Posts |
Quote:
Happy to donate a few hundred of our Acorn boards to the cause, or a dozen VU9Ps. I do also have access to HBM FPGAs, but I don’t expect them to beat nviidia GPUs with the same memory bandwidth - since bandwidth seems to be the issue. |
|
|
|
|
|
|
#937 | |
|
Sep 2003
2·5·7·37 Posts |
Quote:
Factoring doesn't need memory bandwidth. Doesn't need DP either. Would there be any hope of running mfakto (OpenCL) on an FPGA? |
|
|
|
|
|
|
#938 | |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
Quote:
- extract tiny streamlined, simplified OpenCL components from a trial-factorer. E.g. a very basic and simple sieve, or a simple modular exponentiation. - test and adapt for the FPGA in separation - repeat with the next component - when all the basic simple pieces work, put them together into an FPGA TFer. Starting with mfackto as a whole.. may not work as easily. Anyway, somebody with more FPGA experience should try I guess. |
|
|
|
|
|
|
#939 | |
|
Feb 2009
22·7 Posts |
Quote:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
Error: aclBinary init failure
".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
OpenCL compilation in 2771 ms, with "-I. -cl-fast-relaxed-math -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2019-01-09 07:49:05]
Starting at iteration 0
OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [07:49:21]
EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:03]
EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:45] (1 errors)
EE 1000 / 76812401 [ 0.00%], 27.15 ms/it; ETA 24d 03:12; c89d15ae90d209ec [07:51:28] (2 errors)
|
|
|
|
|
|
|
#940 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
|
|
|
|
|
|
#941 |
|
Feb 2009
348 Posts |
|
|
|
|
|
|
#942 | |
|
2·3·1,409 Posts |
Quote:
It may be possible that your FFT size is too small for the exponent. Try to specify the argument "-fft 5M". |
|
|
|
|
#943 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Belay that; V1.9 was before Preda implemented a 5M fft in V2.0. The purpose of running V1.9 was to try to get back to a version not requiring OpenCl V2. If fft size is an issue he could try -fft M61 instead of DP. It's slower than 4M DP but gives about 7% higher max exponent for the 4M size, and is faster than 8M DP. But the OpenCl version appears to still be an issue for his old gpu's driver at V1.9. The 4M DP transform in gpuOwL was capable of 78M exponent as I recall.
Last fiddled with by kriesel on 2019-01-15 at 15:07 |
|
|
|
|
|
#944 | |
|
23×7×79 Posts |
Quote:
So there is no hope for this version ? Last fiddled with by SELROC on 2019-01-15 at 15:18 |
|
|
|
|
#945 |
|
"Mihai Preda"
Apr 2015
26548 Posts |
It is my pleasure to announce.. P-1 in GpuOwl. Good old classic P-1.
1. worktodo.txt PFactor=90551623 PFactor=AID,1,2,90551623,-1,77,2 PFactor=N/A,1,2,90551623,-1,77,2 (in all the PFactor cases above, only the exponent and the AID are used) By default the P-1 task is processed with B1=1M and B2=30 * B1. These can be overriden by prepending the limits to any PFactor line above, with this syntax: B1=2000000;PFactor=90551623 B1=500000,B2=10000000;PFactor=90551623 The P-1 in GpuOwl always has E=2 (a parameter in stage2). The D parameter ("block size") is normally computed automatically based on the amount of memory available on the GPU. It can also be specified on the command line e.g. -D 210. The block size D must be a multiple of 210. Good values are D=2310 (but that wouldn't fit in a GPU with 8GB RAM), and D=210 or small multiples of 210. P-1 does not save the work to a savefile. If stopped (crash etc) the progress is lost. At this stage I'm very interested in bug reports. Most importantly, it situations where a factor which should be detected given the B1/B2, is not found. |
|
|
|
|
|
#946 |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
GpuOwl v6.1, just commit on github, has P-1. It needs GMP (for the GCD done on the CPU, as was before with PRP-1)
I must say, it was rather hard for me to understand P-1 stage2. (after the fact it doesn't look so terrible, I could explain it simply now I think) I found useful Alexander Kruppa's thesis: https://tel.archives-ouvertes.fr/fil...name/thesis.ps (although even that was not easy reading). Last fiddled with by preda on 2019-01-24 at 09:47 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |