mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-01-09, 00:01   #936
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

Quote:
Originally Posted by preda View Post
While OpenCL *may* be portable to some degree, the performance is not portable (and thus, IMO, the whole point of OpenCL "portability" is moot). I mean that even if it would run on an FPGA, it would probably run extremely slow before perf tuning.

In practice, I strongly expect GpuOwl to not run at all on an FPGA. It does use LDS (Local Data Share) which likely is not available on FPGA. It uses DP FP heavily, which may not be present as specialized hardware sub-elements on the FPGA, and thus would be very expensive and rather slow to implement on plain FPGA.

For FPGA, I think a different design that plays into FPGA's strengths is needed. And maybe some specialized DP units would help too.
I have access to quite a few different FPGAs, and would happily provide them to anyone that wants to develop such a thing for trial factoring or prime searching.

Happy to donate a few hundred of our Acorn boards to the cause, or a dozen VU9Ps. I do also have access to HBM FPGAs, but I don’t expect them to beat nviidia GPUs with the same memory bandwidth - since bandwidth seems to be the issue.
airsquirrels is offline   Reply With Quote
Old 2019-01-09, 00:52   #937
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
but I don’t expect them to beat nviidia GPUs with the same memory bandwidth - since bandwidth seems to be the issue.
OK, for LL testing there would be issues, but what about factoring?

Factoring doesn't need memory bandwidth. Doesn't need DP either.

Would there be any hope of running mfakto (OpenCL) on an FPGA?
GP2 is offline   Reply With Quote
Old 2019-01-09, 03:41   #938
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by GP2 View Post
OK, for LL testing there would be issues, but what about factoring?

Factoring doesn't need memory bandwidth. Doesn't need DP either.

Would there be any hope of running mfakto (OpenCL) on an FPGA?
I don't have experience with FPGA development, so I'm not able to help here. But this is what I would see as an approach:

- extract tiny streamlined, simplified OpenCL components from a trial-factorer. E.g. a very basic and simple sieve, or a simple modular exponentiation.
- test and adapt for the FPGA in separation
- repeat with the next component

- when all the basic simple pieces work, put them together into an FPGA TFer.

Starting with mfackto as a whole.. may not work as easily. Anyway, somebody with more FPGA experience should try I guess.
preda is offline   Reply With Quote
Old 2019-01-09, 04:58   #939
clarke
 
Feb 2009

22·7 Posts
Default

Quote:
Originally Posted by kriesel View Post
It's not broken, it's just your setup is incompatible.
Code:
gpuOwL v1.9- GPU Mersenne primality checker
Radeon 500 Series 8 @f:0.0, gfx804 1203MHz

OpenCL compilation in 2147 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2018-01-23 12:43:49 Central Standard Time]
Starting at iteration 25373000
OK 25373000 / 76812401 [33.03%], 0.00 ms/it; ETA 0d 00:00; 6d6a6ebc97092826 [12:43:57]
OK 25374000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:05; bb937b8a48c69d60 [12:44:17]
OK 25375000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:04; b81a6f51602c2bd8 [12:44:36]
OK 25380000 / 76812401 [33.04%], 11.96 ms/it; ETA 7d 02:50; 60bcb33b85922094 [12:45:44]
OK 25390000 / 76812401 [33.05%], 12.00 ms/it; ETA 7d 03:26; 516093b7988f8ac4 [12:47:52]
OK 25400000 / 76812401 [33.07%], 12.00 ms/it; ETA 7d 03:23; 5313239afe8bcffe [12:50:00]
OK 25420000 / 76812401 [33.09%], 12.00 ms/it; ETA 7d 03:20; d04bc7fd72b07e36 [12:54:07]
OK 25440000 / 76812401 [33.12%], 12.00 ms/it; ETA 7d 03:15; 679e6f34ac35a983
Your setup is RX5xx OpenCL 2.0. Indeed, something is wrong at my end:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon HD 5800 Series 20 @1:0.0, Cypress  850MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
Error: aclBinary init failure

".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^


OpenCL compilation in 2771 ms, with "-I. -cl-fast-relaxed-math  -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2019-01-09 07:49:05]
Starting at iteration 0
OK        0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [07:49:21]
EE     1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:03]
EE     1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:45] (1 errors)
EE     1000 / 76812401 [ 0.00%], 27.15 ms/it; ETA 24d 03:12; c89d15ae90d209ec [07:51:28] (2 errors)
Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully.
clarke is offline   Reply With Quote
Old 2019-01-09, 11:25   #940
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by clarke View Post
Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully.
Have you tried mfakto? (Might keep your HD5870 usefully busy while you look for a solution or save for a new card)
kriesel is online now   Reply With Quote
Old 2019-01-09, 19:44   #941
clarke
 
Feb 2009

348 Posts
Default

Quote:
Originally Posted by kriesel View Post
Have you tried mfakto? (Might keep your HD5870 usefully busy while you look for a solution or save for a new card)
Yep, thank you, mfakto works well. I'll try to figure out if different 15.7.1/15.11.1 OpenCL releases make a difference for gpuowl for now.
clarke is offline   Reply With Quote
Old 2019-01-15, 10:23   #942
SELROC
 

2·3·1,409 Posts
Default

Quote:
Originally Posted by clarke View Post
Your setup is RX5xx OpenCL 2.0. Indeed, something is wrong at my end:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon HD 5800 Series 20 @1:0.0, Cypress  850MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
Error: aclBinary init failure

".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^


OpenCL compilation in 2771 ms, with "-I. -cl-fast-relaxed-math  -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2019-01-09 07:49:05]
Starting at iteration 0
OK        0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [07:49:21]
EE     1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:03]
EE     1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:45] (1 errors)
EE     1000 / 76812401 [ 0.00%], 27.15 ms/it; ETA 24d 03:12; c89d15ae90d209ec [07:51:28] (2 errors)
Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully.

It may be possible that your FFT size is too small for the exponent.
Try to specify the argument "-fft 5M".
  Reply With Quote
Old 2019-01-15, 15:04   #943
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by SELROC View Post
It may be possible that your FFT size is too small for the exponent.
Try to specify the argument "-fft 5M".
Belay that; V1.9 was before Preda implemented a 5M fft in V2.0. The purpose of running V1.9 was to try to get back to a version not requiring OpenCl V2. If fft size is an issue he could try -fft M61 instead of DP. It's slower than 4M DP but gives about 7% higher max exponent for the 4M size, and is faster than 8M DP. But the OpenCl version appears to still be an issue for his old gpu's driver at V1.9. The 4M DP transform in gpuOwL was capable of 78M exponent as I recall.

Last fiddled with by kriesel on 2019-01-15 at 15:07
kriesel is online now   Reply With Quote
Old 2019-01-15, 15:18   #944
SELROC
 

23×7×79 Posts
Default

Quote:
Originally Posted by kriesel View Post
Belay that; V1.9 was before Preda implemented a 5M fft in V2.0. The purpose of running V1.9 was to try to get back to a version not requiring OpenCl V2. If fft size is an issue he could try -fft M61 instead of DP. It's slower than 4M DP but gives about 7% higher max exponent for the 4M size, and is faster than 8M DP. But the OpenCl version appears to still be an issue for his old gpu's driver at V1.9. The 4M DP transform in gpuOwL was capable of 78M exponent as I recall.

So there is no hope for this version ?

Last fiddled with by SELROC on 2019-01-15 at 15:18
  Reply With Quote
Old 2019-01-24, 09:24   #945
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

26548 Posts
Default P-1

It is my pleasure to announce.. P-1 in GpuOwl. Good old classic P-1.

1. worktodo.txt

PFactor=90551623
PFactor=AID,1,2,90551623,-1,77,2
PFactor=N/A,1,2,90551623,-1,77,2

(in all the PFactor cases above, only the exponent and the AID are used)

By default the P-1 task is processed with B1=1M and B2=30 * B1. These can be overriden by prepending the limits to any PFactor line above, with this syntax:
B1=2000000;PFactor=90551623
B1=500000,B2=10000000;PFactor=90551623


The P-1 in GpuOwl always has E=2 (a parameter in stage2). The D parameter ("block size") is normally computed automatically based on the amount of memory available on the GPU. It can also be specified on the command line e.g. -D 210. The block size D must be a multiple of 210. Good values are D=2310 (but that wouldn't fit in a GPU with 8GB RAM), and D=210 or small multiples of 210.

P-1 does not save the work to a savefile. If stopped (crash etc) the progress is lost.

At this stage I'm very interested in bug reports. Most importantly, it situations where a factor which should be detected given the B1/B2, is not found.
preda is offline   Reply With Quote
Old 2019-01-24, 09:29   #946
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

GpuOwl v6.1, just commit on github, has P-1. It needs GMP (for the GCD done on the CPU, as was before with PRP-1)

I must say, it was rather hard for me to understand P-1 stage2. (after the fact it doesn't look so terrible, I could explain it simply now I think)

I found useful Alexander Kruppa's thesis:
https://tel.archives-ouvertes.fr/fil...name/thesis.ps
(although even that was not easy reading).

Last fiddled with by preda on 2019-01-24 at 09:47
preda is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:57.


Fri Jul 7 14:57:39 UTC 2023 up 323 days, 12:26, 0 users, load averages: 0.92, 1.09, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔