![]() |
|
|
#1816 | |
|
Romulan Interpreter
Jun 2011
Thailand
258B16 Posts |
Quote:
First, CudaLucas was never intended to run on AMD cards. For native/cuda/nvidia cards is still faster than anything else. At least, for everything I run in my rigs, old cards (like 580 and clasic/black Titans) and new cards (like 1080Ti and 2080Ti) included. Second, there is "almost nothing" to improve in CudaLucas (well, there are some minor things, that's why the quotes, but the big picture won't change much), this toy is just a "square, subtract 2, repeat" tool, which uses Nvidia cuda FFT libraries (cuFFT) to do the squaring. These libraries, indeed, fell behind, as you said. They were not updated by Nvidia for ages, and if we can convince them to make (or make by ourselves ) some cuFFT library a hundred times faster than the actual one, all CL would need would be a recompilation. . For the owl, Preda made the libraries from scratch, and they are well tuned for opencl, but nvidia cards are not so good in emulating opencl, they are faster when native cuda is used.
Last fiddled with by LaurV on 2020-02-02 at 05:22 |
|
|
|
|
|
|
#1817 |
|
"Eric"
Jan 2018
USA
21210 Posts |
This statement is a bit misleading since with the new gpuowl updates it has became significantly more efficient on memory bandwidth usage. I am seeing significant speedups on GPUs with high DP ratio like K80, P100, V100, Titan V. There is indeed not much difference for the GTX and RTX cards due to most of them being DP bound instead of memory.
Last fiddled with by xx005fs on 2020-02-02 at 05:26 |
|
|
|
|
|
#1818 |
|
"Sam Laur"
Dec 2018
Turku, Finland
2×3×53 Posts |
Nope, on my RTX 2080 at least, the current version of gpuowl is about 20-30% faster than cudalucas, varying a bit from FFT size to another. The big improvement came in the beginning of December 2019, and smaller optimizations have accumulated since then, so if you've tested gpuowl before that, please test again.
|
|
|
|
|
|
#1819 |
|
Romulan Interpreter
Jun 2011
Thailand
258B16 Posts |
![]() ![]() You may be totally right... We didn't move to such new fancy things yet.. ![]() Edit @nomead, crosspost, I was replying to xx, but what you say is really tempting, BRB soon
Last fiddled with by LaurV on 2020-02-02 at 08:00 |
|
|
|
|
|
#1820 | |
|
"Eric"
Jan 2018
USA
22·53 Posts |
Quote:
|
|
|
|
|
|
|
#1821 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
It seems that your OpenCL compiler does not like __attribute__((opencl_unroll_hint(1))). To work around that, simply pass "-use UNROLL_ALL" (and none of the other UNROLL_ options), or, if running on a Nvidia card, don't pass any UNROLL option at all.
Quote:
|
|
|
|
|
|
|
#1822 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
As the error says, you can't use "WORKINGOUT4" with that FFT size.
Did you try running the program without any -use options? does that work? Quote:
|
|
|
|
|
|
|
#1823 | |
|
"Jorge Coveiro"
Nov 2006
Moura, Portugal
2·13 Posts |
Quote:
I was just testing the "optimized settings" for Nvidia cards, but it seems that I can't use WORKINGOUT4. Going to test again and publish the results for the GTX1660. Last fiddled with by JCoveiro on 2020-02-02 at 20:45 |
|
|
|
|
|
|
#1824 | |
|
"William Garnett III"
Oct 2002
Bensalem, PA
2·43 Posts |
Quote:
However even with the iteration times being a couple milleseconds slower on gpuOwL versus CUDALucas (plus a couple millesecond slowdown to Prime95 if it is running too) since gpuOwL eliminates the need for a double-check that makes gpuOwL the overall time saver winner over CUDALucas for me. I only did one PRP double-check with gpuOwL and I occasionally do LL double-checks with CUDALucas. Since the 1/32 double-precision ratio is terrible I mostly stick with Trial Factoring using mfaktc. Last fiddled with by wfgarnett3 on 2020-02-06 at 09:34 |
|
|
|
|
|
|
#1825 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
536310 Posts |
But it doesn't. There is a PRP DC work type for good reasons;
1) errors may occur outside the code that the GEC occurs, both in the software and in the manual reporting process, and some have already been confirmed to occur; 2) PRP DC guards against someone forging PRP first test submissions; 3) PRP GEC itself has a very low error rate, but not zero. Gerbicz himself has given error rate estimates. Quote:
|
|
|
|
|
|
|
#1826 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31×173 Posts |
CUDALucas still has its place;
faster on a few gpu models than gpuowl; will run on older NVIDIA gpus that are entirelly incapable of running gpuowl because they don't support the required OpenCL level for gpuowl; relatively current gpuowl versions don't do LL so can't do LLDC (although v0.5 and v0.6 gpuowl can with 4M fft) It would be great if CUDALucas had the Jacobi check. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |