![]() |
|
|
#1 |
|
Oct 2018
22 Posts |
I'm currently performing a Lucas Lehmer test on a 100 million digit prime using CudaLucas. Can it handle numbers that large?
|
|
|
|
|
|
#2 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
|
|
|
|
|
|
|
#3 |
|
Oct 2018
22 Posts |
Thanks - I searched for ages without finding that (before I asked here). The exponent in question is 3.3*10^8 which looks to be above the limit. Does that mean I must abandon my test and find another way?
EDIT... SORRY, IT GOES UP TO 1*10^9 doesn't it? So I'm okay. Not sure if I'm being daft. Last fiddled with by robertfrost on 2018-10-26 at 14:08 |
|
|
|
|
|
#4 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Actually, it turns out, upon further investigation, CUDALucas theoretically goes up to 231-1. It will fft benchmark and thread benchmark to 256M length, and its max exponent is capped at 2147483647. See the attachment at post 3 of
https://www.mersenneforum.org/showthread.php?t=23371 and the CUDALucas reference thread linked at that thread. |
|
|
|
|
|
#5 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Yes, cudaLucas is limited to signed 32-bits word for exponent, but sooner you will reach the limit for FFT due to the memory of the card, unless you rewrite the cuFFT library by yourself.
|
|
|
|
|
|
#6 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172208 Posts |
Quote:
Code:
Wed Jan 09 04:41:05 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 378.78 Driver Version: 378.78 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro 2000 WDDM | 0000:02:00.0 On | N/A | |100% 78C P0 N/A / N/A | 88MiB / 1024MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... WDDM | 0000:03:00.0 Off | N/A | | 66% 82C P2 220W / 250W | 1619MiB / 11264MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1868 C ... Documents\mfaktc q2000\mfaktc-win-64.exe N/A | | 1 4644 C ...CUDALucas2.06beta-CUDA8.0-Windows-x64.exe N/A | +-----------------------------------------------------------------------------+ Code:
Continuing M999999937 @ iteration 4302 with fft length 57344K, 0.00% done | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Jan 09 04:45:26 | M999999937 5000 0xb723ad2cf90fefd5 | 57344K 0.18750 40.3755 28.18s | 473:09:25:34 0.00% | | Jan 09 04:46:07 | M999999937 6000 0x00c230e56a4bc3ca | 57344K 0.20313 40.6178 40.61s | 472:20:17:29 0.00% | | Jan 09 04:46:48 | M999999937 7000 0x7d01674dde8ecc02 | 57344K 0.18945 40.9224 40.92s | 472:22:59:37 0.00% | Extrapolating linearly (which is optimistic; above 2G, code gets a bit bigger) and note, while I was composing this, as the gpu warmed up, the projected run time increased about 0.5% beyond what's tabulated here: Code:
p VRAM GB runtime (years per exponent) M1G 1.62 1.3 M2G 3.24 2.6 M3G 4.86 3.9 M3.7G 5.99 4.8 M4G 6.48 5.2 M5G 8.10 6.5 M6G 9.72 7.8 M6.8G 11.02 8.8 M7G 11.34 9.1 Any idea why signed int was used instead of unsigned for exponent, or how hard it would be to change (hidden complications)? Last fiddled with by kriesel on 2019-01-09 at 11:18 |
|
|
|
|
|
|
#7 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Please disregard the run times in the preceding post. The only one that's credible is the 1.3 years for M1G. The run times should be scaling at approximately p2.1, not linearly. The extrapolation table has been adjusted and extended to include estimates for some typical gpu memory capacities, and posted at https://www.mersenneforum.org/showpo...93&postcount=7
|
|
|
|
|
|
#8 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001100112 Posts |
For the records, cuFFT uses more memory than gwlib/P95 does, and not always transparent for the user. I was never able to run 100M digit LL test (332M+ expo) with my GTX580's with 1.5GB memory (I still own 4 of them, only 2 in production, the other 2 shelved, no available PCIE slots). It will not say that it can't run, but you get a lot of strange errors and mismatches somewhere after a million iteration (for example) and you are never able to finish a test.
For the 3GB version of the same card, you can go to about 550M (can't remember exactly the numbers, I had 2 such cards and sold them years ago). However, my 6GB Titans are currently testing M666666667 (ETA in ~4 months) and there is no problem with it. Your CPU does the calculus sequential, and therefore one iteration of LL does not need much memory. In the GPU, all the butterfly is done in the same time in parallel, so cuFFT operates with all the data, somehow (well, this is not really true, but that is the idea) so it needs more memory that the few MB you give to p95 for LL tests. More I can't say, but you don't know if it works until you really do a complete test at that size - backed up by a parallel run in a second card, of course, otherwise you lose the time - I get mismatching errors and i need to resume weekly (2-3 times per month) at the clocks I push the Titans. Last fiddled with by LaurV on 2019-01-10 at 09:22 |
|
|
|
|
|
#9 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Have you tried using nvidia-smi to show gpu memory usage? Gpu-Z is useful for some things but it seems to show memory usage approximately mod 4GB by comparison to nvidia-smi.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| About GNFS and size of smooth numbers | JuanraG | Factoring | 7 | 2014-11-04 16:43 |
| How to pick the best FFT size: a CUDALucas guide | Karl M Johnson | GPU Computing | 16 | 2013-11-03 05:30 |
| How to handle ECM results? | dbaugh | PrimeNet | 6 | 2012-11-09 19:27 |
| Can msieve handle c197's with SNFS? | david314 | Msieve | 21 | 2012-07-29 15:21 |
| New PC can handle two instances of Prime 95 | Bundu | Software | 9 | 2004-08-21 02:29 |