![]() |
|
|
#199 | |
|
Feb 2012
the Netherlands
1110102 Posts |
Quote:
I so far have tested up to 30000000 on the B2 value. I'll do some further testing, but no error's anymore. |
|
|
|
|
|
|
#200 | |
|
Mar 2010
19B16 Posts |
With e=12 and using the 64 bit binary, the maximum working fft length seems to be = 1024K
Once it jumps to 1120K, stage 2 doesn't run due to not enough vRAM available. Code:
CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 1 -b2 1000 -e2 12 18900103 CUDAPm1 v0.10 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" CUDA reports 5746M of 6143M GPU memory free. Using e=12, d=2310, nrp=480 Using approximately 4395M GPU memory. B1 should be at least 2, increasing it. B2 should be at least 750750, increasing it. Starting stage 1 P-1, M18900103, B1 = 2, B2 = 750750, e = 12, fft length = 1120K Doing 27 iterations M18900103, 0xd9cdc4241fd69cb5, offset = 0, n = 1120K, CUDAPm1 v0.10 Stage 1 complete, estimated total time = 0:01 Starting stage 1 gcd. M18900103 Stage 1 found no factor (P-1, B1=2, B2=750750, e=12, n=1120K CUDAPm1 v0.10) Starting stage 2. C:/Users/childers/Dropbox/NFS/cudapm1/build/cudapm1-code-21/cudapm1-code-21/trunk/CUDAPm1.cu(2640) : cudaSafeCall() Runtime API error 2: out of memory. Code:
CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 1 -b2 1000 -e2 12 18800137 CUDAPm1 v0.10 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" CUDA reports 5754M of 6143M GPU memory free. Using e=12, d=2310, nrp=480 Using approximately 4019M GPU memory. B1 should be at least 2, increasing it. B2 should be at least 750750, increasing it. Starting stage 1 P-1, M18800137, B1 = 2, B2 = 750750, e = 12, fft length = 1024K Doing 27 iterations M18800137, 0x2c4be40be0856b5b, offset = 0, n = 1024K, CUDAPm1 v0.10 Stage 1 complete, estimated total time = 0:00 Starting stage 1 gcd. M18800137 Stage 1 found no factor (P-1, B1=2, B2=750750, e=12, n=1024K CUDAPm1 v0.10) Starting stage 2. Zeros: 26762, Ones: 45718, Pairs: 14576 itime: 71.914939, transforms: 1, average: 71914.939000 ptime: 42.435831, transforms: 95060, average: 0.446411 ETA: 0:00 Stage 2 complete, estimated total time = 1:54 Accumulated Product: M18800137, 0xda62bc92cb243523, n = 1024K, CUDAPm1 v0.10 Starting stage 2 gcd. M18800137 Stage 2 found no factor (P-1, B1=2, B2=750750, e=12, n=1024K CUDAPm1 v0.10) Quote:
|
|
|
|
|
|
|
#201 |
|
Jul 2003
So Cal
22×232 Posts |
This is using the 64-bit binary? It looks suspicious that it fails crossing 4096M.
When running the second case, check to see if it really uses about 4019M when running. That value is just an estimate based on what we expect cufft to use. If that's accurate, then cuda is lying when it says we can use 5746M since it uses well below that. If this is the 64 bit binary, then I may need to limit memory use to 4096M on Windows. This won't affect most users. :-) In the meantime, you can increase the fft size until nrp drops to 240 or decrease d to 210 using -d2 210. |
|
|
|
|
|
#202 |
|
Mar 2010
41110 Posts |
Yes, it's the 64 bit binary.
I used MSI Afterburner to measure it: 273MBs used before CPm1, 401MB at first stage, 4345MB at second stage. 4345-273 = 4072, which is "close" to the reported 4019MB. This is what I call MSI Afterburner delta method. ProcessXP reported commited GPU memory as 4,116,400K CUDA might as well report free memory correctly, but for some reason it cant allocate a whole chunk of more than ~4096 MB. That's my guess. Using -d2 dropped vRAM usage to 615MB
Last fiddled with by Karl M Johnson on 2013-05-04 at 09:30 |
|
|
|
|
|
#203 |
|
Apr 2010
Over the rainbow
2×1,303 Posts |
is there a switch to make the onscreen output less frequent? (low exponent without stage 2 done). I get a new line each 2 second in stage 1 and it's a tad annoying.
|
|
|
|
|
|
#204 |
|
Mar 2010
3×137 Posts |
The -c flag?
|
|
|
|
|
|
#205 |
|
Apr 2010
Over the rainbow
2×1,303 Posts |
thanks
|
|
|
|
|
|
#206 |
|
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
23×53 Posts |
Running it on the 580 ftw with the default worktodo, it starts out well:
Code:
CUDAPm1 v0.10 Selected B1=605000, B2=16637500, 4.1% chance of finding a factor CUDA reports 2766M of 3072M GPU memory free. Using e=6, d=2310, nrp=80 Using approximately 2529M GPU memory. Starting stage 1 P-1, M61262347, B1 = 605000, B2 = 16637500, e = 6, fft length = 3360K Doing 873133 iterations Iteration 1000 M61262347, 0xf19a7f6041953a97, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:09 real, 9.1117 ms/iter, ETA 2:12:26) Iteration 2000 M61262347, 0xaf1d15aad49fcee8, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:06 real, 5.7928 ms/iter, ETA 1:24:06) Iteration 3000 M61262347, 0xb702298e7a8c9a8e, n = 3360K, CUDAPm1 v0.10 err = 0.19922 (0:06 real, 5.9176 ms/iter, ETA 1:25:49) Iteration 4000 M61262347, 0xc53d1695707d3dc0, n = 3360K, CUDAPm1 v0.10 err = 0.19141 (0:06 real, 5.8142 ms/iter, ETA 1:24:13) |
|
|
|
|
|
#207 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
Not that I'm complaining! Nice work Karl, Fred et al!
|
|
|
|
|
|
|
#208 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
1101010111012 Posts |
|
|
|
|
|
|
#209 |
|
Jul 2003
So Cal
22×232 Posts |
nrp must be a divisor of phi(d), a seg fault is likely otherwise.
with p = smallest prime that does not divide d: b1 < b2 / p / 53 will not pair some smaller primes, so will possibly give incorrect results. b2 / p < d * (2 * e + 1) will give incorrect results b2 / p < b1 will produce a seg fault at the onset of stage2. Last fiddled with by frmky on 2013-05-05 at 21:00 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |
| World's dumbest CUDA program? | xilman | Programming | 1 | 2009-11-16 10:26 |
| Factoring program need help | Citrix | Lone Mersenne Hunters | 8 | 2005-09-16 02:31 |
| Factoring program | ET_ | Programming | 3 | 2003-11-25 02:57 |