mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-03, 03:43   #177
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AE16 Posts
Default

Quote:
Originally Posted by Aramis Wyler View Post
Will there be a new windows build with the stage1 fix in? I have an unusual 580 that I would be willing to run some tests against.
Dang! That is unusual! Nice amount of RAM, too. Does it OC at all?
kladner is offline   Reply With Quote
Old 2013-05-03, 04:34   #178
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

23·53 Posts
Default

Well, the 1.5gb version runs at 850, so as a lark I tried to crank this one up to 850/1700 as well. Sure enough, it has been stable. I have never tried to take it past 850/1700.

EDIT: Saying it clocks at 850 doesn't always mean anything in speed terms, so I grabbed this out of the mfactc window for reference.
Code:
got assignment: exp=63249397 bit_min=73 bit_max=74 (30.25 GHz-days)
Starting trial factoring M63249397 from 2^73 to 2^74 (30.25 GHz-days)
 k_min = 74662632479820
 k_max = 149325264962435
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
May 03 00:35 | 3827  82.8% |  5.773  15m53s |    471.52    69941    n.a.%

Last fiddled with by Aramis Wyler on 2013-05-03 at 04:36 Reason: a speed reference.
Aramis Wyler is offline   Reply With Quote
Old 2013-05-03, 07:17   #179
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·232 Posts
Default

Still don't have the motivation to track down the problem reading text from ini files, but here's the next version to try.
https://www.dropbox.com/s/2b840sgu33...1_20130502.zip

Again, completely untested in Windows by me.
frmky is offline   Reply With Quote
Old 2013-05-03, 07:30   #180
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·232 Posts
Default

Quote:
Originally Posted by Stef42 View Post
Code:
B2 should be at least 1560000, increasing it.
Starting stage 1 P-1, M9090017, B1 = 120000, B2 = 1560000, e = 6, fft length = 5
12K
I'm not that good in figuring out what it's bound too. Example might help tough.
The limits depend on B1, B2, d2, and e2. It's somewhat non-trivial, which is why the code handles it automatically.
frmky is offline   Reply With Quote
Old 2013-05-03, 07:33   #181
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

84416 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Code:
Starting stage 1 gcd.
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=12625000, e=6, n=3072K CUDAPm1 v0.00)
If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following:
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00)

If so, that's an easy change.
frmky is offline   Reply With Quote
Old 2013-05-03, 08:07   #182
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

19B16 Posts
Default

Yay, it works!
Manged to get to stage 2 with -b1 500 -b2 0.5M, it was using around 1.8GB of vRAM, as the program calculated.
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
Code:
CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117
Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 0 -------
name                GeForce GTX TITAN
totalGlobalMem      -1
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
totalConstMem       65536
Compatibility       3.5
clockRate (MHz)     928
textureAlignment    512
deviceOverlap       1
multiProcessorCount 14

CUDA reports 4095M of 4095M GPU memory free.
Using e=6, d=2310, nrp=480
Using approximately 1737M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0008 ms/iter, ETA 3:17)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03809 (0:10 real, 0.9963 ms/iter, ETA 3:06)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9987 ms/iter, ETA 2:57)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9896 ms/iter, ETA 2:45)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9817 ms/iter, ETA 2:34)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9906 ms/iter, ETA 2:26)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9818 ms/iter, ETA 2:14)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9974 ms/iter, ETA 2:07)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9985 ms/iter, ETA 1:57)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9842 ms/iter, ETA 1:45)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0001 ms/iter, ETA 1:37)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0034 ms/iter, ETA 1:27)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0009 ms/iter, ETA 1:17)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0004 ms/iter, ETA 1:07)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0039 ms/iter, ETA 0:57)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 1.0033 ms/iter, ETA 0:47)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0025 ms/iter, ETA 0:37)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03442 (0:10 real, 1.0026 ms/iter, ETA 0:27)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9994 ms/iter, ETA 0:17)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0015 ms/iter, ETA 0:07)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 3:27
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)
Running the same binary with the same options, but for GTX 480(1.5GB vRAM) results in great success!
Code:
CUDAPm1 -d 1 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117

Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 1 -------
name                GeForce GTX 480
totalGlobalMem      1610285056
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
Compatibility       2.0
clockRate (MHz)     1600
textureAlignment    512
deviceOverlap       1
multiProcessorCount 15

CUDA reports 1404M of 1535M GPU memory free.
Using e=6, d=2310, nrp=240
Using approximately 897M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2635 ms/iter, ETA 4:09)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2055 ms/iter, ETA 3:45)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2043 ms/iter, ETA 3:33)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:13 real, 1.2206 ms/iter, ETA 3:24)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2151 ms/iter, ETA 3:11)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.1971 ms/iter, ETA 2:56)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2213 ms/iter, ETA 2:47)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2120 ms/iter, ETA 2:34)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2175 ms/iter, ETA 2:22)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2178 ms/iter, ETA 2:10)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:13 real, 1.2204 ms/iter, ETA 1:58)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2092 ms/iter, ETA 1:45)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2287 ms/iter, ETA 1:35)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2191 ms/iter, ETA 1:22)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2197 ms/iter, ETA 1:10)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:13 real, 1.2187 ms/iter, ETA 0:57)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03369 (0:12 real, 1.1983 ms/iter, ETA 0:44)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2121 ms/iter, ETA 0:33)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2194 ms/iter, ETA 0:21)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2186 ms/iter, ETA 0:09)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 4:12
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:05
Karl M Johnson is offline   Reply With Quote
Old 2013-05-03, 08:30   #183
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

1010001011102 Posts
Default

I would suggest to fiddle with the E value.
firejuggler is online now   Reply With Quote
Old 2013-05-03, 08:59   #184
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22×232 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
Yes. e can be 2, 4, 6, 8, 10, or 12. Just use -e2 12.
frmky is offline   Reply With Quote
Old 2013-05-03, 09:13   #185
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Setting the Brent-Suyama exponent to the max resulted in a slight increase of memory usage, around 20 additional MBs.
Now, since the other parameter is nrp(-nrp2 n?), I've tried increasing it too, but the program ignored the switch.

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:19
Karl M Johnson is offline   Reply With Quote
Old 2013-05-03, 09:16   #186
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

50568 Posts
Default

then... higher exponent? or higher bound.

Last fiddled with by firejuggler on 2013-05-03 at 09:21
firejuggler is online now   Reply With Quote
Old 2013-05-03, 09:22   #187
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

41110 Posts
Default

We do not seek simple solutions
Will try to find out the threshold of current binary for vRAM, I can confirm it is NOT this: 2147483647 bytes(mempitch).

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:29
Karl M Johnson is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 07:25.


Mon Aug 2 07:25:55 UTC 2021 up 10 days, 1:54, 0 users, load averages: 0.89, 1.10, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.