mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-03, 03:43   #177
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by Aramis Wyler View Post
Will there be a new windows build with the stage1 fix in? I have an unusual 580 that I would be willing to run some tests against.
Dang! That is unusual! Nice amount of RAM, too. Does it OC at all?
kladner is offline   Reply With Quote
Old 2013-05-03, 04:34   #178
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

23×53 Posts
Default

Well, the 1.5gb version runs at 850, so as a lark I tried to crank this one up to 850/1700 as well. Sure enough, it has been stable. I have never tried to take it past 850/1700.

EDIT: Saying it clocks at 850 doesn't always mean anything in speed terms, so I grabbed this out of the mfactc window for reference.
Code:
got assignment: exp=63249397 bit_min=73 bit_max=74 (30.25 GHz-days)
Starting trial factoring M63249397 from 2^73 to 2^74 (30.25 GHz-days)
 k_min = 74662632479820
 k_max = 149325264962435
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
May 03 00:35 | 3827  82.8% |  5.773  15m53s |    471.52    69941    n.a.%

Last fiddled with by Aramis Wyler on 2013-05-03 at 04:36 Reason: a speed reference.
Aramis Wyler is offline   Reply With Quote
Old 2013-05-03, 07:17   #179
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22×232 Posts
Default

Still don't have the motivation to track down the problem reading text from ini files, but here's the next version to try.
https://www.dropbox.com/s/2b840sgu33...1_20130502.zip

Again, completely untested in Windows by me.
frmky is offline   Reply With Quote
Old 2013-05-03, 07:30   #180
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

84416 Posts
Default

Quote:
Originally Posted by Stef42 View Post
Code:
B2 should be at least 1560000, increasing it.
Starting stage 1 P-1, M9090017, B1 = 120000, B2 = 1560000, e = 6, fft length = 5
12K
I'm not that good in figuring out what it's bound too. Example might help tough.
The limits depend on B1, B2, d2, and e2. It's somewhat non-trivial, which is why the code handles it automatically.
frmky is offline   Reply With Quote
Old 2013-05-03, 07:33   #181
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22×232 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Code:
Starting stage 1 gcd.
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=12625000, e=6, n=3072K CUDAPm1 v0.00)
If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following:
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00)

If so, that's an easy change.
frmky is offline   Reply With Quote
Old 2013-05-03, 08:07   #182
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Yay, it works!
Manged to get to stage 2 with -b1 500 -b2 0.5M, it was using around 1.8GB of vRAM, as the program calculated.
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
Code:
CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117
Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 0 -------
name                GeForce GTX TITAN
totalGlobalMem      -1
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
totalConstMem       65536
Compatibility       3.5
clockRate (MHz)     928
textureAlignment    512
deviceOverlap       1
multiProcessorCount 14

CUDA reports 4095M of 4095M GPU memory free.
Using e=6, d=2310, nrp=480
Using approximately 1737M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0008 ms/iter, ETA 3:17)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03809 (0:10 real, 0.9963 ms/iter, ETA 3:06)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9987 ms/iter, ETA 2:57)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9896 ms/iter, ETA 2:45)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9817 ms/iter, ETA 2:34)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9906 ms/iter, ETA 2:26)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9818 ms/iter, ETA 2:14)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9974 ms/iter, ETA 2:07)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9985 ms/iter, ETA 1:57)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9842 ms/iter, ETA 1:45)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0001 ms/iter, ETA 1:37)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0034 ms/iter, ETA 1:27)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0009 ms/iter, ETA 1:17)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0004 ms/iter, ETA 1:07)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0039 ms/iter, ETA 0:57)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 1.0033 ms/iter, ETA 0:47)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0025 ms/iter, ETA 0:37)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03442 (0:10 real, 1.0026 ms/iter, ETA 0:27)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9994 ms/iter, ETA 0:17)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0015 ms/iter, ETA 0:07)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 3:27
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)
Running the same binary with the same options, but for GTX 480(1.5GB vRAM) results in great success!
Code:
CUDAPm1 -d 1 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117

Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 1 -------
name                GeForce GTX 480
totalGlobalMem      1610285056
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
Compatibility       2.0
clockRate (MHz)     1600
textureAlignment    512
deviceOverlap       1
multiProcessorCount 15

CUDA reports 1404M of 1535M GPU memory free.
Using e=6, d=2310, nrp=240
Using approximately 897M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2635 ms/iter, ETA 4:09)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2055 ms/iter, ETA 3:45)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2043 ms/iter, ETA 3:33)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:13 real, 1.2206 ms/iter, ETA 3:24)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2151 ms/iter, ETA 3:11)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.1971 ms/iter, ETA 2:56)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2213 ms/iter, ETA 2:47)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2120 ms/iter, ETA 2:34)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2175 ms/iter, ETA 2:22)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2178 ms/iter, ETA 2:10)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:13 real, 1.2204 ms/iter, ETA 1:58)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2092 ms/iter, ETA 1:45)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2287 ms/iter, ETA 1:35)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2191 ms/iter, ETA 1:22)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2197 ms/iter, ETA 1:10)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:13 real, 1.2187 ms/iter, ETA 0:57)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03369 (0:12 real, 1.1983 ms/iter, ETA 0:44)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2121 ms/iter, ETA 0:33)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2194 ms/iter, ETA 0:21)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2186 ms/iter, ETA 0:09)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 4:12
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:05
Karl M Johnson is offline   Reply With Quote
Old 2013-05-03, 08:30   #183
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2×1,303 Posts
Default

I would suggest to fiddle with the E value.
firejuggler is online now   Reply With Quote
Old 2013-05-03, 08:59   #184
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·232 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
Yes. e can be 2, 4, 6, 8, 10, or 12. Just use -e2 12.
frmky is offline   Reply With Quote
Old 2013-05-03, 09:13   #185
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

19B16 Posts
Default

Setting the Brent-Suyama exponent to the max resulted in a slight increase of memory usage, around 20 additional MBs.
Now, since the other parameter is nrp(-nrp2 n?), I've tried increasing it too, but the program ignored the switch.

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:19
Karl M Johnson is offline   Reply With Quote
Old 2013-05-03, 09:16   #186
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

1010001011102 Posts
Default

then... higher exponent? or higher bound.

Last fiddled with by firejuggler on 2013-05-03 at 09:21
firejuggler is online now   Reply With Quote
Old 2013-05-03, 09:22   #187
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default

We do not seek simple solutions
Will try to find out the threshold of current binary for vRAM, I can confirm it is NOT this: 2147483647 bytes(mempitch).

Last fiddled with by Karl M Johnson on 2013-05-03 at 09:29
Karl M Johnson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 07:28.


Mon Aug 2 07:28:20 UTC 2021 up 10 days, 1:57, 0 users, load averages: 1.27, 1.22, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.