mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

kladner 2013-05-03 03:43

[QUOTE=Aramis Wyler;339072]Will there be a new windows build with the stage1 fix in? I have an [URL="http://www.evga.com/products/pdf/03G-P3-1591.pdf"]unusual 580[/URL] that I would be willing to run some tests against.[/QUOTE]

Dang! That [U]is[/U] unusual! Nice amount of RAM, too. Does it OC at all?

Aramis Wyler 2013-05-03 04:34

Well, the 1.5gb version runs at 850, so as a lark I tried to crank this one up to 850/1700 as well. Sure enough, it has been stable. I have never tried to take it past 850/1700.

EDIT: Saying it clocks at 850 doesn't always mean anything in speed terms, so I grabbed this out of the mfactc window for reference.
[CODE]
got assignment: exp=63249397 bit_min=73 bit_max=74 (30.25 GHz-days)
Starting trial factoring M63249397 from 2^73 to 2^74 (30.25 GHz-days)
k_min = 74662632479820
k_max = 149325264962435
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
May 03 00:35 | 3827 82.8% | 5.773 15m53s | 471.52 69941 n.a.%
[/CODE]

frmky 2013-05-03 07:17

Still don't have the motivation to track down the problem reading text from ini files, but here's the next version to try.
[URL="https://www.dropbox.com/s/2b840sgu33bqm6l/cudapm1_20130502.zip"]https://www.dropbox.com/s/2b840sgu33bqm6l/cudapm1_20130502.zip[/URL]

Again, completely untested in Windows by me.

frmky 2013-05-03 07:30

[QUOTE=Stef42;339030][CODE]B2 should be at least 1560000, increasing it.
Starting stage 1 P-1, M9090017, B1 = 120000, B2 = 1560000, e = 6, fft length = 5
12K[/CODE]

I'm not that good in figuring out what it's bound too. Example might help tough.[/QUOTE]

The limits depend on B1, B2, d2, and e2. It's somewhat non-trivial, which is why the code handles it automatically.

frmky 2013-05-03 07:33

[QUOTE=owftheevil;339065][CODE]
Starting stage 1 gcd.
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=12625000, e=6, n=3072K CUDAPm1 v0.00)
[/CODE][/QUOTE]
If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following:
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00)

If so, that's an easy change.

Karl M Johnson 2013-05-03 08:07

Yay, it works!
Manged to get to stage 2 with -b1 500 -b2 0.5M, it was using around 1.8GB of vRAM, as the program calculated.
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
[CODE]
CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117
Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 0 -------
name GeForce GTX TITAN
totalGlobalMem -1
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
totalConstMem 65536
Compatibility 3.5
clockRate (MHz) 928
textureAlignment 512
deviceOverlap 1
multiProcessorCount 14

CUDA reports 4095M of 4095M GPU memory free.
Using e=6, d=2310, nrp=480
Using approximately 1737M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0008 ms/iter, ETA 3:17)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03809 (0:10 real, 0.9963 ms/iter, ETA 3:06)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9987 ms/iter, ETA 2:57)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9896 ms/iter, ETA 2:45)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9817 ms/iter, ETA 2:34)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9906 ms/iter, ETA 2:26)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9818 ms/iter, ETA 2:14)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9974 ms/iter, ETA 2:07)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9985 ms/iter, ETA 1:57)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9842 ms/iter, ETA 1:45)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0001 ms/iter, ETA 1:37)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0034 ms/iter, ETA 1:27)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0009 ms/iter, ETA 1:17)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0004 ms/iter, ETA 1:07)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0039 ms/iter, ETA 0:57)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 1.0033 ms/iter, ETA 0:47)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0025 ms/iter, ETA 0:37)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03442 (0:10 real, 1.0026 ms/iter, ETA 0:27)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9994 ms/iter, ETA 0:17)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0015 ms/iter, ETA 0:07)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 3:27
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)[/CODE]

Running the same binary with the same options, but for GTX 480(1.5GB vRAM) results in great success!
[CODE]CUDAPm1 -d 1 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117

Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
------- DEVICE 1 -------
name GeForce GTX 480
totalGlobalMem 1610285056
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
Compatibility 2.0
clockRate (MHz) 1600
textureAlignment 512
deviceOverlap 1
multiProcessorCount 15

CUDA reports 1404M of 1535M GPU memory free.
Using e=6, d=2310, nrp=240
Using approximately 897M GPU memory.
B1 should be at least 143687, increasing it.
Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K
Doing 207401 iterations
Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2635 ms/iter, ETA 4:09)
Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2055 ms/iter, ETA 3:45)
Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2043 ms/iter, ETA 3:33)
Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:13 real, 1.2206 ms/iter, ETA 3:24)
Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2151 ms/iter, ETA 3:11)
Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.1971 ms/iter, ETA 2:56)
Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2213 ms/iter, ETA 2:47)
Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2120 ms/iter, ETA 2:34)
Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2175 ms/iter, ETA 2:22)
Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2178 ms/iter, ETA 2:10)
Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:13 real, 1.2204 ms/iter, ETA 1:58)
Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2092 ms/iter, ETA 1:45)
Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2287 ms/iter, ETA 1:35)
Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2191 ms/iter, ETA 1:22)
Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2197 ms/iter, ETA 1:10)
Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:13 real, 1.2187 ms/iter, ETA 0:57)
Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03369 (0:12 real, 1.1983 ms/iter, ETA 0:44)
Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2121 ms/iter, ETA 0:33)
Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2194 ms/iter, ETA 0:21)
Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2186 ms/iter, ETA 0:09)
M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00
Stage 1 complete, estimated total time = 4:12
Starting stage 1 gcd.
M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)[/CODE]

firejuggler 2013-05-03 08:30

I would suggest to fiddle with the E value.

frmky 2013-05-03 08:59

[QUOTE=Karl M Johnson;339106]
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)?
[/QUOTE]
Yes. e can be 2, 4, 6, 8, 10, or 12. Just use -e2 12.

Karl M Johnson 2013-05-03 09:13

Setting the Brent-Suyama exponent to the max resulted in a slight increase of memory usage, around 20 additional MBs.
Now, since the other parameter is nrp(-nrp2 n?), I've tried increasing it too, but the program ignored the switch.

firejuggler 2013-05-03 09:16

then... higher exponent? or higher bound.

Karl M Johnson 2013-05-03 09:22

We do not seek simple solutions:smile:
Will try to find out the threshold of current binary for vRAM, I can confirm it is NOT this: 2147483647 bytes(mempitch).


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.