mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

Karl M Johnson 2013-05-07 04:32

It doesn't, so if you get an error, you have to start from the very beginning.

As for the error, I remember getting the same error(unknown error 30, which was explained by Oliver in an adjacent thread) for CUDALucas from time to time.
What's curious, that after that error core clock refused to go higher than 525 Mhz, while memory clock remained the same.
Could be solved by a reboot, as WDDM timeout is disabled.
So, to prevent that pesky error from happening, I had to downclock the memory of the GPU.

Aramis Wyler 2013-05-07 11:06

[QUOTE=Aramis Wyler;339421]It didn't end well.[/QUOTE]

Well, good news. I ran the program again on default settings and it failed during the third ptime section again. But I looked at the error and thought that since it was the sync function, maybe the problem was related to cpu usage. I turned off prime95 and ran the thing again, and sure enough it completed:
[code]Iteration 871000 M61262347, 0x268993cb3b899d21, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:06 real, 5.8528 ms/iter, ETA 0:12)
Iteration 872000 M61262347, 0xdf828a4cb19fc49d, n = 3360K, CUDAPm1 v0.10 err = 0.20313 (0:06 real, 5.8514 ms/iter, ETA 0:06)
Iteration 873000 M61262347, 0x92b46441f57f0dc1, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:06 real, 5.8502 ms/iter, ETA 0:00)
M61262347, 0xfd7ab9d857ea4a36, offset = 0, n = 3360K, CUDAPm1 v0.10
Stage 1 complete, estimated total time = 1:25:43
Starting stage 1 gcd.
M61262347 Stage 1 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.10)
Starting stage 2.
Zeros: 748288, Ones: 847712, Pairs: 172477
itime: 34.363595, transforms: 1, average: 34363.595000
ptime: 945.240834, transforms: 322686, average: 2.929290
ETA: 1:21:38
itime: 42.020420, transforms: 1, average: 42020.420000
ptime: 948.034002, transforms: 322970, average: 2.935363
ETA: 1:05:39
itime: 45.361230, transforms: 1, average: 45361.230000
ptime: 942.894161, transforms: 321866, average: 2.929462
ETA: 49:17
itime: 46.910720, transforms: 1, average: 46910.720000
ptime: 942.954856, transforms: 322050, average: 2.927977
ETA: 32:53
itime: 48.828722, transforms: 1, average: 48828.722000
ptime: 942.975542, transforms: 322794, average: 2.921292
ETA: 16:27
itime: 49.518076, transforms: 1, average: 49518.076000
ptime: 941.506243, transforms: 322458, average: 2.919780
ETA: 0:00
Stage 2 complete, estimated total time = 1:38:50
Accumulated Product: M61262347, 0xb7550a14cb5172b6, n = 3360K, CUDAPm1 v0.10
Starting stage 2 gcd.
M61262347 Stage 2 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.10)[/code]

Though I don't know if it was supposed to find a factor. :)

Aramis Wyler 2013-05-07 11:23

I think there might be a problem somewhere with the calculations, because looking closer I see that after cudapm1 finished the default assignment, it went on and did a number that I had put in there from one of my old pm1 assignments:

[code]Selected B1=530000, B2=12985000, 3.11% chance of finding a factor
CUDA reports 2777M of 3072M GPU memory free.
Using e=6, d=2310, nrp=80
Using approximately 2529M GPU memory.
Starting stage 1 P-1, M61394569, B1 = 530000, B2 = 12985000, e = 6, fft length = 3360K
Doing 764962 iterations
Iteration 1000 M61394569, 0x8888b22cb0159fe4, n = 3360K, CUDAPm1 v0.10 err = 0.20703 (0:09 real, 9.1390 ms/iter, ETA 1:56:21)
Iteration 2000 M61394569, 0x22ce4679c47bde53, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:06 real, 5.8427 ms/iter, ETA 1:14:17)
Iteration 3000 M61394569, 0x4199d13a32c43ec1, n = 3360K, CUDAPm1 v0.10 err = 0.19531 (0:06 real, 5.8484 ms/iter, ETA 1:14:16)
...
Iteration 762000 M61394569, 0x72f2c43f0662fa7d, n = 3360K, CUDAPm1 v0.10 err = 0.22949 (0:06 real, 5.8454 ms/iter, ETA 0:17)
Iteration 763000 M61394569, 0x5d768a7b9cc19fc1, n = 3360K, CUDAPm1 v0.10 err = 0.19727 (0:06 real, 5.8017 ms/iter, ETA 0:11)
Iteration 764000 M61394569, 0xa9c8c0938a1354e6, n = 3360K, CUDAPm1 v0.10 err = 0.20313 (0:05 real, 5.8006 ms/iter, ETA 0:05)
M61394569, 0xe6ed39c645d90fd3, offset = 0, n = 3360K, CUDAPm1 v0.10
Stage 1 complete, estimated total time = 1:14:26
Starting stage 1 gcd.
M61394569 Stage 1 found no factor (P-1, B1=530000, B2=12985000, e=6, n=3360K CUDAPm1 v0.10)
Starting stage 2.
Zeros: 576475, Ones: 669125, Pairs: 135475
itime: 34.168611, transforms: 1, average: 34168.611000
ptime: 742.552935, transforms: 254220, average: 2.920907
ETA: 1:04:43
itime: 41.946698, transforms: 1, average: 41946.698000
ptime: 743.830499, transforms: 254674, average: 2.920716
ETA: 52:04
itime: 45.455219, transforms: 1, average: 45455.219000
ptime: 740.867106, transforms: 253650, average: 2.920824
ETA: 39:08
itime: 46.824025, transforms: 1, average: 46824.025000
ptime: 741.681265, transforms: 253924, average: 2.920879
ETA: 26:08
itime: 48.740888, transforms: 1, average: 48740.888000
ptime: 743.663183, transforms: 254586, average: 2.921069
ETA: 13:05
itime: 49.611376, transforms: 1, average: 49611.376000
ptime: 742.008431, transforms: 254036, average: 2.920879
ETA: 0:00
Stage 2 complete, estimated total time = 1:18:41
Accumulated Product: M61394569, 0xc7cca920aa444fbe, n = 3360K, CUDAPm1 v0.10
Starting stage 2 gcd.
M61394569 Stage 2 found no factor (P-1, B1=530000, B2=12985000, e=6, n=3360K CUDAPm1 v0.10)[/code]
Problem there is that when I ran that with p95, it did find a factor:

[Tue Apr 30 13:09:28 2013]
P-1 found a factor in stage #2, B1=580000, B2=12035000, E=12.
UID: staffen/Romeo, M61394569 has a factor: 189843460261039170580823, AID: cc392de5c69eef9aeaf12ea5c839f9e7

Now, I see that in the p95 that e was 12 (vs 6 for cudapm1), but the bounds were actually smaller than with cudapm1.

owftheevil 2013-05-07 12:14

The first one should have found a factor. I'm testing the second exponent to check if we get the same residues. If so, there's definitely something wrong in the calculations.

Edit: On the first three iterations, the residues match but the errors are different.

Stef42 2013-05-07 13:51

[CODE]Processing 457 - 480 of 480 relative primes
itime: 18.458630, transforms: 6906, average: 2.672840
ptime: 148.896680, transforms: 52262, average: 2.849043
ETA: 0:00
Stage 2 complete, estimated total time = 55:19
Accumulated Product: M61394569, 0xe849edfe1bbc661b, n = 3360K, CUDAPm1 v0.10
Starting stage 2 gcd.
M61394569 Stage 2 found no factor (P-1, B1=530000, B2=6890000, e=6, n=3360K CUDA
Pm1 v0.10)


[/CODE]

Ran it myself, no factor found either.

owftheevil 2013-05-07 13:59

Stef42, if you still have the full output of that run, could you pm them to me?

Stef42 2013-05-07 14:01

I would love to, but I have closed the command prompt already.
Still, it only shows the last of stage 1. Do you might suggest a logging function/tool..?

owftheevil 2013-05-07 14:11

Never mind then. What I realy wanted to do was compare your residues with mine. Any part towards the end of stage 1 would have been sufficient. Aramis Wyler's and mine disagree at iteration 45000.

Stef42 2013-05-07 14:24

[QUOTE=owftheevil;339558]Never mind then. What I realy wanted to do was compare your residues with mine. Any part towards the end of stage 1 would have been sufficient. Aramis Wyler's and mine disagree at iteration 45000.[/QUOTE]

I've got this exponent until iteration 50.000 run for you.
[url]https://dl.dropboxusercontent.com/u/27359940/CUDAPm1%2050000.txt[/url]

firejuggler 2013-05-07 14:26

hmm don't bother. ( was gonna rport the first few iteration, wich seem useless now)

owftheevil 2013-05-07 14:29

As to the cudaDevice Synchronize errors people are seeing, I'm almost convinced it is an Nvidia driver bug. On Linux, I'm getting something similar, only its a timeout error (error 6) instead of an unidentified error. Here's what I know about it.

1. It occurs only with Nvidia drivers with version number >= 300.

2. It occurs only if the card CPm1 is running on is also driving the display.

3. cufftbench sees similar errors which I have traced back to a cufftPlan1d call being unable to allocate resources.

Its as if the driver, going about its business managing the display, is interfering with cufft or some other kernel in CPm1. I need more testing on my card that is not driving the display, and I also am going to make a test version that does away with all the error checking and host synchronizing to see which call is actually failing.


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.