mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

ixfd64 2013-03-03 01:43

I may be wrong, but I believe stage 2 can be easily parallelized. If that's true, then the GPU firepower will really come in handy.

owftheevil 2013-03-03 03:59

I have seen one case now where an exponent, 59756923 passes the round off test but show no factor. Increasing the fft correctly finds the factor. Going to have to slow it down a little.

frmky 2013-03-03 04:55

For that case, the max error is significantly larger than the average error. Looks like an average error < [STRIKE]0.15[/STRIKE] 0.1 might be an appropriate check with some safety margin?

[CODE][childers@physicstitan cudapm1]$ ./CUDA-Pm1 59756923, -b1 1100

Starting Stage 1 P-1, M59756923, B1 = 1100, fft length = 3072K
Doing 1637 iterations
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a longer FFT.
Iteration = 27 < 1000 && err = 0.50000 >= 0.35, increasing n from 3072K
Starting Stage 1 P-1, M59756923, B1 = 1100, fft length = 3200K
Doing 1637 iterations
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a longer FFT.
Iteration 100, average error = 0.08286, max error = 0.27734
Iteration 200, average error = 0.08826, max error = 0.29688
Iteration 300, average error = 0.09689, max error = 0.28125
Iteration 400, average error = 0.10635, max error = 0.30859
Iteration 500, average error = 0.11458, max error = 0.27344
Iteration 600, average error = 0.11786, max error = 0.28125
Iteration 700, average error = 0.11941, max error = 0.28125
Iteration 800, average error = 0.12019, max error = 0.28125
Iteration 900, average error = 0.12194, max error = 0.31250
Iteration 1000, average error = 0.12054 < 0.25 (max error = 0.31250), continuing test.
M59756923, 0xd040e885dd81e22d, offset = 0, n = 3200K, CUDA-P-1 v0.00
Stage 1 complete, estimated total time = 0:53
M59756923 has a factor: 1
[/CODE]

frmky 2013-03-03 05:07

[QUOTE=owftheevil;331717]./CUDA-pm1 60593041, -b1 1000, [-f 3360k][/QUOTE]
How is this factor found with B1=1000?
P-1 = 2^3 * 3^5 * [B]2551[/B] * 60593041 * [B]P9[/B] * [B]P23[/B]

Batalov 2013-03-03 07:32

It's composite, so that's ok.
[CODE]2105528336291622770155712978260232660484461209 =
969488657 * 19874517449 * 2208979902697 * 49468643416729[/CODE]each of which passes with B1=41 (!).
There are more factors known for [URL="http://mersenne.org/report_exponent/?exp_lo=60593041&exp_hi=10000&B1=Get+status"]this Mp[/URL]

frmky 2013-03-03 07:34

Ah, makes sense. Thanks! :smile:

Dubslow 2013-03-03 10:51

[QUOTE=owftheevil;331733]I'm not sure, but I think Dubslow is responsible for the roundoff test part. Its hard to tell who did what on CuLu.

Edit: I didn't see the PS. I have so far been too lazy to make a different message for no factor found. I was thinking of just adding "but you already knew that, didn't you."[/QUOTE]
I added a bit, but msft had the gist of it.

[QUOTE=henryzz;331735]I can see that cpus are going to become obsolete for P-1 stage 1 soon. This should help kill the P-1 deficit.[/QUOTE]
I wouldn't quite go that far yet, especially w.r.t. stage 2. :smile:

[QUOTE=frmky;331757]For that case, the max error is significantly larger than the average error. Looks like an average error < [STRIKE]0.15[/STRIKE] 0.1 might be an appropriate check with some safety margin?
[/QUOTE]

Wowzers... I've not seen anything like that before. Perhaps a better solution would be maxerr/avgerr < 1.5 (or maybe 2)?

owftheevil 2013-03-03 14:58

1 Attachment(s)
Ok, this should be better. 3%-4% slower, but gives correct results, even when the max error is allowed to go as high as 0.42. Also fixes the error reporting problem.

ET_ 2013-03-03 18:59

[QUOTE=owftheevil;331784]Ok, this should be better. 3%-4% slower, but gives correct results, even when the max error is allowed to go as high as 0.42. Also fixes the error reporting problem.[/QUOTE]

With this update I got a couple of strange behaviors:

1:
[code]
luigi@luigi-ubuntu:~/luigi/CUDA/cudapm1-0.00$ ./CUDA-Pm1 4170308402961950452420687314125107372845632692860124825390003761727514150572517983869509135472975278394865154210790597209778982578895669768763371749038447454396115727404741278971617695528084038894140322072199744865271524521758726031117787322230290427036555791315034863880063825719334586180093, -b1 1000

Can't open workfile worktodo.txt
[/code]

I suppose that the P-1 program actually only works with mersenne exponents, and that's not a bad thing, but the search for the string "Can't open workfile worktodo.txt" on the source code reported null :sad:


2:
[code]
luigi@luigi-ubuntu:~/luigi/CUDA/cudapm1-0.00$ ./CUDA-Pm1 60593041, -b1 1000

Starting Stage 1 P-1, M60593041, B1 = 1000, fft length = 3200K
Doing 1475 iterations
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a longer FFT.
Iteration 100, average error = 0.22696, max error = 0.34664
^C SIGINT caught, writing checkpoint.Iteration 200, average error = 0.26131, max error = 0.34241
Iteration 300, average error = 0.27237, max error = 0.33556
^C^C SIGINT caught, writing checkpoint. SIGINT caught, writing checkpoint.Iteration 400, average error = 0.27678, max error = 0.36553
Iteration 500, average error = 0.28131, max error = 0.33734
Iteration 600, average error = 0.28373, max error = 0.33752
Iteration 700, average error = 0.28460, max error = 0.34566
Iteration 800, average error = 0.28564, max error = 0.35941
Iteration 900, average error = 0.28658, max error = 0.35676
Iteration 1000, average error = 0.28638 < 0.25 (max error = 0.36553), continuing test.
Estimated time spent so far: 0:39
[/code]

i.e. the ctrl-C is trapped during the error-checking routine, but is not passed to the program.

Luigi

Dubslow 2013-03-04 00:22

The second is deliberate (though I forget why). It should quit immediately after the roundoff test is finished (as it seems it did).

As for the first, it's probably a printf substitution -- search for "Can't open workfile %s" or, more safely, search for "Can't open" or "Can't open workfile".

owftheevil 2013-03-06 00:09

[QUOTE=ixfd64;331747]I may be wrong, but I believe stage 2 can be easily parallelized. If that's true, then the GPU firepower will really come in handy.[/QUOTE]

You are right. The way I am seeing it now, stage two naturally splits into 3 tasks that can be separated into different streams. Not sure yet how much this will speed things up and how much will have the different streams stepping on each others toes.


All times are UTC. The time now is 23:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.