mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

henryzz 2013-04-25 09:48

1024 of memory as it is close it might be worth slightly reducing B2 in order to fit. B1 can be increased to compensate.

James Heinrich 2013-04-25 10:47

[QUOTE=frmky;338230]George has indicated that he will add support for the results output soon.[/QUOTE]Can you please post a sample of possible variations in the output? I need to add support to both mersenne.org and mersenne.ca

owftheevil 2013-04-25 11:02

[QUOTE=henryzz;338248]1024 of memory as it is close it might be worth slightly reducing B2 in order to fit. B1 can be increased to compensate.[/QUOTE]

Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done.

The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs.

ET_ 2013-04-25 11:13

[QUOTE=owftheevil;338255]Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done.

The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So [B]until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs[/B].[/QUOTE]

Good advice! This will help me shorten the time required for the speed tests :smile:

Luigi

henryzz 2013-04-25 12:58

[QUOTE=owftheevil;338255]Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done.

The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs.[/QUOTE]
B2 normally effects memory usage with prime95 and gmp-ecm. My problem for misunderstanding.

Stef42 2013-04-25 17:16

No Windows version yet? :redface:

James Heinrich 2013-04-25 17:26

[QUOTE=Stef42;338288]No Windows version yet? :redface:[/QUOTE]I would also very much appreciate the ability to try it out :smile:

frmky 2013-04-26 00:45

Autoselection of B1, B2, and GPU memory related parameters (d2, e, nrp) should now work. It tries to use as much GPU memory as it thinks is safe, so let me know if you get memory allocation errors.

It still lacks proper error checking in many places, checkpointing, and the ability to interrupt stage 2 with a Ctrl-C. If anyone wants to dive into that code, feel free!

Also, I'm not set up to compile on Windows with both CUDA and GMP. If anyone here is, I'm sure it will be appreciated! :smile:

frmky 2013-04-26 06:48

Haven't posted timing in a while.

M60973753 P-1, B1=535000, B2=10432500, e=6, n=3584K
Time for stage 1: 56:27
Time for stage 2: 47:18

This was on a K20. A GTX Titan should be a bit faster. My GTX 480 will likely take nearly twice as long.

And it does find factors. :smile:
M60870653 has a factor: 87951105041429114235889 (P-1, B1=600000, B2=12000000, e=6, n=3584K CUDAPm1 v0.00)

ET_ 2013-04-26 08:18

[QUOTE=frmky;338341]Autoselection of B1, B2, and GPU memory related parameters (d2, e, nrp) should now work. It tries to use as much GPU memory as it thinks is safe, so let me know if you get memory allocation errors.

It still lacks proper error checking in many places, checkpointing, and the ability to interrupt stage 2 with a Ctrl-C. If anyone wants to dive into that code, feel free!

Also, I'm not set up to compile on Windows with both CUDA and GMP. If anyone here is, I'm sure it will be appreciated! :smile:[/QUOTE]

If the autoselection now works, I guess I can quit my work on timing different exponents with different e, d and nrp... :davieddy::bangheadonwall:

I will get the alpha from sourceforge...

Luigi

owftheevil 2013-04-26 11:07

@ ET The timings are interesting, but the work on how big nrp can get without a crash is still just as important.

A recent run on a 570:

Selected B1=605000, B2=16637500, 4.1% chance of finding a factor
CUDA reports 732M of 1279M GPU memory free.
Using e=6, d=2310, nrp=10
Starting stage 1 P-1, M61410829, B1 = 605000, B2 = 16637500, e = 6, fft length = 3360K
.
.
.
Stage 1 complete, estimated total time = 1:41:14
.
.
.
Stage 2 complete, estimated total time = 3:42:23
M61410829 Stage 2 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.00)


There were ~350Mb of free memory during stage 2.


All times are UTC. The time now is 23:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.