mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

bcp19 2013-04-18 03:07

[QUOTE=axn;337412]8h/53min ~= 8x. But later found that only half the things were being processed, so it is more like 4x?[/QUOTE]
Actually, I'm pretty sure it was stage 2 vs stage 2, so 8x would still apply.

This matters not though, since you need a comparison to a fairly fixed given. If the GPU is not using a CPU core, then what do you compare it to? If I have a GTX 480 in an I5 2500 I get a speed up, but if I run that same 480 in a Core2Quad I have a more significant speed up when compared to the system it is in.

I'd say you'd need to do like CPU based P-1, compare the time it takes to run the GPU P-1 VS the time it takes to run that same exponent on CUDALucas (or Mfaktc/o?). Otherwise the 'speed increase' is technically unknown.

axn 2013-04-18 03:46

[QUOTE=NBtarheel_33;337164]Factoring to 7x bits (assuming an increase of one bit level) gives you (roughly) a 1/7x = 1.27-1.43% chance of finding a factor.

P-1 with decent bounds will typically give you a 5-8% chance of finding a factor.

So, given 125 TF attempts, we'd expect roughly 1.6-1.8 factors found. On the other hand, 25 TF attempts should yield roughly 1.25-2.0 factors found.

If GPU P-1 allows us to increase bounds or make more frequent use of the Brent-Suyama extension, the expected number of successes will be at or above the higher end of this range. In that case, it would make complete sense to trade 125x TF for 25x P-1.

Note also that GPU P-1 will make use of the *GPU* RAM, rather than the system RAM. This could bring in P-1'ers who were previously unable to dedicate large quantities of RAM to Stage 2.[/QUOTE]

This looks like a good jumping point into the discussion. For those of you who're thinking of switching from GPU TF to GPU P-1 and worried whether it'll impact the number of expos cleared, I propose the following model.
Compare the efficiency (expos cleared/unit time) of doing the _last bit_ of TF to the efficiency of doing P-1. Simple. Assuming that the last bit work is for 73->74, how many "last bit" TFs can be done in a day on a particular GPU, and how many P-1 can be done in the same time? Then calculate the expected number of factors. Whichever is higher wins. If they're approximately the same (within 20% of each other), picking either one should be fine.

frmky 2013-04-20 22:31

[QUOTE=garo;337309]P-1 with 2GB memory in the 61M range gives a probability of success of 3.3-3.6% depending on the TF level. Dunno where you got 5-8%.[/QUOTE]

What typical values of B1, B2, and e does Prime95 choose at 61M with 2GB memory and TF to 73 bits?

garo 2013-04-21 15:41

[QUOTE=frmky;337743]What typical values of B1, B2, and e does Prime95 choose at 61M with 2GB memory and TF to 73 bits?[/QUOTE]

No factor to 74 bits:
M61482791 completed P-1, B1=545000, B2=10355000

To 73 bits:
M59518889 completed P-1, B1=555000, B2=10961250

e=0 in both cases.

axn 2013-04-21 16:31

[QUOTE=garo;337793]e=0 in both cases.[/QUOTE]

Which is secretly e=2 :smile:

garo 2013-04-23 05:40

[QUOTE=axn;337795]Which is secretly e=2 :smile:[/QUOTE]

Ah didn't know that. Thanks.

frmky 2013-04-25 06:38

With owftheevil's permission, I have posted a [I]very[/I] early version at Sourceforge, [URL="https://sourceforge.net/projects/cudapm1/?source=directory"]https://sourceforge.net/projects/cudapm1/?source=directory[/URL]

It does read Pfactor lines from worktodo.txt and output to results.txt, and George has indicated that he will add support for the results output soon. The core routines have survived testing on 30+ known factors over the past few days. Autoselection of FFT sizes may need tweaking. It currently does not intelligently select B1 and B2 sizes; for now parameters should be specified manually (it defaults to B1=600k, B2=12M, e=6 which is reasonable for current ~61M exponents). Error checking should be added in many places. It does not support checkpointing. In summary, it is still very alpha.

frmky 2013-04-25 07:46

The default parameters will require ~900 MB of GPU memory. If you do not have that available, try using -nrp2 10 or -nrp2 4. You can also save a little memory by using -e2 4 or -e2 2. For really low memory cards, use -d2 30 -e2 2 -nrp2 2. Autoselection of these parameters based on available GPU memory is on the TODO.

firejuggler 2013-04-25 07:57

My 560 has 1024. Might be a bit tight.

frmky 2013-04-25 08:09

[QUOTE=firejuggler;338239]My 560 has 1024. Might be a bit tight.[/QUOTE]

Try a run with a small B1, say 3000, just to see if stage 2 starts successfully without waiting a long time for B1 to finish.

ET_ 2013-04-25 09:44

[QUOTE=frmky;338242]Try a run with a small B1, say 3000, just to see if stage 2 starts successfully without waiting a long time for B1 to finish.[/QUOTE]

Hi Greg, I am doing some time testing on the e, d and nrp values for different B2 values.
It will take some time to check all the combinations for 5 distinct test-cases, but I think it may be useful to automate the choice of these parameters.

I hope I am not stepping over others' feet.

Luigi

P.S. Thanks again to Carl that started the project... :smile:


All times are UTC. The time now is 23:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.