![]() |
|
|
#111 | |
|
Oct 2011
7×97 Posts |
Quote:
This matters not though, since you need a comparison to a fairly fixed given. If the GPU is not using a CPU core, then what do you compare it to? If I have a GTX 480 in an I5 2500 I get a speed up, but if I run that same 480 in a Core2Quad I have a more significant speed up when compared to the system it is in. I'd say you'd need to do like CPU based P-1, compare the time it takes to run the GPU P-1 VS the time it takes to run that same exponent on CUDALucas (or Mfaktc/o?). Otherwise the 'speed increase' is technically unknown. |
|
|
|
|
|
|
#112 | |
|
Jun 2003
5,087 Posts |
Quote:
Compare the efficiency (expos cleared/unit time) of doing the _last bit_ of TF to the efficiency of doing P-1. Simple. Assuming that the last bit work is for 73->74, how many "last bit" TFs can be done in a day on a particular GPU, and how many P-1 can be done in the same time? Then calculate the expected number of factors. Whichever is higher wins. If they're approximately the same (within 20% of each other), picking either one should be fine. |
|
|
|
|
|
|
#113 |
|
Jul 2003
So Cal
24·7·19 Posts |
|
|
|
|
|
|
#114 | |
|
Aug 2002
Termonfeckin, IE
22×691 Posts |
Quote:
M61482791 completed P-1, B1=545000, B2=10355000 To 73 bits: M59518889 completed P-1, B1=555000, B2=10961250 e=0 in both cases. Last fiddled with by garo on 2013-04-21 at 15:41 |
|
|
|
|
|
|
#115 |
|
Jun 2003
5,087 Posts |
|
|
|
|
|
|
#116 |
|
Aug 2002
Termonfeckin, IE
ACC16 Posts |
|
|
|
|
|
|
#117 |
|
Jul 2003
So Cal
212810 Posts |
With owftheevil's permission, I have posted a very early version at Sourceforge, https://sourceforge.net/projects/cud...urce=directory
It does read Pfactor lines from worktodo.txt and output to results.txt, and George has indicated that he will add support for the results output soon. The core routines have survived testing on 30+ known factors over the past few days. Autoselection of FFT sizes may need tweaking. It currently does not intelligently select B1 and B2 sizes; for now parameters should be specified manually (it defaults to B1=600k, B2=12M, e=6 which is reasonable for current ~61M exponents). Error checking should be added in many places. It does not support checkpointing. In summary, it is still very alpha. Last fiddled with by frmky on 2013-04-25 at 06:55 |
|
|
|
|
|
#118 |
|
Jul 2003
So Cal
24×7×19 Posts |
The default parameters will require ~900 MB of GPU memory. If you do not have that available, try using -nrp2 10 or -nrp2 4. You can also save a little memory by using -e2 4 or -e2 2. For really low memory cards, use -d2 30 -e2 2 -nrp2 2. Autoselection of these parameters based on available GPU memory is on the TODO.
Last fiddled with by frmky on 2013-04-25 at 07:47 |
|
|
|
|
|
#119 |
|
Apr 2010
Over the rainbow
2·1,303 Posts |
My 560 has 1024. Might be a bit tight.
|
|
|
|
|
|
#120 |
|
Jul 2003
So Cal
1000010100002 Posts |
|
|
|
|
|
|
#121 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
Quote:
It will take some time to check all the combinations for 5 distinct test-cases, but I think it may be useful to automate the choice of these parameters. I hope I am not stepping over others' feet. Luigi P.S. Thanks again to Carl that started the project...
Last fiddled with by ET_ on 2013-04-25 at 09:45 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3498 | 2021-08-06 21:07 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |
| World's dumbest CUDA program? | xilman | Programming | 1 | 2009-11-16 10:26 |
| Factoring program need help | Citrix | Lone Mersenne Hunters | 8 | 2005-09-16 02:31 |
| Factoring program | ET_ | Programming | 3 | 2003-11-25 02:57 |