20130530, 00:01  #276 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
3^{2}·5·7 Posts 
Past the edit time limit. I realized I left out a factor of 53 in the bit about b1 and b2 restrictions.
b2 / d / 53 <= b1, otherwise some small primes > b1 do not get included. 
20130530, 11:00  #277  
Romulan Interpreter
Jun 2011
Thailand
2^{4}·13·47 Posts 
Quote:
Example: 5*7*11=385, or in binary 101*111*1011=110000001, in the "one by one" you do 2+2+3=7 squaring and 1+2+2=5 shift/add, in the "mul first" case you do 8 squaring and 2 shift/add, certainly advantage for "one by one" for a small base. 7*61=427 (I deliberately select primes with many ones!) = 111*111101=110101011, (7+6) against (8+5). If there are many factors, you must do their sum(size[i]1) squarings, but when you multiply first you do size(sum1), i.e. you waste a squaring for each aditional factor. 75 factors, each having 20 bits, you do 75*19=1425 squarings if you take them one by one (because you ignore the first bit being 1 every time in exponentiation), but if you do the product first, you have also 75*(201)=1425 squarings. Statistically, in 1425 bits you have to do 713 multiplications (they can be 1 or 0), in any case. Even if not, computing them one by one has a lot of other advantages (easy to extend the B1, lower waiting state, etc). Also, for the "one huge prime variation", (the stage 2 part), you always can get a compromise which is "almost" same fast, but used much less memory (this does not work if you want to have the BrS extension, but if you don't want it, you can do stage 2 almost as same fast, but using less memory, which will allow much higher B2, and will also put the "cheap cards" in the game, right now only people with a lot of mem on the card can do "good" P1 with the GPU). You can have a look to a pari implementation attached below, I think I posted it in the past, but I don't know where. Rename to delete the ".txt", load it in pari with \r, and use "mpm1(257,,10^6,10^7)" as a sample command. The stage 2 of that is VERY fast (well, for pari!) and it does not consume memory (of course, no BrS extension, just classical). I am also working to implement something like this with cuda, but I stopped due to the lack of math knowledge related to fft and due to poor skill with cuda, which I am still learning. The last cuda version works with my gtx580 with a kind of karatsuba multiplication (divide and conquer, integer only) and it is quite slow. I can give parameters for the number of iterations after which a GCD() is taken, which (depending of the expo) can speed up the things. For example if B2 is very big, better take a couple of GCD's, one might get lucky and find the factor faster, etc. You can consider these things (and many other) for the future versions of cudaPm1. Last fiddled with by LaurV on 20130530 at 11:26 

20130530, 12:17  #278  
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
3^{2}×5×7 Posts 
Quote:
Quote:
Again, thanks for your input. Carl 

20130530, 12:38  #279 
Romulan Interpreter
Jun 2011
Thailand
2^{4}×13×47 Posts 
On the bright side, look what I have found in the logs when I just arrived home now:
Code:
M631247 has a factor: 83947913480780207864691737 (P1, B1=3100000, B2=3100000, e=6, n=40K CUDAPm1 v0.10) BTW, for all of them, the B2 was automatically extended to 40300000, which is what the log shows (the screen redirected in a text file), the test was done, in a single "gulp" (1 to 480 primes), but the results.txt only shows 3M1 for both limits (?!), I don't know if the test was really done to which B2 limit. For the factor found, the "k" is smoother than B1 so this proves nothing, there was no stage 2. On the dark side, PrimeNet says: Code:
No factor lines found: 0 Mfaktc no factor lines found: 0 Mfakto no factor lines found: 0 CUDAPm1factor lines found: 1 Insufficient information for accurate CPU credit. For stats purposes, assuming factor was found using ECM with B1 = 50000. CPU credit is 0.0170 GHzdays. CUDAPm1nofactor lines found: 38 CPU credit is 0.1130 GHzdays. CPU credit is 0.1130 GHzdays. CPU credit is 0.1130 GHzdays. <etc> Last fiddled with by LaurV on 20130530 at 12:41 
20130530, 13:04  #280  
"James Heinrich"
May 2004
exNorthern Ontario
3502_{10} Posts 
Quote:
I've posted this before somewhere a while back, but this is the abbreviated logic for how PrimeNet currently guesses as to the factor method: Code:
add_user_msg( $resp, "Insufficient information for accurate CPU credit."); if ( $k == 1 && $b == 2 && $n % 4096 == 0 && $c == 1 ) { add_user_msg( $resp, "For stats purposes, assuming factor was found using ECM with B1 = 44M."); } else if ( $k != 1  $b != 2  $c != 1  $n < 16000000) { if ($bits > 55) { add_user_msg( $resp, "For stats purposes, assuming factor was found using ECM with B1 = 50000."); } else { add_user_msg( $resp, "For stats purposes, assuming factor was found by trial factoring using prime95."); } } else { if ($bits > $t_results[tflimit]) { add_user_msg( $resp, "For stats purposes, assuming factor was found using P1 with B1 = 800000."); } else { add_user_msg( $resp, "For stats purposes, assuming factor was found by trial factoring using prime95."); } } 

20130530, 13:26  #281 
Romulan Interpreter
Jun 2011
Thailand
2^{4}×13×47 Posts 
Ignore that! I was stupid. The results are stored properly in the results file, with the right extended B2, and the test is done to extended B2, too (just found a new factor). After a shower and a some dinner I can read properly the results.txt

20130530, 13:31  #282 
Romulan Interpreter
Jun 2011
Thailand
2^{4}×13×47 Posts 
Looking into my log file, I see that the "factor" lines were searched first, and the "no factor" were searched only after (see the former post, log included). Maybe that is the problem? Adding "no factor lines in front would be no use in this case (like in mfaktx would have no preliminary "no factor" line, misfit is changing a line order, problem solved). Just an idea...

20130530, 13:40  #283 
"James Heinrich"
May 2004
exNorthern Ontario
3502_{10} Posts 
The whole manual results parsing code needs to be reworked. Fortunately I already have nicelyworking code on mersenne.ca that I can pretty much just copy over. I just need a little while to sit down and work through it.

20130530, 13:45  #284 
Banned
"Luigi"
Aug 2002
Team Italia
3×1,609 Posts 

20130530, 14:21  #285  
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
3^{2}×5×7 Posts 
From LaurV:
Quote:


20130531, 18:50  #286 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
3^{2}×5×7 Posts 
LaurV,
I took a look at your pari implementation of p1. What you are doing is essentially the case e = 1 and d = 6. CUDAPm1 currently can deal with d = 6, but to get e =1, I would have to include clauses in some of the stage 2 initialization functions, and provide for incrementing the main stage 2 loop counter by 1 instead of 2. Not hard to do. Any coder that doesn't suck could do it in an hour. I might be able to do it in a day. It would save 16 * (fft length) bytes of device memory, but take a performance hit directly related to the number of prime pairs 6*k  5, 6*k +5 and 6*k 1, 6*k + 1 between b2 / 5 and b2 compared to using e = 2 and d = 6. By the way, very clean and easy to read code. I wish I could do that. Last fiddled with by owftheevil on 20130531 at 19:19 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfaktc: a CUDA program for Mersenne prefactoring  TheJudger  GPU Computing  3506  20210918 00:04 
World's seconddumbest CUDA program  fivemack  Programming  112  20150212 22:51 
World's dumbest CUDA program?  xilman  Programming  1  20091116 10:26 
Factoring program need help  Citrix  Lone Mersenne Hunters  8  20050916 02:31 
Factoring program  ET_  Programming  3  20031125 02:57 