mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-13, 11:54   #265
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

1001110112 Posts
Default

No, too big an fft will cause errors too. I think it has to do with how far the carries get propagated.
owftheevil is offline   Reply With Quote
Old 2013-05-19, 16:04   #266
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Stage 1 save files are now implemented. It's not very polite in that it doesn't clean these up when its done. Some of you will want to keep these for extending b1 later. I'm starting work on stage 2 save files and will figure out the cleanup when that's ready.
owftheevil is offline   Reply With Quote
Old 2013-05-19, 16:30   #267
c10ck3r
 
c10ck3r's Avatar
 
Aug 2010
Kansas

547 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Stage 1 save files are now implemented. It's not very polite in that it doesn't clean these up when its done. Some of you will want to keep these for extending b1 later. I'm starting work on stage 2 save files and will figure out the cleanup when that's ready.
Do you have a Win-32-bit compiled version of this available?
c10ck3r is offline   Reply With Quote
Old 2013-05-19, 16:44   #268
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

13B16 Posts
Default

Not yet. frmky has been doing the windows builds. I don't know when he will have time to get to it.
owftheevil is offline   Reply With Quote
Old 2013-05-20, 00:39   #269
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

1001110112 Posts
Default

Just wanted to mention that without frmky's help none of this would be available until later this summer or maybe even fall.
owftheevil is offline   Reply With Quote
Old 2013-05-21, 06:58   #270
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

41048 Posts
Default

Windows binaries with latest changes, untested as usual.
Win32
https://www.dropbox.com/s/ecwuwbezul...2_20130520.zip
x64
https://www.dropbox.com/s/ik1g9eza96...4_20130520.zip
frmky is online now   Reply With Quote
Old 2013-05-21, 08:19   #271
Stef42
 
Feb 2012
the Netherlands

5810 Posts
Default

Thank you very much!
Stef42 is offline   Reply With Quote
Old 2013-05-25, 08:56   #272
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default

Latest and greatest 64 bit binary works here
Stopped and resumed a couple of times during stage 1, here are the end results on new whql forceware:
Code:
Accumulated product stage 1: M63137587, 0x1f2595c1236f31dc, n = 3456K, CUDAPm1 v0.10
Accumulated product stage 2: M63137587, 0x412ca727e7d21026, n = 3456K, CUDAPm1 v0.10
Karl M Johnson is offline   Reply With Quote
Old 2013-05-29, 19:43   #273
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

226668 Posts
Default

Having trouble with CUDAPm1. When I use "-b1 3100000" in the command line it works, but it stays a lot in that CPU routine that compute the product. A pari line line "n=3*10^6; lgn=log(n); z=prod(x=1,n,if(isprime(x),x^floor(lgn/log(x)),1)); ceil(log(z)/log(2))" returns in the same time, against all the logic and reason (pari should be much slower!).

But not this is the main problem. All values between 3200000 and 20M are parsed wrong, it says "B1 need to be at least 1" and does a test with B1=1 and B2=393xxx or so, which does find a factor, if one exists for these values.

I am not sure if smaller values starting with 1 are parsed wrong too or not (like -b1 150000)

When I use a value of -b1 over 20M, it is parsed right (but never returns from the CPU multiplication routine, not ever after half hour).

So, what are the restrictions for B1? Or, are there any restrictions and I am doing something completely silly? (I would like to run "CUDAPm1 160403 -b1 12000000 -b2 12000000" for example... Max value I can use is around 3M1, which is not enough, the former one is 10M. And totally ignoring the fact that he wants B2 to be 13 times higher then B1, which is totally nonsense for these numbers.)

Also, how can we "extend" a former B1?

I tried the test cases:

CUDAPm1 58610467 -b1 70843 -b2 694201

and

CUDAPm1 58610467 -b1 694201 -b2 694201

they both find the factor [edit, first one in stage 2, second one in stage 1, as it is normal] if started from scratch (delete the checkpoint file in between). But now assuming I have a run with the first, I want, when I run the second, that it should continue from where B1 left. This is not possible, as the former B1 is recorded in the file, and if I let the file there, it is totally ignoring my command line, it says "found limits in the file" and only runs stage 2. If I delete the file, obviously it starts from the scratch, duplicating the most of the work. This is not what was intended when we talked about "extending B1". OTOH, resuming stage1 works very nice, and I believe it is only about ignoring that former B1 stored in the file (I did not look into the sources however, and for the record, I use win7 64 bits binaries).

Question: why are you doing that whole product in the beginning? You can do exponentiation for every prime, this would make it easy to "extend" the B1 limit, and you would not need to stress the CPU "only" (the GPU is idle in this time, for minutes, depends how big B1 is).

Code:
>CUDAPm1 630893 -b1 3100000

mkdir: cannot create directory `savefiles': File exists
CUDA reports 1306M of 1535M GPU memory free.
Using e=6, d=2310, nrp=480
Using approximately 155M GPU memory.
B2 should be at least 390390, increasing it.
B2 should be at least 40300000, increasing it.
<<<<  here it stays about 2 minutes, GPU is iddle, CPU hard computing the product, then everything continues normally.
Starting stage 1 P-1, M630893, B1 = 3100000, B2 = 40300000, e = 6, fft length = 40K
Doing 4471985 iterations
Iteration 10000 M630893, 0x280b630169a8b5f7, n = 40K, CUDAPm1 v0.10 err = 0.00049 (0:17 real, 1.6675 ms/iter, ETA 2:04:00)
Iteration 20000 M630893, 0xfb3b1f4975308539, n = 40K, CUDAPm1 v0.10 err = 0.00046 (0:01 real, 0.1044 ms/iter, ETA 7:44)
Iteration 30000 M630893, 0xc90545f20507538b, n = 40K, CUDAPm1 v0.10 err = 0.00046 (0:01 real, 0.1039 ms/iter, ETA 7:41)
Iteration 40000 M630893, 0x3ff1f732d6ebab86, n = 40K, CUDAPm1 v0.10 err = 0.00046 (0:01 real, 0.1041 ms/iter, ETA 7:41)

Last fiddled with by LaurV on 2013-05-29 at 20:00
LaurV is online now   Reply With Quote
Old 2013-05-29, 21:58   #274
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

LaurV, thanks for your input. I'll have time for a more complete response in about an hour, but for now I'll just say that most of what you are talking about hasn't been implemented yet, or hasn't been cleaned up yet. I was unaware of any problems parsing b1, I'll take a look as soon as I have time.
owftheevil is offline   Reply With Quote
Old 2013-05-29, 22:50   #275
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

13B16 Posts
Default

Quote:
Having trouble with CUDAPm1. When I use "-b1 3100000" in the command line it works, but it stays a lot in that CPU routine that compute the product. A pari line line "n=3*10^6; lgn=log(n); z=prod(x=1,n,if(isprime(x),x^floor(lgn/log(x)),1)); ceil(log(z)/log(2))" returns in the same time, against all the logic and reason (pari should be much slower!).
My lack of imagination strikes again. As in who the heck would want to spend that much time doing p-1? Well I know the answer to that question now. Currently, the computation of the products of powers of primes is rather inefficient. And now that I realize some people will want to use huge b1's, I should probably split large b1's into two parts, a reasonable length large exponent and then piecewise smaller exponents to fill in the gap.

Quote:
But not this is the main problem. All values between 3200000 and 20M are parsed wrong, it says "B1 need to be at least 1" and does a test with B1=1 and B2=393xxx or so, which does find a factor, if one exists for these values.

I am not sure if smaller values starting with 1 are parsed wrong too or not (like -b1 150000)

When I use a value of -b1 over 20M, it is parsed right (but never returns from the CPU multiplication routine, not ever after half hour).
Like I said earlier, I was not aware of this problem. I'll look into it.

Quote:
So, what are the restrictions for B1? Or, are there any restrictions and I am doing something completely silly? (I would like to run "CUDAPm1 160403 -b1 12000000 -b2 12000000" for example... Max value I can use is around 3M1, which is not enough, the former one is 10M. And totally ignoring the fact that he wants B2 to be 13 times higher then B1, which is totally nonsense for these numbers.)
Currently there are a few silly restrictions caused by my lack of boundary case considerations in the initialization of stage 2. These are first on the list to be removed after stage 2 save files are working. Exactly what the restrictions are depend on many factors, so it hard to say exactly how big b1 must be. If e is the B-S exponent, d is the primorial being used, and p is the smallest prime which does not divide d, then b2 / p <= b1 and b2 / p / d >= 2 * e + 1 are the primary restrictions.

Quote:
Also, how can we "extend" a former B1?
You can't yet. Its on the list of things to do. The code for splitting large b1's up will automatically provided most of this.

Quote:
Question: why are you doing that whole product in the beginning? You can do exponentiation for every prime, this would make it easy to "extend" the B1 limit, and you would not need to stress the CPU "only" (the GPU is idle in this time, for minutes, depends how big B1 is).
Speed. 0's in the binary representation of the exponent require a squaring, 1's require an additional multiplication by the base. If the base is 3, this can be done with a modified normalization kernel with negligible increase in time, but with a huge integer base, it requires an additional fft multiplication.
owftheevil is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 08:19.


Mon Aug 2 08:19:41 UTC 2021 up 10 days, 2:48, 0 users, load averages: 2.24, 2.13, 1.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.