mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-09-02, 23:45   #364
ckdo
 
ckdo's Avatar
 
Dec 2007
Cleves, Germany

2·5·53 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Which variant do you prefer?
The latter. Windows console windows are 80 chars wide by default, and the current style lines are slightly longer (at least for me).
ckdo is offline   Reply With Quote
Old 2010-09-03, 09:01   #365
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

I like the new style.
Karl M Johnson is offline   Reply With Quote
Old 2010-09-03, 11:32   #366
gjmccrac
 
gjmccrac's Avatar
 
Aug 2009
Ontario, Canada

131 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Find attached mfaktc 0.11.
Oliver - I took a look at your sieving code and there might be a small change that will help improve performance.

I notice the following:

within your sieve_init_class you are calculating ii and jj within the loop.
Code:
    ii=(2ULL * (exp%p) * (k_start%p))%p;
    jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
It does not appear that these values will change once they have been calculated. If you calculate them just before the loop you should save time with the repeated calculations - of which some of them are mods.

Just an observation - do with it what you will.

Grant.
gjmccrac is offline   Reply With Quote
Old 2010-09-03, 11:56   #367
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi Grant,

Quote:
Originally Posted by gjmccrac View Post
Oliver - I took a look at your sieving code and there might be a small change that will help improve performance.

I notice the following:

within your sieve_init_class you are calculating ii and jj within the loop.
Code:
    ii=(2ULL * (exp%p) * (k_start%p))%p;
    jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
It does not appear that these values will change once they have been calculated. If you calculate them just before the loop you should save time with the repeated calculations - of which some of them are mods.

Just an observation - do with it what you will.

Grant.
The p changes on every iteration of the loop (first line within the loop):
Code:
p=primes[i];
I didn't spent much time on optimizing sieve_init_class(), it is only called once per class and doesn't take much time. The performance-critical function is sieve_candidates() which might (due to lack of documentation ) no so easy to understand.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-09-03, 12:37   #368
gjmccrac
 
gjmccrac's Avatar
 
Aug 2009
Ontario, Canada

2038 Posts
Default

Quote:
Originally Posted by TheJudger View Post
The p changes on every iteration of the loop (first line within the loop):
Code:
p=primes[i];
I missed that
gjmccrac is offline   Reply With Quote
Old 2010-09-03, 19:59   #369
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23·59 Posts
Default

There seem to be an awful lot of modulo operations in here:
Code:
    ii=(2ULL * (exp%p) * (k_start%p))%p;
    jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
Would it be quicker to store 2ULL * (exp%p) in another variable?

Do you really need to take those values modulo p in every step? Surely doing it just once at the end would eat fewer cycles. Assuming of course that 2ULL * exp * (k_start|NUM_CLASSES) will fit in whatever data type you're using to hold it.
Code:
    t=2ULL * exp;
    ii=(t * k_start)%p;
    jj=(t * NUM_CLASSES)%p;
Or does the compiler take care of stuff like this?

Edit: If only p changes in the loop, then you could actually do this instead:
Code:
    ii=2ULL * exp * k_start;
    jj=2ULL * exp * NUM_CLASSES;

    while(whocares){
      ii_loop=ii%p;
      jj_loop=jj%p;
    }

Last fiddled with by lavalamp on 2010-09-03 at 20:04
lavalamp is offline   Reply With Quote
Old 2010-09-03, 20:41   #370
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Hi,

Quote:
Originally Posted by lavalamp View Post
Do you really need to take those values modulo p in every step? Surely doing it just once at the end would eat fewer cycles. Assuming of course that 2ULL * exp * (k_start|NUM_CLASSES) will fit in whatever data type you're using to hold it.
thats the point, it doesn't fit. exp goes up to the biggest prime below 2^32 and k_start is allready a 64bit integer which goes up to ~2^63.
I could save the modulo on NUM_CLASSES...

I won't spent much time optimizing on this function, if you make it 10 times faster you won't recognize it on a regular run.

Oliver

P.S. If you can tell me how to speedup sieve_candidates() I would be very happy!
TheJudger is offline   Reply With Quote
Old 2010-09-03, 21:11   #371
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

61·79 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,



thats the point, it doesn't fit. exp goes up to the biggest prime below 2^32 and k_start is allready a 64bit integer which goes up to ~2^63.
I could save the modulo on NUM_CLASSES...

I won't spent much time optimizing on this function, if you make it 10 times faster you won't recognize it on a regular run.

Oliver

P.S. If you can tell me how to speedup sieve_candidates() I would be very happy!
I have an idea...

Did you already check how I implemented the sieve on Factor5?

Luigi
ET_ is offline   Reply With Quote
Old 2010-09-03, 21:31   #372
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

25158 Posts
Default

Quote:
Originally Posted by TheJudger View Post
P.S. If you can tell me how to speedup sieve_candidates() I would be very happy!
I've downloaded the code, but not yet dug into it.

Even when I do, don't hold your breath waiting for any improvements I might bring to it.
lavalamp is offline   Reply With Quote
Old 2010-09-06, 20:22   #373
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi Luigi,

Quote:
Originally Posted by ET_ View Post
I have an idea...

Did you already check how I implemented the sieve on Factor5?

Luigi
please give me a hint, I've taken a look on your sieve code.

You split the factor candidates into classes (mod 60 which is 2 * 2 * 3 * 5), I do the same except that I'm using classes mod 420 or 4620 (2 * 2 * 3 * 5 * 7 (* 11)). This removes multiples of 2, 3, 5 (, 7(, 11)) and candidates which are not +/- 1 mod 8 totally out of the sieve.

You're using segments for sieving, your segment size is 272272 (16 * 7 * 11 * 13 * 17), I'm doing the same, my segments are N * 11 * 13 * 17 * 19 (for mod 420 classes) or N * 13 * 17 * 19 * 23 (for mod 4620 classes) where N is choosen in that way that the segment size fits into to L1-cache of the CPU (SIEVE_SIZE_LIMIT in params.h).
This allows presieved segments (presieve up to 17 in your implementation and 19 or 23 in my code).

Differences: you need a whole 32bit integers per factor candidate while I use a single bit per candidate in the sieve.

Unless I've missed something
our both sieve implementations are basically the same.


Oliver
TheJudger is offline   Reply With Quote
Old 2010-09-07, 10:37   #374
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

61×79 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi Luigi,


Unless I've missed something
our both sieve implementations are basically the same.


Oliver
No way, yours is more memory-effective!

My question was about "sieving" itself. I didn't have the time to look in deep at your code. The whole process of sieving after the initialization consists in clearing bits, and testing those that were not cleared. Am I correct?

Luigi
ET_ is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 06:00.


Fri Aug 6 06:00:54 UTC 2021 up 14 days, 29 mins, 1 user, load averages: 3.13, 3.16, 3.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.