![]() |
|
|
#364 |
|
Dec 2007
Cleves, Germany
2·5·53 Posts |
|
|
|
|
|
|
#365 |
|
Mar 2010
3×137 Posts |
I like the new style.
|
|
|
|
|
|
#366 |
|
Aug 2009
Ontario, Canada
131 Posts |
Oliver - I took a look at your sieving code and there might be a small change that will help improve performance.
I notice the following: within your sieve_init_class you are calculating ii and jj within the loop. Code:
ii=(2ULL * (exp%p) * (k_start%p))%p;
jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
Just an observation - do with it what you will. Grant. |
|
|
|
|
|
#367 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi Grant,
Quote:
Code:
p=primes[i]; ) no so easy to understand.Oliver |
|
|
|
|
|
|
#368 |
|
Aug 2009
Ontario, Canada
2038 Posts |
|
|
|
|
|
|
#369 |
|
Oct 2007
Manchester, UK
23·59 Posts |
There seem to be an awful lot of modulo operations in here:
Code:
ii=(2ULL * (exp%p) * (k_start%p))%p;
jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
Do you really need to take those values modulo p in every step? Surely doing it just once at the end would eat fewer cycles. Assuming of course that 2ULL * exp * (k_start|NUM_CLASSES) will fit in whatever data type you're using to hold it. Code:
t=2ULL * exp;
ii=(t * k_start)%p;
jj=(t * NUM_CLASSES)%p;
Edit: If only p changes in the loop, then you could actually do this instead: Code:
ii=2ULL * exp * k_start;
jj=2ULL * exp * NUM_CLASSES;
while(whocares){
ii_loop=ii%p;
jj_loop=jj%p;
}
Last fiddled with by lavalamp on 2010-09-03 at 20:04 |
|
|
|
|
|
#370 | |
|
"Oliver"
Mar 2005
Germany
45716 Posts |
Hi,
Quote:
I could save the modulo on NUM_CLASSES... I won't spent much time optimizing on this function, if you make it 10 times faster you won't recognize it on a regular run. Oliver P.S. If you can tell me how to speedup sieve_candidates() I would be very happy! |
|
|
|
|
|
|
#371 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
![]() Did you already check how I implemented the sieve on Factor5? Luigi |
|
|
|
|
|
|
#372 |
|
Oct 2007
Manchester, UK
25158 Posts |
|
|
|
|
|
|
#373 | |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi Luigi,
Quote:
You split the factor candidates into classes (mod 60 which is 2 * 2 * 3 * 5), I do the same except that I'm using classes mod 420 or 4620 (2 * 2 * 3 * 5 * 7 (* 11)). This removes multiples of 2, 3, 5 (, 7(, 11)) and candidates which are not +/- 1 mod 8 totally out of the sieve. You're using segments for sieving, your segment size is 272272 (16 * 7 * 11 * 13 * 17), I'm doing the same, my segments are N * 11 * 13 * 17 * 19 (for mod 420 classes) or N * 13 * 17 * 19 * 23 (for mod 4620 classes) where N is choosen in that way that the segment size fits into to L1-cache of the CPU (SIEVE_SIZE_LIMIT in params.h). This allows presieved segments (presieve up to 17 in your implementation and 19 or 23 in my code). Differences: you need a whole 32bit integers per factor candidate while I use a single bit per candidate in the sieve. Unless I've missed something our both sieve implementations are basically the same. Oliver |
|
|
|
|
|
|
#374 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
Quote:
![]() My question was about "sieving" itself. I didn't have the time to look in deep at your code. The whole process of sieving after the initialization consists in clearing bits, and testing those that were not cleared. Am I correct? Luigi |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |