mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

ckdo 2010-09-02 23:45

[quote=TheJudger;228207]Which variant do you prefer?[/quote]
The latter. Windows console windows are 80 chars wide by default, and the current style lines are slightly longer (at least for me).

Karl M Johnson 2010-09-03 09:01

I like the new style.

gjmccrac 2010-09-03 11:32

[QUOTE=TheJudger;228094]
Find attached mfaktc 0.11. :smile:
[/QUOTE]

Oliver - I took a look at your sieving code and there might be a small change that will help improve performance.

I notice the following:

within your sieve_init_class you are calculating ii and jj within the loop.
[CODE] ii=(2ULL * (exp%p) * (k_start%p))%p;
jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
[/CODE]
It does not appear that these values will change once they have been calculated. If you calculate them just before the loop you should save time with the repeated calculations - of which some of them are mods.

Just an observation - do with it what you will.

Grant.

TheJudger 2010-09-03 11:56

Hi Grant,

[QUOTE=gjmccrac;228263]Oliver - I took a look at your sieving code and there might be a small change that will help improve performance.

I notice the following:

within your sieve_init_class you are calculating ii and jj within the loop.
[CODE] ii=(2ULL * (exp%p) * (k_start%p))%p;
jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;
[/CODE]
It does not appear that these values will change once they have been calculated. If you calculate them just before the loop you should save time with the repeated calculations - of which some of them are mods.

Just an observation - do with it what you will.

Grant.[/QUOTE]

The p changes on every iteration of the loop (first line within the loop):
[CODE]p=primes[i];[/CODE]

I didn't spent much time on optimizing sieve_init_class(), it is only called once per class and doesn't take much time. The performance-critical function is sieve_candidates() which might (due to lack of documentation :rolleyes:) no so easy to understand.

Oliver

gjmccrac 2010-09-03 12:37

[QUOTE=TheJudger;228265]
The p changes on every iteration of the loop (first line within the loop):
[CODE]p=primes[i];[/CODE]
[/QUOTE]

I missed that :blush:

lavalamp 2010-09-03 19:59

There seem to be an awful lot of modulo operations in here:[code] ii=(2ULL * (exp%p) * (k_start%p))%p;
jj=(2ULL * (exp%p) * (NUM_CLASSES%p))%p;[/code]Would it be quicker to store 2ULL * (exp%p) in another variable?

Do you really need to take those values modulo p in every step? Surely doing it just once at the end would eat fewer cycles. Assuming of course that 2ULL * exp * (k_start|NUM_CLASSES) will fit in whatever data type you're using to hold it.[code] t=2ULL * exp;
ii=(t * k_start)%p;
jj=(t * NUM_CLASSES)%p;[/code]Or does the compiler take care of stuff like this?

Edit: If only p changes in the loop, then you could actually do this instead:[code] ii=2ULL * exp * k_start;
jj=2ULL * exp * NUM_CLASSES;

while(whocares){
ii_loop=ii%p;
jj_loop=jj%p;
}[/code]

TheJudger 2010-09-03 20:41

Hi,

[QUOTE=lavalamp;228330]Do you really need to take those values modulo p in every step? Surely doing it just once at the end would eat fewer cycles. Assuming of course that 2ULL * exp * (k_start|NUM_CLASSES) will fit in whatever data type you're using to hold it.[/QUOTE]

thats the point, it doesn't fit. exp goes up to the biggest prime below 2^32 and k_start is allready a 64bit integer which goes up to ~2^63.
I could save the modulo on NUM_CLASSES...

I won't spent much time optimizing on this function, if you make it 10 times faster you won't recognize it on a regular run.

Oliver

P.S. If you can tell me how to speedup sieve_candidates() I would be very happy!

ET_ 2010-09-03 21:11

[QUOTE=TheJudger;228342]Hi,



thats the point, it doesn't fit. exp goes up to the biggest prime below 2^32 and k_start is allready a 64bit integer which goes up to ~2^63.
I could save the modulo on NUM_CLASSES...

I won't spent much time optimizing on this function, if you make it 10 times faster you won't recognize it on a regular run.

Oliver

P.S. If you can tell me how to speedup sieve_candidates() I would be very happy![/QUOTE]

I have an idea... :smile:

Did you already check how I implemented the sieve on Factor5?

Luigi

lavalamp 2010-09-03 21:31

[QUOTE=TheJudger;228342]P.S. If you can tell me how to speedup sieve_candidates() I would be very happy![/QUOTE]I've downloaded the code, but not yet dug into it.

Even when I do, don't hold your breath waiting for any improvements I might bring to it.

TheJudger 2010-09-06 20:22

Hi Luigi,

[QUOTE=ET_;228349]I have an idea... :smile:

Did you already check how I implemented the sieve on Factor5?

Luigi[/QUOTE]

please give me a hint, I've taken a look on your sieve code.

You split the factor candidates into classes (mod 60 which is 2 * 2 * 3 * 5), I do the same except that I'm using classes mod 420 or 4620 (2 * 2 * 3 * 5 * 7 (* 11)). This removes multiples of 2, 3, 5 (, 7(, 11)) and candidates which are not +/- 1 mod 8 totally out of the sieve.

You're using segments for sieving, your segment size is 272272 (16 * 7 * 11 * 13 * 17), I'm doing the same, my segments are N * 11 * 13 * 17 * 19 (for mod 420 classes) or N * 13 * 17 * 19 * 23 (for mod 4620 classes) where N is choosen in that way that the segment size fits into to L1-cache of the CPU (SIEVE_SIZE_LIMIT in params.h).
This allows presieved segments (presieve up to 17 in your implementation and 19 or 23 in my code).

Differences: you need a whole 32bit integers per factor candidate while I use a single bit per candidate in the sieve.
[B]
Unless I've missed something[/B] our both sieve implementations are basically the same.


Oliver

ET_ 2010-09-07 10:37

[QUOTE=TheJudger;228711]Hi Luigi,

[B]
Unless I've missed something[/B] our both sieve implementations are basically the same.


Oliver[/QUOTE]

No way, yours is more memory-effective! :smile:

My question was about "sieving" itself. I didn't have the time to look in deep at your code. The whole process of sieving after the initialization consists in clearing bits, and testing those that were not cleared. Am I correct?

Luigi


All times are UTC. The time now is 22:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.