mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

ET_ 2013-03-02 13:42

[QUOTE=firejuggler;331659]1H and 34 min? that's 5 to 7 time faster than a ordinary CPU. nice.[/QUOTE]

Please notice the B1=580,000... :smile:

Is the B1 limit actually a fixed one?

Luigi

firejuggler 2013-03-02 13:54

1 Attachment(s)
yup,
1H34 is about 5700 second.
Below a run of a similar sized expo , total run for phase 1 is 24400 seconds. Ok.. so speed up is only about 4.2 time

ET_ 2013-03-02 14:13

[QUOTE=firejuggler;331664]yup,
1H34 is about 5700 second.
Below a run of a similar sized expo , total run for phase 1 is 24400 seconds. Ok.. so speed up is only about 4.2 time[/QUOTE]

Please note that not everyone here has an high-clocked AVX2 processor available... Not to mention that the code here is still a proof of concept with just few optimizations.

Luigi

firejuggler 2013-03-02 14:45

Please note that i'm genuinely impressed by the "proof-of-concept" speed up.

As for my CPU, it is a stock speed i5 2500k, which is pretty 'ordinary' for today computer-enthusiast ( the run was done on one core).

kracker 2013-03-02 15:42

[QUOTE=chalsall;331651]+1!!! :smile:



Ditto. I only have a 560, but it is the 2GB version, so when you need some Stage 2 testing.... :wink:[/QUOTE]

Now give us your half-first born child. Naow! :razz:

chalsall 2013-03-02 15:50

[QUOTE=kracker;331677]Now give us your half-first born child. Naow! :razz:[/QUOTE]

Sure. Where should half of my non-existent first-born be delivered? :wink:

owftheevil 2013-03-02 16:32

[QUOTE=henryzz;331652]
I am intrigued how fast this would run small numbers upto a much higher B1. How much does the exponent affect the runtime? Could you try a sub 1000-bit exponent? Maybe a range of different size exponents would be appropriate.[/QUOTE]

It could be coerced into taking exponents that small (I assume you mean exponents p with Mp < 1000 bits), but it wouldn't be very efficient. Toom-Cook multiplication would be better, or even grammar school multiplication if you go small enough. A very rough upper bound on the number of iterations you need for a given B1 is log2(B1) * the number of primes < B1. Iteration times will be close to what CuLu gets for the same fft. This is after all only a slight modification of CuLu. For very large B1 things will be about 5-10% slower for some final segment.

owftheevil 2013-03-02 16:40

[QUOTE=ET_;331661]Please notice the B1=580,000... :smile:

Is the B1 limit actually a fixed one?

Luigi[/QUOTE]

Yes, at the moment, I have to rebuild it every time I want to use a different B1. If fact the next thing I'm going to do is enable it to get the B1 from the command line.

[QUOTE]
Please note that not everyone here has an high-clocked AVX2 processor available... Not to mention that the code here is still a proof of concept with just few optimizations.

Luigi[/QUOTE]It won't get much faster than it is now.

owftheevil 2013-03-02 16:45

[QUOTE=chalsall;331678]Sure. Where should half of my non-existent first-born be delivered? :wink:[/QUOTE]

e-mail would be fine. But if its half of a boy, don't bother. My wife only wants it if its half of a girl.

And as for the half night in the gap hotel, I presume I have to find my own way to Barbados? Or are you also going to provide transportation for half of the way there?

ET_ 2013-03-02 16:55

[QUOTE=owftheevil;331679]It could be coerced into taking exponents that small (I assume you mean exponents p with Mp < 1000 bits), but it wouldn't be very efficient. Toom-Cook multiplication would be better, or even grammar school multiplication if you go small enough. A very rough upper bound on the number of iterations you need for a given B1 is log2(B1) * the number of primes < B1. Iteration times will be close to what CuLu gets for the same fft. This is after all only a slight modification of CuLu. For very large B1 things will be about 5-10% slower for some final segment.[/QUOTE]

Maybe Montgomery/Barrett could fill the gap between Toom-Cook and grammar multiplication, but at the moment I'd rather have a "fully-functional" P-1 on current exponents, leaving the smaller ones to either mfakt or gpu-ecm...

You rock, owftheevil! :et_:

Luigi

chalsall 2013-03-02 17:00

[QUOTE=owftheevil;331683]e-mail would be fine. But if its half of a boy, don't bother. My wife only wants it if its half of a girl.[/QUOTE]

Actually it's transgendered... MtF... :wink:

[QUOTE=owftheevil;331683]And as for the half night in the gap hotel, I presume I have to find my own way to Barbados? Or are you also going to provide transportation for half of the way there?[/QUOTE]

Since I just got a big contact, I'll also provide half a trip here. You'll have to swim the rest of the way.... :smile:

Sincerely though, thanks [I][U]very[/U][/I] much for your work! :bow:


All times are UTC. The time now is 23:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.