mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-03-02, 13:42   #34
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×1,619 Posts
Default

Quote:
Originally Posted by firejuggler View Post
1H and 34 min? that's 5 to 7 time faster than a ordinary CPU. nice.
Please notice the B1=580,000...

Is the B1 limit actually a fixed one?

Luigi

Last fiddled with by ET_ on 2013-03-02 at 13:42 Reason: Add another annoying question...
ET_ is offline   Reply With Quote
Old 2013-03-02, 13:54   #35
firejuggler
 
firejuggler's Avatar
 
"Vincent"
Apr 2010
Over the rainbow

22·7·103 Posts
Default

yup,
1H34 is about 5700 second.
Below a run of a similar sized expo , total run for phase 1 is 24400 seconds. Ok.. so speed up is only about 4.2 time
Attached Thumbnails
Click image for larger version

Name:	P-1.jpg
Views:	490
Size:	101.1 KB
ID:	9445  
firejuggler is offline   Reply With Quote
Old 2013-03-02, 14:13   #36
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3·1,619 Posts
Default

Quote:
Originally Posted by firejuggler View Post
yup,
1H34 is about 5700 second.
Below a run of a similar sized expo , total run for phase 1 is 24400 seconds. Ok.. so speed up is only about 4.2 time
Please note that not everyone here has an high-clocked AVX2 processor available... Not to mention that the code here is still a proof of concept with just few optimizations.

Luigi
ET_ is offline   Reply With Quote
Old 2013-03-02, 14:45   #37
firejuggler
 
firejuggler's Avatar
 
"Vincent"
Apr 2010
Over the rainbow

1011010001002 Posts
Default

Please note that i'm genuinely impressed by the "proof-of-concept" speed up.

As for my CPU, it is a stock speed i5 2500k, which is pretty 'ordinary' for today computer-enthusiast ( the run was done on one core).
firejuggler is offline   Reply With Quote
Old 2013-03-02, 15:42   #38
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

27×17 Posts
Default

Quote:
Originally Posted by chalsall View Post
+1!!!



Ditto. I only have a 560, but it is the 2GB version, so when you need some Stage 2 testing....
Now give us your half-first born child. Naow!
kracker is online now   Reply With Quote
Old 2013-03-02, 15:50   #39
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23×3×461 Posts
Default

Quote:
Originally Posted by kracker View Post
Now give us your half-first born child. Naow!
Sure. Where should half of my non-existent first-born be delivered?
chalsall is offline   Reply With Quote
Old 2013-03-02, 16:32   #40
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Quote:
Originally Posted by henryzz View Post
I am intrigued how fast this would run small numbers upto a much higher B1. How much does the exponent affect the runtime? Could you try a sub 1000-bit exponent? Maybe a range of different size exponents would be appropriate.
It could be coerced into taking exponents that small (I assume you mean exponents p with Mp < 1000 bits), but it wouldn't be very efficient. Toom-Cook multiplication would be better, or even grammar school multiplication if you go small enough. A very rough upper bound on the number of iterations you need for a given B1 is log2(B1) * the number of primes < B1. Iteration times will be close to what CuLu gets for the same fft. This is after all only a slight modification of CuLu. For very large B1 things will be about 5-10% slower for some final segment.
owftheevil is offline   Reply With Quote
Old 2013-03-02, 16:40   #41
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Quote:
Originally Posted by ET_ View Post
Please notice the B1=580,000...

Is the B1 limit actually a fixed one?

Luigi
Yes, at the moment, I have to rebuild it every time I want to use a different B1. If fact the next thing I'm going to do is enable it to get the B1 from the command line.

Quote:
Please note that not everyone here has an high-clocked AVX2 processor available... Not to mention that the code here is still a proof of concept with just few optimizations.

Luigi
It won't get much faster than it is now.
owftheevil is offline   Reply With Quote
Old 2013-03-02, 16:45   #42
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

13B16 Posts
Default

Quote:
Originally Posted by chalsall View Post
Sure. Where should half of my non-existent first-born be delivered?
e-mail would be fine. But if its half of a boy, don't bother. My wife only wants it if its half of a girl.

And as for the half night in the gap hotel, I presume I have to find my own way to Barbados? Or are you also going to provide transportation for half of the way there?

Last fiddled with by owftheevil on 2013-03-02 at 16:46
owftheevil is offline   Reply With Quote
Old 2013-03-02, 16:55   #43
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3·1,619 Posts
Default

Quote:
Originally Posted by owftheevil View Post
It could be coerced into taking exponents that small (I assume you mean exponents p with Mp < 1000 bits), but it wouldn't be very efficient. Toom-Cook multiplication would be better, or even grammar school multiplication if you go small enough. A very rough upper bound on the number of iterations you need for a given B1 is log2(B1) * the number of primes < B1. Iteration times will be close to what CuLu gets for the same fft. This is after all only a slight modification of CuLu. For very large B1 things will be about 5-10% slower for some final segment.
Maybe Montgomery/Barrett could fill the gap between Toom-Cook and grammar multiplication, but at the moment I'd rather have a "fully-functional" P-1 on current exponents, leaving the smaller ones to either mfakt or gpu-ecm...

You rock, owftheevil!

Luigi
ET_ is offline   Reply With Quote
Old 2013-03-02, 17:00   #44
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·3·461 Posts
Default

Quote:
Originally Posted by owftheevil View Post
e-mail would be fine. But if its half of a boy, don't bother. My wife only wants it if its half of a girl.
Actually it's transgendered... MtF...

Quote:
Originally Posted by owftheevil View Post
And as for the half night in the gap hotel, I presume I have to find my own way to Barbados? Or are you also going to provide transportation for half of the way there?
Since I just got a big contact, I'll also provide half a trip here. You'll have to swim the rest of the way....

Sincerely though, thanks very much for your work!
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3622 2023-01-25 16:41
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 06:01.


Sun Jan 29 06:01:45 UTC 2023 up 164 days, 3:30, 0 users, load averages: 1.46, 1.27, 1.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔