mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2009-11-06, 06:27   #34
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

50448 Posts
Default

Version k runs at .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.
frmky is offline   Reply With Quote
Old 2009-11-06, 10:05   #35
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

Cool. Can't wait for the 3xx series - Dec 2009 release date apparently (well according to wikipedia).

-- Craig
nucleon is offline   Reply With Quote
Old 2009-11-06, 10:20   #36
BigBrother
 
Feb 2005
The Netherlands

2×109 Posts
Default

After some fiddling, I managed to compile and run this under Windows. I had to replace two memalign() functions with malloc(), because memalign() is apparently obsolete. Here are two results:

Code:
D:\Code\MaclucasFFTW.cuda.k>a 11213
 too small Exponent
Code:
D:\Code\MaclucasFFTW.cuda.k>a 216091
 1 131072
 10001 131072
 20001 131072
 30001 131072
 40001 131072
 50001 131072
 60001 131072
 70001 131072
 80001 131072
 90001 131072
 100001 131072
 110001 131072
 120001 131072
 130001 131072
 140001 131072
 150001 131072
 160001 131072
 170001 131072
 180001 131072
 190001 131072
 200001 131072
 210001 131072
M( 216091 )C, 0xfffffffffffffffd, n = 131072, MacLucasFFTW v8.1  Ballester
This last one ran for only +- 30 seconds.

Other exponents that are big enough have the same results: 0xfffffffffffffffd after a very short while.

My video card is a 9600 M GS, and is capable of running Folding@Home.
BigBrother is offline   Reply With Quote
Old 2009-11-06, 13:35   #37
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

32×5×79 Posts
Default

Quote:
Originally Posted by BigBrother View Post
My video card is a 9600 M GS, and is capable of running Folding@Home.
Are you sure this card supports double precision? If it doesn't, it will use single precision FP internally, and generate completely wrong answers.
jasonp is offline   Reply With Quote
Old 2009-11-07, 04:38   #38
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi, frmky
Quote:
Originally Posted by frmky View Post
.0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.
4096K FFT performance is reasonable,My GTX260's 4096K FFT performance is not.

Hi,nucleon
Quote:
Originally Posted by nucleon View Post
Cool. Can't wait for the 3xx series - Dec 2009 release date apparently (well according to wikipedia).
me too.

Hi, BigBrother
Quote:
Originally Posted by BigBrother View Post
Code:
D:\Code\MaclucasFFTW.cuda.k>a 11213
 too small Exponent
Need Exponent more than 131072, aint() function need Exponent more than FFT size.
Quote:
Originally Posted by BigBrother View Post
My video card is a 9600 M GS, and is capable of running Folding@Home.
Sorry only 2xx support DP.

Hi, jasonp
Quote:
Originally Posted by jasonp View Post
Are you sure this card supports double precision? If it doesn't, it will use single precision FP internally, and generate completely wrong answers.
Nice support, thank you
msft is offline   Reply With Quote
Old 2009-11-07, 05:22   #39
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi,

Version o runs at .0134 sec/iter for the 2048K FFT and .0320 sec/iter for the 4096K FFT on the GTX260.
Attached Files
File Type: gz MacLucasFFTW.cuda.o.tar.gz (30.6 KB, 506 views)
msft is offline   Reply With Quote
Old 2009-11-07, 07:30   #40
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

259610 Posts
Default

Excellent! I have another calculation running now, so I won't be able to bench it on the C1060 for a few days.

Two questions... First, can this be adapted to use non-power-of-2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95? Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's.
frmky is offline   Reply With Quote
Old 2009-11-07, 10:24   #41
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi, frmky
Quote:
Originally Posted by frmky View Post
First, can this be adapted to use non-power-of-2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95?
I Consider it.
Quote:
Originally Posted by frmky View Post
Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's.
My question is "How is 1D FFT supported on S1070 ?".
msft is offline   Reply With Quote
Old 2009-11-07, 17:50   #42
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010001001002 Posts
Default

Quote:
Originally Posted by msft View Post
My question is "How is 1D FFT supported on S1070 ?".
The S1070 is really just four discrete C1060's, just housed in a separate unit. It is no different than installing four GTX260's in your computer. Each card must be addressed individually from a different program thread, and the cards cannot directly communicate with each other.
frmky is offline   Reply With Quote
Old 2009-11-08, 09:01   #43
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Hi,
Quote:
Originally Posted by frmky View Post
can this be adapted to use non-power-of-2 FFT's,
I make non-power-of-2 FFT version with cufftExecD2Z(),but cufftExecD2Z() is two times slower than cufftExecZ2Z().
Someone tell me use Complex FFT method ?

Thank you,
msft is offline   Reply With Quote
Old 2009-11-08, 09:06   #44
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi,
Quote:
Originally Posted by frmky View Post
the cards cannot directly communicate with each other.
Nobody say 1D FFT performance on S1070,this is answer.
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 01:36.


Mon Jan 30 01:36:21 UTC 2023 up 164 days, 23:04, 0 users, load averages: 1.63, 1.45, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔