mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2009-10-30, 22:59   #23
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

I understand,lycorn.

Thank you,
msft is offline   Reply With Quote
Old 2009-10-30, 23:43   #24
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Hi msft.

This is great work: many thanks! I had to copy the .h files from a separate install of MacLucasFFTW into the directory and modify some of the paths in the makefile to get it to work, but it works now.

It's a bit slower than I expected, 3m40s on a GTX275 to test 216091, but that's probably because 131072 is a very large FFT size to use in double precision for so small a number.

Unfortunately my computer crashed the second time I tried testing 216091; I think the graphics card is a bit flaky.

Last fiddled with by fivemack on 2009-10-31 at 00:21
fivemack is offline   Reply With Quote
Old 2009-10-31, 01:55   #25
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

72·53 Posts
Default

This is getting interesting! I decided to try the exponent 24036583. On a Tesla C1060 GPU, CUDA MacLucasFFTW runs at 0.0153 sec/iter using a 2048K FFT. Using one thread on a 2GHz Opteron K10 CPU, on the same exponent Prime95 runs at 0.055 sec/iter using a 1280K FFT. So, comparing the speed on a top-of-the-line GPU and a notoriously slow for Prime95 CPU, the GPU version runs about 3.5x faster. Also interesting is that after adding

cudaSetDeviceFlags(cudaDeviceBlockingSync);
cudaSetDevice(0);

near the top of the main() function, MacLucasFFTW uses only about 5% of a cpu core.

Assuming the computer doesn't get reset in the next 5 days or that restarting works, I'll let this run to completion.

Last fiddled with by frmky on 2009-10-31 at 01:57
frmky is offline   Reply With Quote
Old 2009-10-31, 09:27   #26
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Thank you testing this program,fivemack,
Quote:
Originally Posted by fivemack View Post
I had to copy the .h files from a separate install of MacLucasFFTW into the directory and modify some of the paths in the makefile to get it to work, but it works now.
Sorry,it is my first Makefile.
Quote:
Originally Posted by fivemack View Post
It's a bit slower than I expected, 3m40s on a GTX275 to test 216091, but that's probably because 131072 is a very large FFT size to use in double precision for so small a number.
The side effects of parallelization , GPU need more thrhads, tuning target is 2048k or more higher.
Quote:
Originally Posted by fivemack View Post
Unfortunately my computer crashed the second time I tried testing 216091; I think the graphics card is a bit flaky.
Exactry, What are GTX275 made of ?
msft is offline   Reply With Quote
Old 2009-10-31, 09:52   #27
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi,frmky.
Quote:
Originally Posted by frmky View Post
I decided to try the exponent 24036583. On a Tesla C1060 GPU, CUDA MacLucasFFTW runs at 0.0153 sec/iter using a 2048K FFT.
I immediately check 24036583's 2000 iteration checksum, It is correct.
Quote:
Originally Posted by frmky View Post
cudaSetDeviceFlags(cudaDeviceBlockingSync);
cudaSetDevice(0);
near the top of the main() function, MacLucasFFTW uses only about 5% of a cpu core.
Cpu was 95% spin loop, I add this function, thank you.

Quote:
Originally Posted by frmky View Post
Assuming the computer doesn't get reset in the next 5 days or that restarting works, I'll let this run to completion.
Thank you for lots of work,
msft is offline   Reply With Quote
Old 2009-11-04, 08:30   #28
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A2516 Posts
Default

Quote:
Originally Posted by msft View Post
I immediately check 24036583's 2000 iteration checksum, It is correct.
And so were the next 24 million.

M( 24036583 )P, n = 2097152, MacLucasFFTW v8.1 Ballester
frmky is offline   Reply With Quote
Old 2009-11-04, 11:03   #29
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by frmky View Post
M( 24036583 )P, n = 2097152, MacLucasFFTW v8.1 Ballester
Prima!!!, Thank you
msft is offline   Reply With Quote
Old 2009-11-04, 12:33   #30
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

157010 Posts
Default

Congrats, msft. The code seems to be running fine. 0.0153 sec/iter for a 1280K FFT is better than I can get on a Core2 duo T8300, with BOTH cores crunching the same exponent (best result is ~ 0.017).
lycorn is offline   Reply With Quote
Old 2009-11-04, 14:45   #31
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Thank you, lycorn

New version on GTX260.

$ tar -zxvf MacLucasFFTW.cuda.k.tar.gz
$ make
$ time ./MacLucasFFTW 216091

M( 216091 )P, n = 131072, MacLucasFFTW v8.1 Ballester

real 6m34.691s
user 0m10.025s
sys 0m0.188s

$ time ./MacLucasFFTW 2976221

M( 2976221 )P, n = 262144, MacLucasFFTW v8.1 Ballester

real 129m52.509s
user 19m27.337s
sys 0m1.232s

$ time ./MacLucasFFTW 33333333
10001 2097152

real 2m44.702s
user 0m22.469s
sys 0m1.136s

2048k fft sec/iter = 0.0165

$ time ./MacLucasFFTW 63333333
10001 4194304

real 7m0.095s
user 1m43.026s
sys 0m1.160s

4096k fft sec/iter = 0.042

M131101 to M1548619 1000 iterations check sum compare to Glucas,it is correct.

Thank you,
Attached Files
File Type: gz MacLucasFFTW.cuda.k.tar.gz (30.0 KB, 503 views)
msft is offline   Reply With Quote
Old 2009-11-05, 09:46   #32
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5·103 Posts
Default

Is there any advantage is doing multiple FFTs at the same time on the GPU? i.e. can we get 2x prime checks at the same time is say <50% of the time in doing one check?

-- Craig
nucleon is offline   Reply With Quote
Old 2009-11-05, 15:06   #33
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi, nucleon

Quote:
Originally Posted by nucleon View Post
Is there any advantage is doing multiple FFTs at the same time on the GPU? i.e. can we get 2x prime checks at the same time is say <50% of the time in doing one check?
Unfortunately PCI was too slow for LL-test.

Thank you,
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 07:04.


Tue Jan 31 07:04:29 UTC 2023 up 166 days, 4:33, 0 users, load averages: 0.65, 0.79, 0.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔