mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2009-10-25, 14:55   #12
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Thank you,

I can't predict the performance.
I look forward to 3xx.
I think Prime95(2 or 3 threads) + MacLucasFFTW(cuda) on quad core is enhnace throughput.
msft is offline   Reply With Quote
Old 2009-10-26, 02:27   #13
Shinzok
 
Shinzok's Avatar
 
Oct 2008
Zevenbergen, NL

2×3 Posts
Default

So how exactly do I run this program ?
Shinzok is offline   Reply With Quote
Old 2009-10-26, 03:20   #14
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Hi,

$ mkdir mers
$ cd mers
$ wget http://www.garlic.com/%7Ewedgingt/mers.tar.gz
$ tar -zxvf mers.tar.gz
$ tar -zxvf MacLucasFFTW.cuda.e.tar.gz
$ make
$ time ./MacLucasFFTW 11213
1 1024
1001 1024
2001 1024
3001 1024
4001 1024
5001 1024
6001 1024
7001 1024
8001 1024
9001 1024
10001 1024
11001 1024
M( 11213 )P, n = 1024, MacLucasFFTW v8.1 Ballester

real 0m1.746s
user 0m1.008s
sys 0m0.700s

It is Not Enough?
Attached Files
File Type: gz MacLucasFFTW.cuda.e.tar.gz (25.2 KB, 509 views)

Last fiddled with by msft on 2009-10-26 at 04:03
msft is offline   Reply With Quote
Old 2009-10-26, 19:56   #15
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×557 Posts
Default

Hi msft,

a 32bit binary works fine here on openSUSE 11.1 x86_64 with CUDA 2.3.
Factory overclocked GTX 275:
Core: 648MHz (Nvidia reference is 633MHz)
Shader: 1458MHz (Nvidia reference is 1404MHz)
Memory: 1188 (Nvidia reference is 1134MHz)

Code:
time ./MacLucasFFTW 11213
M( 11213 )P, n = 1024, MacLucasFFTW v8.1  Ballester

real    0m1.639s
user    0m1.072s
sys     0m0.564s
Code:
time ./MacLucasFFTW 216091
M( 216091 )P, n = 16384, MacLucasFFTW v8.1  Ballester

real    1m32.206s
user    1m8.348s
sys     0m23.861s

Last fiddled with by TheJudger on 2009-10-26 at 19:56
TheJudger is offline   Reply With Quote
Old 2009-10-27, 01:20   #16
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Thank you,TheJudger

My GIGABYTE GV-N26OC-896I is
Core: 650MHz
Shader: 1400MHz
Memory: 1000Mhz

your GTX275 10% first is reasonable result.
msft is offline   Reply With Quote
Old 2009-10-27, 13:30   #17
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,

New result on GTX260.

$ time ./MacLucasFFTW 216091

M( 216091 )P, n = 16384, MacLucasFFTW v8.1 Ballester

real 1m1.812s
user 0m29.146s
sys 0m32.666s

$ time ./MacLucasFFTW 859433

M( 859433 )P, n = 65536, MacLucasFFTW v8.1 Ballester

real 11m7.542s
user 4m28.421s
sys 6m39.149s

$ time ./MacLucasFFTW 2976221

M( 2976221 )P, n = 262144, MacLucasFFTW v8.1 Ballester

real 140m5.919s
user 52m42.382s
sys 87m24.016s

$ time ./MacLucasFFTW 33333333
10001 2097152

real 3m36.574s
user 1m19.181s
sys 2m17.393s

2048k fft sec/iter = 0.022

$ time ./MacLucasFFTW 63333333
10001 4194304

real 7m15.839s
user 2m38.638s
sys 4m37.229s

4096k fft sec/iter = 0.044

Thank you,
Attached Files
File Type: gz MacLucasFFTW.cuda.h.tar.gz (25.6 KB, 492 views)
msft is offline   Reply With Quote
Old 2009-10-27, 20:24   #18
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×557 Posts
Default

Hi msft,

so this becomes interesting right now. :)
Faster than a single core of a current CPU (but less energy efficent I guess).
You have doubled the speed every few days, how long can you keep the pace?

Did you verify that the results are OK?
E.g. do some 1000 iterations of various exponents (near FFT limits) and compare them with a "known good" application on CPU?
TheJudger is offline   Reply With Quote
Old 2009-10-28, 01:53   #19
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Good suggestion,TheJudger

I try it.

thank you,
msft is offline   Reply With Quote
Old 2009-10-28, 12:45   #20
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,

Now tuning is stop,vrify result is start.

MacLucasFFTW.cuda.i.tar.gz:stable version(verify result base)

check example

Mlucas:
M18760031 iteration = 10000 clocks = 00:01:54.879. Res64: 6F3B0CA04650A35D
M18760069 iteration = 10000 clocks = 00:01:55.260. Res64: 7C5D1E38F5880285
M18760153 iteration = 10000 clocks = 00:01:55.400. Res64: 72E6FA57F10F1AA0
M18760309 iteration = 10000 clocks = 00:01:56.569. Res64: 7D5F0967A6C0CDF7
M18760349 iteration = 10000 clocks = 00:01:55.290. Res64: F5EFF9C512E8E0C6
M18760381 iteration = 10000 clocks = 00:01:55.670. Res64: 2CD35948AD199A4F
M18760393 iteration = 10000 clocks = 00:01:56.250. Res64: CF9E63DE37304EA6
M18760451 iteration = 10000 clocks = 00:01:55.400. Res64: 0BE49436D31B79E9
M18760519 iteration = 10000 clocks = 00:01:55.519. Res64: E708514A5EB70096
M18760529 iteration = 10000 clocks = 00:01:55.709. Res64: 2ED23475BBD672F7
M18760531 iteration = 10000 clocks = 00:01:56.120. Res64: 2CE83E69F963DC84
M18760561 iteration = 10000 clocks = 00:01:55.049. Res64: 02F4AD7139B1B8C4
M18760589 iteration = 10000 clocks = 00:01:55.939. Res64: 64B9DBC08126BE60
M18760667 iteration = 10000 clocks = 00:01:55.879. Res64: 8B0F66B09BF10933
M18760681 iteration = 10000 clocks = 00:01:57.200. Res64: 917A77D412CE5B0B
M18760699 iteration = 10000 clocks = 00:01:57.260. Res64: 2E0DD3E2EAE80054
M18760727 iteration = 10000 clocks = 00:01:57.310. Res64: 09C34222F1C37AED
M18760747 iteration = 10000 clocks = 00:01:57.090. Res64: F96FF6B17E1E8367
M18760789 iteration = 10000 clocks = 00:01:57.430. Res64: 225E4973690D03B6
M18760793 iteration = 10000 clocks = 00:01:57.239. Res64: 18359C85D903175C
M18760799 iteration = 10000 clocks = 00:01:57.409. Res64: 9C62E310CBBE3EC8
M18760829 iteration = 10000 clocks = 00:01:58.010. Res64: 68432B797C6759F5
M18760837 iteration = 10000 clocks = 00:01:57.469. Res64: B3C6FC522CD49416
M18760909 iteration = 10000 clocks = 00:01:57.879. Res64: 6DD7EE83DCE2886F
M18761003 iteration = 10000 clocks = 00:01:57.620. Res64: 55ACE9289C2610C8
M18761009 iteration = 10000 clocks = 00:01:57.480. Res64: F5AA08E6FAA03F9C
M18761021 iteration = 10000 clocks = 00:01:57.459. Res64: 87719DD801DBCE61
M18761087 iteration = 10000 clocks = 00:01:57.230. Res64: FB00679524AB92AF
M18761161 iteration = 10000 clocks = 00:01:57.310. Res64: 6F2C976DFED11A5F
M18761179 iteration = 10000 clocks = 00:01:57.709. Res64: AE9CAAC2751B55A0

MacLucasFFTW:
M( 18760031 )C, 0x6f3b0ca04650a35d, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760069 )C, 0x7c5d1e38f5880285, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760153 )C, 0x72e6fa57f10f1aa0, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760309 )C, 0x7d5f0967a6c0cdf7, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760349 )C, 0xf5eff9c512e8e0c6, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760381 )C, 0x2cd35948ad199a4f, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760393 )C, 0xcf9e63de37304ea6, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760451 )C, 0x0be49436d31b79e9, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760519 )C, 0xe708514a5eb70096, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760529 )C, 0x2ed23475bbd672f7, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760531 )C, 0x2ce83e69f963dc84, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760561 )C, 0x02f4ad7139b1b8c4, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760589 )C, 0x64b9dbc08126be60, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760667 )C, 0x8b0f66b09bf10933, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760681 )C, 0x917a77d412ce5b0b, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760699 )C, 0x2e0dd3e2eae80054, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760727 )C, 0x09c34222f1c37aed, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760747 )C, 0xf96ff6b17e1e8367, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760789 )C, 0x225e4973690d03b6, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18760793 )C, 0x18359c85d903175c, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760799 )C, 0x9c62e310cbbe3ec8, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760829 )C, 0x68432b797c6759f5, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760837 )C, 0xb3c6fc522cd49416, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18760909 )C, 0x6dd7ee83dce2886f, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18761003 )C, 0x55ace9289c2610c8, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18761009 )C, 0xf5aa08e6faa03f9c, n = 2097152, MacLucasFFTW v8.1 Ballester
M( 18761021 )C, 0x87719dd801dbce61, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18761087 )C, 0xfb00679524ab92af, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18761161 )C, 0x6f2c976dfed11a5f, n = 1048576, MacLucasFFTW v8.1 Ballester
M( 18761179 )C, 0xae9caac2751b55a0, n = 1048576, MacLucasFFTW v8.1 Ballester

Thank you,
Attached Files
File Type: gz MacLucasFFTW.cuda.i.tar.gz (25.7 KB, 794 views)
msft is offline   Reply With Quote
Old 2009-10-30, 09:26   #21
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Hi,

some verify result.

$ tar -ztvf file.tar.gz
MacLucasFFTW.18760031-18767081.txt:Exponet 18760031 to 18767081,iteration= 10000
MacLucasFFTW.9400009-9456961.txt :Exponet 9400009 to 9456961,iteration= 2000
Mlucas.18760031-18767081.txt :Exponet 18760031 to 18767081,iteration= 10000
Mlucas.9400009-9456961.txt :Exponet 9400009 to 9456961,iteration= 2000

All result is correct.

Thank you,
Attached Files
File Type: gz file.tar.gz (46.3 KB, 486 views)
msft is offline   Reply With Quote
Old 2009-10-30, 15:20   #22
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

Great work, msft.
You might as well try to run a complete Double check (a 22M exponent) to check the reliability of the hardware running your code. The current FFT length is 1280K so it shouldn´t take too long, your code is running fast.
lycorn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:39.


Sun Feb 5 14:39:40 UTC 2023 up 171 days, 12:08, 1 user, load averages: 1.65, 1.08, 0.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔