![]() |
![]() |
#12 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Thank you,
I can't predict the performance. I look forward to 3xx. I think Prime95(2 or 3 threads) + MacLucasFFTW(cuda) on quad core is enhnace throughput. |
![]() |
![]() |
![]() |
#13 |
Oct 2008
Zevenbergen, NL
2×3 Posts |
![]()
So how exactly do I run this program ?
|
![]() |
![]() |
![]() |
#14 |
Jul 2009
Tokyo
10011000102 Posts |
![]()
Hi,
$ mkdir mers $ cd mers $ wget http://www.garlic.com/%7Ewedgingt/mers.tar.gz $ tar -zxvf mers.tar.gz $ tar -zxvf MacLucasFFTW.cuda.e.tar.gz $ make $ time ./MacLucasFFTW 11213 1 1024 1001 1024 2001 1024 3001 1024 4001 1024 5001 1024 6001 1024 7001 1024 8001 1024 9001 1024 10001 1024 11001 1024 M( 11213 )P, n = 1024, MacLucasFFTW v8.1 Ballester real 0m1.746s user 0m1.008s sys 0m0.700s It is Not Enough? Last fiddled with by msft on 2009-10-26 at 04:03 |
![]() |
![]() |
![]() |
#15 |
"Oliver"
Mar 2005
Germany
2×557 Posts |
![]()
Hi msft,
a 32bit binary works fine here on openSUSE 11.1 x86_64 with CUDA 2.3. Factory overclocked GTX 275: Core: 648MHz (Nvidia reference is 633MHz) Shader: 1458MHz (Nvidia reference is 1404MHz) Memory: 1188 (Nvidia reference is 1134MHz) Code:
time ./MacLucasFFTW 11213 M( 11213 )P, n = 1024, MacLucasFFTW v8.1 Ballester real 0m1.639s user 0m1.072s sys 0m0.564s Code:
time ./MacLucasFFTW 216091 M( 216091 )P, n = 16384, MacLucasFFTW v8.1 Ballester real 1m32.206s user 1m8.348s sys 0m23.861s Last fiddled with by TheJudger on 2009-10-26 at 19:56 |
![]() |
![]() |
![]() |
#16 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Thank you,TheJudger
My GIGABYTE GV-N26OC-896I is Core: 650MHz Shader: 1400MHz Memory: 1000Mhz your GTX275 10% first is reasonable result. |
![]() |
![]() |
![]() |
#17 |
Jul 2009
Tokyo
2×5×61 Posts |
![]()
Hi,
New result on GTX260. $ time ./MacLucasFFTW 216091 M( 216091 )P, n = 16384, MacLucasFFTW v8.1 Ballester real 1m1.812s user 0m29.146s sys 0m32.666s $ time ./MacLucasFFTW 859433 M( 859433 )P, n = 65536, MacLucasFFTW v8.1 Ballester real 11m7.542s user 4m28.421s sys 6m39.149s $ time ./MacLucasFFTW 2976221 M( 2976221 )P, n = 262144, MacLucasFFTW v8.1 Ballester real 140m5.919s user 52m42.382s sys 87m24.016s $ time ./MacLucasFFTW 33333333 10001 2097152 real 3m36.574s user 1m19.181s sys 2m17.393s 2048k fft sec/iter = 0.022 $ time ./MacLucasFFTW 63333333 10001 4194304 real 7m15.839s user 2m38.638s sys 4m37.229s 4096k fft sec/iter = 0.044 Thank you, |
![]() |
![]() |
![]() |
#18 |
"Oliver"
Mar 2005
Germany
2×557 Posts |
![]()
Hi msft,
so this becomes interesting right now. :) Faster than a single core of a current CPU (but less energy efficent I guess). You have doubled the speed every few days, how long can you keep the pace? Did you verify that the results are OK? E.g. do some 1000 iterations of various exponents (near FFT limits) and compare them with a "known good" application on CPU? |
![]() |
![]() |
![]() |
#19 |
Jul 2009
Tokyo
2×5×61 Posts |
![]()
Good suggestion,TheJudger
I try it. thank you, |
![]() |
![]() |
![]() |
#20 |
Jul 2009
Tokyo
2×5×61 Posts |
![]()
Hi,
Now tuning is stop,vrify result is start. MacLucasFFTW.cuda.i.tar.gz:stable version(verify result base) check example Mlucas: M18760031 iteration = 10000 clocks = 00:01:54.879. Res64: 6F3B0CA04650A35D M18760069 iteration = 10000 clocks = 00:01:55.260. Res64: 7C5D1E38F5880285 M18760153 iteration = 10000 clocks = 00:01:55.400. Res64: 72E6FA57F10F1AA0 M18760309 iteration = 10000 clocks = 00:01:56.569. Res64: 7D5F0967A6C0CDF7 M18760349 iteration = 10000 clocks = 00:01:55.290. Res64: F5EFF9C512E8E0C6 M18760381 iteration = 10000 clocks = 00:01:55.670. Res64: 2CD35948AD199A4F M18760393 iteration = 10000 clocks = 00:01:56.250. Res64: CF9E63DE37304EA6 M18760451 iteration = 10000 clocks = 00:01:55.400. Res64: 0BE49436D31B79E9 M18760519 iteration = 10000 clocks = 00:01:55.519. Res64: E708514A5EB70096 M18760529 iteration = 10000 clocks = 00:01:55.709. Res64: 2ED23475BBD672F7 M18760531 iteration = 10000 clocks = 00:01:56.120. Res64: 2CE83E69F963DC84 M18760561 iteration = 10000 clocks = 00:01:55.049. Res64: 02F4AD7139B1B8C4 M18760589 iteration = 10000 clocks = 00:01:55.939. Res64: 64B9DBC08126BE60 M18760667 iteration = 10000 clocks = 00:01:55.879. Res64: 8B0F66B09BF10933 M18760681 iteration = 10000 clocks = 00:01:57.200. Res64: 917A77D412CE5B0B M18760699 iteration = 10000 clocks = 00:01:57.260. Res64: 2E0DD3E2EAE80054 M18760727 iteration = 10000 clocks = 00:01:57.310. Res64: 09C34222F1C37AED M18760747 iteration = 10000 clocks = 00:01:57.090. Res64: F96FF6B17E1E8367 M18760789 iteration = 10000 clocks = 00:01:57.430. Res64: 225E4973690D03B6 M18760793 iteration = 10000 clocks = 00:01:57.239. Res64: 18359C85D903175C M18760799 iteration = 10000 clocks = 00:01:57.409. Res64: 9C62E310CBBE3EC8 M18760829 iteration = 10000 clocks = 00:01:58.010. Res64: 68432B797C6759F5 M18760837 iteration = 10000 clocks = 00:01:57.469. Res64: B3C6FC522CD49416 M18760909 iteration = 10000 clocks = 00:01:57.879. Res64: 6DD7EE83DCE2886F M18761003 iteration = 10000 clocks = 00:01:57.620. Res64: 55ACE9289C2610C8 M18761009 iteration = 10000 clocks = 00:01:57.480. Res64: F5AA08E6FAA03F9C M18761021 iteration = 10000 clocks = 00:01:57.459. Res64: 87719DD801DBCE61 M18761087 iteration = 10000 clocks = 00:01:57.230. Res64: FB00679524AB92AF M18761161 iteration = 10000 clocks = 00:01:57.310. Res64: 6F2C976DFED11A5F M18761179 iteration = 10000 clocks = 00:01:57.709. Res64: AE9CAAC2751B55A0 MacLucasFFTW: M( 18760031 )C, 0x6f3b0ca04650a35d, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760069 )C, 0x7c5d1e38f5880285, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760153 )C, 0x72e6fa57f10f1aa0, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760309 )C, 0x7d5f0967a6c0cdf7, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760349 )C, 0xf5eff9c512e8e0c6, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760381 )C, 0x2cd35948ad199a4f, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760393 )C, 0xcf9e63de37304ea6, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760451 )C, 0x0be49436d31b79e9, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760519 )C, 0xe708514a5eb70096, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760529 )C, 0x2ed23475bbd672f7, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760531 )C, 0x2ce83e69f963dc84, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760561 )C, 0x02f4ad7139b1b8c4, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760589 )C, 0x64b9dbc08126be60, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760667 )C, 0x8b0f66b09bf10933, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760681 )C, 0x917a77d412ce5b0b, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760699 )C, 0x2e0dd3e2eae80054, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760727 )C, 0x09c34222f1c37aed, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760747 )C, 0xf96ff6b17e1e8367, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760789 )C, 0x225e4973690d03b6, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18760793 )C, 0x18359c85d903175c, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760799 )C, 0x9c62e310cbbe3ec8, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760829 )C, 0x68432b797c6759f5, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760837 )C, 0xb3c6fc522cd49416, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18760909 )C, 0x6dd7ee83dce2886f, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18761003 )C, 0x55ace9289c2610c8, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18761009 )C, 0xf5aa08e6faa03f9c, n = 2097152, MacLucasFFTW v8.1 Ballester M( 18761021 )C, 0x87719dd801dbce61, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18761087 )C, 0xfb00679524ab92af, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18761161 )C, 0x6f2c976dfed11a5f, n = 1048576, MacLucasFFTW v8.1 Ballester M( 18761179 )C, 0xae9caac2751b55a0, n = 1048576, MacLucasFFTW v8.1 Ballester Thank you, |
![]() |
![]() |
![]() |
#21 |
Jul 2009
Tokyo
10011000102 Posts |
![]()
Hi,
some verify result. $ tar -ztvf file.tar.gz MacLucasFFTW.18760031-18767081.txt:Exponet 18760031 to 18767081,iteration= 10000 MacLucasFFTW.9400009-9456961.txt :Exponet 9400009 to 9456961,iteration= 2000 Mlucas.18760031-18767081.txt :Exponet 18760031 to 18767081,iteration= 10000 Mlucas.9400009-9456961.txt :Exponet 9400009 to 9456961,iteration= 2000 All result is correct. Thank you, |
![]() |
![]() |
![]() |
#22 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
Great work, msft.
You might as well try to run a complete Double check (a 22M exponent) to check the reliability of the hardware running your code. The current FFT length is 1280K so it shouldn´t take too long, your code is running fast. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |