mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-06-01, 14:43   #1926
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Or just multiply fft by 15 or so and pick a prime close to that.

But to get it distributed so that you can get results soon would be a problem.
owftheevil is offline   Reply With Quote
Old 2013-06-01, 15:04   #1927
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Another thought, actual exponents might not be necessary for just a timing benchmark.
owftheevil is offline   Reply With Quote
Old 2013-06-01, 19:52   #1928
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

4738 Posts
Default

Here's what it looks like so far. I've done a few trial runs with exponents near those specified in the table. Maximum errors are coming in around .25 and with all the error checking and checkpoint processing, iteration times are .1%-.7% higher in the actual test than in the benchmark results.

Code:
filbert@filbert:~/Build/CudaLucas-2.052$ ./CUDALucas -cufftbench 3000 4000 5 -d 1

------- DEVICE 1 -------
name                GeForce GTX 560 Ti
Compatibility       2.1
clockRate (MHz)     1900
memClockRate (MHz)  2080
totalGlobalMem      1073414144
totalConstMem       65536
l2CacheSize         524288
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     1536
multiProcessorCount 8
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
textureAlignment    512
deviceOverlap       1

CUFFT bench testing fft sizes 3000K to 4000K, doing 5 passes.
Pass 1, fft size = 3000K, exp up to 55296000, ave time = 9.306 msec, max-ave = 0.00000
Pass 1, fft size = 3024K, exp up to 55738368, ave time = 8.174 msec, max-ave = 0.00000

.
.
.

Pass 5, fft size = 3000K, exp up to 55296000, ave time = 9.307 msec, max-ave = 0.00132
Pass 5, fft size = 3024K, exp up to 55738368, ave time = 8.173 msec, max-ave = 0.00080
Pass 5, fft size = 3072K, exp up to 56623104, ave time = 8.443 msec, max-ave = 0.00111
Pass 5, fft size = 3087K, exp up to 56899584, ave time = 9.114 msec, max-ave = 0.00192
Pass 5, fft size = 3125K, exp up to 57600000, ave time = 10.053 msec, max-ave = 0.00158
Pass 5, fft size = 3136K, exp up to 57802752, ave time = 8.120 msec, max-ave = 0.00069
Pass 5, fft size = 3150K, exp up to 58060800, ave time = 9.222 msec, max-ave = 0.00037
Pass 5, fft size = 3200K, exp up to 58982400, ave time = 8.691 msec, max-ave = 0.00017
Pass 5, fft size = 3240K, exp up to 59719680, ave time = 9.396 msec, max-ave = 0.00124
Pass 5, fft size = 3360K, exp up to 61931520, ave time = 9.655 msec, max-ave = 0.00039
Pass 5, fft size = 3375K, exp up to 62208000, ave time = 10.542 msec, max-ave = 0.00272
Pass 5, fft size = 3402K, exp up to 62705664, ave time = 9.492 msec, max-ave = 0.00043
Pass 5, fft size = 3430K, exp up to 63221760, ave time = 10.350 msec, max-ave = 0.00118
Pass 5, fft size = 3456K, exp up to 63700992, ave time = 9.091 msec, max-ave = 0.00074
Pass 5, fft size = 3500K, exp up to 64512000, ave time = 10.426 msec, max-ave = 0.00123
Pass 5, fft size = 3528K, exp up to 65028096, ave time = 9.722 msec, max-ave = 0.00029
Pass 5, fft size = 3584K, exp up to 66060288, ave time = 9.315 msec, max-ave = 0.00096
Pass 5, fft size = 3600K, exp up to 66355200, ave time = 10.074 msec, max-ave = 0.00073
Pass 5, fft size = 3645K, exp up to 67184640, ave time = 11.112 msec, max-ave = 0.00154
Pass 5, fft size = 3675K, exp up to 67737600, ave time = 11.758 msec, max-ave = 0.00391
Pass 5, fft size = 3750K, exp up to 69120000, ave time = 12.185 msec, max-ave = 0.00084
Pass 5, fft size = 3780K, exp up to 69672960, ave time = 11.054 msec, max-ave = 0.00065
Pass 5, fft size = 3840K, exp up to 70778880, ave time = 11.030 msec, max-ave = 0.00142
Pass 5, fft size = 3888K, exp up to 71663616, ave time = 10.535 msec, max-ave = 0.00155
Pass 5, fft size = 3920K, exp up to 72253440, ave time = 10.931 msec, max-ave = 0.00047
Pass 5, fft size = 3969K, exp up to 73156608, ave time = 11.438 msec, max-ave = 0.00165
Pass 5, fft size = 4000K, exp up to 73728000, ave time = 11.430 msec, max-ave = 0.00124
filbert@filbert:~/Build/CudaLucas-2.052$

Last fiddled with by owftheevil on 2013-06-01 at 20:20
owftheevil is offline   Reply With Quote
Old 2013-06-01, 21:22   #1929
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23·149 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Here's what it looks like so far.
iteration times are .1%-.7% higher in the actual test than in the benchmark results.
That's beautiful!
1% deviation from actual isn't a problem for my purposes, where I'm aggregating results from many people on many GPUs for approximate average performance for a given architecture.

Now I'll just have to wait until this new benchmark mode is widely available (meaning, in many cases, Windows binaries).
James Heinrich is offline   Reply With Quote
Old 2013-06-05, 18:53   #1930
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Hi Carl,

Quote:
Originally Posted by owftheevil View Post
Here's version 0.13 with Olivers requested fflush(NULL) statements.

Oliver, thanks for showing me that.
two more whishes:
  1. add "clean"-target to the Makefile
  2. add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile

Your memtest refuses to run on GTX 690 (GK104? GK10x?) when compiled with the default CUFLAGS (Linux, nvcc V0.2.1221 (CUDA 5.0), driver 319.23):
Code:
./memtest 10 10 0

Initializing test using 250MiB of memory on device 0

memtest.cu(187) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED
It doesn't matter how much memory, iterations or which device I choose.
When I add compute_30,sm_30 to the Makefile and recompile it runs fine here (GTX 690, TITAN, Tesla K20).

As you can see I'm really using your tool!

Oliver

P.S. I'm remembering this one: http://www.mersenneforum.org/showpos...5&postcount=8: should Carls memtest get his own thread?

Last fiddled with by TheJudger on 2013-06-05 at 18:55
TheJudger is offline   Reply With Quote
Old 2013-06-11, 18:11   #1931
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi ,
I make Radion FFT Benchmark.
HD7750:
Quote:
$ sh -x ./run.sh
+ rm *.o a.out
+ g++ -c main.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/
+ g++ -c clFFTPlans.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/
+ g++ main.o clFFTPlans.o /opt/clAmdFft-1.8.291/lib64/libclAmdFft.Runtime.so -lOpenCL -lfftw3
+ export LD_LIBRARY_PATH=:/opt/clAmdFft-1.8.291/lib64/
+ time ./a.out
Using device: Capeverde
AmdFFT_Z2Z size= 524288 time= 2.780000 msec
Everything went fine!
31.54user 249.49system 4:41.49elapsed 99%CPU (0avgtext+0avgdata 1334368maxresident)k
0inputs+45368outputs (0major+151072minor)pagefaults 0swaps
$
Attached Files
File Type: bz2 0.77.tar.bz2 (3.6 KB, 73 views)
msft is offline   Reply With Quote
Old 2013-06-11, 18:39   #1932
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by msft View Post
Hi ,
I make Radion FFT Benchmark.
HD7750:
Wow. At last someone...
If you need testers for anything I have two 7770's.
kracker is online now   Reply With Quote
Old 2013-06-11, 19:33   #1933
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Hi Oliver,

Quote:
two more whishes:
  1. add "clean"-target to the Makefile
  2. add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile
Noted and added to the list. Thanks as always.

Carl
owftheevil is offline   Reply With Quote
Old 2013-06-11, 19:36   #1934
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

4738 Posts
Default

Hi msft,

At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be?
In any event, those are interesting results. Do you plan on make a LL test out of this?

Carl

Last fiddled with by owftheevil on 2013-06-11 at 19:44
owftheevil is offline   Reply With Quote
Old 2013-06-11, 19:36   #1935
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Hi msft,

At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be?

Carl
a 650. OR 550. Not Ti

EDIT: Sorry, wasn't my question to answer, just realized that.

Last fiddled with by kracker on 2013-06-11 at 19:37
kracker is online now   Reply With Quote
Old 2013-06-11, 19:45   #1936
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Hi Kracker,

No problem, the question was for anyone with information. Thanks. The time seems much more reasonable now.

Last fiddled with by owftheevil on 2013-06-11 at 19:50
owftheevil is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:56.


Fri Aug 6 14:56:11 UTC 2021 up 14 days, 9:25, 1 user, load averages: 2.44, 2.79, 2.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.