mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

owftheevil 2013-06-01 14:43

Or just multiply fft by 15 or so and pick a prime close to that.

But to get it distributed so that you can get results soon would be a problem.

owftheevil 2013-06-01 15:04

Another thought, actual exponents might not be necessary for just a timing benchmark.

owftheevil 2013-06-01 19:52

Here's what it looks like so far. I've done a few trial runs with exponents near those specified in the table. Maximum errors are coming in around .25 and with all the error checking and checkpoint processing, iteration times are .1%-.7% higher in the actual test than in the benchmark results.

[CODE]filbert@filbert:~/Build/CudaLucas-2.052$ ./CUDALucas -cufftbench 3000 4000 5 -d 1

------- DEVICE 1 -------
name GeForce GTX 560 Ti
Compatibility 2.1
clockRate (MHz) 1900
memClockRate (MHz) 2080
totalGlobalMem 1073414144
totalConstMem 65536
l2CacheSize 524288
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 8
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1

CUFFT bench testing fft sizes 3000K to 4000K, doing 5 passes.
Pass 1, fft size = 3000K, exp up to 55296000, ave time = 9.306 msec, max-ave = 0.00000
Pass 1, fft size = 3024K, exp up to 55738368, ave time = 8.174 msec, max-ave = 0.00000

.
.
.

Pass 5, fft size = 3000K, exp up to 55296000, ave time = 9.307 msec, max-ave = 0.00132
Pass 5, fft size = 3024K, exp up to 55738368, ave time = 8.173 msec, max-ave = 0.00080
Pass 5, fft size = 3072K, exp up to 56623104, ave time = 8.443 msec, max-ave = 0.00111
Pass 5, fft size = 3087K, exp up to 56899584, ave time = 9.114 msec, max-ave = 0.00192
Pass 5, fft size = 3125K, exp up to 57600000, ave time = 10.053 msec, max-ave = 0.00158
Pass 5, fft size = 3136K, exp up to 57802752, ave time = 8.120 msec, max-ave = 0.00069
Pass 5, fft size = 3150K, exp up to 58060800, ave time = 9.222 msec, max-ave = 0.00037
Pass 5, fft size = 3200K, exp up to 58982400, ave time = 8.691 msec, max-ave = 0.00017
Pass 5, fft size = 3240K, exp up to 59719680, ave time = 9.396 msec, max-ave = 0.00124
Pass 5, fft size = 3360K, exp up to 61931520, ave time = 9.655 msec, max-ave = 0.00039
Pass 5, fft size = 3375K, exp up to 62208000, ave time = 10.542 msec, max-ave = 0.00272
Pass 5, fft size = 3402K, exp up to 62705664, ave time = 9.492 msec, max-ave = 0.00043
Pass 5, fft size = 3430K, exp up to 63221760, ave time = 10.350 msec, max-ave = 0.00118
Pass 5, fft size = 3456K, exp up to 63700992, ave time = 9.091 msec, max-ave = 0.00074
Pass 5, fft size = 3500K, exp up to 64512000, ave time = 10.426 msec, max-ave = 0.00123
Pass 5, fft size = 3528K, exp up to 65028096, ave time = 9.722 msec, max-ave = 0.00029
Pass 5, fft size = 3584K, exp up to 66060288, ave time = 9.315 msec, max-ave = 0.00096
Pass 5, fft size = 3600K, exp up to 66355200, ave time = 10.074 msec, max-ave = 0.00073
Pass 5, fft size = 3645K, exp up to 67184640, ave time = 11.112 msec, max-ave = 0.00154
Pass 5, fft size = 3675K, exp up to 67737600, ave time = 11.758 msec, max-ave = 0.00391
Pass 5, fft size = 3750K, exp up to 69120000, ave time = 12.185 msec, max-ave = 0.00084
Pass 5, fft size = 3780K, exp up to 69672960, ave time = 11.054 msec, max-ave = 0.00065
Pass 5, fft size = 3840K, exp up to 70778880, ave time = 11.030 msec, max-ave = 0.00142
Pass 5, fft size = 3888K, exp up to 71663616, ave time = 10.535 msec, max-ave = 0.00155
Pass 5, fft size = 3920K, exp up to 72253440, ave time = 10.931 msec, max-ave = 0.00047
Pass 5, fft size = 3969K, exp up to 73156608, ave time = 11.438 msec, max-ave = 0.00165
Pass 5, fft size = 4000K, exp up to 73728000, ave time = 11.430 msec, max-ave = 0.00124
filbert@filbert:~/Build/CudaLucas-2.052$ [/CODE]

James Heinrich 2013-06-01 21:22

[QUOTE=owftheevil;342267]Here's what it looks like so far.
iteration times are .1%-.7% higher in the actual test than in the benchmark results.[/QUOTE]That's beautiful!
1% deviation from actual isn't a problem for my purposes, where I'm aggregating results from many people on many GPUs for approximate average performance for a given architecture.

Now I'll just have to wait until this new benchmark mode is widely available (meaning, in many cases, Windows binaries).

TheJudger 2013-06-05 18:53

Hi Carl,

[QUOTE=owftheevil;340301]Here's version 0.13 with Olivers requested fflush(NULL) statements.

Oliver, thanks for showing me that.[/QUOTE]

two more whishes:[LIST=1][*]add "clean"-target to the Makefile[*]add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile[/LIST]
Your memtest refuses to run on GTX 690 (GK104? GK10x?) when compiled with the default CUFLAGS (Linux, nvcc V0.2.1221 (CUDA 5.0), driver 319.23):
[CODE]./memtest 10 10 0

Initializing test using 250MiB of memory on device 0

memtest.cu(187) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/CODE]
It doesn't matter how much memory, iterations or which device I choose.
When I add compute_30,sm_30 to the Makefile and recompile it runs fine here (GTX 690, TITAN, Tesla K20).

As you can see I'm really using your tool!

Oliver

P.S. I'm remembering this one: [url]http://www.mersenneforum.org/showpost.php?p=197975&postcount=8:[/url] should Carls memtest get his own thread?

msft 2013-06-11 18:11

1 Attachment(s)
Hi ,
I make Radion FFT Benchmark.
HD7750:
[QUOTE]
$ sh -x ./run.sh
+ rm *.o a.out
+ g++ -c main.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/
+ g++ -c clFFTPlans.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/
+ g++ main.o clFFTPlans.o /opt/clAmdFft-1.8.291/lib64/libclAmdFft.Runtime.so -lOpenCL -lfftw3
+ export LD_LIBRARY_PATH=:/opt/clAmdFft-1.8.291/lib64/
+ time ./a.out
Using device: Capeverde
AmdFFT_Z2Z size= 524288 time= 2.780000 msec
Everything went fine!
31.54user 249.49system 4:41.49elapsed 99%CPU (0avgtext+0avgdata 1334368maxresident)k
0inputs+45368outputs (0major+151072minor)pagefaults 0swaps
$
[/QUOTE]:smile:

kracker 2013-06-11 18:39

[QUOTE=msft;343077]Hi ,
I make Radion FFT Benchmark.
HD7750:
:smile:[/QUOTE]

:shock: Wow. At last someone...
If you need testers for anything I have two 7770's.

owftheevil 2013-06-11 19:33

Hi Oliver,

[QUOTE]two more whishes:[LIST=1][*]add "clean"-target to the Makefile[*]add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile[/LIST][/QUOTE]

Noted and added to the list. Thanks as always.

Carl

owftheevil 2013-06-11 19:36

Hi msft,

At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be?
In any event, those are interesting results. Do you plan on make a LL test out of this?

Carl

kracker 2013-06-11 19:36

[QUOTE=owftheevil;343087]Hi msft,

At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be?

Carl[/QUOTE]

a 650. OR 550. Not Ti

EDIT: Sorry, wasn't my question to answer, just realized that.

owftheevil 2013-06-11 19:45

Hi Kracker,

No problem, the question was for anyone with information. Thanks. The time seems much more reasonable now.


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.