![]() |
Or just multiply fft by 15 or so and pick a prime close to that.
But to get it distributed so that you can get results soon would be a problem. |
Another thought, actual exponents might not be necessary for just a timing benchmark.
|
Here's what it looks like so far. I've done a few trial runs with exponents near those specified in the table. Maximum errors are coming in around .25 and with all the error checking and checkpoint processing, iteration times are .1%-.7% higher in the actual test than in the benchmark results.
[CODE]filbert@filbert:~/Build/CudaLucas-2.052$ ./CUDALucas -cufftbench 3000 4000 5 -d 1 ------- DEVICE 1 ------- name GeForce GTX 560 Ti Compatibility 2.1 clockRate (MHz) 1900 memClockRate (MHz) 2080 totalGlobalMem 1073414144 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 8 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 CUFFT bench testing fft sizes 3000K to 4000K, doing 5 passes. Pass 1, fft size = 3000K, exp up to 55296000, ave time = 9.306 msec, max-ave = 0.00000 Pass 1, fft size = 3024K, exp up to 55738368, ave time = 8.174 msec, max-ave = 0.00000 . . . Pass 5, fft size = 3000K, exp up to 55296000, ave time = 9.307 msec, max-ave = 0.00132 Pass 5, fft size = 3024K, exp up to 55738368, ave time = 8.173 msec, max-ave = 0.00080 Pass 5, fft size = 3072K, exp up to 56623104, ave time = 8.443 msec, max-ave = 0.00111 Pass 5, fft size = 3087K, exp up to 56899584, ave time = 9.114 msec, max-ave = 0.00192 Pass 5, fft size = 3125K, exp up to 57600000, ave time = 10.053 msec, max-ave = 0.00158 Pass 5, fft size = 3136K, exp up to 57802752, ave time = 8.120 msec, max-ave = 0.00069 Pass 5, fft size = 3150K, exp up to 58060800, ave time = 9.222 msec, max-ave = 0.00037 Pass 5, fft size = 3200K, exp up to 58982400, ave time = 8.691 msec, max-ave = 0.00017 Pass 5, fft size = 3240K, exp up to 59719680, ave time = 9.396 msec, max-ave = 0.00124 Pass 5, fft size = 3360K, exp up to 61931520, ave time = 9.655 msec, max-ave = 0.00039 Pass 5, fft size = 3375K, exp up to 62208000, ave time = 10.542 msec, max-ave = 0.00272 Pass 5, fft size = 3402K, exp up to 62705664, ave time = 9.492 msec, max-ave = 0.00043 Pass 5, fft size = 3430K, exp up to 63221760, ave time = 10.350 msec, max-ave = 0.00118 Pass 5, fft size = 3456K, exp up to 63700992, ave time = 9.091 msec, max-ave = 0.00074 Pass 5, fft size = 3500K, exp up to 64512000, ave time = 10.426 msec, max-ave = 0.00123 Pass 5, fft size = 3528K, exp up to 65028096, ave time = 9.722 msec, max-ave = 0.00029 Pass 5, fft size = 3584K, exp up to 66060288, ave time = 9.315 msec, max-ave = 0.00096 Pass 5, fft size = 3600K, exp up to 66355200, ave time = 10.074 msec, max-ave = 0.00073 Pass 5, fft size = 3645K, exp up to 67184640, ave time = 11.112 msec, max-ave = 0.00154 Pass 5, fft size = 3675K, exp up to 67737600, ave time = 11.758 msec, max-ave = 0.00391 Pass 5, fft size = 3750K, exp up to 69120000, ave time = 12.185 msec, max-ave = 0.00084 Pass 5, fft size = 3780K, exp up to 69672960, ave time = 11.054 msec, max-ave = 0.00065 Pass 5, fft size = 3840K, exp up to 70778880, ave time = 11.030 msec, max-ave = 0.00142 Pass 5, fft size = 3888K, exp up to 71663616, ave time = 10.535 msec, max-ave = 0.00155 Pass 5, fft size = 3920K, exp up to 72253440, ave time = 10.931 msec, max-ave = 0.00047 Pass 5, fft size = 3969K, exp up to 73156608, ave time = 11.438 msec, max-ave = 0.00165 Pass 5, fft size = 4000K, exp up to 73728000, ave time = 11.430 msec, max-ave = 0.00124 filbert@filbert:~/Build/CudaLucas-2.052$ [/CODE] |
[QUOTE=owftheevil;342267]Here's what it looks like so far.
iteration times are .1%-.7% higher in the actual test than in the benchmark results.[/QUOTE]That's beautiful! 1% deviation from actual isn't a problem for my purposes, where I'm aggregating results from many people on many GPUs for approximate average performance for a given architecture. Now I'll just have to wait until this new benchmark mode is widely available (meaning, in many cases, Windows binaries). |
Hi Carl,
[QUOTE=owftheevil;340301]Here's version 0.13 with Olivers requested fflush(NULL) statements. Oliver, thanks for showing me that.[/QUOTE] two more whishes:[LIST=1][*]add "clean"-target to the Makefile[*]add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile[/LIST] Your memtest refuses to run on GTX 690 (GK104? GK10x?) when compiled with the default CUFLAGS (Linux, nvcc V0.2.1221 (CUDA 5.0), driver 319.23): [CODE]./memtest 10 10 0 Initializing test using 250MiB of memory on device 0 memtest.cu(187) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/CODE] It doesn't matter how much memory, iterations or which device I choose. When I add compute_30,sm_30 to the Makefile and recompile it runs fine here (GTX 690, TITAN, Tesla K20). As you can see I'm really using your tool! Oliver P.S. I'm remembering this one: [url]http://www.mersenneforum.org/showpost.php?p=197975&postcount=8:[/url] should Carls memtest get his own thread? |
1 Attachment(s)
Hi ,
I make Radion FFT Benchmark. HD7750: [QUOTE] $ sh -x ./run.sh + rm *.o a.out + g++ -c main.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/ + g++ -c clFFTPlans.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.8.291/include/ + g++ main.o clFFTPlans.o /opt/clAmdFft-1.8.291/lib64/libclAmdFft.Runtime.so -lOpenCL -lfftw3 + export LD_LIBRARY_PATH=:/opt/clAmdFft-1.8.291/lib64/ + time ./a.out Using device: Capeverde AmdFFT_Z2Z size= 524288 time= 2.780000 msec Everything went fine! 31.54user 249.49system 4:41.49elapsed 99%CPU (0avgtext+0avgdata 1334368maxresident)k 0inputs+45368outputs (0major+151072minor)pagefaults 0swaps $ [/QUOTE]:smile: |
[QUOTE=msft;343077]Hi ,
I make Radion FFT Benchmark. HD7750: :smile:[/QUOTE] :shock: Wow. At last someone... If you need testers for anything I have two 7770's. |
Hi Oliver,
[QUOTE]two more whishes:[LIST=1][*]add "clean"-target to the Makefile[*]add "--generate-code arch=compute_30,code=sm_30" to the CUFLAGS in the Makefile[/LIST][/QUOTE] Noted and added to the list. Thanks as always. Carl |
Hi msft,
At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be? In any event, those are interesting results. Do you plan on make a LL test out of this? Carl |
[QUOTE=owftheevil;343087]Hi msft,
At first glance that time seems a bit slow, but then I have no idea how strong or weak a card the 7750 is. What would a comparable Nvidia card be? Carl[/QUOTE] a 650. OR 550. Not Ti EDIT: Sorry, wasn't my question to answer, just realized that. |
Hi Kracker,
No problem, the question was for anyone with information. Thanks. The time seems much more reasonable now. |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.