mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

UBR47K 2015-08-26 12:07

I am getting this error in arch linux:
[code]device_number >= device_count ... exiting
(This is probably a driver problem)
[/code]

I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled.
Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail.

kladner 2015-09-02 18:44

Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?

EDIT: I have deleted the checkpoint files between tests.

frmky 2015-09-02 23:14

[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?.[/QUOTE]

For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you.

frmky 2015-09-02 23:22

[QUOTE=UBR47K;408850]I am getting this error in arch linux:
[code]device_number >= device_count ... exiting
(This is probably a driver problem)
[/code]

I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled.
Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail.[/QUOTE]

Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active.

kladner 2015-09-03 03:55

[QUOTE=frmky;409465]For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you.[/QUOTE]

1890K and 2000K don't show up in 'GeForce GTX 580 fft.txt', but I'll plug them in and see what comes out.

I think I ran CUFFTbench such that it limited the output in some way. It might be good to rerun it unfiltered.
Here's that immediate region of the current fft.txt:
[CODE] 1600 30232693 2.6039
1728 32597297 2.7957
1792 33778141 2.8987
2048 38492887 2.9924
2304 43194913 3.7102
2592 48471289 3.9732
2880 53735041 4.7980
[/CODE]Thanks for the suggestions! :smile:

EDIT: Here are the results for the 3 FFTs:
[QUOTE]M347xxxxx 5000 0xfd897327f15981c1 | 1890K 0.14111 3.5447 17.72s | 1:10:11:45 0.01%
M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 1890K 0.14258 3.5466 17.73s | 1:10:12:00 0.02%

M347xxxxx 5000 0xfd897327f15981c1 | 2000K 0.03125 3.4163 17.08s | 1:08:57:26 0.01%
M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 2000K 0.03027 3.4354 17.17s | 1:09:02:41 0.02%

M34733759 5000 0xfd897327f15981c1 | 2048K 0.02051 3.0706 15.35s | 1:05:37:19 0.01%
M34733759 10000 0xe66eeb94b6e3a4e9 | 2048K 0.02100 3.0900 15.45s | 1:05:42:40 0.02%[/QUOTE]These are pretty much in keeping with FFT.txt. 1792K is the next lowest FFT with shorter times. I guess the [STRIKE]0.05[/STRIKE] 0.02 error rate is normal in this range.

LaurV 2015-09-03 07:30

[QUOTE=kladner;409473]I think I ran CUFFTbench such that it limited the output in some way.[/QUOTE]
"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card) than a higher FFT which remained. So, between 33M77 and 38M49 exponents, you will be faster using the 2M FFT (i.e 2048K in your file). The 1792K is too small, and any intermediary values will be slower than 2048K. When you do the "tuning", a list with all is created, then is parsed from the end, and any line which has a longer time than a line already parsed (i.e. a shorter FFT which takes longer than a longer FFT) is eliminated. There is no mystery in it.

Remark that a smaller FFT does not always mean a faster iteration time. This depends of how "smooth" the FFT value is, i.e. how [B][U]your card*[/U][/B] can split it in small pieces and combine those [URL="https://en.wikipedia.org/wiki/Butterfly_diagram"]butterflies[/URL] ([URL="https://www.google.co.th/search?q=fft+butterfly&tbm=isch"]yaaarrr[/URL]!). That is why "power of two" is usually faster than the neighbors, because fast multiplication is kinda "[URL="https://en.wikipedia.org/wiki/Divide_and_conquer_algorithms"]divide et impera[/URL]", it splits the stuff in two, solve the halves, put the results together (well, kind of...) so when you split in two something which is not a multiple of (power of, when multiple splits) two, then a chunk is bigger than the other, and you have to do more splits and more work to put it together at the end.

----
* it depends of how many threads can the card do in the same time, memory for each, if it multiplies on 24 or on 32 bits, etc. - that is why you have to TUNE the card when you start using cudaLucas on it.

------------
Edit: yes, an error between 0.0006 and 0.24 is perfect. As higher as better (you will have a faster iteration time if you select a lower FFT, from that file, but the error is bigger). Some go to "as high as 0.35" but this is risky if you do not check the rounding error at every iteration.

kladner 2015-09-03 08:01

Thanks, LaurV. It has been a while since I messed with this stuff. I actually pulled a full list, and saw exactly what you are saying: the ones that don't show aren't worth looking at.

UBR47K 2015-09-03 12:37

[QUOTE=frmky;409466]Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active.[/QUOTE]

I found out that it works after I rebooted my computer. Apparently I had installed a new driver update and I wasn't aware of it (piled updates for 1 month).

frmky 2015-09-03 21:02

[QUOTE=LaurV;409482]"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card)[/QUOTE]
And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark.

LaurV 2015-09-04 12:59

[QUOTE=frmky;409524]And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark.[/QUOTE]right! :tu:

Xyzzy 2015-09-14 13:46

[QUOTE=Oddball;223390]Fighting off a pit bull is quite hard, but you'll probably make it out alive. Now if you had to fight off a whole pack of wolves instead...[/QUOTE][URL]http://www.cnn.com/2015/09/12/us/new-york-pit-bull-attacks/[/URL]


All times are UTC. The time now is 23:03.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.