mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   LL with OpenCL (https://www.mersenneforum.org/showthread.php?t=18297)

msft 2014-03-18 05:05

Hi,
>clFFT problems?
Yes.
>What does OOURA code? Only checks rounding errors?
Translation from Complex FFT to HalfComplex FFT.

AK76 2014-10-25 20:41

I have calculated my first LLR test on ATI gpu:

M( 64847711 )C, 0xffffffff80000000, n = 131072, clLucas v1.02

My card is Gigabyte R9 290.

Test was 5h 30m long.

Batalov 2014-10-25 20:51

[QUOTE=AK76;386081]M( 64847711 )C, [COLOR=Red]0xffffffff80000000[/COLOR], n = 131072, clLucas v1.02
[/QUOTE]
This is not a valid result, though. I may want to tune down OC on your card, check for dust etc etc. The usual things.

Bdot 2014-10-25 20:54

More likely, the FFT size is way too small. Let clLucas autoselect the FFT size, then use the next bigger power of two.

It is a fast card, but 5 hours can't be real (unfortunately).

owftheevil 2014-10-25 20:58

The fft is too small. For an exponent of that size, you should be using n=4194304.

clLucas still only works with 2^k ffts, right?

kracker 2014-10-25 22:00

[QUOTE=owftheevil;386087]The fft is too small. For an exponent of that size, you should be using n=4194304.

clLucas still only works with 2^k ffts, right?[/QUOTE]

Well, yes and no. Non 2^k FFT sizes work, but are slow.

EDIT: I'm surprised clLucas didn't stop right away...

AK76 2014-10-26 10:19

I do not set -f parameter myself. clLucas set this number. I do the same test one more time. My ATI is not overclocked.

Bdot 2014-10-26 11:28

For me, it looks like this (HD7950 OC'd to the max (1100/1400) for this test):

[code]
C:\projects\clLucas>clLucas_x64.exe -aggressive 64847711
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti

Build Options are : -D KHR_DP_EXTENSION

start M64847711 fft length = 3276800
err = 0.5, increasing n from 3276800

start M64847711 fft length = 3538944
Iteration 10000 0xf063d79e2bcf83d7, n = 3538944 err = 0.2813 (2:50 real, 16.9916 ms/iter, ETA 305:59:27)
^C caught. Writing checkpoint.

C:\projects\clLucas>rm c64847711

C:\projects\clLucas>clLucas_x64.exe -f 4194304 -aggressive 64847711
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti

Build Options are : -D KHR_DP_EXTENSION

start M64847711 fft length = 4194304
Iteration 10000 0xf063d79e2bcf83d7, n = 4194304 err = 0.005371 (1:04 real, 6.3979 ms/iter, ETA 115:12:58)
Iteration 20000 0x929ed99e32ae2e48, n = 4194304 err = 0.005371 (1:04 real, 6.3386 ms/iter, ETA 114:07:48)
Iteration 30000 0x20c4df779543306b, n = 4194304 err = 0.005371 (1:03 real, 6.3383 ms/iter, ETA 114:06:26)
^C caught. Writing checkpoint.
[/code]Without "-aggressive" it is a bit slower, but the FFT selection is the same. AK76, did you maybe experiment with the -f parameter and this exponent at some time so that there were checkpoint files containing the bad FFT size?

AK76 2014-10-26 13:23

I set FFT manually at 4194304 ande result is:


clLucas_x64 -f 4194304 64847711

Iteration 100000 0x8bc61577dc9b0813, n = 4194304 err = 0.004395 (1:06 real, 6.6892 ms/iter, ETA 120:17:37)

====================================

When run on automatic settings:


clLucas_x64 64847711

Iteration 20000 0x929ed99e32ae2e48, n = 3538944 err = 0.2344 (4:05 real, 24.5069 ms/iter, ETA 441:15:37)


The different is biiiig.

kracker 2014-10-26 14:39

[QUOTE=AK76;386141]I do not set -f parameter myself. clLucas set this number. I do the same test one more time. My ATI is not overclocked.[/QUOTE]

Very strange... :loco:

Bdot 2014-10-26 16:39

[QUOTE=AK76;386147]I set FFT manually at 4194304 ande result is:


clLucas_x64 -f 4194304 64847711

Iteration 100000 0x8bc61577dc9b0813, n = 4194304 err = 0.004395 (1:06 real, 6.6892 ms/iter, ETA 120:17:37)

====================================

When run on automatic settings:


clLucas_x64 64847711

Iteration 20000 0x929ed99e32ae2e48, n = 3538944 err = 0.2344 (4:05 real, 24.5069 ms/iter, ETA 441:15:37)


The different is biiiig.[/QUOTE]

Yes, the difference is very big, but it compares well to my test results. You can try to set "-aggressive" to improve the timings, but apart from that, "-f 4194304" is the best thing to do. With this FFT size, you can even run tests with slightly bigger exponents as [URL="http://www.mersenneforum.org/showpost.php?p=366726&postcount=294"]LaurV explained[/URL] very nicely.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.