mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-08-23 10:15

[QUOTE=preda;494513]About 2days. Depending on cooling/frequency, I get between 2.05 - 2.15 ms/it.[/QUOTE]


On RX580 it is 4 days and some hour, timing 3.50-4.00

preda 2018-08-23 10:58

[QUOTE=SELROC;494514]On RX580 it is 4 days and some hour, timing 3.50-4.00[/QUOTE]

Did you try FFT variants, e.g. passing "-fft +1" or "-fft +2" and see which is fastest.

Edit: Never mind, at wavefront the default FFT should be the fastest.

SELROC 2018-08-23 11:08

[QUOTE=preda;494516]Did you try FFT variants, e.g. passing "-fft +1" or "-fft +2" and see which is fastest.[/QUOTE]

Not yet. But it is coming.

I am currently fighting amdgpu errors, and an unfortunate thing, that the latest version of amdgpu driver (18.30) does not install on Debian.

I wrote to the amd community in the hope to get help,

[URL]https://community.amd.com/thread/206833[/URL]
(it is possible that my messages do not appear yet, they are being moderated)

And I have attempted to use ROCm with Ubuntu 18.04 but apparently they don't support my hardware.

SELROC 2018-08-23 13:42

[QUOTE=SELROC;494519]Not yet. But it is coming.[/QUOTE]


Without the -fft option:
[CODE]FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 17.94 bits/word
Note: using short carry kernels
Ellesmere-36x1360-@2:0.0 Radeon RX 580 Series
OpenCL compilation in 2989 ms, with "-DEXP=84674323u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
[2018-08-23 15:25:02 CEST] PRP M(84674323), FFT 4608K, 17.94 bits/word
OK loaded: 72354800/84674323, blockSize 400, c05550b6947ecc38
OK initial check: c05550b6947ecc38
OK 2018-08-23 15:25:13 0 72355600/84674323 [85.45%], 3.84 ms/it [3.83, 3.85]; ETA 0d 13:09; 361312417c909cfa (check 2.37s) (saved)

Stopping, please wait..
OK 2018-08-23 15:28:08 0 72400000/84674323 [85.50%], 3.90 ms/it [3.89, 3.97]; ETA 0d 13:17; 553db60311ce3007 (check 2.33s) (saved)

Bye
[/CODE]With -fft +1:

[CODE]gpuowl-OpenCL 3.6--mod
FFT 5120K: Width 1024 (256x4), Height 512 (64x8), Middle 5; 16.15 bits/word
Note: using short carry kernels
Ellesmere-36x1360-@2:0.0 Radeon RX 580 Series
OpenCL compilation in 1005 ms, with "-DEXP=84674323u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
[2018-08-23 15:28:48 CEST] PRP M(84674323), FFT 5120K, 16.15 bits/word
OK loaded: 72400000/84674323, blockSize 400, 553db60311ce3007
OK initial check: 553db60311ce3007
OK 2018-08-23 15:29:01 0 72400800/84674323 [85.50%], 4.29 ms/it [4.28, 4.30]; ETA 0d 14:38; 16610202fac94885 (check 2.56s) (saved)

Stopping, please wait..
OK 2018-08-23 15:29:07 0 72401600/84674323 [85.51%], 4.42 ms/it [4.31, 4.53]; ETA 0d 15:03; 64ae36294a2a5e46 (check 2.56s) (saved)

Bye
[/CODE]With -fft +2:
[CODE]gpuowl-OpenCL 3.6--mod
FFT 4096K: Width 4096 (512x8), Height 512 (64x8); 20.19 bits/word
FFT size too small for exponent (20.19 bits/word).
gpuowl-OpenCL 3.6--mod
FFT 5120K: Width 512 (64x8), Height 1024 (256x4), Middle 5; 16.15 bits/word
Note: using short carry kernels
Ellesmere-36x1360-@2:0.0 Radeon RX 580 Series
OpenCL compilation in 879 ms, with "-DEXP=84674323u -DWIDTH=512u -DSMALL_HEIGHT=1024u -DMIDDLE=5u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
[2018-08-23 15:30:15 CEST] PRP M(84674323), FFT 5120K, 16.15 bits/word
OK loaded: 72401600/84674323, blockSize 400, 64ae36294a2a5e46
OK initial check: 64ae36294a2a5e46
OK 2018-08-23 15:30:28 0 72402400/84674323 [85.51%], 4.71 ms/it [4.70, 4.72]; ETA 0d 16:04; 29cd7a69cd8234dd (check 2.72s) (saved)

Stopping, please wait..
OK 2018-08-23 15:31:58 0 72420800/84674323 [85.53%], 4.74 ms/it [4.73, 4.75]; ETA 0d 16:08; 6f0303e6cb8819b6 (check 2.71s) (saved)

Bye[/CODE]




With -fft -1 gpuowl aborts, bits/word > 20

preda 2018-08-24 00:56

Valerio, about TF not working on amdgpu-pro: I wouldn't worry that much, a bit more or less TF is unlikely to make any significant difference. In the meantime you can either use mfakto if you want to deepen the TF, or better just skip the TF altogether and jump right into PRP :)

SELROC 2018-08-24 06:51

[QUOTE=preda;494556]Valerio, about TF not working on amdgpu-pro: I wouldn't worry that much, a bit more or less TF is unlikely to make any significant difference. In the meantime you can either use mfakto if you want to deepen the TF, or better just skip the TF altogether and jump right into PRP :)[/QUOTE]


I am appreciating the speed of my gpus with mfakto. In a few days I returned something like 500 results, and found more than 10 factors.


On my cpu, dual-core hyperthreaded, it takes some hour to complete a trial factoring run, while mfakto takes approx. 13 minutes to complete a run.

xx005fs 2018-08-24 21:36

PRP speed
 
[QUOTE=preda;494513]About 2days. Depending on cooling/frequency, I get between 2.05 - 2.15 ms/it.[/QUOTE]

I use a utility called amdcovc on Linux to push the memory speed higher and thus achieving higher memory clock speed, which increased the performance on my Vega 56 quite a lot. From 2.15 on Vega 64 liquid BIOS stock to like 1.9 ms/it.

preda 2018-08-24 23:46

[QUOTE=xx005fs;494626]I use a utility called amdcovc on Linux to push the memory speed higher and thus achieving higher memory clock speed, which increased the performance on my Vega 56 quite a lot. From 2.15 on Vega 64 liquid BIOS stock to like 1.9 ms/it.[/QUOTE]

Nice! I'll try it out. One good thing about PRP is that it reliably detects if one pushes the overclock too far.

xx005fs 2018-08-25 01:28

[QUOTE=preda;494635]Nice! I'll try it out. One good thing about PRP is that it reliably detects if one pushes the overclock too far.[/QUOTE]

The problem about amdcovc is that there is currently no voltage control and thus I run GpuOwL on windows because using WattMan I could tune the efficiency of my Vega GPU as it is way more efficient on windows after tuning than running it on stock voltage on linux.

xx005fs 2018-08-25 06:18

GpuOwL wrong speed reporting???
 
I recently also realized that on my system when there is at least 1 thread of my CPU running at 100%, the vega GPU's report speed will increase, usually between 0.02 to 0.06 ms/it. For example, when I am running prime95 simultaneously with GpuOwL I realized that my speed increased from 2.00 ms/it to 1.94 ms/it. I know the difference is minute but does that have to do with the GPU architecture or is there just a bug in the software. Also, would this issue be resolved if I turn on some sort of system timer like HPET?

preda 2018-08-25 07:28

[QUOTE=xx005fs;494659]I recently also realized that on my system when there is at least 1 thread of my CPU running at 100%, the vega GPU's report speed will increase, usually between 0.02 to 0.06 ms/it. For example, when I am running prime95 simultaneously with GpuOwL I realized that my speed increased from 2.00 ms/it to 1.94 ms/it. I know the difference is minute but does that have to do with the GPU architecture or is there just a bug in the software. Also, would this issue be resolved if I turn on some sort of system timer like HPET?[/QUOTE]

Yes, I saw the same thing. IMO it looks like a problem in the GPU driver area. It may be that the transition from some CPU sleep state to active is slower when the CPU is not busy.


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.