mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2017-12-15 19:02

[QUOTE=M344587487;474076]Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.[/QUOTE]

Have you looked at TF performance benchmarks at
[url]http://www.mersenne.ca/mfaktc.php?sort=ghdpd[/url]

or LL at [url]http://www.mersenne.ca/cudalucas.php?sort=gflops[/url]

The 75M column is what's relevant to 4096K fft length

M344587487 2017-12-15 20:36

Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that [I]might[/I] do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.

preda 2017-12-16 23:07

[QUOTE=M344587487;474099]Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that [I]might[/I] do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.[/QUOTE]

One data point: my air Vega64 clocked at 1401MHz is pulling 145W while doing 1.63ms/it PRP 4M.

Surprisingly low power usage for Vega, clearly in the range of RX580.

kriesel 2017-12-17 14:33

[QUOTE=M344587487;474099]Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that [I]might[/I] do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.[/QUOTE]

You're welcome re the stats, and the real thanks go to James Heinrich for creating and maintaining those huge and searchable filterable pages of benchmarks.

My interest in AMD < 60W is less about max output per watt hour than having something on which to test and run OpenCL software, attached to systems via 1x/16x PCIe extenders and mounted externally, and to safely exceed the case space and cooling limits. I have a pretty NVIDIA/Intel centric fleet here. Some of the software seems to be GPU specific within OpenCl. (Mfakto as I recall.) OpenCl on IGPs works in some cases and not others, and can have negative effect on total system throughput. Quadro 2000's are getting scarce, so something like the 50W RX550 is an alternative at about the same speed as the Quadro 2000. So I have an RX 550 on order. Next whole system I bring up, might have a GTX1080, or might have a Vega, inside the box, tbd.

Mfaktco ini file excerpt:
# Different GPUs may have their best performance with different kernels
# Here, you can give a hint to mfakto on how to optimize the kernels.
#
# Possible values:
# GPUType=AUTO try to auto-detect, if that does not work: let me know
# GPUType=GCN Tahiti et al. (HD77xx-HD79xx), also assumed for unknown devices.
# GPUType=VLIW4 Cayman (HD69xx)
# GPUType=VLIW5 most other AMD GPUs (HD4xxx, HD5xxx, HD62xx-HD68xx)
# GPUType=APU all APUs (C-30 - C-60, E-240 - E-450, A2-3200 - A8-3870K) not sure if the "small" APUs would work better as VLIW5.
# GPUType=CPU all CPUs (when GPU not found, or forced to CPU)
# GPUType=NVIDIA reserved for Nvidia-OpenCL. Currently mapped to "CPU" and not yet functional on Nvidia Hardware.
# GPUType=INTEL reserved for Intel-OpenCL (e.g. HD4000). Not yet functional.
#
# Default: GPUType=AUTO

moebius 2017-12-19 22:21

Following Error occured at me! Anyone have a idea to fix this?


C:\Users\name\Desktop\gpuowl-v1.9-94aa58f>gpuowl
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz


OpenCL compilation in 452 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DE
XP=8171XXXXu -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 "
Note: using long carry kernels
PRP-3: FFT 8M (2048 * 2048 * 2) of 8171XXXX (9.74 bits/word)
Starting at iteration 0
error -5
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 234

Expression: check(clEnqueueReadBuffer(queue, buf, blocking, start, size, data, 0
, __null, __null))

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

preda 2017-12-21 15:58

[QUOTE=moebius;474406]Following Error occured at me! Anyone have a idea to fix this?

C:\Users\name\Desktop\gpuowl-v1.9-94aa58f>gpuowl
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz


OpenCL compilation in 452 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DE
XP=8171XXXXu -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 "
Note: using long carry kernels
PRP-3: FFT 8M (2048 * 2048 * 2) of 8171XXXX (9.74 bits/word)
Starting at iteration 0
error -5
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 234

Expression: check(clEnqueueReadBuffer(queue, buf, blocking, start, size, data, 0
, __null, __null))
[/QUOTE]

Sorry I don't know why this happens. The error code -5 is CL_OUT_OF_RESSOURCES, but why get that on clEnqueueReadBuffer I don't know.

Would you try with a lower exponent around 77M, to see if you get the same?

moebius 2017-12-21 22:14

smilar error, but not the same error output....

gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz

OpenCL compilation in 1684 ms, with "-I. -cl-fast-relaxed-math -cl-s
EXP=60000877u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=
PRP-3: FFT 4M (1024 * 2048 * 2) of 60000877 (14.31 bits/word)
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &
roupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

preda 2017-12-22 05:07

[QUOTE=moebius;474541]smilar error, but not the same error output....
[/QUOTE]

It seems it's not working on Nvidia for some reason. Maybe too many VGPRs requested by the compiler, or too much shared memory, or who knows what. I may look into this myself when I get an Nvidia GPU to test on. In the meantime you can use cudaLucas.

moebius 2017-12-23 21:12

[QUOTE=preda;474565. I may look into this myself when I get an Nvidia GPU to test on. In the meantime you can use cudaLucas.[/QUOTE]




Ich denke ebenso, das das man einfach nur die systemnahen Befehle anpassen muss.
Der Erfolg von PRP in der Zukunft, ist abhängig davon,
ob sämtliche Grakikkarten-Typen unterstützt werden.

I also think that you just have to adjust the system-related commands.
The success of PRP in the future depends on the grade of supporting all types of Grafik cards.

xx005fs 2018-01-18 02:12

[QUOTE=preda;471917]Some performance data that I see on my hardware, at 4M FFT (adequate for the current wavefront around 76M).

This is with ROCm 1.6-180. ROCm seems to generate better optimized code compared to AMDGPU-pro, so in general better performance. All hardware is standard, air-cooled, nothing changed.

Vega64: 1.63 ms/it (under-clocked to 1401MHz for thermal reasons)
FuryX : 1.89 ms/it
R9-Nano: 2.05 ms/it (the card downcloks itself for thermal reasons)
390x: 2.17 ms/it

Broadly speaking this comes out to a bit under 2days per exponent.[/QUOTE]

How do you get such high performance numbers? I can only get maximum 3.2 ms/it on my vega even overclocked to 1800MHz. Do I have to change the FFT size? and how?

preda 2018-01-19 12:41

[QUOTE=xx005fs;477799]How do you get such high performance numbers? I can only get maximum 3.2 ms/it on my vega even overclocked to 1800MHz. Do I have to change the FFT size? and how?[/QUOTE]

Those numbers were with 4M FFT, as stated. 8M FFT is not perf-tuned yet.


All times are UTC. The time now is 22:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.