mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2017-12-15, 19:02   #243
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

542110 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.
Have you looked at TF performance benchmarks at
http://www.mersenne.ca/mfaktc.php?sort=ghdpd

or LL at http://www.mersenne.ca/cudalucas.php?sort=gflops

The 75M column is what's relevant to 4096K fft length
kriesel is online now   Reply With Quote
Old 2017-12-15, 20:36   #244
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×52×11 Posts
Default

Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that might do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.
M344587487 is offline   Reply With Quote
Old 2017-12-16, 23:07   #245
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that might do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.
One data point: my air Vega64 clocked at 1401MHz is pulling 145W while doing 1.63ms/it PRP 4M.

Surprisingly low power usage for Vega, clearly in the range of RX580.
preda is offline   Reply With Quote
Old 2017-12-17, 14:33   #246
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152D16 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Thanks for the mountain of stats, what I was really after was a comparison between Vega 56/64 and 1080/TI in their best-case prime hunting, so I should have picked CUDALucas as nvidia's best-case straight away.

I don't know the best OpenCL card to fit a 60W limit, but I know that in the context of Monero mining the minimum efficient wattages are ~100W for the RX570, RX580 and Vega56, so for AMD's best offering that might do the job you're probably looking at the RX560 at best. One of AMD's GPU weaknesses tend to be power consumption, so you might find that with such a low power limit an nvidia card might be preferable, even if your workload is OpenCL.
You're welcome re the stats, and the real thanks go to James Heinrich for creating and maintaining those huge and searchable filterable pages of benchmarks.

My interest in AMD < 60W is less about max output per watt hour than having something on which to test and run OpenCL software, attached to systems via 1x/16x PCIe extenders and mounted externally, and to safely exceed the case space and cooling limits. I have a pretty NVIDIA/Intel centric fleet here. Some of the software seems to be GPU specific within OpenCl. (Mfakto as I recall.) OpenCl on IGPs works in some cases and not others, and can have negative effect on total system throughput. Quadro 2000's are getting scarce, so something like the 50W RX550 is an alternative at about the same speed as the Quadro 2000. So I have an RX 550 on order. Next whole system I bring up, might have a GTX1080, or might have a Vega, inside the box, tbd.

Mfaktco ini file excerpt:
# Different GPUs may have their best performance with different kernels
# Here, you can give a hint to mfakto on how to optimize the kernels.
#
# Possible values:
# GPUType=AUTO try to auto-detect, if that does not work: let me know
# GPUType=GCN Tahiti et al. (HD77xx-HD79xx), also assumed for unknown devices.
# GPUType=VLIW4 Cayman (HD69xx)
# GPUType=VLIW5 most other AMD GPUs (HD4xxx, HD5xxx, HD62xx-HD68xx)
# GPUType=APU all APUs (C-30 - C-60, E-240 - E-450, A2-3200 - A8-3870K) not sure if the "small" APUs would work better as VLIW5.
# GPUType=CPU all CPUs (when GPU not found, or forced to CPU)
# GPUType=NVIDIA reserved for Nvidia-OpenCL. Currently mapped to "CPU" and not yet functional on Nvidia Hardware.
# GPUType=INTEL reserved for Intel-OpenCL (e.g. HD4000). Not yet functional.
#
# Default: GPUType=AUTO

Last fiddled with by kriesel on 2017-12-17 at 14:47
kriesel is online now   Reply With Quote
Old 2017-12-19, 22:21   #247
moebius
 
moebius's Avatar
 
Jul 2009
Germany

607 Posts
Unhappy

Following Error occured at me! Anyone have a idea to fix this?


C:\Users\name\Desktop\gpuowl-v1.9-94aa58f>gpuowl
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz


OpenCL compilation in 452 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DE
XP=8171XXXXu -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 "
Note: using long carry kernels
PRP-3: FFT 8M (2048 * 2048 * 2) of 8171XXXX (9.74 bits/word)
Starting at iteration 0
error -5
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 234

Expression: check(clEnqueueReadBuffer(queue, buf, blocking, start, size, data, 0
, __null, __null))

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
moebius is offline   Reply With Quote
Old 2017-12-21, 15:58   #248
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

137110 Posts
Default

Quote:
Originally Posted by moebius View Post
Following Error occured at me! Anyone have a idea to fix this?

C:\Users\name\Desktop\gpuowl-v1.9-94aa58f>gpuowl
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz


OpenCL compilation in 452 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DE
XP=8171XXXXu -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 "
Note: using long carry kernels
PRP-3: FFT 8M (2048 * 2048 * 2) of 8171XXXX (9.74 bits/word)
Starting at iteration 0
error -5
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 234

Expression: check(clEnqueueReadBuffer(queue, buf, blocking, start, size, data, 0
, __null, __null))
Sorry I don't know why this happens. The error code -5 is CL_OUT_OF_RESSOURCES, but why get that on clEnqueueReadBuffer I don't know.

Would you try with a lower exponent around 77M, to see if you get the same?
preda is offline   Reply With Quote
Old 2017-12-21, 22:14   #249
moebius
 
moebius's Avatar
 
Jul 2009
Germany

10010111112 Posts
Default

smilar error, but not the same error output....

gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 560 Ti, 8x1800MHz

OpenCL compilation in 1684 ms, with "-I. -cl-fast-relaxed-math -cl-s
EXP=60000877u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=
PRP-3: FFT 4M (1024 * 2048 * 2) of 60000877 (14.31 bits/word)
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: C:\Users\name\Desktop\gpuowl-v1.9-94aa58f\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &
roupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
moebius is offline   Reply With Quote
Old 2017-12-22, 05:07   #250
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by moebius View Post
smilar error, but not the same error output....
It seems it's not working on Nvidia for some reason. Maybe too many VGPRs requested by the compiler, or too much shared memory, or who knows what. I may look into this myself when I get an Nvidia GPU to test on. In the meantime you can use cudaLucas.
preda is offline   Reply With Quote
Old 2017-12-23, 21:12   #251
moebius
 
moebius's Avatar
 
Jul 2009
Germany

607 Posts
Thumbs up

[QUOTE=preda;474565. I may look into this myself when I get an Nvidia GPU to test on. In the meantime you can use cudaLucas.[/QUOTE]




Ich denke ebenso, das das man einfach nur die systemnahen Befehle anpassen muss.
Der Erfolg von PRP in der Zukunft, ist abhängig davon,
ob sämtliche Grakikkarten-Typen unterstützt werden.

I also think that you just have to adjust the system-related commands.
The success of PRP in the future depends on the grade of supporting all types of Grafik cards.
moebius is offline   Reply With Quote
Old 2018-01-18, 02:12   #252
xx005fs
 
"Eric"
Jan 2018
USA

3248 Posts
Default

Quote:
Originally Posted by preda View Post
Some performance data that I see on my hardware, at 4M FFT (adequate for the current wavefront around 76M).

This is with ROCm 1.6-180. ROCm seems to generate better optimized code compared to AMDGPU-pro, so in general better performance. All hardware is standard, air-cooled, nothing changed.

Vega64: 1.63 ms/it (under-clocked to 1401MHz for thermal reasons)
FuryX : 1.89 ms/it
R9-Nano: 2.05 ms/it (the card downcloks itself for thermal reasons)
390x: 2.17 ms/it

Broadly speaking this comes out to a bit under 2days per exponent.
How do you get such high performance numbers? I can only get maximum 3.2 ms/it on my vega even overclocked to 1800MHz. Do I have to change the FFT size? and how?
xx005fs is offline   Reply With Quote
Old 2018-01-19, 12:41   #253
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by xx005fs View Post
How do you get such high performance numbers? I can only get maximum 3.2 ms/it on my vega even overclocked to 1800MHz. Do I have to change the FFT size? and how?
Those numbers were with 4M FFT, as stated. 8M FFT is not perf-tuned yet.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 16:57.


Mon Aug 2 16:57:09 UTC 2021 up 10 days, 11:26, 0 users, load averages: 2.35, 2.35, 2.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.