mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-06-04, 15:33   #1233
SELROC
 

25·43 Posts
Default

Quote:
Originally Posted by kriesel View Post
I read through the commit listings back to mid January, and saw Preda had acknowledged there numerous contributions made by several individuals. A crude summary follows
Code:
valeriob01  -w argument; readme.md work; description of cmd line arguments 
            & updates, display of parameters; primenet.py date & time;
            makefile fix
 
k3ack3r     fix some msys2 warnings; update makefile

chengsun    fix alignment violation causing OUT_OF_RESOURCES error on NVIDIA
            GPUs
 
sillygitter add -iters argument

gwoltman    allow making small test kernels; new X2 definition; fft8 cleanup +
            documentation; new sq macro; overhaul/comment fft5/fft10 macros;
            improved pairSq and pairMul; faster 6m fft using new fft12 middle;
            new 5.5m fft using new fft11 middle; increased precision of fft11 
            constants; inline X2; fft7 middle step; shorter multiply chains in
            middle
Thanks to you all!

Thanks, though the readme.md work is incomplete. The argument listing has been growing and I haven't had time to follow.
  Reply With Quote
Old 2019-06-05, 09:33   #1234
SELROC
 

52·17·23 Posts
Default

Quote:
Originally Posted by kriesel View Post
Latest makefile seems to get the strip right on Windows, requires specifying the target as gpuowl-win.exe.
Code:
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.5-75-g4902439-dirty"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17   -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17   -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17   -c -o Worktodo.o Worktodo.cpp
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17   -c -o common.o common.cpp
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17   -c -o main.o main.cpp
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17   -c -o Gpu.o Gpu.cpp
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17   -c -o clwrap.o clwrap.cpp
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17   -c -o Task.o Task.cpp
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17   -c -o checkpoint.o checkpoint.cpp
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17   -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17   -c -o Args.o Args.cpp
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17   -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17   -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17   -c -o FFTConfig.o FFTConfig.cpp
g++ -MT clpp.o -MMD -MP -MF .d/clpp.Td -Wall -O2 -std=c++17   -c -o clpp.o clpp.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17   -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o clpp.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe
What does it mean that it's labeled dirty? Perhaps that the conversion to u32 is not complete?

Code:
>gpuowl-win -prp 3321928097
2019-06-03 14:22:00 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:22:00 Exception St12out_of_range: stol
2019-06-03 14:22:00 Bye

>gpuowl-win -prp 2147483659
2019-06-03 14:28:16 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:28:17 Exception St12out_of_range: stol
2019-06-03 14:28:17 Bye

>gpuowl-win -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:29:52 Note: no config.txt file found
2019-06-03 14:29:52 config: -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-06-03 14:29:52 using long carry kernels
2019-06-03 14:30:00 OpenCL args "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b745787f2c4cp-3 -DIWEIGHT_STEP=0x9.550d2c9e8
37e8p-4 -DWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-3 -DIWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-4 -DFMA_X2=1 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 14:30:04 OpenCL compilation in 4704 ms
2019-06-03 14:30:28 2147483647.owl not found, starting from the beginning.
2019-06-03 14:42:03 2147483647 OK     2000  0.00%; 162.835 ms/sq; ETA 4047d 06:30; fb12c8169932aa03 (check 172.72s)
(Above was on an RX480.
2147483647 < 231 < 2147483659;
log10(23321928097-1) > 109)

The -dirty version tag means you have local modifications. You can drop them with "git stash" before upgrading.
  Reply With Quote
Old 2019-06-05, 13:49   #1235
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

After a recent commit, it seems it's now fine to upgrade to ROCm 2.5 (which will be released soon). I.e. the performance degradation that appeared in ROCm 2.3 has been worked-around in GpuOwl.
preda is offline   Reply With Quote
Old 2019-06-06, 15:36   #1236
SELROC
 

1100100010112 Posts
Default

Quote:
Originally Posted by preda View Post
After a recent commit, it seems it's now fine to upgrade to ROCm 2.5 (which will be released soon). I.e. the performance degradation that appeared in ROCm 2.3 has been worked-around in GpuOwl.

332299993 is not prime, with ROCm 2.4:


Code:
2019-06-06 17:13:37 RadeonVII 332299993    332290000 100.00%; 3447 us/sq; ETA 0d 00:01; f6bea554d5dd44f0
2019-06-06 17:14:12 RadeonVII CC 332299992 / 332299993, e7ad0dddd78cd94c
2019-06-06 17:14:14 RadeonVII 332299993 OK 332300000 100.00%; 3500 us/sq; ETA 0d 00:00; 5254b6ede6bf9ca1 (check 1.86s)

https://www.mersenne.org/report_expo...2299993&full=1

Last fiddled with by SELROC on 2019-06-06 at 15:37
  Reply With Quote
Old 2019-06-17, 13:31   #1237
mnd9
 
Jun 2019
Boston, MA

3×13 Posts
Default

I'm a bit of a GPU computing newbie, but was hoping to get some use out of a VERY old GPU I had laying around -- it's an ATI Radeon 4650 HD.

I was able to get mfakto up and running and generate about 25-30 GHz-day/day output but it's slowing my Prime95 output a bit, so on pause for now.

I was wondering if there is any way to get gpuOwL running and attempt a PRP test on this old card? It seems to support OpenCL 1.1 so not sure if this meets the minimum specs.

Any help would be greatly appreciated!
mnd9 is offline   Reply With Quote
Old 2019-06-17, 15:13   #1238
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

113228 Posts
Default

Quote:
Originally Posted by mnd9 View Post
I'm a bit of a GPU computing newbie, but was hoping to get some use out of a VERY old GPU I had laying around -- it's an ATI Radeon 4650 HD.

I was able to get mfakto up and running and generate about 25-30 GHz-day/day output but it's slowing my Prime95 output a bit, so on pause for now.

I was wondering if there is any way to get gpuOwL running and attempt a PRP test on this old card? It seems to support OpenCL 1.1 so not sure if this meets the minimum specs.

Any help would be greatly appreciated!
IIRC, the minimum required for gpuOwL was OpenCL 1.2.
ET_ is offline   Reply With Quote
Old 2019-06-17, 16:06   #1239
mnd9
 
Jun 2019
Boston, MA

3×13 Posts
Default

Quote:
Originally Posted by ET_ View Post
IIRC, the minimum required for gpuOwL was OpenCL 1.2.
I see -- is there another software package to implement LL/PRP on AMD GPUs or am I only able to do TF work on this card?
mnd9 is offline   Reply With Quote
Old 2019-06-17, 20:51   #1240
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152B16 Posts
Default

Quote:
Originally Posted by mnd9 View Post
I see -- is there another software package to implement LL/PRP on AMD GPUs or am I only able to do TF work on this card?
See http://www.mersenneforum.org/showpos...91&postcount=2
https://www.mersenneforum.org/showthread.php?t=23401
and https://www.mersenneforum.org/showpo...35&postcount=2

You could try cllucas, but your gpu is old and slow, and cllucas is about half the speed of gpuowl for the same hardware and parameters, and lacks the Gerbicz error check or Jacobi LL check. The same 86M primality test that would take 3.8 days on an RX480 with gpuowl may (if it successfully runs) take around ~4.5 months on your gpu.
I never did get an answer to https://mersenneforum.org/showpost.p...&postcount=425 re what OpenCl level cllucas requires.

Maybe this is the justification you're looking for to upgrade your gpu. Either way, try turning your mfakto.ini gpu sieving parameters.

Welcome to the hunt.

Last fiddled with by kriesel on 2019-06-17 at 20:54
kriesel is offline   Reply With Quote
Old 2019-06-18, 13:49   #1241
mnd9
 
Jun 2019
Boston, MA

3×13 Posts
Default

Quote:
Originally Posted by kriesel View Post
See http://www.mersenneforum.org/showpos...91&postcount=2
https://www.mersenneforum.org/showthread.php?t=23401
and https://www.mersenneforum.org/showpo...35&postcount=2

You could try cllucas, but your gpu is old and slow, and cllucas is about half the speed of gpuowl for the same hardware and parameters, and lacks the Gerbicz error check or Jacobi LL check. The same 86M primality test that would take 3.8 days on an RX480 with gpuowl may (if it successfully runs) take around ~4.5 months on your gpu.
I never did get an answer to https://mersenneforum.org/showpost.p...&postcount=425 re what OpenCl level cllucas requires.

Maybe this is the justification you're looking for to upgrade your gpu. Either way, try turning your mfakto.ini gpu sieving parameters.

Welcome to the hunt.
Thanks so much for your detailed response -- I may put GPU computing on pause for now until I get a better card, as you suggested, since it's hurting my Prime95 throughput for not much gain.

On a somewhat unrelated topic, I recently discovered a very basic mistake I had made when building my PC years back that literally doubled my prime95 output when I recentlycorrected it. I had erroneously installed my RAM DIMMs in single channel mode by placing them in adjacent slots. When I moved to sticks in slots #1 and #3, my mobo instantly switched to dual channel mode, and my ms/iter literally halved!! Apparently memory bandwidth was really bottle-necking my throughput.

Does anyone have a sense of whether adding additional RAM always contributes to Prime95 throughput? E.g. I currently have 2 x 8GB DDR3 memory @ 1333 mHz -- would adding another 2 sticks help? Is it safe to assume memory bandwidth is always limiting?

Is there an easy way to test whether my current limiting factor for throughput is CPU or memory bandwidth?
mnd9 is offline   Reply With Quote
Old 2019-06-18, 14:12   #1242
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

11001110012 Posts
Default

Quote:
Originally Posted by mnd9 View Post
...

Does anyone have a sense of whether adding additional RAM always contributes to Prime95 throughput? E.g. I currently have 2 x 8GB DDR3 memory @ 1333 mHz -- would adding another 2 sticks help? Is it safe to assume memory bandwidth is always limiting?

Is there an easy way to test whether my current limiting factor for throughput is CPU or memory bandwidth?

Adding more RAM will not help in a meaningful way (maybe single digit improvements if anything), swapping to faster RAM like 1600 should. The thing that doubled your speed was going from single channel to dual channel, desktop motherboards top out at dual channel. HEDT platforms range from triple to hex channel, server platforms go from quad to eight channel, intel has a 12 channel but it costs more than my car and is more like dual hex channel anyway. Memory bandwidth isn't always limiting but for any modern intel desktop quad core or better it's probably pushing the limits of whatever speeds a motherboard of that era supports.
M344587487 is online now   Reply With Quote
Old 2019-06-18, 14:29   #1243
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by mnd9 View Post
Thanks so much for your detailed response -- I may put GPU computing on pause for now until I get a better card, as you suggested, since it's hurting my Prime95 throughput for not much gain.
mfakto on a discrete gpu, and configured for gpu sieving, should have very little effect on prime95 throughput. Check whether your mfakto.ini is configured to use CPU sieving instead of GPU sieving of trial factor candidates. Also check whether all your cooling fans are working and system ventilation is adequate.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:32.


Sun Aug 1 20:32:02 UTC 2021 up 9 days, 15:01, 0 users, load averages: 2.28, 2.26, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.