mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-06-22, 19:30   #2355
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

9,791 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
gpuOwl is all video card based, IIRC.
That is the idea, but JanS, please see my note in the OP of the neighboring how-to-for-linux thread about running the program with -maxAlloc set based on your card's (usable) memory capacity and #jobs you are running on the card. On my R7 with 16GB, running sans the above flag led to out-of-memory occurrences when both jobs happened to be running the memory-hungry stage 2 of p-1 at the same time. That led the 'card OS' to swap to RAM, leading to a massive, order-of-100x, slowdown.

Mike/Xyzzy has reported that this can happen even on cards running just 1 instance, if e.g. the card is being used for driving video on the user's system, in which case the available HBM may be less than advertised, possibly significantly - Mike reported he needed to cut the HBM allocation in -maxAlloc to something like 1GB less than his card's memory.

I suggest using -maxAlloc to make sure the set of instances running on a given card use no more 90% of card memory, in total.

Last fiddled with by ewmayer on 2020-06-22 at 19:33
ewmayer is offline   Reply With Quote
Old 2020-06-22, 22:53   #2356
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·577 Posts
Default

Quote:
Originally Posted by Jan S View Post
When i use GPUowl for P-1 factoring, this process uses for stage 2 RAM or videoRAM?
Vram, except for the gcd & misc. activity, under normal circumstances. On-gpu bandwidth is much faster than accessing through the PCIe interface.

Here's what I think the distribution is: gpuowl LL, PRP, P-1 stage 1, and P-1 stage 2 computations occur on the gpu cores and gpu vram, except for:
part of the Gerbicz check comparison for PRP, the file save and load and file checksum generate or comparison, setup and transfer to / from the gpu, the Jacobi check for LL, P-1 stage 1 gcd, P-1 stage 2 gcd, housekeeping functions like console output and logging occurring on the cpu and system ram. The gcds use one cpu core. If there is other work to do, the gpu is given that to do in parallel while the gcd is run.

Last fiddled with by kriesel on 2020-06-22 at 22:59
kriesel is offline   Reply With Quote
Old 2020-06-25, 07:17   #2357
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

D7616 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I just upgraded to ROCm-3.5.1 and clinfo thereafter said 0 devices found. Now trying to go back to 3,3.

After more shenanigans i.e. apt-get autoremove rocm*, changing the apt sources file and apt-get install rocm-dev3.3.0, I got my clinfo to return to reason. I changed gpuOwl Makefile and recompiled.

2 instances at 5.5M FFT with sclk 4
3.5.0: 1312 µs/it
3.3.0: 1248 µs/it

A huge difference!
I managed to get ROCm 3.5.1 running and installed its rocm-smi. The timing is:

3.5.1 1250 µs/it

Last fiddled with by paulunderwood on 2020-06-26 at 00:26
paulunderwood is offline   Reply With Quote
Old 2020-06-25, 19:33   #2358
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

979110 Posts
Default

Quote:
Originally Posted by kriesel View Post
The gcds use one cpu core. If there is other work to do, the gpu is given that to do in parallel while the gcd is run.
Mihai, is that true? Are you using the Gnu MP gcd, and running that on the CPU?
ewmayer is offline   Reply With Quote
Old 2020-06-25, 20:11   #2359
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010000010002 Posts
Default

If how the gcd is done has changed since https://www.mersenneforum.org/showpo...&postcount=946 or https://www.mersenneforum.org/showpo...postcount=1323 it would be good to know that.
(Found these by computer search for "gcd" through my offline notes for this thread.)
Gcd on cpu is the same approach CUDAPm1 took. I haven't encountered any gpu implementation of gcd in GIMPS software.
But there is use on RSA http://www.cs.hiroshima-u.ac.jp/cs/_...apdcm15gcd.pdf and polynomials https://domino.mpi-inf.mpg.de/intranet/ag1/ag1publ.nsf/0/4c78b0f12389e4e8c125788600538d92/$FILE/paper.pdf

Last fiddled with by kriesel on 2020-06-25 at 20:21
kriesel is offline   Reply With Quote
Old 2020-06-26, 06:10   #2360
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

50B16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Mihai, is that true? Are you using the Gnu MP gcd, and running that on the CPU?
Yes, the GCD is done on the CPU using GNU-MP. It's a convenient solution from the coding POV. The GCD is infrequent, and one GCD takes on the order of 1min on one core of the CPU, no big deal.

Porting the fancy GCD algo to GPU would be a lot of work. Worth it if somebody was doing mainly GCDs, but that's not the case for gpuowl ATM.
preda is online now   Reply With Quote
Old 2020-06-26, 21:40   #2361
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

1E1A16 Posts
Default

Warning!

We are not-so-good at programming!

The following code works but it is a brute-force approach and highly un-optimized!

Xyzzy's "Fan Monitor v0.01":

Code:
#!/bin/bash

echo "1" > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1_enable
echo "1" > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1_enable

let f0=255
let f1=255

echo $f0 > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1
echo $f1 > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1

let z = 90000

while true
do

    let t0=`cat /sys/class/drm/card0/device/hwmon/hwmon3/temp2_input`
    let t1=`cat /sys/class/drm/card1/device/hwmon/hwmon4/temp2_input`

    echo card0 $t0 $f0 card1 $t1 $f1

    if (( $t0 == $z ));
        then let "f0=f0+0"
    elif (( $t0 > $z )); then
        let "f0=f0+1"
        if (( $f0 >= 255 )); then let "f0=255"; fi
    elif (( $t0 < $z )); then
        let "f0=f0-1"
        if (( $f0 <= 0 )); then let "f0=0"; fi
    fi

    if (( $t1 == $z ));
        then let "f1=f1+0"
    elif (( $t1 > $z )); then
        let "f1=f1+1"
        if (( $f1 >= 255 )); then let "f1=255"; fi
    elif (( $t1 < $z )); then
        let "f1=f1-1"
        if (( $f1 <= 0 )); then let "f1=0"; fi
    fi

    echo $f0 > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1
    echo $f1 > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1

    sleep 1

done
Note that you can fill in your desired temperature, sleep interval, starting value, step value and all sorts of stuff.

We have this set up for two cards.

If your video card catches on fire please don't yell at us!

Also, after you are happy that it works you can remove the statement that prints out stuff to the screen.

Xyzzy is offline   Reply With Quote
Old 2020-07-20, 02:26   #2362
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

461610 Posts
Default

First attempt to verify a PRP proof, on an RX550, of one produced by an RX480, same software version:

Code:
2020-07-19 21:18:50 gpuowl v6.11-340-g41d435f
2020-07-19 21:18:50 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-19 21:18:50 config: -device 1 -verify 137000009
2020-07-19 21:18:50 device 1, unique id ''
2020-07-19 21:18:50 condorella/r550 - 137000009\proof
2020-07-19 21:18:50 condorella/r550 Can't open '1' (mode 'rb')
2020-07-19 21:18:50 condorella/r550 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: No error [137000009]
 2020-07-19 21:18:50 condorella/r550 Bye
Code:
Directory of C:\msys64\home\...\gpuowl-v6.11-340-g41d435f\rx550\137000009\proof
          ..
07/19/2020  12:06 PM       154,125,076 137000009-8.proof
Examination of the proof file indicates it does not contain the ASCII header shown in the file spec linked in https://www.mersenneforum.org/showpo...6&postcount=87

Last fiddled with by kriesel on 2020-07-20 at 02:39
kriesel is offline   Reply With Quote
Old 2020-07-20, 11:26   #2363
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,291 Posts
Default

Quote:
Originally Posted by kriesel View Post
Examination of the proof file indicates it does not contain the ASCII header shown in the file spec linked in https://www.mersenneforum.org/showpo...6&postcount=87
What header does it contain? no secret there, feel free to post the few textual lines at the beginning of the proof file.

Try passing a full file path to -verify

In the meantime I'll have a look, I suspect -verify is not working in this situation (exponent vs. full path).
preda is online now   Reply With Quote
Old 2020-07-20, 13:05   #2364
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·577 Posts
Default

Quote:
Originally Posted by preda View Post
What header does it contain? no secret there, feel free to post the few textual lines at the beginning of the proof file.
My bad, "MORE" shows a header that looks correct. Notepad had displayed none at all. More output:
Code:
PRP PROOF
VERSION=1
HASHSIZE=64
POWER=8
NUMBER=M137000009
t└ÄÇ▲)F╦∙☻╨♠≈\╫ö╜ê♫⌡♥↑♀ocòókxÖ■x9'²│╩Θ▐º♠>àG╣♠"Ü│èG▄εVµ3⌡är·╛WK»│±☻╢$ç╠ªD²▒♂»τJ'eôδ╗1M½M╩↨;Äà∩ò
 9G≡~↨►#ï░╫§╓aßΣ∩≥÷òäδ╗┬F Θ²W}é\αü│ñ±▼V=πZ·¡JS≤_<ä=≡┐╜⌠S╥I♂╦ô♠'▒½δ2@√gKûH ╘C¥Æ  >íD:╦₧↑∟o\'╩7ⁿwT >╜T≈☼(↕3♂┐↑úátÇQ[(σ╞:@à0╓§lAg>m7äL■$≤╩öΩ₧ôπ
╜4Ω╨φδPQ>╠↔ô}∙üÿ0J$m├π╝JÇp&╝┐+êár▼│ë♦A╚╡
╓╒4~V▌a:é░±ó∙#╩N4╡├[Θƒ¼┘7]9^╛√¡─&╔√≥¶ⁿ│₧╙÷¡δ'╤]Iéh╟^┘≈│¿▐&Uë∙½z♀ï¬H▐╠╠ÿ%⌠
Ñ╣BKx▲l♠>`╟Q≤â╒Æi±↓σ▲☺A╦~DP
Quote:
Try passing a full file path to -verify
Please clarify in built in help regarding path\file requirement. Since it had stated file, I had been using only filename.type and trying various locations for the file. The following seems to be working

Code:
gpuowl-win -device 1 -verify .\137000009\proof\137000009-8.proof
2020-07-20 07:51:43 gpuowl v6.11-340-g41d435f
2020-07-20 07:51:43 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-20 07:51:43 config: -device 1 -verify .\137000009\proof\137000009-8.proof
2020-07-20 07:51:43 device 1, unique id ''
2020-07-20 07:51:43 condorella/r550 137000009 FFT: 7.50M 1K:15:256 (17.42 bpw)
2020-07-20 07:51:43 condorella/r550 Expected maximum carry32: 38070000
2020-07-20 07:51:45 condorella/r550 OpenCL args "-DEXP=137000009u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.d1
f815ca64eap-5 -DIWEIGHT_STEP_MINUS_1=-0xa.9621adaf56208p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-07-20 07:51:45 condorella/r550 ASM compilation failed, retrying compilation using NO_ASM
2020-07-20 07:51:51 condorella/r550 OpenCL compilation in 5.46 s
2020-07-20 07:51:51 condorella/r550 proof: doing 184 iterations
2020-07-20 07:52:40 condorella/r550 proof verification: doing 535157 iterations
2020-07-20 08:00:39 condorella/r550 20000 / 535157, 23962 us/it
\
so should finish in a few hours. A PRP DC would take ~5.4 WEEKS on the RX550.
Thanks!

Last fiddled with by kriesel on 2020-07-20 at 13:10
kriesel is offline   Reply With Quote
Old 2020-07-20, 16:38   #2365
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×577 Posts
Default Verification completed, per log, but no entry in results file

gpuowl-win v6.11-340 gpuowl.log excerpt:
Code:
2020-07-20 07:51:43 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-20 07:51:43 config: -device 1 -verify .\137000009\proof\137000009-8.proof 
2020-07-20 07:51:43 device 1, unique id ''
2020-07-20 07:51:43 condorella/r550 137000009 FFT: 7.50M 1K:15:256 (17.42 bpw)
2020-07-20 07:51:43 condorella/r550 Expected maximum carry32: 38070000
2020-07-20 07:51:45 condorella/r550 OpenCL args "-DEXP=137000009u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.d1f815ca64eap-5 -DIWEIGHT_STEP_MINUS_1=-0xa.9621adaf56208p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-07-20 07:51:45 condorella/r550 ASM compilation failed, retrying compilation using NO_ASM
2020-07-20 07:51:51 condorella/r550 OpenCL compilation in 5.46 s
2020-07-20 07:51:51 condorella/r550 proof: doing 184 iterations
2020-07-20 07:52:40 condorella/r550 proof verification: doing 535157 iterations
2020-07-20 08:00:39 condorella/r550 20000 / 535157, 23962 us/it
2020-07-20 08:08:27 condorella/r550 40000 / 535157, 23398 us/it
2020-07-20 08:16:14 condorella/r550 60000 / 535157, 23403 us/it
2020-07-20 08:24:02 condorella/r550 80000 / 535157, 23392 us/it
2020-07-20 08:31:49 condorella/r550 100000 / 535157, 23393 us/it
2020-07-20 08:39:37 condorella/r550 120000 / 535157, 23397 us/it
2020-07-20 08:47:24 condorella/r550 140000 / 535157, 23384 us/it
2020-07-20 08:55:11 condorella/r550 160000 / 535157, 23391 us/it
2020-07-20 09:02:58 condorella/r550 180000 / 535157, 23375 us/it
2020-07-20 09:10:46 condorella/r550 200000 / 535157, 23388 us/it
2020-07-20 09:18:33 condorella/r550 220000 / 535157, 23387 us/it
2020-07-20 09:26:20 condorella/r550 240000 / 535157, 23383 us/it
2020-07-20 09:34:07 condorella/r550 260000 / 535157, 23388 us/it
2020-07-20 09:41:55 condorella/r550 280000 / 535157, 23383 us/it
2020-07-20 09:49:42 condorella/r550 300000 / 535157, 23394 us/it
2020-07-20 09:57:29 condorella/r550 320000 / 535157, 23391 us/it
2020-07-20 10:05:17 condorella/r550 340000 / 535157, 23386 us/it
2020-07-20 10:13:04 condorella/r550 360000 / 535157, 23375 us/it
2020-07-20 10:20:51 condorella/r550 380000 / 535157, 23393 us/it
2020-07-20 10:28:38 condorella/r550 400000 / 535157, 23389 us/it
2020-07-20 10:36:26 condorella/r550 420000 / 535157, 23386 us/it
2020-07-20 10:44:13 condorella/r550 440000 / 535157, 23391 us/it
2020-07-20 10:52:00 condorella/r550 460000 / 535157, 23381 us/it
2020-07-20 10:59:48 condorella/r550 480000 / 535157, 23398 us/it
2020-07-20 11:07:35 condorella/r550 500000 / 535157, 23395 us/it
2020-07-20 11:15:23 condorella/r550 520000 / 535157, 23399 us/it
2020-07-20 11:21:17 condorella/r550 proof: 137000009 proved composite
2020-07-20 11:21:17 condorella/r550 proof '.\137000009\proof\137000009-8.proof' verified
2020-07-20 11:21:17 condorella/r550 Bye
Results.txt contains zero bytes.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1656 2020-10-13 14:21
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 22:21.

Mon Oct 26 22:21:12 UTC 2020 up 46 days, 19:32, 0 users, load averages: 1.70, 1.77, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.