mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

ewmayer 2020-06-22 19:30

[QUOTE=Uncwilly;548811]gpuOwl is all video card based, IIRC.[/QUOTE]

That is the idea, but JanS, please see my note in the OP of the neighboring how-to-for-linux thread about running the program with -maxAlloc set based on your card's (usable) memory capacity and #jobs you are running on the card. On my R7 with 16GB, running sans the above flag led to out-of-memory occurrences when both jobs happened to be running the memory-hungry stage 2 of p-1 at the same time. That led the 'card OS' to swap to RAM, leading to a massive, order-of-100x, slowdown.

Mike/Xyzzy has reported that this can happen even on cards running just 1 instance, if e.g. the card is being used for driving video on the user's system, in which case the available HBM may be less than advertised, possibly significantly - Mike reported he needed to cut the HBM allocation in -maxAlloc to something like 1GB less than his card's memory.

I suggest using -maxAlloc to make sure the set of instances running on a given card use no more 90% of card memory, in total.

kriesel 2020-06-22 22:53

[QUOTE=Jan S;548810]When i use GPUowl for P-1 factoring, this process uses for stage 2 RAM or videoRAM?[/QUOTE]Vram, except for the gcd & misc. activity, under normal circumstances. On-gpu bandwidth is much faster than accessing through the PCIe interface.

Here's what I think the distribution is: gpuowl LL, PRP, P-1 stage 1, and P-1 stage 2 computations occur on the gpu cores and gpu vram, except for:
part of the Gerbicz check comparison for PRP, the file save and load and file checksum generate or comparison, setup and transfer to / from the gpu, the Jacobi check for LL, P-1 stage 1 gcd, P-1 stage 2 gcd, housekeeping functions like console output and logging occurring on the cpu and system ram. The gcds use one cpu core. If there is other work to do, the gpu is given that to do in parallel while the gcd is run.

paulunderwood 2020-06-25 07:17

[QUOTE=paulunderwood;548571]I just upgraded to ROCm-3.5.1 and clinfo thereafter said 0 devices found. Now trying to go back to 3,3. :rant:

After more shenanigans i.e. [C]apt-get autoremove rocm*[/C], changing the apt sources file and [C]apt-get install rocm-dev3.3.0[/C], I got my clinfo to return to reason. I changed gpuOwl Makefile and recompiled.

2 instances at 5.5M FFT with sclk 4
3.5.0: 1312 µs/it
3.3.0: 1248 µs/it

A huge difference![/QUOTE]

I managed to get ROCm 3.5.1 running and installed its rocm-smi. The timing is:

3.5.1 1250 µs/it

ewmayer 2020-06-25 19:33

[QUOTE=kriesel;548832]The gcds use one cpu core. If there is other work to do, the gpu is given that to do in parallel while the gcd is run.[/QUOTE]

Mihai, is that true? Are you using the Gnu MP gcd, and running that on the CPU?

kriesel 2020-06-25 20:11

If how the gcd is done has changed since [URL]https://www.mersenneforum.org/showpost.php?p=506749&postcount=946[/URL] or [URL]https://www.mersenneforum.org/showpost.php?p=525223&postcount=1323[/URL] it would be good to know that.
(Found these by computer search for "gcd" through my offline notes for this thread.)
Gcd on cpu is the same approach CUDAPm1 took. I haven't encountered any gpu implementation of gcd in GIMPS software.
But there is use on RSA [URL]http://www.cs.hiroshima-u.ac.jp/cs/_media/apdcm15gcd.pdf[/URL] and polynomials https://domino.mpi-inf.mpg.de/intranet/ag1/ag1publ.nsf/0/4c78b0f12389e4e8c125788600538d92/$FILE/paper.pdf

preda 2020-06-26 06:10

[QUOTE=ewmayer;549075]Mihai, is that true? Are you using the Gnu MP gcd, and running that on the CPU?[/QUOTE]

Yes, the GCD is done on the CPU using GNU-MP. It's a convenient solution from the coding POV. The GCD is infrequent, and one GCD takes on the order of 1min on one core of the CPU, no big deal.

Porting the fancy GCD algo to GPU would be a lot of work. Worth it if somebody was doing mainly GCDs, but that's not the case for gpuowl ATM.

Xyzzy 2020-06-26 21:40

Warning!

We are not-so-good at programming!

The following code works but it is a brute-force approach and highly un-optimized!

Xyzzy's "Fan Monitor v0.01":

[CODE]#!/bin/bash

echo "1" > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1_enable
echo "1" > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1_enable

let f0=255
let f1=255

echo $f0 > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1
echo $f1 > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1

let z = 90000

while true
do

let t0=`cat /sys/class/drm/card0/device/hwmon/hwmon3/temp2_input`
let t1=`cat /sys/class/drm/card1/device/hwmon/hwmon4/temp2_input`

echo card0 $t0 $f0 card1 $t1 $f1

if (( $t0 == $z ));
then let "f0=f0+0"
elif (( $t0 > $z )); then
let "f0=f0+1"
if (( $f0 >= 255 )); then let "f0=255"; fi
elif (( $t0 < $z )); then
let "f0=f0-1"
if (( $f0 <= 0 )); then let "f0=0"; fi
fi

if (( $t1 == $z ));
then let "f1=f1+0"
elif (( $t1 > $z )); then
let "f1=f1+1"
if (( $f1 >= 255 )); then let "f1=255"; fi
elif (( $t1 < $z )); then
let "f1=f1-1"
if (( $f1 <= 0 )); then let "f1=0"; fi
fi

echo $f0 > /sys/class/drm/card0/device/hwmon/hwmon3/pwm1
echo $f1 > /sys/class/drm/card1/device/hwmon/hwmon4/pwm1

sleep 1

done[/CODE]Note that you can fill in your desired temperature, sleep interval, starting value, step value and all sorts of stuff.

We have this set up for two cards.

If your video card catches on fire please don't yell at us!

Also, after you are happy that it works you can remove the statement that prints out stuff to the screen.

:mike:

kriesel 2020-07-20 02:26

First attempt to verify a PRP proof, on an RX550, of one produced by an RX480, same software version:

[CODE]2020-07-19 21:18:50 gpuowl v6.11-340-g41d435f
2020-07-19 21:18:50 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-19 21:18:50 config: -device 1 -verify 137000009
2020-07-19 21:18:50 device 1, unique id ''
2020-07-19 21:18:50 condorella/r550 - 137000009\proof
2020-07-19 21:18:50 condorella/r550 Can't open '1' (mode 'rb')
2020-07-19 21:18:50 condorella/r550 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: No error [137000009]
2020-07-19 21:18:50 condorella/r550 Bye[/CODE][CODE]Directory of C:\msys64\home\...\gpuowl-v6.11-340-g41d435f\rx550\137000009\proof
..
07/19/2020 12:06 PM 154,125,076 137000009-8.proof[/CODE]Examination of the proof file indicates it does not contain the ASCII header shown in the file spec linked in [url]https://www.mersenneforum.org/showpost.php?p=549786&postcount=87[/url]

preda 2020-07-20 11:26

[QUOTE=kriesel;551071]Examination of the proof file indicates it does not contain the ASCII header shown in the file spec linked in [url]https://www.mersenneforum.org/showpost.php?p=549786&postcount=87[/url][/QUOTE]

What header does it contain? no secret there, feel free to post the few textual lines at the beginning of the proof file.

Try passing a full file path to -verify

In the meantime I'll have a look, I suspect -verify is not working in this situation (exponent vs. full path).

kriesel 2020-07-20 13:05

[QUOTE=preda;551087]What header does it contain? no secret there, feel free to post the few textual lines at the beginning of the proof file.[/QUOTE]My bad, "MORE" shows a header that looks correct. Notepad had displayed none at all. More output:
[CODE]PRP PROOF
VERSION=1
HASHSIZE=64
POWER=8
NUMBER=M137000009
t└ÄÇ▲)F╦∙☻╨♠≈\╫ö╜ê♫⌡♥↑♀ocòókxÖ■x9'²│╩Θ▐º♠>àG╣♠"Ü│èG▄εVµ3⌡är·╛WK»│±☻╢$ç╠ªD²▒♂»τJ'eôδ╗1M½M╩↨;Äà∩ò
9G≡~↨►#ï░╫§╓aßΣ∩≥÷òäδ╗┬F Θ²W}é\αü│ñ±▼V=πZ·¡JS≤_<ä=≡┐╜⌠S╥I♂╦ô♠'▒½δ2@√gKûH ╘C¥Æ >íD:╦₧↑∟o\'╩7ⁿwT >╜T≈☼(↕3♂┐↑úátÇQ[(σ╞:@à0╓§lAg>m7äL■$≤╩öΩ₧ôπ
╜4Ω╨φδPQ>╠↔ô}∙üÿ0J$m├π╝JÇp&╝┐+êár▼│ë♦A╚╡
╓╒4~V▌a:é░±ó∙#╩N4╡├[Θƒ¼┘7]9^╛√¡─&╔√≥¶ⁿ│₧╙÷¡δ'╤]Iéh╟^┘≈│¿▐&Uë∙½z♀ï¬H▐╠╠ÿ%⌠
Ñ╣BKx▲l♠>`╟Q≤â╒Æi±↓σ▲☺A╦~DP[/CODE] [QUOTE]
Try passing a full file path to -verify[/QUOTE]Please clarify in built in help regarding path\file requirement. Since it had stated file, I had been using only filename.type and trying various locations for the file. The following seems to be working

[CODE]gpuowl-win -device 1 -verify .\137000009\proof\137000009-8.proof
2020-07-20 07:51:43 gpuowl v6.11-340-g41d435f
2020-07-20 07:51:43 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-20 07:51:43 config: -device 1 -verify .\137000009\proof\137000009-8.proof
2020-07-20 07:51:43 device 1, unique id ''
2020-07-20 07:51:43 condorella/r550 137000009 FFT: 7.50M 1K:15:256 (17.42 bpw)
2020-07-20 07:51:43 condorella/r550 Expected maximum carry32: 38070000
2020-07-20 07:51:45 condorella/r550 OpenCL args "-DEXP=137000009u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.d1
f815ca64eap-5 -DIWEIGHT_STEP_MINUS_1=-0xa.9621adaf56208p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-07-20 07:51:45 condorella/r550 ASM compilation failed, retrying compilation using NO_ASM
2020-07-20 07:51:51 condorella/r550 OpenCL compilation in 5.46 s
2020-07-20 07:51:51 condorella/r550 proof: doing 184 iterations
2020-07-20 07:52:40 condorella/r550 proof verification: doing 535157 iterations
2020-07-20 08:00:39 condorella/r550 20000 / 535157, 23962 us/it
\[/CODE]so should finish in a few hours. A PRP DC would take ~5.4 WEEKS on the RX550.
Thanks!

kriesel 2020-07-20 16:38

Verification completed, per log, but no entry in results file
 
gpuowl-win v6.11-340 gpuowl.log excerpt:[CODE]2020-07-20 07:51:43 config: -device 1 -user kriesel -cpu condorella/r550 -yield -maxAlloc 7500
2020-07-20 07:51:43 config: -device 1 -verify .\137000009\proof\137000009-8.proof
2020-07-20 07:51:43 device 1, unique id ''
2020-07-20 07:51:43 condorella/r550 137000009 FFT: 7.50M 1K:15:256 (17.42 bpw)
2020-07-20 07:51:43 condorella/r550 Expected maximum carry32: 38070000
2020-07-20 07:51:45 condorella/r550 OpenCL args "-DEXP=137000009u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.d1f815ca64eap-5 -DIWEIGHT_STEP_MINUS_1=-0xa.9621adaf56208p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-07-20 07:51:45 condorella/r550 ASM compilation failed, retrying compilation using NO_ASM
2020-07-20 07:51:51 condorella/r550 OpenCL compilation in 5.46 s
2020-07-20 07:51:51 condorella/r550 proof: doing 184 iterations
2020-07-20 07:52:40 condorella/r550 proof verification: doing 535157 iterations
2020-07-20 08:00:39 condorella/r550 20000 / 535157, 23962 us/it
2020-07-20 08:08:27 condorella/r550 40000 / 535157, 23398 us/it
2020-07-20 08:16:14 condorella/r550 60000 / 535157, 23403 us/it
2020-07-20 08:24:02 condorella/r550 80000 / 535157, 23392 us/it
2020-07-20 08:31:49 condorella/r550 100000 / 535157, 23393 us/it
2020-07-20 08:39:37 condorella/r550 120000 / 535157, 23397 us/it
2020-07-20 08:47:24 condorella/r550 140000 / 535157, 23384 us/it
2020-07-20 08:55:11 condorella/r550 160000 / 535157, 23391 us/it
2020-07-20 09:02:58 condorella/r550 180000 / 535157, 23375 us/it
2020-07-20 09:10:46 condorella/r550 200000 / 535157, 23388 us/it
2020-07-20 09:18:33 condorella/r550 220000 / 535157, 23387 us/it
2020-07-20 09:26:20 condorella/r550 240000 / 535157, 23383 us/it
2020-07-20 09:34:07 condorella/r550 260000 / 535157, 23388 us/it
2020-07-20 09:41:55 condorella/r550 280000 / 535157, 23383 us/it
2020-07-20 09:49:42 condorella/r550 300000 / 535157, 23394 us/it
2020-07-20 09:57:29 condorella/r550 320000 / 535157, 23391 us/it
2020-07-20 10:05:17 condorella/r550 340000 / 535157, 23386 us/it
2020-07-20 10:13:04 condorella/r550 360000 / 535157, 23375 us/it
2020-07-20 10:20:51 condorella/r550 380000 / 535157, 23393 us/it
2020-07-20 10:28:38 condorella/r550 400000 / 535157, 23389 us/it
2020-07-20 10:36:26 condorella/r550 420000 / 535157, 23386 us/it
2020-07-20 10:44:13 condorella/r550 440000 / 535157, 23391 us/it
2020-07-20 10:52:00 condorella/r550 460000 / 535157, 23381 us/it
2020-07-20 10:59:48 condorella/r550 480000 / 535157, 23398 us/it
2020-07-20 11:07:35 condorella/r550 500000 / 535157, 23395 us/it
2020-07-20 11:15:23 condorella/r550 520000 / 535157, 23399 us/it
2020-07-20 11:21:17 condorella/r550 proof: 137000009 proved composite
2020-07-20 11:21:17 condorella/r550 proof '.\137000009\proof\137000009-8.proof' verified
2020-07-20 11:21:17 condorella/r550 Bye
[/CODE]Results.txt contains zero bytes.


All times are UTC. The time now is 22:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.