mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

paulunderwood 2019-07-11 19:17

[QUOTE=Prime95;521326]Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE]

[c]sudo whoami[/c] should tell you something. Then [c]sudo crontab -u root -e[/c]. You might want to start a shell script to firstly configure the card and then launch gpuowl. In crontab have something like:

[code]
@reboot /bin/bash /path-to/mystartupgpuowl.sh
[/code]

And in that shell script have:
[code]
/opt/rocm/bin/rocm-smi --load path-to-and-config-file
cd /home/george/gpuowl; ./gpuowl &
[/code]

preda 2019-07-11 21:33

[QUOTE=GP2;521118]I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is [c]3^(2^p) mod (2^p + 1)[/c] whereas for Mersenne (where type 1 and type 5 are the same thing), it's [c]3^(2^p − 2) mod (2^p − 1)[/c]. I don't know if there is a similarly simple modification for type 4 or type 2 residues.
[/QUOTE]

This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.

For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.

Prime95 2019-07-12 00:13

[QUOTE=preda;521342]This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1"[/QUOTE]

There are two weights. You still have the real weights to distribute the p bits uniformly over the FFTLEN words.

You also need to apply complex roots-of-minus-one to "trick" the FFT into doing a negacyclic convolution instead of a cyclic convolution. You don't need any extra FFT memory, but you do need a modified first pass that takes real inputs and produces weighted complex FFT'ed outputs. Not easy, but not hard either.

Next you need a new simpler second pass that scraps all the Hermetian symmetry computations before the point-wise squaring.

SELROC 2019-07-12 06:03

[QUOTE=Prime95;521326]Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE]


I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target

SELROC 2019-07-14 07:04

[QUOTE=Prime95;521326]Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE]

[QUOTE=SELROC;521383]I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target[/QUOTE]




here a good guide:


[url]https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files[/url]

SELROC 2019-07-14 07:17

Checkpoint file management
 
It seems that mfakto manages checkpoint files, after a result is computed, the checkpoints are removed.
Also, if a checkpoint is invalid, mfakto renames it (to mark it as bad) and loads the previous checkpoint.

SELROC 2019-07-15 07:23

[QUOTE=SELROC;521383]I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target[/QUOTE]


It is important to use SIGINT instead of SIGQUIT.


SIGINT behavior is like Control-C in the terminal, it lets gpuowl save a checkpoint before stopping.

Bulldozer 2019-07-16 00:15

I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.

kriesel 2019-07-16 01:15

[QUOTE=Bulldozer;521694]I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.[/QUOTE]
Regardless of which device number you specify? Both the AMD and Intel OpenCL drivers are ok? (I've seen installing software for one knock the other out.)

See [URL]https://www.mersenneforum.org/showpost.php?p=488474&postcount=6[/URL] for utilities to check opencl is seeing both devices, etc.

ATH 2019-07-19 22:22

So I got my Radeon VII but I'm a bit lost, it has been many many years since I had an AMD card and it was way before using GPUs for any calculations, and I'm also new to gpuowl.

I installed the newest drivers: Adrenalin 2019 19.7.2. I had "gpuowl-win7-x64-v6.5-c48d46f.7z" from [URL="https://mersenneforum.org/showpost.php?p=516704&postcount=1171"]post #1171[/URL] on my hard drive already from 2 months ago, I think I got it to confirm that OpenCL really worked on my RTX 2080 which it did.

Now when I run it with -device 1 (Radeon VII) it only writes the first few lines but never gets to the "OpenCL compilation in ..." line and it never starts running.

[QUOTE]2019-07-19 23:38:23 config: -device 1
2019-07-19 23:38:23 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-19 23:38:23 using short carry kernels[/QUOTE]

When I use -device 0 it works fine and runs on my RTX 2080.


I tried downloading the " gpuowl-win-v6.5-84-g30c0508.7z" from [URL="https://mersenneforum.org/showpost.php?p=521225&postcount=1274"]post #1274[/URL] but it does not start at all on neither card:

[CODE]2019-07-20 00:05:56 config: -device 1
2019-07-20 00:05:56 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 00:05:56 using short carry kernels
2019-07-20 00:05:56 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-20 00:05:56 OpenCL compilation error -11 (args -DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-07-20 00:05:56 C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: implicit declaration of function '__asm' is invalid in C99
X2(u[0], u[2]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:2: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
X2(u[0], u[2]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:1982019-07-20 00:05:56 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-07-20 00:05:56 Bye[/CODE]


Are there any more Windows executables collected somewhere?

ewmayer 2019-07-19 23:27

[QUOTE=preda;521342]This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.[/quote]
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.

[quote]For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.[/QUOTE]
As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.