mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-07-11, 19:17   #1277
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,739 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?
sudo whoami should tell you something. Then sudo crontab -u root -e. You might want to start a shell script to firstly configure the card and then launch gpuowl. In crontab have something like:

Code:
@reboot /bin/bash /path-to/mystartupgpuowl.sh
And in that shell script have:
Code:
/opt/rocm/bin/rocm-smi --load path-to-and-config-file
cd /home/george/gpuowl; ./gpuowl &

Last fiddled with by paulunderwood on 2019-07-11 at 20:05
paulunderwood is offline   Reply With Quote
Old 2019-07-11, 21:33   #1278
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by GP2 View Post
I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.

For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.

Last fiddled with by preda on 2019-07-11 at 21:37
preda is offline   Reply With Quote
Old 2019-07-12, 00:13   #1279
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011001102 Posts
Default

Quote:
Originally Posted by preda View Post
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1"
There are two weights. You still have the real weights to distribute the p bits uniformly over the FFTLEN words.

You also need to apply complex roots-of-minus-one to "trick" the FFT into doing a negacyclic convolution instead of a cyclic convolution. You don't need any extra FFT memory, but you do need a modified first pass that takes real inputs and produces weighted complex FFT'ed outputs. Not easy, but not hard either.

Next you need a new simpler second pass that scraps all the Hermetian symmetry computations before the point-wise squaring.
Prime95 is offline   Reply With Quote
Old 2019-07-12, 06:03   #1280
SELROC
 

211B16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?

I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target
  Reply With Quote
Old 2019-07-14, 07:04   #1281
SELROC
 

365410 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?
Quote:
Originally Posted by SELROC View Post
I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target



here a good guide:


https://www.digitalocean.com/communi...and-unit-files
  Reply With Quote
Old 2019-07-14, 07:17   #1282
SELROC
 

22 Posts
Default Checkpoint file management

It seems that mfakto manages checkpoint files, after a result is computed, the checkpoints are removed.
Also, if a checkpoint is invalid, mfakto renames it (to mark it as bad) and loads the previous checkpoint.
  Reply With Quote
Old 2019-07-15, 07:23   #1283
SELROC
 

23×3×5×19 Posts
Default

Quote:
Originally Posted by SELROC View Post
I think you have to write a systemd service file.
Something like this: gpuowl.service


[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/home/george/gpuowl <arguments>
Restart=on-failure
RestartSec=1minute
WatchdogSec=20minutes
TimeoutStopSec=150seconds
StandardOutput=syslog
NotifyAccess=main
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target

It is important to use SIGINT instead of SIGQUIT.


SIGINT behavior is like Control-C in the terminal, it lets gpuowl save a checkpoint before stopping.
  Reply With Quote
Old 2019-07-16, 00:15   #1284
Bulldozer
 
Jun 2019

1516 Posts
Default

I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.
Bulldozer is offline   Reply With Quote
Old 2019-07-16, 01:15   #1285
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×7×383 Posts
Default

Quote:
Originally Posted by Bulldozer View Post
I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.
Regardless of which device number you specify? Both the AMD and Intel OpenCL drivers are ok? (I've seen installing software for one knock the other out.)

See https://www.mersenneforum.org/showpo...74&postcount=6 for utilities to check opencl is seeing both devices, etc.
kriesel is offline   Reply With Quote
Old 2019-07-19, 22:22   #1286
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35·13 Posts
Default

So I got my Radeon VII but I'm a bit lost, it has been many many years since I had an AMD card and it was way before using GPUs for any calculations, and I'm also new to gpuowl.

I installed the newest drivers: Adrenalin 2019 19.7.2. I had "gpuowl-win7-x64-v6.5-c48d46f.7z" from post #1171 on my hard drive already from 2 months ago, I think I got it to confirm that OpenCL really worked on my RTX 2080 which it did.

Now when I run it with -device 1 (Radeon VII) it only writes the first few lines but never gets to the "OpenCL compilation in ..." line and it never starts running.

Quote:
2019-07-19 23:38:23 config: -device 1
2019-07-19 23:38:23 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-19 23:38:23 using short carry kernels
When I use -device 0 it works fine and runs on my RTX 2080.


I tried downloading the " gpuowl-win-v6.5-84-g30c0508.7z" from post #1274 but it does not start at all on neither card:

Code:
2019-07-20 00:05:56 config: -device 1 
2019-07-20 00:05:56 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 00:05:56 using short carry kernels
2019-07-20 00:05:56 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-20 00:05:56 OpenCL compilation error -11 (args -DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-07-20 00:05:56 C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: implicit declaration of function '__asm' is invalid in C99
  X2(u[0], u[2]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:2: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
        ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:35: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:7: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
  X2(u[0], u[2]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:35: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:7: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: error: expected ')'
  X2_mul_t4(u[1], u[3]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:35: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:7: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:1982019-07-20 00:05:56 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-07-20 00:05:56 Bye

Are there any more Windows executables collected somewhere?

Last fiddled with by ATH on 2019-07-19 at 22:23
ATH is online now   Reply With Quote
Old 2019-07-19, 23:27   #1287
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

Quote:
Originally Posted by preda View Post
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.

Quote:
For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.
As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.

Last fiddled with by ewmayer on 2019-07-19 at 23:29
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 11:28.


Fri Jul 16 11:28:58 UTC 2021 up 49 days, 9:16, 1 user, load averages: 1.60, 1.68, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.