![]() |
[QUOTE=Prime95;521326]Happy me.
I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course. A question for the Linux gurus: I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE] [c]sudo whoami[/c] should tell you something. Then [c]sudo crontab -u root -e[/c]. You might want to start a shell script to firstly configure the card and then launch gpuowl. In crontab have something like: [code] @reboot /bin/bash /path-to/mystartupgpuowl.sh [/code] And in that shell script have: [code] /opt/rocm/bin/rocm-smi --load path-to-and-config-file cd /home/george/gpuowl; ./gpuowl & [/code] |
[QUOTE=GP2;521118]I was thinking that there might be some relatively trivial modification.
Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is [c]3^(2^p) mod (2^p + 1)[/c] whereas for Mersenne (where type 1 and type 5 are the same thing), it's [c]3^(2^p − 2) mod (2^p − 1)[/c]. I don't know if there is a similarly simple modification for type 4 or type 2 residues. [/QUOTE] This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1": In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that. For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed? To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore. |
[QUOTE=preda;521342]This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1"[/QUOTE]
There are two weights. You still have the real weights to distribute the p bits uniformly over the FFTLEN words. You also need to apply complex roots-of-minus-one to "trick" the FFT into doing a negacyclic convolution instead of a cyclic convolution. You don't need any extra FFT memory, but you do need a modified first pass that takes real inputs and produces weighted complex FFT'ed outputs. Not easy, but not hard either. Next you need a new simpler second pass that scraps all the Hermetian symmetry computations before the point-wise squaring. |
[QUOTE=Prime95;521326]Happy me.
I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course. A question for the Linux gurus: I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE] I think you have to write a systemd service file. Something like this: gpuowl.service [Unit] Description=GpuOwl After=network-online.target Wants=network-online.target [Service] ExecStart=/home/george/gpuowl <arguments> Restart=on-failure RestartSec=1minute WatchdogSec=20minutes TimeoutStopSec=150seconds StandardOutput=syslog NotifyAccess=main KillSignal=SIGINT [Install] WantedBy=multi-user.target |
[QUOTE=Prime95;521326]Happy me.
I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course. A question for the Linux gurus: I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?[/QUOTE] [QUOTE=SELROC;521383]I think you have to write a systemd service file. Something like this: gpuowl.service [Unit] Description=GpuOwl After=network-online.target Wants=network-online.target [Service] ExecStart=/home/george/gpuowl <arguments> Restart=on-failure RestartSec=1minute WatchdogSec=20minutes TimeoutStopSec=150seconds StandardOutput=syslog NotifyAccess=main KillSignal=SIGINT [Install] WantedBy=multi-user.target[/QUOTE] here a good guide: [url]https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files[/url] |
Checkpoint file management
It seems that mfakto manages checkpoint files, after a result is computed, the checkpoints are removed.
Also, if a checkpoint is invalid, mfakto renames it (to mark it as bad) and loads the previous checkpoint. |
[QUOTE=SELROC;521383]I think you have to write a systemd service file.
Something like this: gpuowl.service [Unit] Description=GpuOwl After=network-online.target Wants=network-online.target [Service] ExecStart=/home/george/gpuowl <arguments> Restart=on-failure RestartSec=1minute WatchdogSec=20minutes TimeoutStopSec=150seconds StandardOutput=syslog NotifyAccess=main KillSignal=SIGINT [Install] WantedBy=multi-user.target[/QUOTE] It is important to use SIGINT instead of SIGQUIT. SIGINT behavior is like Control-C in the terminal, it lets gpuowl save a checkpoint before stopping. |
I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.
|
[QUOTE=Bulldozer;521694]I'm having a problem running gpuowl on my laptop. It has an integrated CPU (Intel HD 620) and a AMD Radeon R5 530. When I run this program, it always runs on my HD 620 and get a bunch of errors. It never runs on my R5 530. I tried re-installing both of the drivers, re-installing Windows, and set the program in high-performance mode in Radeon Settings. None of these works. I hope for an answer.[/QUOTE]
Regardless of which device number you specify? Both the AMD and Intel OpenCL drivers are ok? (I've seen installing software for one knock the other out.) See [URL]https://www.mersenneforum.org/showpost.php?p=488474&postcount=6[/URL] for utilities to check opencl is seeing both devices, etc. |
So I got my Radeon VII but I'm a bit lost, it has been many many years since I had an AMD card and it was way before using GPUs for any calculations, and I'm also new to gpuowl.
I installed the newest drivers: Adrenalin 2019 19.7.2. I had "gpuowl-win7-x64-v6.5-c48d46f.7z" from [URL="https://mersenneforum.org/showpost.php?p=516704&postcount=1171"]post #1171[/URL] on my hard drive already from 2 months ago, I think I got it to confirm that OpenCL really worked on my RTX 2080 which it did. Now when I run it with -device 1 (Radeon VII) it only writes the first few lines but never gets to the "OpenCL compilation in ..." line and it never starts running. [QUOTE]2019-07-19 23:38:23 config: -device 1 2019-07-19 23:38:23 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word 2019-07-19 23:38:23 using short carry kernels[/QUOTE] When I use -device 0 it works fine and runs on my RTX 2080. I tried downloading the " gpuowl-win-v6.5-84-g30c0508.7z" from [URL="https://mersenneforum.org/showpost.php?p=521225&postcount=1274"]post #1274[/URL] but it does not start at all on neither card: [CODE]2019-07-20 00:05:56 config: -device 1 2019-07-20 00:05:56 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word 2019-07-20 00:05:56 using short carry kernels 2019-07-20 00:05:56 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-07-20 00:05:56 OpenCL compilation error -11 (args -DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0) 2019-07-20 00:05:56 C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: implicit declaration of function '__asm' is invalid in C99 X2(u[0], u[2]); ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:2: note: expanded from macro 'X2' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')' C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:35: note: expanded from macro 'X2' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '(' C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:7: note: expanded from macro 'X2' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')' X2(u[0], u[2]); ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:35: note: expanded from macro 'X2' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '(' C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:7: note: expanded from macro 'X2' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: error: expected ')' X2_mul_t4(u[1], u[3]); ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:35: note: expanded from macro 'X2_mul_t4' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: note: to match this '(' C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:7: note: expanded from macro 'X2_mul_t4' __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \ ^ C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:1982019-07-20 00:05:56 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build 2019-07-20 00:05:56 Bye[/CODE] Are there any more Windows executables collected somewhere? |
[QUOTE=preda;521342]This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":
In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.[/quote] No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length. [quote]For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed? To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.[/QUOTE] As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT. |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.