mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

axn 2017-05-27 05:48

IIRC the issue was when the shift was just a bit smaller than p (i.e. very close to p).

preda 2017-05-27 07:03

[QUOTE=ewmayer;459834][...] since all our shift arithmetic (specifically the shiftcount doubling on each iteration) is done (mod p). I could see not-properly-modded shifts *greater* than p being a problem, but in the normal course of the LL test the shift should always be in [0,p-1].[/QUOTE]

Yes, that's how I see "offset", a value in [0, Exponent - 1]. (randomly generated at the start of a new exponent, unless forced with -offset).

preda 2017-05-27 07:08

[QUOTE=LaurV;459836]If it is more convenient for you to limit the initial shift to 16 or even 8 bits, than do so :smile:[/QUOTE]
Even when starting with a small offset, by doubling on each iteration it grows large quickly and I have to handle the full range anyhow, thus no much benefit in limiting the starting value.

GP2 2017-05-27 07:18

[QUOTE=Madpoo;459826]I don't know the details, but there was another funny bug that showed up when I was doing (mostly) unnecessary triple-checks of previously verified work.

If the shift was smaller than the exponent (do I have that right?) it would cause a problem. It was rare, especially once the exponent sizes got larger, but we did find a few cases where it was an issue.

Specifically I noticed it when doing triple checks of every exponent below 3M or whatever, and the shift count in some cases was smaller than 3e6 so I was getting residues that didn't match.

It might not apply to your algorithm... I forget what the exact problem was (if I ever even knew the details) :smile:[/QUOTE]

See [URL="http://mersenneforum.org/showthread.php?t=20372&page=30"]p. 30 of the Strategic Double Checking thread[/URL], it was discovered and discussed there by you and George.

The bug was triggered when the shift was greater than p − 64. Obviously, for p in the tens of millions it was extremely rare, but when you ran triple checks of very low exponents it was triggered a few times.

preda 2017-05-27 07:25

[QUOTE=kracker;459821]I'm playing around with v0.3 atm.. i'm getting 5.15ms/iter compared to 5ms/iter from v0.2 without offset[/QUOTE]

3% sounds like a lot. What I'd like to achieve is no-offset performance parity when the user sets -offset 0. (This is not the case now, the perf impact is the same regardless of the offset being 0 or something else).

preda 2017-05-27 14:04

[QUOTE=GP2;459841]See [URL="http://mersenneforum.org/showthread.php?t=20372&page=30"]p. 30 of the Strategic Double Checking thread[/URL], it was discovered and discussed there by you and George.

The bug was triggered when the shift was greater than p − 64. Obviously, for p in the tens of millions it was extremely rare, but when you ran triple checks of very low exponents it was triggered a few times.[/QUOTE]

Thanks for pointing out that thread. Reading it, yes such a bug makes sense, so I went and looked over my residue-computing code to make sure that the word index is correctly "rolled over" in the unlikely case that the 64-bits span the "end" boundary.

Madpoo 2017-05-27 16:46

[QUOTE=GP2;459841]See [URL="http://mersenneforum.org/showthread.php?t=20372&page=30"]p. 30 of the Strategic Double Checking thread[/URL], it was discovered and discussed there by you and George.

The bug was triggered when the shift was greater than p − 64. Obviously, for p in the tens of millions it was extremely rare, but when you ran triple checks of very low exponents it was triggered a few times.[/QUOTE]

Ah, thanks for the reminder. Yeah, I was pretty sure it was a rare condition and I just couldn't remember the details (and was too lazy to go back and look). :smile:

preda 2017-05-28 09:19

[QUOTE=preda;459842]3% sounds like a lot. What I'd like to achieve is no-offset performance parity when the user sets -offset 0. (This is not the case now, the perf impact is the same regardless of the offset being 0 or something else).[/QUOTE]

I just submitted changes to avoid [most of] the performance penalty when the offset is zero (-offset 0 on exponent start). So even with v0.3, you should see now the same perf as with v0.2 when not using offset.

LaurV 2017-05-28 11:18

After endless struggle we succeeded in building our own owl. Or owl own. Not clear how.
The struggle is not due to the program, but due to our "excellent knowledge" about using the tools. But, well, we succeeded at last, to get a partial selftest ok. Partial because it is still running.

We mainly followed Victor's guide, with few deviations. Deviations were due to the fact that we had to install cygwin too, needed for mingw and msys. Then, within msys, we couldn't run make, because... well... th[FONT=Arial]ere is no [/FONT]make. There is a mingw32-make.exe, but we were not sure if it is the same toy or not.

After some more search, we found that we need to run

> pacman -S msys/make

[FONT=Arial]Then it asked us what the hack is g++. So we also had to run

[/FONT]> pacman -S msys/gcc

[FONT=Arial]Then the infamous "make" command produced an .exe file. Which is now self-testing. Up to now, everything ok.

We have a small issue however: it does not get out with ctrl+c (not ctrl+break). Most probably we missed a switch or a link or something..


[/FONT]

LaurV 2017-05-28 13:10

Ok. Selftests passed, but since the worktodo.h separation, it seems we can't avoid skipping all lines from the worktodo, no matter what we put there. What are we doing wrong? (also, the harmless warning, caused by unused variable)

[CODE]
e:\99 - Prime\gpuOwl>gpuowl -logstep 1000 -savestep 100000
gpuOwL v0.3 GPU Lucas-Lehmer primality checker; Sun May 28 20:07:11 2017
Config: -logstep 1000 -savestep 100000 -cl ""
32x1050MHz Tahiti; OpenCL 1.2 AMD-APP (2348.3)
OpenCL compilation log:
An invalid option was specified.

Falling back to CL1.x compilation (error -11)
OpenCL compilation log:
"C:\blahblah\gpuowl.cl", line 529: warning: variable
"H" was declared but never referenced
uint W = 1024, H = 2048, GW = W / 64;
^

Compile : 760 ms
General setup : 320 ms
worktodo.txt line 'DoubleCheck=blahblah,40159129,72,1' skipped
worktodo.txt line 'DoubleCheck=blahblah,40159153,72,1' skipped

Bye

e:\99 - Prime\gpuOwl>[/CODE]

preda 2017-05-28 13:42

[QUOTE=LaurV;459914]Ok. Selftests passed, but since the worktodo.h separation, it seems we can't avoid skipping all lines from the worktodo, no matter what we put there. What are we doing wrong? (also, the harmless warning, caused by unused variable)
[/QUOTE]

Most likely skipped because of bounds check on the exponent, 50M - 78M..? (is that the reason?)

I updated the code to be explicit on that, you may git pull again.


All times are UTC. The time now is 07:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.