mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

axn 2017-08-30 06:16

[QUOTE=preda;466649]- all LL error protections have been removed: loop detection, Jacobi-check, rounding-error, etc. (as the new check is stronger).
[/QUOTE]

Hopefully you've put in a "zero check". Without it, the whole thing becomes worthless.

EDIT:- Also, rounding error still needs to be detected, as it could mean you need to go to higher FFT

preda 2017-08-30 08:01

[QUOTE=axn;466653]Hopefully you've put in a "zero check". Without it, the whole thing becomes worthless.

EDIT:- Also, rounding error still needs to be detected, as it could mean you need to go to higher FFT[/QUOTE]

Yes, zero check is in (as required by the new check). Rounding errors should be caught by the normal check though, why should there be dedicated rounding detection?

It's true that rounding suggests strongly "highter FFT", but that same idea can arise from a simple bits/word computation as well.

kracker 2017-09-05 01:46

1 Attachment(s)
Windows binaries from latest commit(55d094a)... not tested, sorry!

preda 2017-09-05 03:15

[QUOTE=kracker;467125]Windows binaries from latest commit(55d094a)... not tested, sorry![/QUOTE]
Thanks Kracker!

A summary of the changes in 1.1:
- savefile name change, for exponent NNNN they're now called:
NNNN.ll
NNNN-prev.ll
NNNN-temp.ll
NNNN.<iteration>.ll

i.e. all the savefile for some exponent start with that exponent.
To see details about a savefile just print the last line of the file, like this:
tail -n1 file.ll
(the last line is human-readable text, all before that is binary).

- savefile format change. See the savefile signature that is now changed to "LL4".
It is possible to bring over an LL3 savefile to LL4 format, if it is edited with care by appending a "0" on the last line, and updating the signature to LL4. (this "0" that was added is the number of error rollbacks).

And not news, it does PRP-3. It writes JSON-formatted result to results.txt . Soon it will be possible to submit this result format.

preda 2017-09-11 00:06

[QUOTE=airsquirrels;465777]Updated timings from the Windows driver (Which I'm told is still using the previous closed compiler, while Linux has transitioned to LLVM/rocm - apparently regrettably)

RX64 Air:
ms/iter: 1.639

This is with stock clocks. If I use the "stable" 1000Mhz memory clocks that pass self test I get 1.600

Both kernels work as expected in Windows, so likely an llvm regression.[/QUOTE]

The recent head version now works correctly on VEGA on Linux with amdgpu-pro 17.30, in both modes (-legacy or not).

It appears the "workaround" that fixed the behavior was the removal of the "max-error" computation in the amalgamation kernel (which kernel is only used in non-legacy mode).

Also other small improvements, bring the speed to 1.54 ms/it on Vega air standard (but with quite some heat generated).

kriesel 2017-09-11 18:52

[QUOTE=preda;467130]Thanks Kracker!

A summary of the changes in 1.1:
- savefile name change, for exponent NNNN they're now called:
NNNN.ll
NNNN-prev.ll
NNNN-temp.ll
NNNN.<iteration>.ll

i.e. all the savefile for some exponent start with that exponent.
To see details about a savefile just print the last line of the file, like this:
tail -n1 file.ll
(the last line is human-readable text, all before that is binary).

- savefile format change. See the savefile signature that is now changed to "LL4".
It is possible to bring over an LL3 savefile to LL4 format, if it is edited with care by appending a "0" on the last line, and updating the signature to LL4. (this "0" that was added is the number of error rollbacks).

And not news, it does PRP-3. It writes JSON-formatted result to results.txt . Soon it will be possible to submit this result format.[/QUOTE]

Why file extension .ll if you're doing a PRP-3 computation not Lucas-Lehmer?

preda 2017-09-11 22:07

[QUOTE=kriesel;467562]Why file extension .ll if you're doing a PRP-3 computation not Lucas-Lehmer?[/QUOTE]

Legacy? -- it simply remained unchanged from LL times. I can change it if desired.

GP2 2017-09-12 00:30

[QUOTE=preda;467585]Legacy? -- it simply remained unchanged from LL times. I can change it if desired.[/QUOTE]

It does create confusion.

preda 2017-09-14 04:18

[QUOTE=GP2;467591]It does create confusion.[/QUOTE]

OK. I plan to change the file extension to "owl".

Prime95 2017-09-20 21:04

I'm trying to make sure gpuOwl interim residues match prime95 interim residues. As the person that implemented completely non-standard iteration numbers (off by 2) for LL testing, let me check on the standard for PRP tests.

I view a squaring and optional mul-by-3 as one iteration. Thus, when PRPing a Mersenne number, I think the interim residue for iteration 1 is (3^2)*3 = 27. There are N-1 iterations to PRP 2^N-1.

preda 2017-09-20 23:35

[QUOTE=Prime95;468213]I'm trying to make sure gpuOwl interim residues match prime95 interim residues. As the person that implemented completely non-standard iteration numbers (off by 2) for LL testing, let me check on the standard for PRP tests.

I view a squaring and optional mul-by-3 as one iteration. Thus, when PRPing a Mersenne number, I think the interim residue for iteration 1 is (3^2)*3 = 27. There are N-1 iterations to PRP 2^N-1.[/QUOTE]

A brief description of my implem is here:
[url]http://www.mersenneforum.org/showthread.php?p=466655#post466655[/url]

In the residue computation, there is no mul-by-3 at any point.

The mul-by-3 is only involved in the verification, thus does not affect the residue.

My values are:
iteration 0: residue is 3.
iteration 1: residue is 3^2.
iteration 2: residue is 3^4.

For M = 2^p - 1,
the final residue is 3^(2^(p-1)). This final residue is -3 for a PRP.
Note, computing 3^(2^(p-1)) does not require any mul-by-3, only squarings.


See also [url]http://www.mersenneforum.org/showpost.php?p=466054&postcount=138[/url]
where I propose adding 3 to the final residue to make the final value for a PRP == 0, an idea that I abandoned in the end (adding the 3) [because it would make the final residue non-uniform with the res at other positions].


All times are UTC. The time now is 21:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.