mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

0PolarBearsHere 2017-07-19 10:10

Win 8.1, i7-3930K, GTX1080, gpuowl 0.5
Doing a self-test I got an error -9999, and then the exe crashed.
[QUOTE]LL FFT 4096K (1024*2048*2) of 77002949 (18.36 bits/word) offset 2000 iteration 0
error -9999[/QUOTE]
Any ideas? I couldn't find that error in this thread using a quick search.

thyw 2017-07-19 11:09

I'm not sure about this, but after looking at the sourcecode on github in
clwrap.h line #11-12
[CODE]
#define CHECK(err) { int e = err; if (e != CL_SUCCESS) { fprintf(stderr, "error %d\n", e); assert(false); }}
#define CHECK2(err, mes) { int e = err; if (e != CL_SUCCESS) { fprintf(stderr, "error %d (%s)\n", e, mes); assert(false); }}[/CODE]these must the lines writing out the warning. (searched all the sources searching for printing out "error ..."
about CL_SUCCESS [URL]https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clGetEventInfo.html[/URL], so -9999 is an error code/value. Opencl error.

[URL]https://streamhpc.com/blog/2013-04-28/opencl-error-codes/[/URL]
[CODE]Errors thrown by Vendors
Code Vendor Function(s) Description
-9999 NVidia clEnqueueNDRangeKernel Illegal read or write to a buffer
[/CODE]

airsquirrels 2017-08-16 20:05

Vega Timings, had to use legacy kernels or all 0xfffff... residues. This was done on a stack known to still be pretty raw and not optimized (rocm: 1.6.127);

(Testing on 77002949)
[B]RX64 Air (1630Mhz, but was down at 1536 quickly due to heat)[/B]
ms/iter: 2.048

[B]FE Liquid (1600Mhz) :[/B]
ms/iter: 1.898

Similar to Fury at present.

preda 2017-08-17 11:07

Thank you David for the info. Yes I can repro this now (just got my RX Vega moments ago). The "-2" residue is clearly a problem, I'll investigate.

airsquirrels 2017-08-17 17:45

Updated timings from the Windows driver (Which I'm told is still using the previous closed compiler, while Linux has transitioned to LLVM/rocm - apparently regrettably)

RX64 Air:
ms/iter: 1.639


FE: Unknown, won't recognize with the same same driver as RX.

This is with stock clocks. If I use the "stable" 1000Mhz memory clocks that pass self test I get 1.600

Both kernels work as expected in Windows, so likely an llvm regression.

preda 2017-08-21 00:31

[QUOTE=airsquirrels;465777]Updated timings from the Windows driver (Which I'm told is still using the previous closed compiler, while Linux has transitioned to LLVM/rocm - apparently regrettably)

RX64 Air:
ms/iter: 1.639


FE: Unknown, won't recognize with the same same driver as RX.

This is with stock clocks. If I use the "stable" 1000Mhz memory clocks that pass self test I get 1.600

Both kernels work as expected in Windows, so likely an llvm regression.[/QUOTE]

I've been investigating this for a while now on Linux. I'm mystified by the behavior I observe. Clearly is something related to memory/cache and timing. But I can't track this down to a cause in my code; frustrating.

airsquirrels 2017-08-21 01:30

[QUOTE=preda;466027]I've been investigating this for a while now on Linux. I'm mystified by the behavior I observe. Clearly is something related to memory/cache and timing. But I can't track this down to a cause in my code; frustrating.[/QUOTE]

I saw your post over on ROCm. I’ve seen numerous regressions since this switch to llvm from the older proprietary compiler. In my opinion switching amdgpu-pro to ROCm was premature and I’m disappointed that they essentially moved a beta compiler to the only option in their production driver. Clearly they have the compiler code for Vega - windows uses it.

kracker 2017-08-28 02:06

1 Attachment(s)
Windows binary from the latest commit(676be1c).
[B]NOTE: Untested![/B]

preda 2017-08-28 02:59

[QUOTE=kracker;466482]Windows binary from the latest commit(676be1c).
[B]NOTE: Untested![/B][/QUOTE]

I would hold on a bit before using 1.0, it's still work in progress and I'd like to do more testing myself. Also it's not backwards compatible with savefiles.

PS: I'll make a post with a summary of the changes when it's good to go.

preda 2017-08-30 05:45

gpuOwl 1.0
 
[QUOTE=preda;466484]I'll make a post with a summary of the changes when it's good to go.[/QUOTE]

As promised, here's the note with recent changes to gpuOwl:

gpuOwl 1.0 ( [url]https://github.com/preda/gpuowl[/url] )

The "1.0" is not an indication of stability or bug-free, but only of radical changes from the previous version (0.6) -- so expect the usual amount of bugs (or more) and rough edges.

gpuOwl does not implement LL (Lucas-Lehmer primality test) anymore. Instead, it does the "PRP-3" probable prime test, which is:

For M = 2^p-1, compute R := 3^(p-1) mod M.
If R == -3 (mod M), then M is a probable prime. In this situation, an LL test should be run to verify that the *probable* prime is actually prime.

(in practice though, somebody lucky to find a PRP-3 probable mersenne prime should probably simply claim a prime, and the community would do the required LL validation).

The change to PRP-3 is motivated by a nice self-validating side computation described by Robert Gerbicz [url]http://www.mersenneforum.org/showpost.php?p=465431&postcount=88[/url] .

This affords confidence in the correctness of the result, with small additional cost.

Other main changes:
- the residue computed is not an LL residue anymore. It should not be compared or reported/mixed with LL residues. To facilitate the distinction, the new residue looks different, e.g. "P3-0223c56fae4cfdef6b" (you may notice it's a bit longer too, this is because the last 2 hex-digits are check-digits).

- while the end-residue for an LL prime is 0, the end-residue for this PRP-3 "probable prime" is "-3", which shows up as 0xfffffffffffffffc. Not as nice as 0, I agree :). E.g. P3-fffffffffffffffc0d with the checkdigits.

- all LL error protections have been removed: loop detection, Jacobi-check, rounding-error, etc. (as the new check is stronger).

1.0 looks OK to me for general use now. Looking for bug reports & feedback.

PS: on RX Vega, use "-legacy" flag.

preda 2017-08-30 05:52

gpuOwl 1.0 does not load savefiles from previous versions. (PRP does not load/continue LL).
So please keep with the LL version until the ongoing exponent is done, then switch and start from the beginning of a new exponent.


All times are UTC. The time now is 07:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.