![]() |
|
|
#133 |
|
"Mihai Preda"
Apr 2015
55B16 Posts |
|
|
|
|
|
|
#134 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
|
|
|
|
|
|
|
#135 | |
|
"Mihai Preda"
Apr 2015
137110 Posts |
I updated gpuOwL on github ( https://github.com/preda/gpuowl ), bumping version to 0.2. Here is a summary of the changes:
1. "amalgamation kernel". I merged 4 previous distinct kernels into one, "big" kernel. This saves about 3 global-memory round-trips. It does not change the double-precision complexity though. As the previous kernels were close to being double-precision bound (and close to memory-bound too), the performance gain from using the "amalgamation" is a modest 5%-10%. 2. Added option -legacy which forces the old behavior (i.e. not using the "amalgamation kernel"). 3. Added define -D NO_ERR to disable computation of the max-error. This gains about 1% performance, but I think it's not recommended because the max-error is useful info. Similarly, added a define -D LOW_LDS to use a variant of the amalgamation with low LDS usage. These defines are passed on the command line like this (an example): ./gpuowl -logstep 10000 -cl "-DNO_ERR -DLOW_LDS" (note there is a single argument after -cl, enclosed in quotes if needed) 4. Changed the carry-propagation to stay in double-precision (previously an intermediary integer phase was involved, but the conversion double-to-long is expensive on GCN). 5. The carry propagation length is much shorter, only 3 words now. This raises the exponent lower bound to about 12 bits-per-word. (-legacy is not affected). The checkpoint (save) format is not changed. As usual, I'd recommend doing a -selftest (~ 30minutes) and one successful double-check LL before starting first-time LL. This amalgamation kernel is big. For performance it must be compiled in under 128 VGPRs, but AMD's OpenCL compiler (LLVM-based) is very poor at optimizing VGPR allocation and I had to fight it to fit under 128 VGPRs. If it happens that some compiler does not make the 128VGPR limit, then the amalgamation kernel takes a serious performance hit. Below, on FuryX, 4M FFT just barely over 2 ms/it: Quote:
|
|
|
|
|
|
|
#136 | ||
|
"Mihai Preda"
Apr 2015
3·457 Posts |
A couple more timings:
Fury Nano, with max-error disabled, 2.24 ms/it Quote:
Quote:
|
||
|
|
|
|
|
#137 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Got this error while trying to compile:
Code:
$ g++ -c gpuowl.cpp
gpuowl.cpp: In function 'int main(int, char**)':
gpuowl.cpp:702:5: error: 'uint' was not declared in this scope
uint baseBitlen = (int) floorl(E / (long double) N);
^~~~
gpuowl.cpp:714:25: error: 'baseBitlen' was not declared in this scope
mega1K.setArgs (baseBitlen, buf1, bufCarry, bufReady, bufErr, bufA, bufI, bufTrig1K);
^~~~~~~~~~
|
|
|
|
|
|
#138 | |
|
"Mihai Preda"
Apr 2015
101010110112 Posts |
Quote:
|
|
|
|
|
|
|
#139 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Thanks!
I'm getting this now though when i start it... Code:
... Compile : 2160 ms General setup : 476 ms Assertion failed! Program: C:\Users\Back\Desktop\gpuowl\gpuowl.exe File: gpuowl.cpp, Line 100 Expression: bits == baseBits || bits == baseBits +1 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. |
|
|
|
|
|
#140 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Could you please tell me, which c++ compiler, and which exponent? also, which platform?(mingw?). I'd like to reproduce this. (probably the exponent is all I need to repro...)
Last fiddled with by preda on 2017-05-22 at 03:40 |
|
|
|
|
|
#141 | |
|
"David"
Jul 2015
Ohio
51710 Posts |
Quote:
I can say just from looking over the high level numbers it does look like there is something about gpuowl that results in occasional bad results where clLucas had been reliable. It is hard to know if this is just from stressing cards closer to their limit by pulling all the performance, or some other factor. So far I haven't found an issue that is repeatable for any one exponent, so I don't believe there are math/logic issues unless timing related. It is worth noting that we should cause gpuowl to fail if it reaches 00...0002 or all zero at any point in the calculation. One of the most interesting results I have had lately hit the 00...02 residue at one point. What made this most interesting is that this was on a FirePro W8100 which should be more resilient than the typical card due to ECC memory and better binning. |
|
|
|
|
|
|
#142 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
I'll tinker around with it some more later though.. |
|
|
|
|
|
|
#143 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
181 successful LL tests with gpuOwl.
7 bad 4 good from one particular Fury X, vs. a perfect record on clLucas. I've dialed core clock back to 975 and will continue to monitor. 1 bad on a Fury X from a bad power supply system. 1 bad possibly from a power outage on one core of a 295x2. (Other core matched) 2 bad (Out of 4 total completed) on my FirePro W9100/W8100 system. I'm monitoring this, it may be a driver problem or other issue. This system is rock solid on clLucas, but I updated drivers around the same time I switched it to gpuOwl. I also discovered another one of my Titan Blacks seems to have reached failure mode, which was clouding my bad results list. I feel bad for blaming gpuOwl for those red marks. In total v0.1 seems very reliable aside from the above listed cases. Even including them, I see 181 good/ 11 bad for a 94.27% success rate. The new version 0.2 is definitely running ~12.5% faster for me on most cards. I will report back in a week or so if I see any errors from that version. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |