![]() |
![]() |
#1673 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·29·127 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1674 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]()
It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. https://www.mersenne.org/report_expo...0000031&full=1 Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.
Last fiddled with by kriesel on 2019-12-31 at 03:01 |
![]() |
![]() |
![]() |
#1675 |
Aug 2010
Republic of Belarus
2·89 Posts |
![]()
Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq? |
![]() |
![]() |
![]() |
#1676 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
163068 Posts |
![]()
Edit source code and recompile.
Last fiddled with by kriesel on 2019-12-31 at 12:33 |
![]() |
![]() |
![]() |
#1677 |
Aug 2010
Republic of Belarus
2·89 Posts |
![]() |
![]() |
![]() |
![]() |
#1678 | |
Sep 2002
Database er0rr
118816 Posts |
![]() Quote:
Code:
grep -B3 -A3 "us/it" Gpu.cpp static string makeLogStr(u32 E, string_view status, u32 k, u64 res, float secsPerIt, u32 nIters) { char buf[256]; snprintf(buf, sizeof(buf), "%u %2s %8d %6.2f%%; %4.0f us/it; ETA %s; %s", E, status.data(), k, k / float(nIters) * 100, secsPerIt * 1'000'000, getETA(k, nIters, secsPerIt).c_str(), hex(res).c_str()); %4.0f us/it ---> %4.3f ms/it 1'000'000 ---> 1'000 And recompile. Or just divide by 1000 in your head. ![]() Last fiddled with by paulunderwood on 2019-12-31 at 12:52 |
|
![]() |
![]() |
![]() |
#1679 |
Aug 2010
Republic of Belarus
2628 Posts |
![]() |
![]() |
![]() |
![]() |
#1680 | |
Random Account
Aug 2009
Not U. + S.A.
7·192 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1681 |
"Mihai Preda"
Apr 2015
22·192 Posts |
![]()
A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.
The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl. PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop). OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now). If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it. |
![]() |
![]() |
![]() |
#1682 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]()
This should have the -use CARRY32 default that Preda described above. I've only gone as far as running -h on it so far. Build again had the usual shower of warnings.
Just when I think we're at diminishing returns or at the end of optimizations, George provides another pleasant surprise. Last fiddled with by kriesel on 2020-01-04 at 15:02 |
![]() |
![]() |
![]() |
#1683 | |
Sep 2002
Database er0rr
23×3×11×17 Posts |
![]() Quote:
Last fiddled with by paulunderwood on 2020-01-04 at 16:11 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |