mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-12-11, 21:15   #1574
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1100010101102 Posts
Default

Quote:
Originally Posted by kriesel View Post
So, the second is also not at system startup.
No, only if you have autologon enabled for some user on the system.


You can create a task in "Task Scheduler" with trigger "At startup" and checkmark "Run whether user is logged on or not". But you need some form of admin privileges on the system to create such a task.

Last fiddled with by ATH on 2019-12-11 at 21:24
ATH is offline   Reply With Quote
Old 2019-12-11, 21:42   #1575
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by ATH View Post
You can create a task in "Task Scheduler" with trigger "At startup" and checkmark "Run whether user is logged on or not". But you need some form of admin privileges on the system to create such a task.
Code:
@reboot ~/prime/mprime -d </dev/null >>~/prime/mprime.log 2>/dev/null &
...under an uprivilaged account.

Sorry; couldn't resist...
chalsall is offline   Reply With Quote
Old 2019-12-11, 22:21   #1576
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

Quote:
Originally Posted by kracker View Post
What we really need is the equivalent of --perftest from mfakto to gpuowl - while I don't mind manually doing it takes a decent amount of time and others may not be inclined to do something like this... Plus, I'm sure retests will be necessary as things are changed or added in time.
Or more like cufftbench and threadbench of cudalucas.
Programmatically spin through all the possibilities, for a given fft length or range, and create lists in files for what to use for what fft length on a given gpu. Program, benchmark and tune thyself.
The price of that is whatever Mihai would be doing such as increasing performance or adding features, if not for programming benchmarking instead. And that benchmarking code is a moving target as George or Mihai come up with additional -use options and underlying code path changes/additions.

Meanwhile, we can use batch files / shell scripts with the right options. Assuming of course that we know what the right options and combinations are. Which is not the case generally for the latest commit or several. For example, how does T2_SHUFFLE combine with the others that were applicable to 6.11-79?

Last fiddled with by kriesel on 2019-12-11 at 22:25
kriesel is offline   Reply With Quote
Old 2019-12-11, 22:31   #1577
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE

I'll ask preda to include this change soon.

For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower.
Prime95 is offline   Reply With Quote
Old 2019-12-11, 22:51   #1578
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1110101100112 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE

I'll ask preda to include this change soon.

For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower.
For mine at ~99 million bits:

Code:
1033us with ./gpuowl
936us with ./gpuowl -use MERGED_MIDDLE
875us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH
866us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH -use T2_SHUFFLE_REVERSELINE -use T2_SHUFFLE_MIDDLE
"sensors" shows a move from 195w to 215w (setsclk 4) between the second and fourth commands.

Another giant leap

Last fiddled with by paulunderwood on 2019-12-11 at 23:16
paulunderwood is online now   Reply With Quote
Old 2019-12-11, 22:51   #1579
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE

I'll ask preda to include this change soon.

For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower.
With just NO_ASM and MERGED_MIDDLE, I'm getting this:
Code:
2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run
EDIT: P100/Colab

Last fiddled with by kracker on 2019-12-11 at 22:53
kracker is offline   Reply With Quote
Old 2019-12-12, 01:14   #1580
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by kracker View Post
With just NO_ASM and MERGED_MIDDLE, I'm getting this:
Code:
2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run
EDIT: P100/Colab
Try using just T2_SHUFFLE_WIDTH and T2_SHUFFLE_MIDDLE. The other 2 options will double the amount of local memory required by tailFused.
Prime95 is offline   Reply With Quote
Old 2019-12-12, 02:16   #1581
xx005fs
 
"Eric"
Jan 2018
USA

3248 Posts
Default

Quote:
Originally Posted by kracker View Post
With just NO_ASM and MERGED_MIDDLE, I'm getting this:
Code:
2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run
EDIT: P100/Colab
Getting same issue. It seems to be only attributed to Nvidia GPUs.
xx005fs is offline   Reply With Quote
Old 2019-12-12, 02:36   #1582
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Getting same issue. It seems to be only attributed to Nvidia GPUs.
Yup, same here on RTX2080, I now get that error even with just NO_ASM.
nomead is offline   Reply With Quote
Old 2019-12-12, 02:39   #1583
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Getting same issue. It seems to be only attributed to Nvidia GPUs.
On the off chance it is an OpenCL compile issue, go to tailFused and change the declaration of lds to size SMALL_HEIGHT*2 rather than SMALL_HEIGHT*complicated_expression.
Prime95 is offline   Reply With Quote
Old 2019-12-12, 06:55   #1584
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default V6.11-83-ge270393

Building gpuowl v6.11-83 for Windows, with msys2/mingw64, git, and make, emits quite a few warnings, but builds successfully:
Code:
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.11-83-ge270393"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17   -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17   -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17   -c -o Worktodo.o Worktodo.cpp
In file included from Worktodo.cpp:6:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17   -c -o common.o common.cpp
In file included from common.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17   -c -o main.o main.cpp
In file included from main.cpp:8:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17   -c -o Gpu.o Gpu.cpp
In file included from ProofSet.h:6,
                 from Gpu.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17   -c -o clwrap.o clwrap.cpp
In file included from clwrap.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17   -c -o Task.o Task.cpp
In file included from Task.cpp:7:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17   -c -o checkpoint.o checkpoint.cpp
In file included from checkpoint.h:5,
                 from checkpoint.cpp:3:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17   -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17   -c -o Args.o Args.cpp
In file included from Args.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~

g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17   -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17   -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17   -c -o FFTConfig.o FFTConfig.cpp
g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17   -c -o AllocTrac.o AllocTrac.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17   -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe
Run the help:
Code:
$ ./gpuowl-win.exe -h
2019-12-11 17:34:31 gpuowl v6.11-83-ge270393

Command line options:

-dir <folder>      : specify local work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-pool <dir>        : specify a directory with the shared (pooled) worktodo.txt and results.txt
                     Multiple GpuOwl instances, each in its own directory, can share a pool of assignments and report
                     the results back to the common pool.
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-log <step>        : log every <step> iterations, default 200000. Multiple of 10000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound, default 500000
-B2                : P-1 B2 bound, default B1 * 30
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-cleanup           : delete save files at end of run
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent>    : run a single P-1 test and exit, ignoring worktodo.txt
-results <file>    : name of results file, default 'results.txt'
-iters <N>         : run next PRP test for <N> iterations and exit. Multiple of 10000.
-maxAlloc          : limit GPU memory usage to this value in MB (needed on non-AMD GPUs)
-yield             : enable work-around for CUDA busy wait taking up one CPU core
-nospin            : disable progress spinner
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning)
-device <N>        : select a specific device:
 0 : Ellesmere-Radeon (TM) RX 480 Graphics AMD
 1 : gfx804-Radeon 550 Series AMD

FFT Configurations:
FFT    8K [  0.01M -    0.17M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   64K [  0.10M -    1.33M]  64-512 512-64
FFT  128K [  0.20M -    2.62M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.89M]  64-256-6
FFT  224K [  0.34M -    4.52M]  64-256-7
FFT  256K [  0.39M -    5.15M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.77M]  64-256-9
FFT  320K [  0.49M -    6.40M]  64-256-10
FFT  352K [  0.54M -    7.02M]  64-256-11
FFT  384K [  0.59M -    7.64M]  64-256-12 64-512-6
FFT  448K [  0.69M -    8.88M]  64-512-7
FFT  512K [  0.79M -   10.12M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.35M]  64-512-9
FFT  640K [  0.98M -   12.58M]  64-512-10
FFT  704K [  1.08M -   13.81M]  64-512-11
FFT  768K [  1.18M -   15.03M]  64-512-12 64-1K-6 256-256-6
FFT  896K [  1.38M -   17.47M]  64-1K-7 256-256-7
FFT    1M [  1.57M -   19.89M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.32M]  64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.73M]  64-1K-10 256-256-10
FFT 1408K [  2.16M -   27.14M]  64-1K-11 256-256-11
FFT 1536K [  2.36M -   29.54M]  64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6
FFT 1792K [  2.75M -   34.33M]  64-2K-7 256-512-7 512-256-7
FFT    2M [  3.15M -   39.10M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   43.85M]  64-2K-9 256-512-9 512-256-9
FFT 2560K [  3.93M -   48.59M]  64-2K-10 256-512-10 512-256-10
FFT 2816K [  4.33M -   53.32M]  64-2K-11 256-512-11 512-256-11
FFT    3M [  4.72M -   58.04M]  1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [  5.51M -   67.44M]  1K-256-7 256-1K-7 512-512-7
FFT    4M [  6.29M -   76.81M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.15M]  1K-256-9 256-1K-9 512-512-9
FFT    5M [  7.86M -   95.46M]  1K-256-10 256-1K-10 512-512-10
FFT 5632K [  8.65M -  104.74M]  1K-256-11 256-1K-11 512-512-11
FFT    6M [  9.44M -  114.00M]  1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT    7M [ 11.01M -  132.46M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  150.85M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  169.18M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  187.45M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   11M [ 17.30M -  205.67M]  1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT   12M [ 18.87M -  223.85M]  1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT   14M [ 22.02M -  260.08M]  1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT   16M [ 25.17M -  296.17M]  4K-2K
FFT   18M [ 28.31M -  332.13M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  367.98M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   22M [ 34.60M -  403.74M]  1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT   24M [ 37.75M -  439.40M]  1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT   28M [ 44.04M -  510.47M]  1K-2K-7 2K-1K-7 4K-512-7
FFT   36M [ 56.62M -  651.81M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  722.13M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   44M [ 69.21M -  792.25M]  1K-2K-11 2K-1K-11 4K-512-11
FFT   48M [ 75.50M -  862.18M]  1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT   56M [ 88.08M - 1001.57M]  2K-2K-7 4K-1K-7
FFT   72M [113.25M - 1278.70M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1416.57M]  2K-2K-10 4K-1K-10
FFT   88M [138.41M - 1554.04M]  2K-2K-11 4K-1K-11
FFT   96M [150.99M - 1691.15M]  2K-2K-12 4K-1K-12 4K-2K-6
FFT  112M [176.16M - 1964.39M]  4K-2K-7
FFT  144M [226.49M - 2507.57M]  4K-2K-9
FFT  160M [251.66M - 2777.78M]  4K-2K-10
FFT  176M [276.82M - 3047.18M]  4K-2K-11
FFT  192M [301.99M - 3315.86M]  4K-2K-12
2019-12-11 17:34:38 Exiting because "help"
2019-12-11 17:34:38 Bye
Tune test on NVIDIA GTX 1080 Ti
Code:
Gpuowl version and commit   
GPU model    NVIDIA GTX 1080 Ti
GPU clock    free running ~1860 Mhz
Host OS        Win7 Pro x64
Notes    

Exponent timed     89796247
Computation type (PRP, P-1 stage 1, P-1 stage 2):    PRP
FFT length    FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word 
config file entries    -time -iters 10000 -device 0 -user kriesel -cpu dodo/gtx1080ti

varying tuning -use options, in chronological order
3696    NO_ASM us/sq warmup, end user interaction, stabilize
3706    NO_ASM baseline

In benchmarking (highlight fastest time in bold)
3596    NO_ASM,MERGED_MIDDLE,WORKINGIN
3593    NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability)
3592    NO_ASM,MERGED_MIDDLE,WORKINGIN1
3593    NO_ASM,MERGED_MIDDLE,WORKINGIN1A
3600    NO_ASM,MERGED_MIDDLE,WORKINGIN2
3534    NO_ASM,MERGED_MIDDLE,WORKINGIN3
3515    NO_ASM,MERGED_MIDDLE,WORKINGIN4
3529    NO_ASM,MERGED_MIDDLE,WORKINGIN5

Out benchmarking (highlight fastest time in bold)
3567    NO_ASM,MERGED_MIDDLE,WORKINGOUT
3584    NO_ASM,MERGED_MIDDLE,WORKINGOUT0
3587    NO_ASM,MERGED_MIDDLE,WORKINGOUT1
3599    NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
3577    NO_ASM,MERGED_MIDDLE,WORKINGOUT2
3529    NO_ASM,MERGED_MIDDLE,WORKINGOUT3
3509    NO_ASM,MERGED_MIDDLE,WORKINGOUT4
3531    NO_ASM,MERGED_MIDDLE,WORKINGOUT5

Fastest WORKINGIN, Fastest WORKINGOUT combination:
3490    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4

repeatability +-1.5/3594.5 = +-0.042%
best     3490
base    3706
 ratio    1.062
It's unclear which commit is required for the T2 options George has introduced recently. (https://www.mersenneforum.org/showpo...postcount=1577)
Do the shuffle shuffle:
Code:
3677    NO_ASM 
3485    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4
3490    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH
3482    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE
3480    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT
3480    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 
3504    NO_ASM,MERGED_MIDDLE,WORKINGIN4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 

3676    NO_ASM
3482    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
3487    NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE

best    3480
base    3677
ratio    1.057
Attached Files
File Type: 7z gpuowl-v6.11-83-ge270393.7z (431.9 KB, 82 views)

Last fiddled with by kriesel on 2019-12-12 at 07:54
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 10:22.


Fri Aug 6 10:22:12 UTC 2021 up 14 days, 4:51, 1 user, load averages: 3.94, 3.84, 3.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.