![]() |
|
|
#1574 |
|
Einyen
Dec 2003
Denmark
1100010101102 Posts |
No, only if you have autologon enabled for some user on the system.
You can create a task in "Task Scheduler" with trigger "At startup" and checkmark "Run whether user is logged on or not". But you need some form of admin privileges on the system to create such a task. Last fiddled with by ATH on 2019-12-11 at 21:24 |
|
|
|
|
|
#1575 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×67×73 Posts |
Quote:
Code:
@reboot ~/prime/mprime -d </dev/null >>~/prime/mprime.log 2>/dev/null & Sorry; couldn't resist...
|
|
|
|
|
|
|
#1576 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
Quote:
Programmatically spin through all the possibilities, for a given fft length or range, and create lists in files for what to use for what fft length on a given gpu. Program, benchmark and tune thyself. The price of that is whatever Mihai would be doing such as increasing performance or adding features, if not for programming benchmarking instead. And that benchmarking code is a moving target as George or Mihai come up with additional -use options and underlying code path changes/additions. Meanwhile, we can use batch files / shell scripts with the right options. Assuming of course that we know what the right options and combinations are. Which is not the case generally for the latest commit or several. For example, how does T2_SHUFFLE combine with the others that were applicable to 6.11-79? Last fiddled with by kriesel on 2019-12-11 at 22:25 |
|
|
|
|
|
|
#1577 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
I'll ask preda to include this change soon. For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower. |
|
|
|
|
|
#1578 | |
|
Sep 2002
Database er0rr
1110101100112 Posts |
Quote:
Code:
1033us with ./gpuowl 936us with ./gpuowl -use MERGED_MIDDLE 875us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH 866us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH -use T2_SHUFFLE_REVERSELINE -use T2_SHUFFLE_MIDDLE Another giant leap Last fiddled with by paulunderwood on 2019-12-11 at 23:16 |
|
|
|
|
|
|
#1579 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
Code:
2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run Last fiddled with by kracker on 2019-12-11 at 22:53 |
|
|
|
|
|
|
#1580 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Try using just T2_SHUFFLE_WIDTH and T2_SHUFFLE_MIDDLE. The other 2 options will double the amount of local memory required by tailFused.
|
|
|
|
|
|
#1581 |
|
"Eric"
Jan 2018
USA
3248 Posts |
|
|
|
|
|
|
#1582 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
|
|
|
|
|
|
#1583 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
|
|
|
|
|
|
#1584 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Building gpuowl v6.11-83 for Windows, with msys2/mingw64, git, and make, emits quite a few warnings, but builds successfully:
Code:
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.11-83-ge270393"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp
In file included from Worktodo.cpp:6:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp
In file included from common.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp
In file included from main.cpp:8:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp
In file included from ProofSet.h:6,
from Gpu.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp
In file included from clwrap.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp
In file included from Task.cpp:7:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp
In file included from checkpoint.h:5,
from checkpoint.cpp:3:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp
In file included from Args.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp
g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17 -c -o AllocTrac.o AllocTrac.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe
Code:
$ ./gpuowl-win.exe -h
2019-12-11 17:34:31 gpuowl v6.11-83-ge270393
Command line options:
-dir <folder> : specify local work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-pool <dir> : specify a directory with the shared (pooled) worktodo.txt and results.txt
Multiple GpuOwl instances, each in its own directory, can share a pool of assignments and report
the results back to the common pool.
-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-log <step> : log every <step> iterations, default 200000. Multiple of 10000.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-B1 : P-1 B1 bound, default 500000
-B2 : P-1 B2 bound, default B1 * 30
-rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-cleanup : delete save files at end of run
-prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt
-results <file> : name of results file, default 'results.txt'
-iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000.
-maxAlloc : limit GPU memory usage to this value in MB (needed on non-AMD GPUs)
-yield : enable work-around for CUDA busy wait taking up one CPU core
-nospin : disable progress spinner
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning)
-device <N> : select a specific device:
0 : Ellesmere-Radeon (TM) RX 480 Graphics AMD
1 : gfx804-Radeon 550 Series AMD
FFT Configurations:
FFT 8K [ 0.01M - 0.17M] 64-64
FFT 32K [ 0.05M - 0.68M] 64-256 256-64
FFT 64K [ 0.10M - 1.33M] 64-512 512-64
FFT 128K [ 0.20M - 2.62M] 1K-64 64-1K 256-256
FFT 192K [ 0.29M - 3.89M] 64-256-6
FFT 224K [ 0.34M - 4.52M] 64-256-7
FFT 256K [ 0.39M - 5.15M] 64-2K 256-512 512-256 2K-64
FFT 288K [ 0.44M - 5.77M] 64-256-9
FFT 320K [ 0.49M - 6.40M] 64-256-10
FFT 352K [ 0.54M - 7.02M] 64-256-11
FFT 384K [ 0.59M - 7.64M] 64-256-12 64-512-6
FFT 448K [ 0.69M - 8.88M] 64-512-7
FFT 512K [ 0.79M - 10.12M] 1K-256 256-1K 512-512 4K-64
FFT 576K [ 0.88M - 11.35M] 64-512-9
FFT 640K [ 0.98M - 12.58M] 64-512-10
FFT 704K [ 1.08M - 13.81M] 64-512-11
FFT 768K [ 1.18M - 15.03M] 64-512-12 64-1K-6 256-256-6
FFT 896K [ 1.38M - 17.47M] 64-1K-7 256-256-7
FFT 1M [ 1.57M - 19.89M] 1K-512 256-2K 512-1K 2K-256
FFT 1152K [ 1.77M - 22.32M] 64-1K-9 256-256-9
FFT 1280K [ 1.97M - 24.73M] 64-1K-10 256-256-10
FFT 1408K [ 2.16M - 27.14M] 64-1K-11 256-256-11
FFT 1536K [ 2.36M - 29.54M] 64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6
FFT 1792K [ 2.75M - 34.33M] 64-2K-7 256-512-7 512-256-7
FFT 2M [ 3.15M - 39.10M] 1K-1K 512-2K 2K-512 4K-256
FFT 2304K [ 3.54M - 43.85M] 64-2K-9 256-512-9 512-256-9
FFT 2560K [ 3.93M - 48.59M] 64-2K-10 256-512-10 512-256-10
FFT 2816K [ 4.33M - 53.32M] 64-2K-11 256-512-11 512-256-11
FFT 3M [ 4.72M - 58.04M] 1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [ 5.51M - 67.44M] 1K-256-7 256-1K-7 512-512-7
FFT 4M [ 6.29M - 76.81M] 1K-2K 2K-1K 4K-512
FFT 4608K [ 7.08M - 86.15M] 1K-256-9 256-1K-9 512-512-9
FFT 5M [ 7.86M - 95.46M] 1K-256-10 256-1K-10 512-512-10
FFT 5632K [ 8.65M - 104.74M] 1K-256-11 256-1K-11 512-512-11
FFT 6M [ 9.44M - 114.00M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K
FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT 10M [ 15.73M - 187.45M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT 11M [ 17.30M - 205.67M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT 12M [ 18.87M - 223.85M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT 14M [ 22.02M - 260.08M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT 16M [ 25.17M - 296.17M] 4K-2K
FFT 18M [ 28.31M - 332.13M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT 20M [ 31.46M - 367.98M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT 22M [ 34.60M - 403.74M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT 24M [ 37.75M - 439.40M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT 28M [ 44.04M - 510.47M] 1K-2K-7 2K-1K-7 4K-512-7
FFT 36M [ 56.62M - 651.81M] 1K-2K-9 2K-1K-9 4K-512-9
FFT 40M [ 62.91M - 722.13M] 1K-2K-10 2K-1K-10 4K-512-10
FFT 44M [ 69.21M - 792.25M] 1K-2K-11 2K-1K-11 4K-512-11
FFT 48M [ 75.50M - 862.18M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT 56M [ 88.08M - 1001.57M] 2K-2K-7 4K-1K-7
FFT 72M [113.25M - 1278.70M] 2K-2K-9 4K-1K-9
FFT 80M [125.83M - 1416.57M] 2K-2K-10 4K-1K-10
FFT 88M [138.41M - 1554.04M] 2K-2K-11 4K-1K-11
FFT 96M [150.99M - 1691.15M] 2K-2K-12 4K-1K-12 4K-2K-6
FFT 112M [176.16M - 1964.39M] 4K-2K-7
FFT 144M [226.49M - 2507.57M] 4K-2K-9
FFT 160M [251.66M - 2777.78M] 4K-2K-10
FFT 176M [276.82M - 3047.18M] 4K-2K-11
FFT 192M [301.99M - 3315.86M] 4K-2K-12
2019-12-11 17:34:38 Exiting because "help"
2019-12-11 17:34:38 Bye
Code:
Gpuowl version and commit GPU model NVIDIA GTX 1080 Ti GPU clock free running ~1860 Mhz Host OS Win7 Pro x64 Notes Exponent timed 89796247 Computation type (PRP, P-1 stage 1, P-1 stage 2): PRP FFT length FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word config file entries -time -iters 10000 -device 0 -user kriesel -cpu dodo/gtx1080ti varying tuning -use options, in chronological order 3696 NO_ASM us/sq warmup, end user interaction, stabilize 3706 NO_ASM baseline In benchmarking (highlight fastest time in bold) 3596 NO_ASM,MERGED_MIDDLE,WORKINGIN 3593 NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability) 3592 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3593 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3600 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3534 NO_ASM,MERGED_MIDDLE,WORKINGIN3 3515 NO_ASM,MERGED_MIDDLE,WORKINGIN4 3529 NO_ASM,MERGED_MIDDLE,WORKINGIN5 Out benchmarking (highlight fastest time in bold) 3567 NO_ASM,MERGED_MIDDLE,WORKINGOUT 3584 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 3587 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3599 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3577 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3529 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 3509 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3531 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 Fastest WORKINGIN, Fastest WORKINGOUT combination: 3490 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 repeatability +-1.5/3594.5 = +-0.042% best 3490 base 3706 ratio 1.062 Do the shuffle shuffle: Code:
3677 NO_ASM 3485 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 3490 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH 3482 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE 3480 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT 3480 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 3504 NO_ASM,MERGED_MIDDLE,WORKINGIN4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 3676 NO_ASM 3482 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 3487 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE best 3480 base 3677 ratio 1.057 Last fiddled with by kriesel on 2019-12-12 at 07:54 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |