![]() |
[QUOTE=kriesel;532661]So, the second is also not at system startup.[/QUOTE]
No, only if you have autologon enabled for some user on the system. You can create a task in "Task Scheduler" with trigger "At startup" and checkmark "Run whether user is logged on or not". But you need some form of admin privileges on the system to create such a task. |
[QUOTE=ATH;532665]You can create a task in "Task Scheduler" with trigger "At startup" and checkmark "Run whether user is logged on or not". But you need some form of admin privileges on the system to create such a task.[/QUOTE]
[CODE]@reboot ~/prime/mprime -d </dev/null >>~/prime/mprime.log 2>/dev/null &[/CODE] ...under an uprivilaged account. Sorry; couldn't resist... :wink: |
[QUOTE=kracker;532662]What we really need is the equivalent of --perftest from mfakto to gpuowl - while I don't mind manually doing it takes a decent amount of time and others may not be inclined to do something like this... Plus, I'm sure retests will be necessary as things are changed or added in time.[/QUOTE]
Or more like cufftbench and threadbench of cudalucas. Programmatically spin through all the possibilities, for a given fft length or range, and create lists in files for what to use for what fft length on a given gpu. Program, benchmark and tune thyself. The price of that is whatever Mihai would be doing such as increasing performance or adding features, if not for programming benchmarking instead. And that benchmarking code is a moving target as George or Mihai come up with additional -use options and underlying code path changes/additions. Meanwhile, we can use batch files / shell scripts with the right options. Assuming of course that we know what the right options and combinations are. Which is not the case generally for the latest commit or several. For example, how does T2_SHUFFLE combine with the others that were applicable to 6.11-79? |
Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
I'll ask preda to include this change soon. For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower. |
[QUOTE=Prime95;532677]Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
I'll ask preda to include this change soon. For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower.[/QUOTE] For mine at ~99 million bits: [CODE] 1033us with ./gpuowl 936us with ./gpuowl -use MERGED_MIDDLE 875us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH 866us with ./gpuowl -use MERGED_MIDDLE -use T2_SHUFFLE_WIDTH -use T2_SHUFFLE_REVERSELINE -use T2_SHUFFLE_MIDDLE [/CODE] "sensors" shows a move from 195w to 215w (setsclk 4) between the second and fourth commands. Another giant leap :tu: |
[QUOTE=Prime95;532677]Four new options to try (using gpuowl.cl from git fork in gwoltman2/gpuowl). T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
I'll ask preda to include this change soon. For me, all but T2_SHUFFLE_HEIGHT result in better performance. I've been fighting the rocm optimizer trying to figure out why this one case is slower.[/QUOTE] With just NO_ASM and MERGED_MIDDLE, I'm getting this: [code]2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run[/code] EDIT: P100/Colab |
[QUOTE=kracker;532681]With just NO_ASM and MERGED_MIDDLE, I'm getting this:
[code]2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run[/code] EDIT: P100/Colab[/QUOTE] Try using just T2_SHUFFLE_WIDTH and T2_SHUFFLE_MIDDLE. The other 2 options will double the amount of local memory required by tailFused. |
[QUOTE=kracker;532681]With just NO_ASM and MERGED_MIDDLE, I'm getting this:
[code]2019-12-11 22:49:23 Exception gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:312 run[/code] EDIT: P100/Colab[/QUOTE] Getting same issue. It seems to be only attributed to Nvidia GPUs. |
[QUOTE=xx005fs;532690]Getting same issue. It seems to be only attributed to Nvidia GPUs.[/QUOTE]
Yup, same here on RTX2080, I now get that error even with just NO_ASM. |
[QUOTE=xx005fs;532690]Getting same issue. It seems to be only attributed to Nvidia GPUs.[/QUOTE]
On the off chance it is an OpenCL compile issue, go to tailFused and change the declaration of lds to size SMALL_HEIGHT*2 rather than SMALL_HEIGHT*complicated_expression. |
V6.11-83-ge270393
1 Attachment(s)
Building gpuowl v6.11-83 for Windows, with msys2/mingw64, git, and make, emits quite a few warnings, but builds successfully:
[CODE]$ make gpuowl-win.exe cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp echo \"`git describe --long --dirty --always`\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: `cat version.inc` Version: "v6.11-83-ge270393" g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp In file included from Worktodo.cpp:6: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp In file included from common.cpp:4: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp In file included from main.cpp:8: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp In file included from ProofSet.h:6, from Gpu.cpp:4: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp In file included from clwrap.cpp:4: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp In file included from Task.cpp:7: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp In file included from checkpoint.h:5, from checkpoint.cpp:3: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp In file included from Args.cpp:4: File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=] log("Can't open '%s' (mode '%s')\n", name.c_str(), mode); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17 -c -o AllocTrac.o AllocTrac.cpp g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static strip gpuowl-win.exe[/CODE]Run the help: [CODE]$ ./gpuowl-win.exe -h 2019-12-11 17:34:31 gpuowl v6.11-83-ge270393 Command line options: -dir <folder> : specify local work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log) -pool <dir> : specify a directory with the shared (pooled) worktodo.txt and results.txt Multiple GpuOwl instances, each in its own directory, can share a pool of assignments and report the results back to the common pool. -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -log <step> : log every <step> iterations, default 200000. Multiple of 10000. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -B1 : P-1 B1 bound, default 500000 -B2 : P-1 B2 bound, default B1 * 30 -rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set -cleanup : delete save files at end of run -prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt -pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt -results <file> : name of results file, default 'results.txt' -iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000. -maxAlloc : limit GPU memory usage to this value in MB (needed on non-AMD GPUs) -yield : enable work-around for CUDA busy wait taking up one CPU core -nospin : disable progress spinner -use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning) -device <N> : select a specific device: 0 : Ellesmere-Radeon (TM) RX 480 Graphics AMD 1 : gfx804-Radeon 550 Series AMD FFT Configurations: FFT 8K [ 0.01M - 0.17M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 64K [ 0.10M - 1.33M] 64-512 512-64 FFT 128K [ 0.20M - 2.62M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.89M] 64-256-6 FFT 224K [ 0.34M - 4.52M] 64-256-7 FFT 256K [ 0.39M - 5.15M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.77M] 64-256-9 FFT 320K [ 0.49M - 6.40M] 64-256-10 FFT 352K [ 0.54M - 7.02M] 64-256-11 FFT 384K [ 0.59M - 7.64M] 64-256-12 64-512-6 FFT 448K [ 0.69M - 8.88M] 64-512-7 FFT 512K [ 0.79M - 10.12M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.35M] 64-512-9 FFT 640K [ 0.98M - 12.58M] 64-512-10 FFT 704K [ 1.08M - 13.81M] 64-512-11 FFT 768K [ 1.18M - 15.03M] 64-512-12 64-1K-6 256-256-6 FFT 896K [ 1.38M - 17.47M] 64-1K-7 256-256-7 FFT 1M [ 1.57M - 19.89M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.32M] 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.73M] 64-1K-10 256-256-10 FFT 1408K [ 2.16M - 27.14M] 64-1K-11 256-256-11 FFT 1536K [ 2.36M - 29.54M] 64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6 FFT 1792K [ 2.75M - 34.33M] 64-2K-7 256-512-7 512-256-7 FFT 2M [ 3.15M - 39.10M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 43.85M] 64-2K-9 256-512-9 512-256-9 FFT 2560K [ 3.93M - 48.59M] 64-2K-10 256-512-10 512-256-10 FFT 2816K [ 4.33M - 53.32M] 64-2K-11 256-512-11 512-256-11 FFT 3M [ 4.72M - 58.04M] 1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6 FFT 3584K [ 5.51M - 67.44M] 1K-256-7 256-1K-7 512-512-7 FFT 4M [ 6.29M - 76.81M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.15M] 1K-256-9 256-1K-9 512-512-9 FFT 5M [ 7.86M - 95.46M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 104.74M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.00M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6 FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 187.45M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 11M [ 17.30M - 205.67M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11 FFT 12M [ 18.87M - 223.85M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6 FFT 14M [ 22.02M - 260.08M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7 FFT 16M [ 25.17M - 296.17M] 4K-2K FFT 18M [ 28.31M - 332.13M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 367.98M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 22M [ 34.60M - 403.74M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11 FFT 24M [ 37.75M - 439.40M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6 FFT 28M [ 44.04M - 510.47M] 1K-2K-7 2K-1K-7 4K-512-7 FFT 36M [ 56.62M - 651.81M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 722.13M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 44M [ 69.21M - 792.25M] 1K-2K-11 2K-1K-11 4K-512-11 FFT 48M [ 75.50M - 862.18M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6 FFT 56M [ 88.08M - 1001.57M] 2K-2K-7 4K-1K-7 FFT 72M [113.25M - 1278.70M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1416.57M] 2K-2K-10 4K-1K-10 FFT 88M [138.41M - 1554.04M] 2K-2K-11 4K-1K-11 FFT 96M [150.99M - 1691.15M] 2K-2K-12 4K-1K-12 4K-2K-6 FFT 112M [176.16M - 1964.39M] 4K-2K-7 FFT 144M [226.49M - 2507.57M] 4K-2K-9 FFT 160M [251.66M - 2777.78M] 4K-2K-10 FFT 176M [276.82M - 3047.18M] 4K-2K-11 FFT 192M [301.99M - 3315.86M] 4K-2K-12 2019-12-11 17:34:38 Exiting because "help" 2019-12-11 17:34:38 Bye[/CODE]Tune test on NVIDIA GTX 1080 Ti [CODE]Gpuowl version and commit GPU model NVIDIA GTX 1080 Ti GPU clock free running ~1860 Mhz Host OS Win7 Pro x64 Notes Exponent timed 89796247 Computation type (PRP, P-1 stage 1, P-1 stage 2): PRP FFT length FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word config file entries -time -iters 10000 -device 0 -user kriesel -cpu dodo/gtx1080ti varying tuning -use options, in chronological order 3696 NO_ASM us/sq warmup, end user interaction, stabilize 3706 NO_ASM baseline In benchmarking (highlight fastest time in bold) 3596 NO_ASM,MERGED_MIDDLE,WORKINGIN 3593 NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability) 3592 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3593 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3600 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3534 NO_ASM,MERGED_MIDDLE,WORKINGIN3 [B]3515[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4 3529 NO_ASM,MERGED_MIDDLE,WORKINGIN5 Out benchmarking (highlight fastest time in bold) 3567 NO_ASM,MERGED_MIDDLE,WORKINGOUT 3584 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 3587 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3599 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3577 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3529 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 [B]3509[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3531 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 Fastest WORKINGIN, Fastest WORKINGOUT combination: 3490 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 repeatability +-1.5/3594.5 = +-0.042% best 3490 base 3706 ratio 1.062[/CODE]It's unclear which commit is required for the T2 options George has introduced recently. ([URL]https://www.mersenneforum.org/showpost.php?p=532677&postcount=1577[/URL]) Do the shuffle shuffle:[CODE]3677 NO_ASM 3485 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 3490 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH 3482 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE 3480 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT 3480 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 3504 NO_ASM,MERGED_MIDDLE,WORKINGIN4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 3676 NO_ASM 3482 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 3487 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE best 3480 base 3677 ratio 1.057[/CODE] |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.