![]() |
![]() |
#1783 | |
"6800 descendent"
Feb 2005
Colorado
2·32·41 Posts |
![]() Quote:
I have gpuowl running on one tty, and I check temperatures, voltages, maintain files, etc from a second tty. In this case I had stopped gpuowl, and the shell was sitting at the prompt with the working directory being gpuowl. In the other tty, I renamed the gpuowl directory, created a new one, and built a new gpuowl. I put all the relevant files and folders in that new gpuowl folder, went back to the other tty, and started gpuowl. The problem is, that shell's working directory didn't exist anymore. It had gotten renamed, but the shell didn't throw any errors. The prompt remained the same too, so I really thought I was working in the new gpuowl directory. The result was data loss. Anyway, the moral of the story is to make sure you leave the gpuowl working directory and re-enter it if you are fooling around with it inside two different tty sessions at the same time. ![]() |
|
![]() |
![]() |
![]() |
#1784 |
"William Garnett III"
Oct 2002
Langhorne, PA
2·43 Posts |
![]()
preda, kriesel, Prime95,
I am only running gpuOwl (no Prime95 on CPU) so since I am new to this what flags should I type at the command line other than gpuowl-win.exe to see if my per iteration time improves? Code:
2020-01-16 01:17:59 Note: no config.txt file found 2020-01-16 01:17:59 device 0, unique id '' 2020-01-16 01:18:00 GeForce GTX 1050-0 81943843 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.37 bits/word 2020-01-16 01:18:01 GeForce GTX 1050-0 OpenCL args "-DEXP=81943843u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xc.69d9ee158d5b8p-3 -DIWEIGHT_STEP=0xa.4fb5ef629afb8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-01-16 01:18:02 GeForce GTX 1050-0 2020-01-16 01:18:02 GeForce GTX 1050-0 OpenCL compilation in 0.38 s 2020-01-16 01:18:10 GeForce GTX 1050-0 81943843 OK 17130000 loaded: blockSize 400, de145902b2059f4b 2020-01-16 01:18:31 GeForce GTX 1050-0 81943843 OK 17130800 20.91%; 17605 us/it; ETA 13d 04:57; ebd9d81bce345290 (check 7.17s) 2020-01-16 01:39:20 GeForce GTX 1050-0 81943843 OK 17200000 20.99%; 17942 us/it; ETA 13d 10:40; 3d45c4478e50aeb6 (check 7.33s) 2020-01-16 02:39:38 GeForce GTX 1050-0 81943843 OK 17400000 21.23%; 18050 us/it; ETA 13d 11:37; f639fefb9039b2ab (check 7.34s) 2020-01-16 03:39:55 GeForce GTX 1050-0 81943843 OK 17600000 21.48%; 18051 us/it; ETA 13d 10:38; a78776fdd7f2ede3 (check 7.32s) 2020-01-16 04:40:13 GeForce GTX 1050-0 81943843 OK 17800000 21.72%; 18051 us/it; ETA 13d 09:37; 9fc9b0886bf2dc88 (check 7.33s) 2020-01-16 05:40:30 GeForce GTX 1050-0 81943843 OK 18000000 21.97%; 18051 us/it; ETA 13d 08:37; 7a4566d01385c94e (check 7.32s) 2020-01-16 06:40:47 GeForce GTX 1050-0 81943843 OK 18200000 22.21%; 18050 us/it; ETA 13d 07:36; 7f5c47985833c542 (check 7.33s) 2020-01-16 07:41:05 GeForce GTX 1050-0 81943843 OK 18400000 22.45%; 18050 us/it; ETA 13d 06:36; 24bf061871068b89 (check 7.34s) 2020-01-16 08:41:22 GeForce GTX 1050-0 81943843 OK 18600000 22.70%; 18050 us/it; ETA 13d 05:36; 5ffa6f774116574f (check 7.32s) 2020-01-16 09:41:40 GeForce GTX 1050-0 81943843 OK 18800000 22.94%; 18051 us/it; ETA 13d 04:37; 9c909adec676d76d (check 7.32s) 2020-01-16 10:41:57 GeForce GTX 1050-0 81943843 OK 19000000 23.19%; 18050 us/it; ETA 13d 03:35; bedb43a9ebaa0317 (check 7.33s) 2020-01-16 11:42:14 GeForce GTX 1050-0 81943843 OK 19200000 23.43%; 18049 us/it; ETA 13d 02:34; 869f10128493c2a3 (check 7.32s) 2020-01-16 12:31:20 GeForce GTX 1050-0 Stopping, please wait.. 2020-01-16 12:31:35 GeForce GTX 1050-0 81943843 OK 19363600 23.63%; 18052 us/it; ETA 13d 01:48; dec7c8f5d6498df8 (check 7.33s) 2020-01-16 12:31:35 GeForce GTX 1050-0 Exiting because "stop requested" 2020-01-16 12:31:35 GeForce GTX 1050-0 Bye |
![]() |
![]() |
![]() |
#1785 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110001102 Posts |
![]()
I typically run something like the following to test different options, then put the best in config.txt or a .bat file. It varies by gpu what is best, and maybe by fft length also. Note, this is somewhat old and does not address all the latest -use options CARRY32 vs CARRY64 etc and following, which seem to me not well documented yet. A read of the source code is suggested for the full list. And take any recommendations from Preda or Prime95 very seriously.
Code:
:gwtime.bat for Windows in a command prompt box. Assumes cd to the gpuowl directory is already done. :iter count is required to be multiple of 10000; 10000 is enough for repeatable results up to gtx1080 or so set iters=10000 :get gpu warmed up and stable, get baseline :first one is there just to ensure the gpu is warmed up and clock-stable somewhat, ignore its timing, use the second gpuowl-win -time -iters %iters% -use NO_ASM gpuowl-win -time -iters %iters% -use NO_ASM :uncomment as needed below to run the pass you want :goto passtwo :goto passthree :passone :get the workingin and workingout optimals in pass one gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN :repeated, let's see reproducibility once; then onward through the list gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN5 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT0 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT4 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT5 goto chain :passtwo :edit the following before running pass two, to the best workingin and workingout choices determined n pass one gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE goto chain :passthree :edit the following if needed gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE start wordpad gpuowl.log goto chain :add passes if needed for CARRY32, CARRY64, etc here? :chain to continuing production work; edit as needed for your environment. In my case mf.bat runs mfaktc.) cd C:\Users\Ken\Documents\tf-gtx1050ti mf From gpuowl-wrap.cpp, gpuowl-v6.11-132-gfd01ee5, it's a considerable list: Code:
/* List of user-serviceable -use flags and their effects FMA : use OpenCL fma(x, y, z) instead of x * y + z in MAD(x, y, z) NO_ASM : request to not use any inline __asm() NO_OMOD: do not use GCN output modifiers in __asm() NO_MERGED_MIDDLE WORKINGOUTs <AMD default is WORKINGOUT3> <nVidia default is WORKINGOUT4> WORKINGINs <AMD default is WORKINGIN5> <nVidia default is WORKINGIN4> PREFER_LESS_FMA ORIG_X2 INLINE_X2 FMA_X2 UNROLL_ALL <nVidia default> UNROLL_NONE UNROLL_WIDTH UNROLL_HEIGHT <AMD default> UNROLL_MIDDLEMUL1 <AMD default> UNROLL_MIDDLEMUL2 <AMD default> T2_SHUFFLE <nVidia default> NO_T2_SHUFFLE T2_SHUFFLE_WIDTH T2_SHUFFLE_MIDDLE T2_SHUFFLE_HEIGHT T2_SHUFFLE_REVERSELINE <AMD default> OLD_FFT8 <default> NEWEST_FFT8 NEW_FFT8 OLD_FFT5 NEW_FFT5 <default> NEWEST_FFT5 NEW_FFT10 <default> OLD_FFT10 CARRY32 <AMD default> // This is potentially dangerous option for large FFTs. Carry may not fit in 31 bits. CARRY64 <nVidia default> FANCY_MIDDLEMUL1 <nVidia default> // Only implemented for MIDDLE=10 and MIDDLE=11 MORE_SQUARES_MIDDLEMUL1 // Replaces some complex muls with complex squares but uses more registers CHEBYSHEV_METHOD // Uses fewer floating point ops than original MiddleMul1 implementation (worse accuracy?) CHEBYSHEV_METHOD_FMA // Uses fewest floating point ops of any of the MiddleMul1 implementations (worse accuracy?) ORIGINAL_METHOD // The original straightforward MiddleMul1 implementation ORIGINAL_TWEAKED <AMD default> // The original MiddleMul1 implementation tweaked to save two multiplies ORIG_MIDDLEMUL2 <default> // The original straightforward MiddleMul2 implementation CHEBYSHEV_MIDDLEMUL2 // Uses fewer floating point ops than original MiddleMul2 implementation (worse accuracy?) ORIG_SLOWTRIG // Use the compliler's implementation of sin/cos functions NEW_SLOWTRIG <default> // Our own sin/cos implementation MORE_ACCURATE <AMD default> // Our own sin/cos implementation with extra accuracy (should be needlessly slower, but isn't) LESS_ACCURATE <nVidia default> // Opposite of MORE_ACCURATE */ Last fiddled with by kriesel on 2020-01-16 at 19:12 |
![]() |
![]() |
![]() |
#1786 |
"Eric"
Jan 2018
USA
22×5×11 Posts |
![]()
With a 1050 you won't expect some significant speedup since it is bottlenecked by the GPU's double-precision capabilities and not memory bandwidth, which is what most of the recent code optimization addresses. All the necessary flags should be already enabled if you are using the newest version. What I recommend is to use MSI afterburner and push the core clock as high as possible (and maybe even a bit of memory but I don't think it will be significant).
|
![]() |
![]() |
![]() |
#1787 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
736610 Posts |
![]()
Here it is. The usual shower of warnings reappeared during the build. Untested so far except for help output.
|
![]() |
![]() |
![]() |
#1788 | ||
Sep 2002
Database er0rr
23×3×11×17 Posts |
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#1789 | ||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110001102 Posts |
![]() Quote:
Do we know the performance of the numerous options are independent, such as optimal Workingin and workingout don't change as a result of the other options being changed? Quote:
![]() Last fiddled with by kriesel on 2020-01-16 at 21:18 |
||
![]() |
![]() |
![]() |
#1790 | |
P90 years forever!
Aug 2002
Yeehaw, FL
815810 Posts |
![]() Quote:
You could also do "-use FANCY_MIDDLEMUL1,ORIGINAL_TWEAKED" to get fancy middlemul1 for middle=10,11 and original tweaked middle mul1 otherwise. |
|
![]() |
![]() |
![]() |
#1791 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]()
Same Colab style run, 3.42 days computing time logged combined for the two stages. Stage 2 was 88.8% the length of stage 1. Fft length 57344K, 19 buffers.
https://www.mersenne.org/report_expo...exp_hi=&full=1 Quote:
Last fiddled with by kriesel on 2020-01-17 at 14:28 |
|
![]() |
![]() |
![]() |
#1792 |
"Mihai Preda"
Apr 2015
22×192 Posts |
![]()
I just commited a tiny change that should speed-up significantly second-stage of P-1. (I tested with ROCm 2.10)
https://github.com/preda/gpuowl/comm...cbdc2e2d814d33 The ROCm optimizer bug is described here https://github.com/RadeonOpenCompute/ROCm/issues/1002 Last fiddled with by preda on 2020-01-18 at 18:22 |
![]() |
![]() |
![]() |
#1793 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |