![]() |
[QUOTE=kriesel;532394]Probably best to let Preda and Prime95 get back into sync first.
But in general, for relatively recent gpuowl versions, on Windows, do steps 1 through 4 of kracker's instructions at [URL]https://www.mersenneforum.org/showpost.php?p=483209&postcount=356[/URL] (The AMD APP SDK 3.0 link has gone dead. See for example [URL]https://github.com/fireice-uk/xmr-stak/issues/1511[/URL] or [URL]https://en.wikipedia.org/wiki/AMD_APP_SDK[/URL]) Install git on msys2 This may not be the whole story for setting up for compiles. In an msys2 cmd prompt box from here on: # to refresh a git working folder: git pull [URL]https://github.com/preda/gpuowl[/URL] #or to new folder that has not been a git folder before: git clone [URL]https://github.com/preda/gpuowl[/URL] cd gpuowl make gpuowl-win.exe To use the executable, switch to an NT command prompt box. It won't run in the msys2 context. Msys2 is a linux like environment. The executable is a Windows executable. It's a sort of cross-compile. I usually run gpuowl-win.exe -h immediately, both to save it, and to verify the newly compiled program is working well enough to identify gpus on the build box. Since it's OpenCL based, it's the same build whether used on AMD or NVIDIA gpus.[/QUOTE] Thank you so much! I was wondering what step I was missing that was causing a bunch of nasty OpenCL link errors, it's because I have never copied the libraries from the APP SDK folders into MSYS2. |
Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)
|
[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]Could you get some comparative wattage readings from nvidia-smi?
|
[QUOTE=xx005fs;532396]Thank you so much! I was wondering what step I was missing that was causing a bunch of nasty OpenCL link errors, it's because I have never copied the libraries from the APP SDK folders into MSYS2.[/QUOTE]You're welcome, been there myself, so I try not to break it once it's working. See also [URL]https://www.mersenneforum.org/showthread.php?t=24938&highlight=msys2&page=4[/URL] including the caution about an unannounced system shutdown
Have fun! |
[QUOTE=kriesel;532399]Could you get some comparative wattage readings from nvidia-smi?[/QUOTE]
The readings seem to change a lot... power usage as shown in nvidia-smi has been slowly climbing over the past several minutes... EDIT: looks like it's semi stabilized... ~180W without, ~190W with. |
[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]
I also tested the K80 with 5120K FFT, went down from ~4350us/it before to around 3300us/it depending on the instance. Pretty impressive speedup. More Updates: The updated source code by Preda works on windows now, and I'm seeing almost exactly 33% speed up on my Titan V much less for regular Vega. Something I found very strange is that I don't know if the graphics that changes from . to o to 0 then to * is intentional or not, but it seems to slow down my Colab console and leave a symbol in front of every line in the log. Is there an option to disable that? |
[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]
With MERGED_MIDDLE,WORKINGOUT,WORKINGIN4, it dropped further to 754 us/it... a very impressive 41% speed boost from the beginning! |
I tried this on one of my Radeon VII cards that has not yet gave me any errors from the last 4-5 PRP tests (while the other returned too many lol). This card sits on second slot with no display attached to it.
[CODE]2019-12-08 23:47:19 config.txt: -user dcheuk/gpu01 -use ORIG_X2 -device 1 -log 100000 -use MERGED_MIDDLE 2019-12-08 23:47:19 config.txt: 2019-12-08 23:47:19 gfx906-0 94607437 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.04 bits/word 2019-12-08 23:47:20 gfx906-0 OpenCL args "-DEXP=94607437u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.8262bb7326f28p-3 -DIWEIGHT_STEP=0x8.40cb53a4a1fd8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-12-08 23:47:21 gfx906-0 OpenCL compilation in 1.31 s 2019-12-08 23:47:22 gfx906-0 94607437 OK 2071500 loaded: blockSize 500, 132c5e1692604fd6 2019-12-08 23:47:23 gfx906-0 94607437 OK 2072500 2.19%; 891 us/it (min 885 885); ETA 0d 22:54; 8d4ac7f8617372d8 (check 0.53s) 2019-12-08 23:47:48 gfx906-0 94607437 OK 2100000 2.22%; 887 us/it (min 884 884); ETA 0d 22:48; f8d6a63b03cfa32a (check 0.53s) 2019-12-08 23:49:17 gfx906-0 94607437 OK 2200000 2.33%; 887 us/it (min 884 884); ETA 0d 22:47; 42044b1ea9fb8b01 (check 0.53s) 2019-12-08 23:50:47 gfx906-0 94607437 OK 2300000 2.43%; 887 us/it (min 884 884); ETA 0d 22:45; fcd02bb8420d5ba7 (check 0.53s) 2019-12-08 23:52:17 gfx906-0 94607437 OK 2400000 2.54%; 887 us/it (min 884 884); ETA 0d 22:43; d784ed68cfa19bd7 (check 0.53s) 2019-12-08 23:53:46 gfx906-0 94607437 OK 2500000 2.64%; 887 us/it (min 884 884); ETA 0d 22:42; 79d614fc892e7a5a (check 0.53s) [/CODE] And tuned at 1449MHz , 867mV, 1200MHz memory. Fan about 75% at temperature hovering 64-66C, junction 78-81C. Ambient temperature 20C. Wattage 140-143W. Very impressive, I'm amazed at what you guys can do. Good work. :smile: |
[QUOTE=kracker;532423]With MERGED_MIDDLE,WORKINGOUT,WORKINGIN4, it dropped further to 754 us/it... a very impressive 41% speed boost from the beginning![/QUOTE]
Preliminary results from Ken suggested WORKINGOUT4 is better than WORKINGOUT. Of course, that was from a huge sample size of 1 nVidia card. |
[QUOTE=Prime95;532431]Preliminary results from Ken suggested WORKINGOUT4 is better than WORKINGOUT. Of course, that was from a huge sample size of 1 nVidia card.[/QUOTE]
p=89796247, fft 5M, gtx1080, Win7 pro, typ timing iters 9200 obtained with -time -iters 10000 [CODE]ms/it -use options 5124 no_asm 5120 no_asm 4868 no_asm,merged_middle,workingin 4873 no_asm,merged_middle,workingin 4873 no_asm,merged_middle,workingin1 4951 no_asm,merged_middle,workingin1a 4876 no_asm,merged_middle,workingin2 4874 no_asm,merged_middle,workingin3 [B]4865[/B] no_asm,merged_middle,workingin5 4878 no_asm,merged_middle,workingout 4911 no_asm,merged_middle,workingout0 4872 no_asm,merged_middle,workingout1 4950 no_asm,merged_middle,workingout1a 4881 no_asm,merged_middle,workingout2 4875 no_asm,merged_middle,workingout3 [B]4836[/B] no_asm,merged_middle,workingout4 4876 no_asm,merged_middle,workingout5[/CODE]repeatability ~+/-0.05% 5122/4836= 1.059 obtained with this batch file derived from a list of cases George requested: [CODE]:iter count is required to be multiple of 10000 set iters=10000 :first one was there just to ensure the gpu is warmed up and clock-stable somewhat, ignore its timing, use the second, but maybe the first 800 iters block does that gpuowl-win -time -iters %iters% -use NO_ASM gpuowl-win -time -iters %iters% -use NO_ASM gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN :repeated, let's see reproducibility once; then onward through the list gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN5 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT0 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT4 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT5[/CODE] |
gtx1080 again
usec/iter; -use case 5055 no_asm 5104 no_asm 4848 NO_ASM,MERGED_MIDDLE,WORKINGIN 4863 NO_ASM,MERGED_MIDDLE,WORKINGIN 4851 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4859 NO_ASM,MERGED_MIDDLE,WORKINGIN5 4873 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5 retest with minimal user interaction: 5058 no_asm 5091 no_asm 4837 NO_ASM,MERGED_MIDDLE,WORKINGIN 4836 NO_ASM,MERGED_MIDDLE,WORKINGIN 4836 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4833 NO_ASM,MERGED_MIDDLE,WORKINGIN5 4835 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5 5091/4833 =~ 1.053 |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.