mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

xx005fs 2019-12-09 00:32

[QUOTE=kriesel;532394]Probably best to let Preda and Prime95 get back into sync first.

But in general, for relatively recent gpuowl versions, on Windows,
do steps 1 through 4 of kracker's instructions at [URL]https://www.mersenneforum.org/showpost.php?p=483209&postcount=356[/URL]
(The AMD APP SDK 3.0 link has gone dead. See for example [URL]https://github.com/fireice-uk/xmr-stak/issues/1511[/URL] or [URL]https://en.wikipedia.org/wiki/AMD_APP_SDK[/URL])

Install git on msys2
This may not be the whole story for setting up for compiles.

In an msys2 cmd prompt box from here on:
# to refresh a git working folder:
git pull [URL]https://github.com/preda/gpuowl[/URL]

#or to new folder that has not been a git folder before:
git clone [URL]https://github.com/preda/gpuowl[/URL]

cd gpuowl
make gpuowl-win.exe

To use the executable, switch to an NT command prompt box. It won't run in the msys2 context.
Msys2 is a linux like environment. The executable is a Windows executable. It's a sort of cross-compile.

I usually run gpuowl-win.exe -h immediately, both to save it, and to verify the newly compiled program is working well enough to identify gpus on the build box. Since it's OpenCL based, it's the same build whether used on AMD or NVIDIA gpus.[/QUOTE]

Thank you so much! I was wondering what step I was missing that was causing a bunch of nasty OpenCL link errors, it's because I have never copied the libraries from the APP SDK folders into MSYS2.

kracker 2019-12-09 00:33

Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)

kriesel 2019-12-09 00:44

[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]Could you get some comparative wattage readings from nvidia-smi?

kriesel 2019-12-09 00:48

[QUOTE=xx005fs;532396]Thank you so much! I was wondering what step I was missing that was causing a bunch of nasty OpenCL link errors, it's because I have never copied the libraries from the APP SDK folders into MSYS2.[/QUOTE]You're welcome, been there myself, so I try not to break it once it's working. See also [URL]https://www.mersenneforum.org/showthread.php?t=24938&highlight=msys2&page=4[/URL] including the caution about an unannounced system shutdown
Have fun!

kracker 2019-12-09 01:06

[QUOTE=kriesel;532399]Could you get some comparative wattage readings from nvidia-smi?[/QUOTE]

The readings seem to change a lot... power usage as shown in nvidia-smi has been slowly climbing over the past several minutes...

EDIT: looks like it's semi stabilized... ~180W without, ~190W with.

xx005fs 2019-12-09 01:20

[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]

I also tested the K80 with 5120K FFT, went down from ~4350us/it before to around 3300us/it depending on the instance. Pretty impressive speedup.

More Updates: The updated source code by Preda works on windows now, and I'm seeing almost exactly 33% speed up on my Titan V much less for regular Vega. Something I found very strange is that I don't know if the graphics that changes from . to o to 0 then to * is intentional or not, but it seems to slow down my Colab console and leave a symbol in front of every line in the log. Is there an option to disable that?

kracker 2019-12-09 04:54

[QUOTE=kracker;532397]Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)[/QUOTE]

With MERGED_MIDDLE,WORKINGOUT,WORKINGIN4, it dropped further to 754 us/it... a very impressive 41% speed boost from the beginning!

dcheuk 2019-12-09 05:55

I tried this on one of my Radeon VII cards that has not yet gave me any errors from the last 4-5 PRP tests (while the other returned too many lol). This card sits on second slot with no display attached to it.

[CODE]2019-12-08 23:47:19 config.txt: -user dcheuk/gpu01 -use ORIG_X2 -device 1 -log 100000 -use MERGED_MIDDLE
2019-12-08 23:47:19 config.txt:
2019-12-08 23:47:19 gfx906-0 94607437 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.04 bits/word
2019-12-08 23:47:20 gfx906-0 OpenCL args "-DEXP=94607437u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.8262bb7326f28p-3 -DIWEIGHT_STEP=0x8.40cb53a4a1fd8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 23:47:21 gfx906-0 OpenCL compilation in 1.31 s
2019-12-08 23:47:22 gfx906-0 94607437 OK 2071500 loaded: blockSize 500, 132c5e1692604fd6
2019-12-08 23:47:23 gfx906-0 94607437 OK 2072500 2.19%; 891 us/it (min 885 885); ETA 0d 22:54; 8d4ac7f8617372d8 (check 0.53s)
2019-12-08 23:47:48 gfx906-0 94607437 OK 2100000 2.22%; 887 us/it (min 884 884); ETA 0d 22:48; f8d6a63b03cfa32a (check 0.53s)
2019-12-08 23:49:17 gfx906-0 94607437 OK 2200000 2.33%; 887 us/it (min 884 884); ETA 0d 22:47; 42044b1ea9fb8b01 (check 0.53s)
2019-12-08 23:50:47 gfx906-0 94607437 OK 2300000 2.43%; 887 us/it (min 884 884); ETA 0d 22:45; fcd02bb8420d5ba7 (check 0.53s)
2019-12-08 23:52:17 gfx906-0 94607437 OK 2400000 2.54%; 887 us/it (min 884 884); ETA 0d 22:43; d784ed68cfa19bd7 (check 0.53s)
2019-12-08 23:53:46 gfx906-0 94607437 OK 2500000 2.64%; 887 us/it (min 884 884); ETA 0d 22:42; 79d614fc892e7a5a (check 0.53s)
[/CODE]

And tuned at 1449MHz , 867mV, 1200MHz memory. Fan about 75% at temperature hovering 64-66C, junction 78-81C. Ambient temperature 20C. Wattage 140-143W. Very impressive, I'm amazed at what you guys can do. Good work. :smile:

Prime95 2019-12-09 06:13

[QUOTE=kracker;532423]With MERGED_MIDDLE,WORKINGOUT,WORKINGIN4, it dropped further to 754 us/it... a very impressive 41% speed boost from the beginning![/QUOTE]

Preliminary results from Ken suggested WORKINGOUT4 is better than WORKINGOUT. Of course, that was from a huge sample size of 1 nVidia card.

kriesel 2019-12-09 09:16

[QUOTE=Prime95;532431]Preliminary results from Ken suggested WORKINGOUT4 is better than WORKINGOUT. Of course, that was from a huge sample size of 1 nVidia card.[/QUOTE]
p=89796247, fft 5M, gtx1080, Win7 pro, typ timing iters 9200
obtained with -time -iters 10000

[CODE]ms/it -use options
5124 no_asm
5120 no_asm
4868 no_asm,merged_middle,workingin
4873 no_asm,merged_middle,workingin
4873 no_asm,merged_middle,workingin1
4951 no_asm,merged_middle,workingin1a
4876 no_asm,merged_middle,workingin2
4874 no_asm,merged_middle,workingin3
[B]4865[/B] no_asm,merged_middle,workingin5
4878 no_asm,merged_middle,workingout
4911 no_asm,merged_middle,workingout0
4872 no_asm,merged_middle,workingout1
4950 no_asm,merged_middle,workingout1a
4881 no_asm,merged_middle,workingout2
4875 no_asm,merged_middle,workingout3
[B]4836[/B] no_asm,merged_middle,workingout4
4876 no_asm,merged_middle,workingout5[/CODE]repeatability ~+/-0.05%
5122/4836= 1.059

obtained with this batch file derived from a list of cases George requested:
[CODE]:iter count is required to be multiple of 10000
set iters=10000
:first one was there just to ensure the gpu is warmed up and clock-stable somewhat, ignore its timing, use the second, but maybe the first 800 iters block does that
gpuowl-win -time -iters %iters% -use NO_ASM
gpuowl-win -time -iters %iters% -use NO_ASM
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN
:repeated, let's see reproducibility once; then onward through the list
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1A
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN2
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN3
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN5
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT0
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT2
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT3
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT4
gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT5[/CODE]

kriesel 2019-12-09 10:17

gtx1080 again

usec/iter; -use case
5055 no_asm
5104 no_asm
4848 NO_ASM,MERGED_MIDDLE,WORKINGIN
4863 NO_ASM,MERGED_MIDDLE,WORKINGIN
4851 NO_ASM,MERGED_MIDDLE,WORKINGIN4
4859 NO_ASM,MERGED_MIDDLE,WORKINGIN5
4873 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5

retest with minimal user interaction:
5058 no_asm
5091 no_asm
4837 NO_ASM,MERGED_MIDDLE,WORKINGIN
4836 NO_ASM,MERGED_MIDDLE,WORKINGIN
4836 NO_ASM,MERGED_MIDDLE,WORKINGIN4
4833 NO_ASM,MERGED_MIDDLE,WORKINGIN5
4835 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5

5091/4833 =~ 1.053


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.