mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-08-25 07:50

[QUOTE=preda;494660]Yes, I saw the same thing. IMO it looks like a problem in the GPU driver area. It may be that the transition from some CPU sleep state to active is slower when the CPU is not busy.[/QUOTE]


It may be a transition from/to low power cpu state, called C3/C7/C8 states.


I turn off in the bios when gpus are present (bios option "C3 States": Disabled)

xx005fs 2018-08-25 20:21

[QUOTE=preda;494660]Yes, I saw the same thing. IMO it looks like a problem in the GPU driver area. It may be that the transition from some CPU sleep state to active is slower when the CPU is not busy.[/QUOTE]

Ah I see. I thought this problem is exclusively for Vega because the HBCC controller. If it can actually leverage the system ram and just do PRP at a higher bandwidth that would be amazing too.

xx005fs 2018-08-26 01:02

[QUOTE=SELROC;494663]It may be a transition from/to low power cpu state, called C3/C7/C8 states.


I turn off in the bios when gpus are present (bios option "C3 States": Disabled)[/QUOTE]

Hmm... That's weird because my CPU is manually overclocked and it never drops down under the overclock I give it. So doesn't that mean it won't ever enter the C3 state?

SELROC 2018-08-26 01:22

[QUOTE=xx005fs;494706]Hmm... That's weird because my CPU is manually overclocked and it never drops down under the overclock I give it. So doesn't that mean it won't ever enter the C3 state?[/QUOTE]


Exactly. I have controlled in the bios, the option is "C-States" and has a few suboptions like C3 and C8.

kriesel 2018-08-27 16:03

gpuOwL V3.8-91c52fa build for Win64
 
1 Attachment(s)
Done in msys2/mingw64[QUOTE=kriesel;494261]That seems to have done it; finished the 48.5M (again) and progressed to the next work this time.

[CODE]g++ -DREV=\"91c52fa\" -O2 -c gpuowl.cpp -o gpuowl.o
g++ -O2 -c OpenGpu.cpp -o OpenGpu.o
g++ -O2 -c common.cpp -o common.o
g++ -O2 -c clwrap.cpp -o clwrap.o
g++ -O2 -c OpenTF.cpp -o OpenTF.o
g++ -o openowl-V38-91c52fa-W64.exe OpenGpu.o common.o gpuowl.o clwrap.o OpenTF.o -lOpenCL -static[/CODE] Win x64 executable attached.
[/QUOTE]
Better late than never? Finally posting the attachment mentioned in [url]http://www.mersenneforum.org/showthread.php?t=22204&page=57[/url] post 626

kriesel 2018-08-27 16:27

gpuOwL V3.9-da61ebd build for Win64
 
1 Attachment(s)
Built in msys2/mingw64 with [CODE]# build openowl in msys2/mingw64 on Win7 x64
g++ -DREV=\"da61ebd\" -O2 -c gpuowl.cpp -o gpuowl.o
g++ -O2 -c OpenGpu.cpp -o OpenGpu.o
g++ -O2 -c common.cpp -o common.o
g++ -O2 -c clwrap.cpp -o clwrap.o
g++ -O2 -c OpenTF.cpp -o OpenTF.o
g++ -o openowl-V39-da61ebd-W64.exe OpenGpu.o common.o gpuowl.o clwrap.o OpenTF.o -lOpenCL -static
strip openowl-V39-da61ebd-W64.exe
[/CODE]

xx005fs 2018-08-28 01:01

Thanks for the build. However, I noticed a rather significant slowdown on my vega gpu while doing the same exponent. The 3.8 version would do about 2.02 ms/it while the new 3.9 version would get something like 2.18 ms/it. This means that the current exponent I am testing which is 86478767 would take an extra 4 hours to complete compared to before. Both versions are using the exact same FFT size by the way.

preda 2018-08-28 05:30

[QUOTE=xx005fs;494773]Thanks for the build. However, I noticed a rather significant slowdown on my vega gpu while doing the same exponent. The 3.8 version would do about 2.02 ms/it while the new 3.9 version would get something like 2.18 ms/it. This means that the current exponent I am testing which is 86478767 would take an extra 4 hours to complete compared to before. Both versions are using the exact same FFT size by the way.[/QUOTE]

I'll look into this slowdown over the following days. (it's caused by my changes)

SELROC 2018-08-28 07:33

I'm looking at something weird. A 300M exponent with 1 error recorded is being slower than a 332M exponent with 0 errors recorded. GpuOwl version 3.5

preda 2018-08-28 09:13

[QUOTE=SELROC;494783]I'm looking at something weird. A 300M exponent with 1 error recorded is being slower than a 332M exponent with 0 errors recorded. GpuOwl version 3.5[/QUOTE]

I don't know. Same FFT size? same long/short carry? Persists on restart?
If you feel like investigating, what happens if you switch the tasks&savefiles across GPUs? (i.e. is the slowdown a property of the exponent or of the GPU)

SELROC 2018-08-28 09:26

[QUOTE=preda;494786]I don't know. Same FFT size? same long/short carry? Persists on restart?
If you feel like investigating, what happens if you switch the tasks&savefiles across GPUs? (i.e. is the slowdown a property of the exponent or of the GPU)[/QUOTE]

Yes. I do this, for a double-control I switch between GPUs, the issue is persistent. It is:

300M exponent with 1 error recorded and 200 block size: 16.80 ms/it

332M exponent with 0 errors and 400 block size: 16.50 ms/it

The difference is small, but persistent.

FFT size 18M


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.