![]() |
[QUOTE=preda;494660]Yes, I saw the same thing. IMO it looks like a problem in the GPU driver area. It may be that the transition from some CPU sleep state to active is slower when the CPU is not busy.[/QUOTE]
It may be a transition from/to low power cpu state, called C3/C7/C8 states. I turn off in the bios when gpus are present (bios option "C3 States": Disabled) |
[QUOTE=preda;494660]Yes, I saw the same thing. IMO it looks like a problem in the GPU driver area. It may be that the transition from some CPU sleep state to active is slower when the CPU is not busy.[/QUOTE]
Ah I see. I thought this problem is exclusively for Vega because the HBCC controller. If it can actually leverage the system ram and just do PRP at a higher bandwidth that would be amazing too. |
[QUOTE=SELROC;494663]It may be a transition from/to low power cpu state, called C3/C7/C8 states.
I turn off in the bios when gpus are present (bios option "C3 States": Disabled)[/QUOTE] Hmm... That's weird because my CPU is manually overclocked and it never drops down under the overclock I give it. So doesn't that mean it won't ever enter the C3 state? |
[QUOTE=xx005fs;494706]Hmm... That's weird because my CPU is manually overclocked and it never drops down under the overclock I give it. So doesn't that mean it won't ever enter the C3 state?[/QUOTE]
Exactly. I have controlled in the bios, the option is "C-States" and has a few suboptions like C3 and C8. |
gpuOwL V3.8-91c52fa build for Win64
1 Attachment(s)
Done in msys2/mingw64[QUOTE=kriesel;494261]That seems to have done it; finished the 48.5M (again) and progressed to the next work this time.
[CODE]g++ -DREV=\"91c52fa\" -O2 -c gpuowl.cpp -o gpuowl.o g++ -O2 -c OpenGpu.cpp -o OpenGpu.o g++ -O2 -c common.cpp -o common.o g++ -O2 -c clwrap.cpp -o clwrap.o g++ -O2 -c OpenTF.cpp -o OpenTF.o g++ -o openowl-V38-91c52fa-W64.exe OpenGpu.o common.o gpuowl.o clwrap.o OpenTF.o -lOpenCL -static[/CODE] Win x64 executable attached. [/QUOTE] Better late than never? Finally posting the attachment mentioned in [url]http://www.mersenneforum.org/showthread.php?t=22204&page=57[/url] post 626 |
gpuOwL V3.9-da61ebd build for Win64
1 Attachment(s)
Built in msys2/mingw64 with [CODE]# build openowl in msys2/mingw64 on Win7 x64
g++ -DREV=\"da61ebd\" -O2 -c gpuowl.cpp -o gpuowl.o g++ -O2 -c OpenGpu.cpp -o OpenGpu.o g++ -O2 -c common.cpp -o common.o g++ -O2 -c clwrap.cpp -o clwrap.o g++ -O2 -c OpenTF.cpp -o OpenTF.o g++ -o openowl-V39-da61ebd-W64.exe OpenGpu.o common.o gpuowl.o clwrap.o OpenTF.o -lOpenCL -static strip openowl-V39-da61ebd-W64.exe [/CODE] |
Thanks for the build. However, I noticed a rather significant slowdown on my vega gpu while doing the same exponent. The 3.8 version would do about 2.02 ms/it while the new 3.9 version would get something like 2.18 ms/it. This means that the current exponent I am testing which is 86478767 would take an extra 4 hours to complete compared to before. Both versions are using the exact same FFT size by the way.
|
[QUOTE=xx005fs;494773]Thanks for the build. However, I noticed a rather significant slowdown on my vega gpu while doing the same exponent. The 3.8 version would do about 2.02 ms/it while the new 3.9 version would get something like 2.18 ms/it. This means that the current exponent I am testing which is 86478767 would take an extra 4 hours to complete compared to before. Both versions are using the exact same FFT size by the way.[/QUOTE]
I'll look into this slowdown over the following days. (it's caused by my changes) |
I'm looking at something weird. A 300M exponent with 1 error recorded is being slower than a 332M exponent with 0 errors recorded. GpuOwl version 3.5
|
[QUOTE=SELROC;494783]I'm looking at something weird. A 300M exponent with 1 error recorded is being slower than a 332M exponent with 0 errors recorded. GpuOwl version 3.5[/QUOTE]
I don't know. Same FFT size? same long/short carry? Persists on restart? If you feel like investigating, what happens if you switch the tasks&savefiles across GPUs? (i.e. is the slowdown a property of the exponent or of the GPU) |
[QUOTE=preda;494786]I don't know. Same FFT size? same long/short carry? Persists on restart?
If you feel like investigating, what happens if you switch the tasks&savefiles across GPUs? (i.e. is the slowdown a property of the exponent or of the GPU)[/QUOTE] Yes. I do this, for a double-control I switch between GPUs, the issue is persistent. It is: 300M exponent with 1 error recorded and 200 block size: 16.80 ms/it 332M exponent with 0 errors and 400 block size: 16.50 ms/it The difference is small, but persistent. FFT size 18M |
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.