mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2018-08-21 21:53

Latest pass
 
[QUOTE=preda;494351]I started my CUDA GPU and tried it. I don't repro, but I attempted a fix; could you retry with fresh checkout?[/QUOTE]Sorry about the delayed response, my power was out due to utility line damage as a result of local-record-rainstorm; UPSes drained. It took a while to get everything running again afterward.
[CODE]C:\users\ken\documents\gpuowl-cuda-build>nvcc -O2 -DREV=\"1fa5f09\" -o cudaowl-V
38-1fa5f09-W64.exe common.cpp gpuowl.cpp CudaGpu.cu NoTF.cpp -lcufft
common.cpp
gpuowl.cpp
CudaGpu.cu
NoTF.cpp
Creating library cudaowl-V38-1fa5f09-W64.lib and object cudaowl-V38-1fa5f09-W
64.exp[/CODE]Closer, but not yet there. Compiled ok, linked. Launches, then an error. Carry short or carry long:[CODE]C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kries
el -cpu condor-gtx1060 -time -carry short -device 0
2018-08-21 15:57:57 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 15:57:57 condor-gtx1060 Exponent 431000021 using FFT 23328K (2^15 *
^6 * 5^0 * 7^0)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSucces
s, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310

C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kries
el -cpu condor-gtx1060 -time -carry long -device 0
2018-08-21 15:58:20 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 15:58:20 condor-gtx1060 Exponent 431000021 using FFT 23328K (2^15 * 3^6 * 5^0 * 7^0)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSucces
s, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310[/CODE]

preda 2018-08-21 23:31

[QUOTE=kriesel;494402]Sorry about the delayed response, my power was out due to utility line damage as a result of local-record-rainstorm; UPSes drained. It took a while to get everything running again afterward.[/QUOTE]
Oops, sorry to hear..

[QUOTE]
Closer, but not yet there. Compiled ok, linked. Launches, then an error. Carry short or carry long:[CODE]C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kries
el -cpu condor-gtx1060 -time -carry short -device 0
2018-08-21 15:57:57 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 15:57:57 condor-gtx1060 Exponent 431000021 using FFT 23328K (2^15 *
^6 * 5^0 * 7^0)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSucces
s, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310

C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kries
el -cpu condor-gtx1060 -time -carry long -device 0
2018-08-21 15:58:20 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 15:58:20 condor-gtx1060 Exponent 431000021 using FFT 23328K (2^15 * 3^6 * 5^0 * 7^0)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSucces
s, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310[/CODE][/QUOTE]

Could it be an out-of-memory? each such allocation is for about 200MB of GPU RAM, and it seems one of them failed. What happens with a lower exponent?

kriesel 2018-08-22 00:21

[QUOTE=preda;494408]
Could it be an out-of-memory? each such allocation is for about 200MB of GPU RAM, and it seems one of them failed. What happens with a lower exponent?[/QUOTE]
It's on a 3GB GTX1060, under Windows 7 X64.
[CODE]C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kriesel -cpu condor-gtx1060 -time -carry long -device 0
2018-08-21 18:48:37 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 18:48:37 condor-gtx1060 Exponent 77231809 using FFT 4200K (2^13 * 3^1 * 5^2 * 7^1)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSuccess, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310[/CODE][CODE]C:\users\ken\documents\gpuowl-cuda-build>cudaowl-v38-1fa5f09-w64.exe -user kriesel -cpu condor-gtx1060 -time -carry long -device 0
2018-08-21 19:18:39 condor-gtx1060 gpuowl-CUDA 3.8-1fa5f09
2018-08-21 19:18:39 condor-gtx1060 Exponent 1257787 using FFT 72K (2^13 * 3^2 * 5^0 * 7^0)
Assertion failed: (cudaMalloc((void **)&bufA, N * sizeof(double))) == cudaSuccess, file c:\users\ken\documents\gpuowl-cuda-build\CudaGpu.h, line 310
[/CODE]It's a very immediate halt; GPU-Z doesn't have any time to register nonzero gpu ram usage.

preda 2018-08-22 00:39

[QUOTE=kriesel;494411]It's on a 3GB GTX1060, under Windows 7 X64.
[/QUOTE]

Yes it's not a legitimate OOM. That is the very first cudaMalloc(), and should succeed. Sorry but I don't know why it's failing, I looked and didn't find an obvious error on my side. It also seemed to work for me last time I tried.

It might help a bit to log the error code returned by cudaMalloc(), to try to understand what it doesn't like.

preda 2018-08-22 00:47

[QUOTE=Fredrik;494389]For what it's worth, I for one like the cudaowl project. Before cudaowl, I actually managed to use the OpenCL version on a GTX-960, this was version v1.10-cd3c8ed (we discussed this on GitHub too, issue #3).

Then, I had to use the -longTail option, so this sounds like the tailFused kernel gave trouble also then.

That OpenCL version was a bit faster than the current cudaowl, I got 11.64 ms/it on OpenCL, and 14.34 ms/it on the current cudaowl, for the exponent 75000001, 4096K fft. Your 30 ms/iteration on that much more powerful GTX-1080 is surprisingly bad. Maybe you could test the 1.10 version on it?

Would it be possible to bring back something like the -longTail option?[/QUOTE]

Thanks for the nice words.

No I wasn't really getting 30ms/it. I was getting 30ms for the first run of some kernel, probably because it was compiling PTX-to-ISA in that time. Presumably the subsequent runs are much faster.

But maybe that compilation was tripping the command queue, and there it ends. It looks like Nvidia's sending the message: "OpenCL not working? tough luck, you should use CUDA anyway."

preda 2018-08-22 00:55

CUDA developer/maintainer for cudaOwl wanted
 
If somebody with CUDA experience is interested in taking over porting/maintaining GpuOwl to CUDA, I'm game.

I think this could be set up as a separate/independent project (on github for example); that would start from what is now in cudaOwl, and move from there to improve performance, fix issues, add features etc.

preda 2018-08-22 08:05

I sold my Nvidia GPU (a GTX 1080). I created a branch named "cuda" [url]https://github.com/preda/gpuowl/tree/cuda[/url] where the legacy CUDA code lives for now. I can't compile or test the CUDA implementation myself now.

Sorry for the ones who were looking for PRP on Nvidia. I don't have the bandwidth and motivation to take on that, myself.

kriesel 2018-08-22 19:04

[QUOTE=Fredrik;494389]For what it's worth, I for one like the cudaowl project. Before cudaowl, I actually managed to use the OpenCL version on a GTX-960, this was version v1.10-cd3c8ed (we discussed this on GitHub too, issue #3).

Then, I had to use the -longTail option, so this sounds like the tailFused kernel gave trouble also then.

That OpenCL version was a bit faster than the current cudaowl, I got 11.64 ms/it on OpenCL, and 14.34 ms/it on the current cudaowl, for the exponent 75000001, 4096K fft. Your 30 ms/iteration on that much more powerful GTX-1080 is surprisingly bad. Maybe you could test the 1.10 version on it?

Would it be possible to bring back something like the -longTail option?[/QUOTE]
You built v1.10 on linux? I got a build error on Win7 X64 (msyss2/mingw64) for the last available commit of v1.10. (auto, -fconcepts etc)

Fredrik 2018-08-22 20:08

[QUOTE=kriesel;494472]You built v1.10 on linux? I got a build error on Win7 X64 (msyss2/mingw64) for the last available commit of v1.10. (auto, -fconcepts etc)[/QUOTE]
Yes, on Linux. I remember I also had trouble with -fconcepts. Preda added it to the make file. GitHub issue [URL="https://github.com/preda/gpuowl/issues/8"]here[/URL]


[QUOTE=preda;494430]
Sorry for the ones who were looking for PRP on Nvidia. I don't have the bandwidth and motivation to take on that, myself.[/QUOTE]
No need to appologize - thank you for writing these programs!

tServo 2018-08-23 00:14

appx run tine
 
Sorry for the noob question.
I tired searching for it but ran into the "TMI" problem.

Appx how long ( wall time ) does it take to do a gpuOwl PRP test on an exponent near the current LL wavefront using a decent board such as a Vega ?
TIA

preda 2018-08-23 10:06

[QUOTE=tServo;494494]Sorry for the noob question.
I tired searching for it but ran into the "TMI" problem.

Appx how long ( wall time ) does it take to do a gpuOwl PRP test on an exponent near the current LL wavefront using a decent board such as a Vega ?
TIA[/QUOTE]

About 2days. Depending on cooling/frequency, I get between 2.05 - 2.15 ms/it.


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.