![]() |
I'm no CUDA expert, but as far as I remember I haven't downloaded anything other than the standard drivers from [url]www.nvidia.com[/url], and just whatever CUDA drivers are bundled with that. I assume you've downloaded & installed the full driver package (e.g. "260.99_desktop_win7_winvista_64bit_english_whql.exe") ?
|
[QUOTE=Prime95;250529]Beats me. I didn't download any CUDA-specific stuff. I just figured a binary would work without any further downloads.[/QUOTE]
Apply the latest Nvidia drivers? :ermm: |
[QUOTE=axn;250534]Apply the latest Nvidia drivers? :ermm:[/QUOTE]
This is possibly the problem as the binaries are compiled with CUDA 3.2 which require an Nvidia driver that is compatible with CUDA 3.2. The developer drivers for CUDA 3.2 can be downloaded from [url]http://developer.download.nvidia.com/compute/cuda/3_2_prod/drivers/devdriver_3.2_winvista-win7_64_263.06_general.exe[/url] Note that I am using the latest production driver but the developer driver should work. |
[QUOTE=James Heinrich;250533]I'm no CUDA expert, but as far as I remember I haven't downloaded anything other than the standard drivers from [url]www.nvidia.com[/url], and just whatever CUDA drivers are bundled with that. I assume you've downloaded & installed the full driver package (e.g. "260.99_desktop_win7_winvista_64bit_english_whql.exe") ?[/QUOTE]
I had only installed the drivers that were on the CD accompanying the card. I installed the latest drivers and it now passes the self-test. Thanks. Tomorrow, its Ubuntu install and an attempt to get msft's LL tester to compile. |
Did anyone measured the speed loss for 26x.xx drivers?
Since this app uses cudart, it should not be significant(maybe up to 5%). The last proper drivers were 258.96 for Windows, and 256.53 for Linux. Using experimenting, I found out that it's exactly the drivers which cause the speed loss, not toolkits. Same code was recompiled using 258.96+3.1tk, 263.06+3.2tk, 263.06+3.1tk . Speed loss was same in two last cases. |
[QUOTE=Prime95;250559]I had only installed the drivers that were on the CD accompanying the card. I installed the latest drivers and it now passes the self-test. Thanks.
Tomorrow, its Ubuntu install and an attempt to get msft's LL tester to compile.[/QUOTE] In the case where mfaktc failed to run with "cudaStreamCreate() failed" does the CUDA version info indicate any problems? e.g. [CODE]CUDA version info binary compiled for CUDA 3.20 CUDA driver version 3.20 CUDA runtime version 3.20 [/CODE] Looks like version mismatch of drivers is a common issue. :sad: I'll try to figure out if it is at least possible to generate a more detailed error message why the streamcreation failed. This isn't related to stream creation itself, stream creation is just the first stuff which deals with the GPU in my code. [QUOTE=Karl M Johnson;250571]Did anyone measured the speed loss for 26x.xx drivers? Since this app uses cudart, it should not be significant(maybe up to 5%). The last proper drivers were 258.96 for Windows, and 256.53 for Linux. Using experimenting, I found out that it's exactly the drivers which cause the speed loss, not toolkits. Same code was recompiled using 258.96+3.1tk, 263.06+3.2tk, 263.06+3.1tk . Speed loss was same in two last cases.[/QUOTE] Karl, do you have more details on this one? I've noticed a small performance impact (~1%) when moving from CUDA 3.1 toolkit to CUDA 3.2 toolkit and keeping the same driver version on Linux. Oliver |
Well, I've seen up to 12.5% speed loss on Cudart apps, which is probably the worst case scenario.
I cant figure out whether CUDA Driver API apps have more or less perf drop than cudart apps. OpenCL suffered most from 26x.xx . I've seen a terrible loss in performance: 25% in speed was lost. It was exactly the same code, recompiled under 260.19 . If a rollback to 256.53 was made, the code was "back to normal". The toolkit was 3.1 . Switching to 3.2 tk didnt change anything, since the app itself was working with integers, not floating points. So, 3.1 tk + 256.53/258.96 is the latest proper drivers. NV did seriously mess something up in the drivers, and didnt even document it. Bastards:smile: |
A lot of other people reported big speed hits in the Nvidia forum when the 3.2 toolkit came out.
|
[QUOTE=TheJudger;250639]In the case where mfaktc failed to run with "cudaStreamCreate() failed" does the CUDA version info indicate any problems?
e.g. [CODE]CUDA version info binary compiled for CUDA 3.20 CUDA driver version 3.20 CUDA runtime version 3.20 [/CODE] [/QUOTE] No, I was getting 3.10 reported across the board. |
[QUOTE=Prime95;250559]I had only installed the drivers that were on the CD accompanying the card. I installed the latest drivers and it now passes the self-test. Thanks.
Tomorrow, its Ubuntu install and an attempt to get msft's LL tester to compile.[/QUOTE] I just posted a Win x64 port of CudaLucas over in msft's thread, if that's the code you're talking about. Not that you shouldn't install Ubuntu anyway, but it's one step closer to being able to be bundled into Prime95 v27 ... :whistle: |
Hello!
[QUOTE=TheJudger;250639]In the case where mfaktc failed to run with "cudaStreamCreate() failed" does the CUDA version info indicate any problems? e.g. [CODE]CUDA version info binary compiled for CUDA 3.20 CUDA driver version 3.20 CUDA runtime version 3.20 [/CODE] Looks like version mismatch of drivers is a common issue. :sad: I'll try to figure out if it is at least possible to generate a more detailed error message why the streamcreation failed. This isn't related to stream creation itself, stream creation is just the first stuff which deals with the GPU in my code. [/QUOTE] In the (unreleased) mfaktc 0.15 I've added the following code after the CUDA calls (stream creation, memory allocation, ...) [CODE]cudaError = cudaGetLastError(); if(cudaError != cudaSuccess) { printf(" cudaGetLastError() returned %d: %s\n", cudaError, cudaGetErrorString(cudaError)); } [/CODE] Lets see if this creates helpful error messages. I was able to produce some "out of memory" error messages by increasing the number of streams to ~300 on my GTX 470 (1.25GiB memory). Ofcourse this number of streams doesn't make sense but it triggers a failure during memory allocation. Oliver |
| All times are UTC. The time now is 23:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.