![]() |
What's your GPU ? This is my guess: if the card itself is of sm_13 or newer capability, it may run CUDALucas compiled for any architecture. But this is very wrong.
|
[QUOTE=Karl M Johnson;266520]What's your GPU ? This is my guess: if the card itself is of sm_13 or newer capability, it may run CUDALucas compiled for any architecture. But this is very wrong.[/QUOTE]
2.0 But if you compile for sm_10 the compiler replaces doubles with floats: [CODE] ptxas tmpxft_00001188_00000000-4_CUDALucas.ptx, line 93; warning : Double is not supported. Demoting to float [/CODE] I did try the smallest exponent possible so maybe the rounding errors are not apparent. But if anyone knows the specifics of on algorithm they may point out how to trigger the errors due to lower precision. But I'll try on 1.0 board too when I get a chance. |
[QUOTE=apsen;266525]But I'll try on 1.0 board too when I get a chance.[/QUOTE]
You were right - it does not work on 1.0. So much for the specifying architecture - if the toolkit does not honor it. |
[QUOTE=apsen;266557]You were right - it does not work on 1.0. So much for the specifying architecture - if the toolkit does not honor it.[/QUOTE]
Glad to hear everything's working as it should. Only GPUs with compute capability of 1.3, 2.0 and 2.1 support double precision, so it's only logical to use those archs for compiling. This is just a guess, but CUDALucas, compiled with -arch=sm_21, and executed on a sm_21 GPU may actually be faster than compiled with any other arch. As for other GPUs, well, sm_13 should use sm_13, sm_20 Fermis should use sm_13:smile: |
[QUOTE=Karl M Johnson;266576]Glad to hear everything's working as it should.
[/QUOTE] I would exactly say that: it should fail at compile time and not at run time with some cryptic error message. [QUOTE=Karl M Johnson;266576] This is just a guess, but CUDALucas, compiled with -arch=sm_21, and executed on a sm_21 GPU may actually be faster than compiled with any other arch. [/QUOTE] I do not have 2.1 board but I could try to compile for one... |
Yeah, I suspect a lot of users have sm_21 GPUs, since they are in the low-mid price segment.
|
[QUOTE=Karl M Johnson;266360]
I dont know what Nvidia changed, but I've saw overall-slowdown of many CUDA apps for : 3.1 -> 3.2 change(this one was driver related, but 3.2 could not be used on old drivers) and 3.2 -> 4.0(this time it's purely the toolkit which causes the slowdown). [/QUOTE] BTW I could compile with 3.1 x64 or even 2.2 x64 if anyone would like to test them. |
1 Attachment(s)
[QUOTE=Karl M Johnson;266600]Yeah, I suspect a lot of users have sm_21 GPUs, since they are in the low-mid price segment.[/QUOTE]
:whistle: |
[QUOTE=Karl M Johnson;266360]I dont know what Nvidia changed, but I've saw overall-slowdown of many CUDA apps for : 3.1 -> 3.2 change(this one was driver related, but 3.2 could not be used on old drivers)[/QUOTE]
[B]AFAIK[/B] one reason is that starting with 3.2 it is now longer possible to call 32bit device code from 64bit host code so for 64bit host code you're forced to run 64bit device code now. Those 64bit pointers need two registers each instead of one and 64bit integer arithmetic is slower than 32bit integer arithmetic on current GPUs. host code = stuff which runs on [B]C[/B]PU device code = stuff which runs on [B]G[/B]PU [QUOTE=Karl M Johnson;266576]This is just a guess, but CUDALucas, compiled with -arch=sm_21, and executed on a sm_21 GPU may actually be faster than compiled with any other arch.[/QUOTE] I would assume that CUDALucas spents most of its time in the CUFFT library which is precompiled. So I assume that the speed difference is very little if there is some. Oliver P.S. for both cases I'm not 100% sure! |
[QUOTE=apsen;266636]:whistle:[/QUOTE]
Is this CUDA 3.2 or 4.0 and CUDALucas 1.2? Anyone who already timed this build? I downgraded to 1.0b again because of performance on CUDA 3.2 with previous Win64 build. |
8.2285 ms/iter for that sm_21 binary. Sure sucks for a sm_20 GPU.
|
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.