![]() |
@ltd:
try this vs wizard to create cuda project [URL]http://sourceforge.net/projects/cudavswizard/[/URL] |
@zyzhan: Thanks for the link. I will give it a try.
@pschoefer: With my machine (i7 920, W7 64Bit, GTX260 V267.24) I observed the following. With seven threads doing other DC projects (4 Threads different BOINC projects and 3 threads normal CPU bound LLR) each of these threads shows 13% load in the task manager. After the starting phase the llrcuda drops to around 2% and GPU-Z shows that the GPU runs between 94-96%. I will make some tests to see what happens if I add another CPU bound LLR thread. What priority are the other tasks/threads running on? To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal". |
[QUOTE=ltd;255634]@pschoefer: With my machine (i7 920, W7 64Bit, GTX260 V267.24)
I observed the following. With seven threads doing other DC projects (4 Threads different BOINC projects and 3 threads normal CPU bound LLR) each of these threads shows 13% load in the task manager. After the starting phase the llrcuda drops to around 2% and GPU-Z shows that the GPU runs between 94-96%. I will make some tests to see what happens if I add another CPU bound LLR thread. What priority are the other tasks/threads running on? To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal".[/QUOTE] With 8 threads on a 4-core machine, you may be seeing cache contention (the two threads on each core share L1/L2 cache). Your overall throughput may improve if you back off the number of active threads, esp. if you are able to assign affinity for the threads to a particular core. (I looked at this on my i7-840QM using Prime95 v26.5, turns out 4 workers with 1 thread each was highest throughput, followed by 1 worker with 4 threads. My little parametric study: [url]http://www.mersenneforum.org/showpost.php?p=253230&postcount=83[/url] Not sure of the memory/cache througput vs. CPU-bound for the applications you are running, but you may see a similar trend. |
[QUOTE=pschoefer;255628]CPU load stayed that high.
After update to driver 267.24, CPU load went down to 1/8 of one core. Unfortunately, GPU load is only ~40%, if all CPU cores are under load. With one core idle, GPU load is at 98%.[/QUOTE] Where are these latest drivers available (link please) ? |
[QUOTE=ltd;255634]What priority are the other tasks/threads running on?
To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal".[/QUOTE] CPU was running BOINC (PG-PPS LLR), lowest priority according to taskmanager. Increasing the priority of llrcuda didn't help. [QUOTE=S34960zz;255666]Where are these latest drivers available (link please) ?[/QUOTE] [url]http://www.nvidia.com/object/win7-winvista-64bit-267.24-beta-driver.html[/url]. It's still beta. |
@zyzhan: Thanks again for the informations. Now my build runs also. As I thought it was a wrong CUDA build configuration.
|
using zyzhan's exe I tested the first and last prime in this thread. I got the following
C:\Users\mathew\Desktop\llrcuda.0.60.win64>llrcuda.exe -d -q"46157*2^698207+1" Starting Proth prime test of 46157*2^698207+1 Using complex irrational base DWT, FFT length = 131072, a = 3 [B]46157*2^698207+1 is prime![/B] Time : 686.588 sec.. Time per bit: 0.888 ms. same as ltd C:\Users\mathew\Desktop\llrcuda.0.60.win64>llrcuda.exe -d -q"5*2^23473+1" [B]too small Exponent...[/B] not same as msft Edit: but the same as [URL="http://www.mersenneforum.org/showpost.php?p=251822&postcount=122"]ltd's post [/URL] What is the min exponent size that can be tested? Thanks everyone |
Does llrcuda supports gpuaffinity?
Looks like not yet... |
GPU affinity
[QUOTE=x3mEn;256076]Does llrcuda supports gpuaffinity?
Looks like not yet...[/QUOTE] See here: [QUOTE=msft;253592]Support affinity.[/QUOTE] This means yes..!? |
[QUOTE=Brain;256080]See here:
This means yes..!?[/QUOTE] Hm... GeneferCUDA really supports GPU affinity, but llrcuda.0.60 doesn't... any idea? |
This is a question to msft:
Is it possible to implement testing multiple candidates at the same time on one GPU? I think this would greatly improve throughput. Just like on a quad-core CPU you get about 3x more throughput if you test 4 candidates on 4 cores than 1 candidate on 4 cores. |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.