mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

zyzhan 2011-03-18 09:13

@ltd:
try this vs wizard to create cuda project
[URL]http://sourceforge.net/projects/cudavswizard/[/URL]

ltd 2011-03-18 10:36

@zyzhan: Thanks for the link. I will give it a try.

@pschoefer: With my machine (i7 920, W7 64Bit, GTX260 V267.24)
I observed the following. With seven threads doing other DC projects (4 Threads different BOINC projects and 3 threads normal CPU bound LLR)
each of these threads shows 13% load in the task manager. After the starting phase the llrcuda drops to around 2% and GPU-Z shows that the GPU runs between 94-96%. I will make some tests to see what happens if I add another CPU bound LLR thread.

What priority are the other tasks/threads running on?
To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal".

S34960zz 2011-03-18 14:06

[QUOTE=ltd;255634]@pschoefer: With my machine (i7 920, W7 64Bit, GTX260 V267.24)
I observed the following. With seven threads doing other DC projects (4 Threads different BOINC projects and 3 threads normal CPU bound LLR)
each of these threads shows 13% load in the task manager. After the starting phase the llrcuda drops to around 2% and GPU-Z shows that the GPU runs between 94-96%. I will make some tests to see what happens if I add another CPU bound LLR thread.

What priority are the other tasks/threads running on?
To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal".[/QUOTE]

With 8 threads on a 4-core machine, you may be seeing cache contention (the two threads on each core share L1/L2 cache). Your overall throughput may improve if you back off the number of active threads, esp. if you are able to assign affinity for the threads to a particular core. (I looked at this on my i7-840QM using Prime95 v26.5, turns out 4 workers with 1 thread each was highest throughput, followed by 1 worker with 4 threads. My little parametric study: [url]http://www.mersenneforum.org/showpost.php?p=253230&postcount=83[/url]

Not sure of the memory/cache througput vs. CPU-bound for the applications you are running, but you may see a similar trend.

S34960zz 2011-03-18 14:10

[QUOTE=pschoefer;255628]CPU load stayed that high.

After update to driver 267.24, CPU load went down to 1/8 of one core. Unfortunately, GPU load is only ~40%, if all CPU cores are under load. With one core idle, GPU load is at 98%.[/QUOTE]

Where are these latest drivers available (link please) ?

pschoefer 2011-03-18 14:41

[QUOTE=ltd;255634]What priority are the other tasks/threads running on?
To see if it makes a difference you could try to increase the priority of llrcuda from within the taskmanager to "aboveNormal".[/QUOTE]
CPU was running BOINC (PG-PPS LLR), lowest priority according to taskmanager. Increasing the priority of llrcuda didn't help.

[QUOTE=S34960zz;255666]Where are these latest drivers available (link please) ?[/QUOTE]
[url]http://www.nvidia.com/object/win7-winvista-64bit-267.24-beta-driver.html[/url]. It's still beta.

ltd 2011-03-18 15:55

@zyzhan: Thanks again for the informations. Now my build runs also. As I thought it was a wrong CUDA build configuration.

Mathew 2011-03-18 23:17

using zyzhan's exe I tested the first and last prime in this thread. I got the following
C:\Users\mathew\Desktop\llrcuda.0.60.win64>llrcuda.exe -d -q"46157*2^698207+1"
Starting Proth prime test of 46157*2^698207+1
Using complex irrational base DWT, FFT length = 131072, a = 3

[B]46157*2^698207+1 is prime![/B] Time : 686.588 sec.. Time per bit: 0.888 ms. same as ltd

C:\Users\mathew\Desktop\llrcuda.0.60.win64>llrcuda.exe -d -q"5*2^23473+1"
[B]too small Exponent...[/B] not same as msft

Edit: but the same as [URL="http://www.mersenneforum.org/showpost.php?p=251822&postcount=122"]ltd's post [/URL]

What is the min exponent size that can be tested?

Thanks everyone

x3mEn 2011-03-19 08:43

Does llrcuda supports gpuaffinity?
Looks like not yet...

Brain 2011-03-19 10:04

GPU affinity
 
[QUOTE=x3mEn;256076]Does llrcuda supports gpuaffinity?
Looks like not yet...[/QUOTE]
See here:
[QUOTE=msft;253592]Support affinity.[/QUOTE]
This means yes..!?

x3mEn 2011-03-19 10:41

[QUOTE=Brain;256080]See here:

This means yes..!?[/QUOTE]

Hm... GeneferCUDA really supports GPU affinity,
but llrcuda.0.60 doesn't... any idea?

nuggetprime 2011-03-19 13:57

This is a question to msft:
Is it possible to implement testing multiple candidates at the same time on one GPU? I think this would greatly improve throughput. Just like on a quad-core CPU you get about 3x more throughput if you test 4 candidates on 4 cores than 1 candidate on 4 cores.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.