mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   llrCUDA (https://www.mersenneforum.org/showthread.php?t=14608)

x3mEn 2011-03-19 14:11

[B]nuggetprime[/B],
run 4 apps simultaniously and you'll get testing 4 candidates at the same time on one GPU, what's the problem?

I asked about another. I have 2 GPUs. When I use GeneferCUDA, 2 jobs are working each at its own GPU. But if change PRPNet port, which calls llr, 2 llrcuda apps are working at the 1st GPU. Could anybody help me? I can show inis, logs or screenshots if it needs.

msft 2011-03-19 14:19

[QUOTE=x3mEn;256083]Hm... GeneferCUDA really supports GPU affinity,
but llrcuda.0.60 doesn't... any idea?[/QUOTE]
[CODE]
CPU_AFFINITY = (unsigned int) IniGetInt (INI_FILE, "Affinity", 99);
[/CODE]
you need use Llr.int file.

msft 2011-03-19 14:21

[QUOTE=nuggetprime;256090]This is a question to msft:
Is it possible to implement testing multiple candidates at the same time on one GPU? I think this would greatly improve throughput. Just like on a quad-core CPU you get about 3x more throughput if you test 4 candidates on 4 cores than 1 candidate on 4 cores.[/QUOTE]
Sorry I can not test,Now.

x3mEn 2011-03-19 14:41

[QUOTE=msft;256094][CODE]
CPU_AFFINITY = (unsigned int) IniGetInt (INI_FILE, "Affinity", 99);
[/CODE]
you need use Llr.int file.[/QUOTE]
msft, you are right, llr.ini helped

nuggetprime 2011-03-19 14:48

[QUOTE=x3mEn;256093][B]nuggetprime[/B],
run 4 apps simultaniously and you'll get testing 4 candidates at the same time on one GPU, what's the problem?

I asked about another. I have 2 GPUs. When I use GeneferCUDA, 2 jobs are working each at its own GPU. But if change PRPNet port, which calls llr, 2 llrcuda apps are working at the 1st GPU. Could anybody help me? I can show inis, logs or screenshots if it needs.[/QUOTE]
Have you got a GPU where you can test how much slower it is with 4 instances than with 1?
From what I read in the previous posts,speed at the moment is about that of 1-1.5 cores of a cheap quadcore (Athlon II X4 640),at about twice the price and power consumption.
msft,do you think that in the next 1-2 years the code will gain so much speed that a say 100 dollar GPU outperforms a 100 dollar CPU for throughput?
Is it useful to invest in a fast GPU(GTX 560 TI) today or should I wait for something better to show up?

msft 2011-03-19 15:07

[QUOTE=nuggetprime;256098]Have you got a GPU where you can test how much slower it is with 4 instances than with 1?
From what I read in the previous posts,speed at the moment is about that of 1-1.5 cores of a cheap quadcore (Athlon II X4 640),at about twice the price and power consumption.
msft,do you think that in the next 1-2 years the code will gain so much speed that a say 100 dollar GPU outperforms a 100 dollar CPU for throughput?
Is it useful to invest in a fast GPU(GTX 560 TI) today or should I wait for something better to show up?[/QUOTE]
I understand.
If someone take me FFT source code,I can 10% speedup .
Anyway CUDALucas's Speed depend memory band width.

x3mEn 2011-03-19 16:24

[QUOTE=nuggetprime;256098]Have you got a GPU where you can test how much slower it is with 4 instances than with 1?
[/QUOTE]
The feature is that even if 4th threads have equal priority (for example 3 [Middle]), active thread takes a lion share of GPU resource.
Between 2 jobs the second is 3 times slower than active one. So I don't know how to test correctly what you are asking. [B]msft[/B] can probably advise something...

msft 2011-03-19 22:29

alice in wonderland
 
In computer world,
measure is not linear,
3x quicker machine need 9x Cost&Power.
Timemachine is very expensive.:lol:

em99010pepe 2011-03-19 22:47

[QUOTE=nuggetprime;256098]
Is it useful to invest in a fast GPU(GTX 560 TI) today or should I wait for something better to show up?[/QUOTE]

For the moment use it to sieve instead. LLR on CPU's.

Ralf Recker 2011-03-20 08:41

Has anyone already tried to reduce the number of threads and increase the workload per thread (Better(?) latency hiding/ILP as described by Volkov et al. in various papers and presentations) for example in the transpose functions?

msft 2011-03-20 09:33

[QUOTE=Ralf Recker;256157]Has anyone already tried to reduce the number of threads and increase the workload per thread (Better(?) latency hiding/ILP as described by Volkov et al. in various papers and presentations) for example in the transpose functions?[/QUOTE]
Yes.I try tune for my GTX460,
with target FFT length is over 2048k.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.