mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Manpowre 2013-05-12 22:01

HyperQ works on titan.
 
So, I got my development environment up and running, and compiled cudalucas all good.. impressed by the titans..

Tested simpleHyperQ sample app too and I can confirm that HyperQ works on titan.

So now I am thinking to enable hyperq for cudalucas, but I am not familiar with the code itself. Ive been cleaning up the code today, and found the kernel calls, which are 4 places in the code. Its the
RDFT function and the normalize function.

I guess the main call for cudalucas is this line of code ?
rftfsub_kernel <<< n / 512, 128 >>> (n, g_x);

Its the most advanced cuda kernel in the code.. so I assume this is the primary call to enable hyperq (if even possible).. Ill give it a try though..

owftheevil 2013-05-12 22:57

That's the pointwise multiplication that comes between the ffts.

NBtarheel_33 2013-05-13 03:43

I've completed an mprime run on 82090207, so that would be a candidate for a CuLu DC if anyone's interested. :smile:

Karl M Johnson 2013-05-13 05:36

[QUOTE=Manpowre;340179]Tested simpleHyperQ sample app too and I can confirm that HyperQ works on titan.[/QUOTE]
Really?!
Lovely news.
I am really glad I was wrong.
The big question is, which of GIMPS apps could HyperQ benefit?
CuLu and CPm1 will not benefit much, right?

Manpowre 2013-05-13 08:12

[QUOTE=Karl M Johnson;340208]Really?!
Lovely news.
I am really glad I was wrong.
The big question is, which of GIMPS apps could HyperQ benefit?
CuLu and CPm1 will not benefit much, right?[/QUOTE]

CuLu, CPm1 (if it can be confirmed to work), and MfaktC.

HyperQ is similar to Intels hyperthreading. its amazing. A and B kernel, Im working on changing the CuLu code to use HyperQ now.. its gonna take a few days mabye weeks, as Cuda is new to me (not programming though) and I see a potential for alot of improvements.

Also, the new function from Nvidia to utilize more GPUs on one system with Cuda lib is something that will benefit from Cuda 5.0 alot. Imagining a 60h process drop to 45h because you can use 2 gpu's within the same code.

Manpowre 2013-05-13 08:15

HyperQ in testcode
 
I did spend 3h during night to add the HyperQ technique into CudaLucas, all 5 GPU calls got streams calls and initialized HyperQ streams and freeing it.
The first section rtsp part runs just fine, but the normalize calls crashes the app.. so I have some debugging to do, and also to setup a and b kernels which I didnt do yet.. but progressing slowly.

Karl M Johnson 2013-05-13 08:22

Oh, sweet, CUDALucas can actually be further improved:smile:
As for using several GPUs, well, that's multi-gpu support, been around since first CUDA gpus, and there are different models for it(one thread controls all gpus, multiple threads control multiple gpus). It's a hard undertaking, can indirectly be compared to converting a single-threaded application into a multi-threaded one. Good to know NV makes it simpler for devs to implement.

Manpowre 2013-05-13 08:51

[QUOTE=Karl M Johnson;340220]Oh, sweet, CUDALucas can actually be further improved:smile:
As for using several GPUs, well, that's multi-gpu support, been around since first CUDA gpus, and there are different models for it(one thread controls all gpus, multiple threads control multiple gpus). It's a hard undertaking, can indirectly be compared to converting a single-threaded application into a multi-threaded one. Good to know NV makes it simpler for devs to implement.[/QUOTE]

Yes, with the Nvidia Library one thread can be spawned across multiple GPU's. I checked out the samples though, and it seems like it has to be done the "old" way by running a for next loop..

It should have been something like initializing multiple gpus, memcopy once to both, and then spawn threads into one device which has multiple gpus attached to it.. but Nvidia isnt there yet.. I guess they will do it very soon..

Though, I just looked at the samples, and I found HyperQ to be the most benefitial atm..

Karl M Johnson 2013-05-13 08:53

[QUOTE=Manpowre;340222]It should have been something like initializing multiple gpus, memcopy once to both, and then spawn threads into one device which has multiple gpus attached to it.. but Nvidia isnt there yet.. I guess they will do it very soon.. [/QUOTE]
I think you need to use the driver api for that, not cudart.

Manpowre 2013-05-13 10:45

[QUOTE=Karl M Johnson;340223]I think you need to use the driver api for that, not cudart.[/QUOTE]

Ill check it out more tonight.. Ive realized what I did with HyperQ yesterday when it failed running, and I will adjust tonight.. its the process of understanding whats going on right ? hehe.. insteresting stuff though.,

TheJudger 2013-05-13 19:35

Hi Carl,

[QUOTE=owftheevil;340103]New and improved version of the memory test. I had to give up the ability to distinguish read and write errors to more closely mimic CuLu and CPm1's memory use patterns. My bad card gave 1555 errors in a 45 minute test, the good card again is without errors for the same test.[/QUOTE]

Could you add fflush(NULL); or similar after your printfs()? Without fflush() I've some trouble with output redirection (bash, using &> memtest.out), e.g. when I start memtest the output isn't updated very often. And CUDA error messages (sometimes?) appear at the begin of the file.

Oliver


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.