mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   GPU LL Testing FAQ (https://www.mersenneforum.org/showthread.php?t=16142)

bcp19 2011-11-14 04:37

[QUOTE=LaurV;278210]Thanks for the small tutorial. [URL="http://www.mersenneforum.org/showpost.php?p=277670&postcount=706"]My babe[/URL] came on last Saturday and I spent the weekend installing stuff on it. I will definitely go for CudaLucas on DC exponents for a while, until I am convinced that all residues match, then I will switch to some other jobs, like LL-front or so-much-debated-TF-front. That is my choice for now, so I don't want to hear any argument.

So, CudaLucas installed and running. So far so good. I use 64 bit version, on Win7. Just as a small observation, -c[xxx] switch does not work, no matter what I put there, it will still output every 10k iterations on screen (did someone tried with other value except the default one?). This is a minor problem, and it is just FYI, of course I can live with it.

[B]My biggest problem is that I don't know how to convince CudaLucas (or a second/third, etc. copy of it) to run on the second GPU. Can anyone help?[/B] I have carefully read all the 36 pages of the GPU-thread on the forum (an related) but did not find too much.

If I start one copy of CudaLucas, about 75-80% of the first GPU is busy, and I get like 3.5ms per iteration (~25-30M range). If I start a second copy, then the same first GPU goes to 99%, and the time decrease per each CL process to about 4.5ms per iteration. Still reasonable. If I continue to launch copies of CL, they will all fight for the same GPU (and the time per iteration decreasing accordingly). The other one is plain empty.

Tried also CL 64 with 4.0, same result. Also, -t switch does not seems to work for any of them. Cuda capability is 2.0. Any switch I am missing for CL?[/QUOTE]

The -c switch is for how many iterations between outputs to the checkpoint file, not the screen output.

the -t switch only seems to be working on the 1.2b version

open a command prompt, change to your cudalucas directory and type cudalucas /? to get a list of switches

Unfortunately, I have no clue on getting cudalucas to work on the 2nd GPU.

Dubslow 2011-11-14 06:30

[QUOTE=LaurV;278210] [URL="http://www.mersenneforum.org/showpost.php?p=277670&postcount=706"]My babe[/URL][/QUOTE]

Wait, 1.2 TFLOPS? What the hell is on that thing?

LaurV 2011-11-14 06:38

[QUOTE=bcp19;278212]The -c switch is for how many iterations between outputs to the checkpoint file, not the screen output.

the -t switch only seems to be working on the 1.2b version

open a command prompt, change to your cudalucas directory and type cudalucas /? to get a list of switches

Unfortunately, I have no clue on getting cudalucas to work on the 2nd GPU.[/QUOTE]

Thanks. The /? I figured out in the very beginning, this is the first thing one does when he gets a new toy, he write "toy /?" at the command prompt :D
About -t I figured out on the forum, just before reading your post. Eager to go home in the evening, to try. About -c, I did not know. Thanks for telling me. Somehow I think that the "printf" used there is the same slow as a disk writing, especially when you have a SSD, and I wonder why the -c does not work for the screen too. I mean, if I use a redirection to a file, that is anyhow writing on disk. So, -c should affect both the screen and the outputs to the checkpoint file. Output to screen every 100k or even larger for a bigger exponent is ok. Whatever...

Seems like I still can't find how to run CudaLucas on both GPU's, and up to now the only profitable solution not to let the the second GPU to sleep, is to run one CudaLucas and one mfaktc (I am aware of the -d switch of the mfaktc, which selects the gpu, I did not try mfaktc yet, I would still prefer to run more CudaLucas instances, as that would let the CPU's free to do P-1. I am also aware of the fact that SLI should be disabled for that to work, as someone said in another thread here, I did not try mfaktc, but for CudaLucas I have tried both SLI and no-SLI, I can not cheat it to run different copies on different gpu's).

Conspiracy theory: I am sure someone knows the answer, but they refuse to tell me, to make me run mfaktc (and therefore TF, see the big debate around) :P:P:smile:

LaurV 2011-11-14 07:35

[QUOTE=Dubslow;278220]Wait, 1.2 TFLOPS? What the hell is on that thing?[/QUOTE]
You are right! The hell is in that thing! And it is (theoretical) 1.3, not 1.2. I will put a photo when I get home, if you tell me how to run cudalucas on both gpu's. :smile:

Dubslow 2011-11-14 07:42

Erm... sorry, no se.
What's the hardware?

frmky 2011-11-14 08:30

A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.

ET_ 2011-11-14 13:23

[QUOTE=frmky;278232]A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.[/QUOTE]

Now we all will eagerly wait for LaurV and his photos... :smile:

Luigi

LaurV 2011-11-14 14:06

[QUOTE=frmky;278232]A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.[/QUOTE]

Wow! Amazing! That is working! And there is no need to disable SLI. I used uppercase D (did not try smaller case d).

Iteration 9650000/27777653, ETA 18 hours, and (the one started later) Iteration 2630000/27863639, ETA 24 hours.

Thanks a billion! If we meet in RL, you have a beer from me!

(edit: this is in parallel with 4 P-1 on P95, another 8 waiting in the queue, and splitting the terms of aliquot 585000 with 4 threads of yafu! It feels no delay, it feels nothing except a lot of heat coming from under the desk...)

bcp19 2011-11-14 20:13

No pictures?

Dubslow 2011-11-14 22:07

At least 8 threads... but good performance, so not a Bulldozer?

LaurV 2011-11-15 03:32

[QUOTE=Dubslow;278352] so not a Bulldozer?[/QUOTE]
Definitely not. I have read bad things about AMD, right here on this forum :D
About the photos, I really tried, but the 240kB limitation of the forum pissed me off, I have to make them either low resolution or tough jpg compression, in either case you can't see nothing clear... I will try again tonight.


All times are UTC. The time now is 14:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.