![]() |
GPU LL Testing FAQ
Q. Can I use my GPU for LL testing?
A. Yes, with the program called CUDALucas, with NVidia supported GPUs. Q. Where can I get the software? A. See the attached PDF guide. Q. How do I get work for CUDALucas? A. You will need to get manually get work from PrimeNet like mfaktc, however CUDALucas currently doesn't use a worktodo file. You test an exponent by using a command line argument; or, to test multiple exponents, use a batch file. Q. Where are the results stored? A. mersarch.txt PDF Guide: [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
Q: DO you have examples of command line entries to run Cudalucas?
I know I can type cudalucas 5412xxx and it'll start running, but how do I get information like mfaktc has, IE: Time spent, time left as well as this checkpoint file it says it cannot find when I start it |
The brief batch file example is actually in [URL="http://www.mersenneforum.org/showthread.php?t=15545"]Best 4XX series GPU .[/URL]
But I'll reprise here. Spaces between lines added for readability. First, create a new text file with whatever name you like, but change the extension from .txt to .bat. Windows will warn about changing extensions. Click Yes or OK, or whatever the affirmative answer is. Once you name the file with .bat, you must right click on it and choose Edit from the menu. If you double click it will try to run the batch. EXAMPLE: CUDALucas.bat (cudalucas referred to as "CL" hereafter.) Then type the following lines into the opened (R-click>Edit) file: e: (if you need to be on a different drive from where the prompt starts. E: happens to be where my CL is. Not needed if you're on already the right drive.) cd \CUDA\CUDALucas.1.2b (changes to CL directory.) CUDALucas_cuda3-2_sm_13_WIN64.exe -c10000 5318xxxx pause The third line has [program name] (a space followed by) -c[#######] (which sets the number of iterations between screen outputs. I used 10000 because it was in the sample command line in Brain's FAQ.) (a space followed by) [put exponent here] >The third line may be repeated as many times as you desire, just with different exponents. This will keep feeding CL non-stop for long runs. Don't put any of the square brackets in, just the numbers or names. There are quite a few other command line switches for CL which can be added to the third line after the program name, but I'm not entirely sure what many of them do. They are always preceded by a "-" (dash). The last line "pause" is optional. It should hold the prompt window open so you can see the final output when the program runs out of work and stops. I have yet to get there with CL (no completed runs) so I don't know what that final screen would look like. It will end with "Press any key to continue...." When you do, the prompt will close. Note that the prompt window will have the name of the batch file in the title bar. Similar batch files can be used for mfaktc. |
[QUOTE=bcp19;274771]Q: DO you have examples of command line entries to run Cudalucas?
I know I can type cudalucas 5412xxx and it'll start running, but how do I get information like mfaktc has, IE: Time spent, time left as well as this checkpoint file it says it cannot find when I start it[/QUOTE] See the previous post on batch files. But to address your questions: 1) I don't know how to get the kind of info that mfaktc puts out. Sorry.:no: 2) The checkpoint file message only occurs on the first time you start the program with an exponent. If it does not find a checkpoint file, it starts the exponent from the beginning. After that, it will have created one in the program folder so it can pick up processing from where it left off. |
[QUOTE=bcp19;274771]Q: DO you have examples of command line entries to run Cudalucas?
I know I can type cudalucas 5412xxx and it'll start running, but how do I get information like mfaktc has, IE: Time spent, time left as well as this checkpoint file it says it cannot find when I start it[/QUOTE] cudalucas.exe -c10000 54123893 That will run cudalucas to output the time between each 10000 iterations, as well as what fft size it's using, and current progress. (It doesn't give an eta, but that's easy to calculate given the time per x iterations.) |
[QUOTE=Dubslow;274780]cudalucas.exe -c10000 54123893
That will run cudalucas to output the time between each 10000 iterations, as well as what fft size it's using, and current progress. (It doesn't give an eta, but that's easy to calculate given the time per x iterations.)[/QUOTE] You running the 32 or 64 bit version? I'm running the 32 bit and all I see every 10k iterations is: Iteration 10000 M< xxxxx >C, 0x90e3daf558958134, n = 4194304, CUDALucas v1.2 Unless I sit and time how long from one iteration to the next (I'm not that interested), I have no time reference to use. |
You need to add the -t:
CUDALucas.cuda3.2.sm_13.WIN64.exe -t -c10000 54123893 then you get output like this: Iteration 10000 M( xxxxxxxx )C, 0x1aa69a25e1ccb38b, n = 4194304, CUDALucas v1.2b (7:36 real, 18.2564 ms/iter, ETA 201:42:26) |
[QUOTE=ATH;274852]You need to add the -t:
CUDALucas.cuda3.2.sm_13.WIN64.exe -t -c10000 54123893 then you get output like this: Iteration 10000 M( xxxxxxxx )C, 0x1aa69a25e1ccb38b, n = 4194304, CUDALucas v1.2b (7:36 real, 18.2564 ms/iter, ETA 201:42:26)[/QUOTE] That's a bit of useful information! :smile: Luigi |
[QUOTE=ATH;274852]You need to add the -t:
CUDALucas.cuda3.2.sm_13.WIN64.exe -t -c10000 54123893[/QUOTE] Thanks for that, ATH! |
[QUOTE=ATH;274852]You need to add the -t:
CUDALucas.cuda3.2.sm_13.WIN64.exe -t -c10000 54123893 then you get output like this: Iteration 10000 M( xxxxxxxx )C, 0x1aa69a25e1ccb38b, n = 4194304, CUDALucas v1.2b (7:36 real, 18.2564 ms/iter, ETA 201:42:26)[/QUOTE] Thanks! Just what I was looking for. |
Sorry for the double post, won't let me edit my last. It must be somethign in the 32 bit version, even using the -t parameter I am not getting any timing indications. :(
|
I haven't checked in 32bit, I don't think. It worked in 64bit. I'll try it the next time I boot into XP-32.
I am running CUDALucas.cuda4.0.sm_13.WIN64. and CUDALucas.1.2.Win32 |
Again, weird. When I run it just with the -c flag, I get outputs of the following form:
Iteration 10000 2:52 real M( 53---xxx )C, 0xxxx----xxxx----, n = 4194304, CUDALucas v1.2 indicating there were 2 minuts 52 seconds from the previous output Running precompiled Win7 64 bit |
I get no timing without the -t flag but I'm using version 1.2b while you are using 1.2. Links to 1.2b and the 2 dll-files are in the pdf guide.
|
[QUOTE=ATH;274944]I get no timing without the -t flag but I'm using version 1.2b while you are using 1.2. Links to 1.2b and the 2 dll-files are in the pdf guide.[/QUOTE]
Doesn't the 1.2b version need to be compiled? The 1.2 I downloaded was precompiled as I am not that computer savvy. Also, the 1.2b link on the PDF has a 64 bit executable in it, which won't run on a 32 bit machine. |
[QUOTE=ET_;274858]That's a bit of useful information! :smile:
Luigi[/QUOTE] Too bad that on CUDALucas v1.2 Linux 64 bit doesn't work... Luigi |
How does it not work? For me, (after numerous issues getting it to compile at all, or even to start,) now it throws some error about device count.
|
Hey, Garo, while we are on the subject, CUDALucas does require compute capability 2.0 or so. That means I can't put my low-end GT220 to work with CUDALucas....not that I'm upset, but it's worth noting that minimum requirement.
|
[QUOTE=Christenson;275144]Hey, Garo, while we are on the subject, CUDALucas does require compute capability 2.0 or so. That means I can't put my low-end GT220 to work with CUDALucas....not that I'm upset, but it's worth noting that minimum requirement.[/QUOTE]
Computer capability 1.3. I'm running CUDALucas with a GTX 275. Luigi |
So, my GTS 250 with compute capability 1.1 cannot be used with CUDALucas at all?
|
[QUOTE=Wizzard;277366]So, my GTS 250 with compute capability 1.1 cannot be used with CUDALucas at all?[/QUOTE]
correct! Oliver |
[QUOTE=kladner;274778]The brief batch file example is actually in [URL="http://www.mersenneforum.org/showthread.php?t=15545"]Best 4XX series GPU .[/URL]
But I'll reprise here. <snip> EXAMPLE: cd \CUDA\CUDALucas.1.2b (changes to CL directory.) CUDALucas_cuda3-2_sm_13_WIN64.exe -c10000 5318xxxx pause The third line has [program name] (a space followed by) -c[#######] (which sets the number of iterations between screen outputs. I used 10000 because it was in the sample command line in Brain's FAQ.) (a space followed by) [/QUOTE] Thanks for the small tutorial. [URL="http://www.mersenneforum.org/showpost.php?p=277670&postcount=706"]My babe[/URL] came on last Saturday and I spent the weekend installing stuff on it. I will definitely go for CudaLucas on DC exponents for a while, until I am convinced that all residues match, then I will switch to some other jobs, like LL-front or so-much-debated-TF-front. That is my choice for now, so I don't want to hear any argument. So, CudaLucas installed and running. So far so good. I use 64 bit version, on Win7. Just as a small observation, -c[xxx] switch does not work, no matter what I put there, it will still output every 10k iterations on screen (did someone tried with other value except the default one?). This is a minor problem, and it is just FYI, of course I can live with it. [B]My biggest problem is that I don't know how to convince CudaLucas (or a second/third, etc. copy of it) to run on the second GPU. Can anyone help?[/B] I have carefully read all the 36 pages of the GPU-thread on the forum (an related) but did not find too much. If I start one copy of CudaLucas, about 75-80% of the first GPU is busy, and I get like 3.5ms per iteration (~25-30M range). If I start a second copy, then the same first GPU goes to 99%, and the time decrease per each CL process to about 4.5ms per iteration. Still reasonable. If I continue to launch copies of CL, they will all fight for the same GPU (and the time per iteration decreasing accordingly). The other one is plain empty. Tried also CL 64 with 4.0, same result. Also, -t switch does not seems to work for any of them. Cuda capability is 2.0. Any switch I am missing for CL? |
[QUOTE=LaurV;278210]Thanks for the small tutorial. [URL="http://www.mersenneforum.org/showpost.php?p=277670&postcount=706"]My babe[/URL] came on last Saturday and I spent the weekend installing stuff on it. I will definitely go for CudaLucas on DC exponents for a while, until I am convinced that all residues match, then I will switch to some other jobs, like LL-front or so-much-debated-TF-front. That is my choice for now, so I don't want to hear any argument.
So, CudaLucas installed and running. So far so good. I use 64 bit version, on Win7. Just as a small observation, -c[xxx] switch does not work, no matter what I put there, it will still output every 10k iterations on screen (did someone tried with other value except the default one?). This is a minor problem, and it is just FYI, of course I can live with it. [B]My biggest problem is that I don't know how to convince CudaLucas (or a second/third, etc. copy of it) to run on the second GPU. Can anyone help?[/B] I have carefully read all the 36 pages of the GPU-thread on the forum (an related) but did not find too much. If I start one copy of CudaLucas, about 75-80% of the first GPU is busy, and I get like 3.5ms per iteration (~25-30M range). If I start a second copy, then the same first GPU goes to 99%, and the time decrease per each CL process to about 4.5ms per iteration. Still reasonable. If I continue to launch copies of CL, they will all fight for the same GPU (and the time per iteration decreasing accordingly). The other one is plain empty. Tried also CL 64 with 4.0, same result. Also, -t switch does not seems to work for any of them. Cuda capability is 2.0. Any switch I am missing for CL?[/QUOTE] The -c switch is for how many iterations between outputs to the checkpoint file, not the screen output. the -t switch only seems to be working on the 1.2b version open a command prompt, change to your cudalucas directory and type cudalucas /? to get a list of switches Unfortunately, I have no clue on getting cudalucas to work on the 2nd GPU. |
[QUOTE=LaurV;278210] [URL="http://www.mersenneforum.org/showpost.php?p=277670&postcount=706"]My babe[/URL][/QUOTE]
Wait, 1.2 TFLOPS? What the hell is on that thing? |
[QUOTE=bcp19;278212]The -c switch is for how many iterations between outputs to the checkpoint file, not the screen output.
the -t switch only seems to be working on the 1.2b version open a command prompt, change to your cudalucas directory and type cudalucas /? to get a list of switches Unfortunately, I have no clue on getting cudalucas to work on the 2nd GPU.[/QUOTE] Thanks. The /? I figured out in the very beginning, this is the first thing one does when he gets a new toy, he write "toy /?" at the command prompt :D About -t I figured out on the forum, just before reading your post. Eager to go home in the evening, to try. About -c, I did not know. Thanks for telling me. Somehow I think that the "printf" used there is the same slow as a disk writing, especially when you have a SSD, and I wonder why the -c does not work for the screen too. I mean, if I use a redirection to a file, that is anyhow writing on disk. So, -c should affect both the screen and the outputs to the checkpoint file. Output to screen every 100k or even larger for a bigger exponent is ok. Whatever... Seems like I still can't find how to run CudaLucas on both GPU's, and up to now the only profitable solution not to let the the second GPU to sleep, is to run one CudaLucas and one mfaktc (I am aware of the -d switch of the mfaktc, which selects the gpu, I did not try mfaktc yet, I would still prefer to run more CudaLucas instances, as that would let the CPU's free to do P-1. I am also aware of the fact that SLI should be disabled for that to work, as someone said in another thread here, I did not try mfaktc, but for CudaLucas I have tried both SLI and no-SLI, I can not cheat it to run different copies on different gpu's). Conspiracy theory: I am sure someone knows the answer, but they refuse to tell me, to make me run mfaktc (and therefore TF, see the big debate around) :P:P:smile: |
[QUOTE=Dubslow;278220]Wait, 1.2 TFLOPS? What the hell is on that thing?[/QUOTE]
You are right! The hell is in that thing! And it is (theoretical) 1.3, not 1.2. I will put a photo when I get home, if you tell me how to run cudalucas on both gpu's. :smile: |
Erm... sorry, no se.
What's the hardware? |
A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.
|
[QUOTE=frmky;278232]A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.[/QUOTE]
Now we all will eagerly wait for LaurV and his photos... :smile: Luigi |
[QUOTE=frmky;278232]A quick look at the source indicates that the unadvertised -D switch selects the GPU. GPU numbering starts at 0, so with two GPU's use -D0 and -D1.[/QUOTE]
Wow! Amazing! That is working! And there is no need to disable SLI. I used uppercase D (did not try smaller case d). Iteration 9650000/27777653, ETA 18 hours, and (the one started later) Iteration 2630000/27863639, ETA 24 hours. Thanks a billion! If we meet in RL, you have a beer from me! (edit: this is in parallel with 4 P-1 on P95, another 8 waiting in the queue, and splitting the terms of aliquot 585000 with 4 threads of yafu! It feels no delay, it feels nothing except a lot of heat coming from under the desk...) |
No pictures?
|
At least 8 threads... but good performance, so not a Bulldozer?
|
[QUOTE=Dubslow;278352] so not a Bulldozer?[/QUOTE]
Definitely not. I have read bad things about AMD, right here on this forum :D About the photos, I really tried, but the 240kB limitation of the forum pissed me off, I have to make them either low resolution or tough jpg compression, in either case you can't see nothing clear... I will try again tonight. |
Just upload them to filesmelt.com and provide links... that's what I do when I can't get attachments here to work.
Also... way to avoid the question :smile: |
[QUOTE=Dubslow;278352]At least 8 threads... but good performance, so not a Bulldozer?[/QUOTE]
Hehe, and as it [URL="http://www.overclock.net/t/1141188/asus-crosshair-v-formula-board-may-have-hampered-bulldozer"]turns out[/URL], this was a very "[URL="http://www.xbitlabs.com/news/cpu/display/20111013232215_Ex_AMD_Engineer_Explains_Bulldozer_Fiasco.html"]smart[/URL]" move of AMD. They forced all Bolldozer-reviewers to use a mainboard that really sucked with Bulldozer. When using a board that not just runs Bulldozer, but really [B]supports[/B] it, performance returns to usable figures. Still not the the performance that would hold up to the CPU name and flatten everything. |
[QUOTE=LaurV;278407]Definitely not. I have read bad things about AMD, right here on this forum :D
About the photos, I really tried, but the 240kB limitation of the forum pissed me off, I have to make them either low resolution or tough jpg compression, in either case you can't see nothing clear... I will try again tonight.[/QUOTE] Hmm.... we still haven't found out. Can you please tell us? |
After I've run CudaLucas and have some results, where do I post them?
Do I send it somewhere or what? |
Send the results to Primenet Server using the Manual Testing pages (be sure to be logged in to get the credit for it).
|
To expand on what lycorn said, click the (+) next to Manual Testing and then select Results
|
Thank you :)
|
100M digits
CudaLucas is running pretty well for 55.xxx.xxx exponents.:cool:
But it does'nt work for 332.xxx.xxx what can i do? |
Hi ,f11ksx
[QUOTE=f11ksx;283514] But it does'nt work for 332.xxx.xxx what can i do?[/QUOTE] This is memory size issue. My GTX-550Ti(1G byte) is not enough M332220523. Somebody who does it a try? |
Hi msft
I have 1536 Mb with the gtx-580 Do you mean it is not enough, and there is no solutions? :cry: |
[QUOTE=f11ksx;283566]
I have 1536 Mb with the gtx-580 Do you mean it is not enough, and there is no solutions? :cry:[/QUOTE]Correct. For each exponent range, there is a minimum FFT size needed, and no way to reduce that requirement. There are tables of FFT size vs. exponent range around here somewhere. |
[QUOTE=f11ksx;283566]I have 1536 Mb with the gtx-580
Do you mean it is not enough, and there is no solutions? :cry:[/QUOTE] It is not enough with CUDALucas 1.3,enough with CUDALucas 1.4. I guess. |
Thank you for the answers :smile:
|
Thoughts...
So I finally got around to installing CUDALucas 1.2b to use with my GTX 580s. I've been TFing 8 instances with two HD 5870s and two 580s. The 580s are faster.
I dropped one instance of mfaktc for CUDALucas and I can't believe how fast it is for LL testing. I haven't run a full 4 cores on one LL in a while, but even with 3 instances of mfakc running the LL is only going to take ~60 hours. I set 3 cores to run mfaktc and one core for CUDA. On my x9650 if I don't set it up that way it makes the system slow because TFing drives the cores to 100% on the nVidia cards.. So, the reason for my post is that I kinda feel like I'm wasting time using CPUs to LL or TF anymore. I have several systems that are runing LLs that might be better off doing something else. I know it's better to have them do something rather than just sit, but in the time it takes them to do one LL I could finish all my current assignments with CUDA (and that's just using the one 580). Once 580s and 590s (and whatever else is coming) drop in price, we're going to be able to make a huge dent into LLing and TFing. And the other systems can work P-1 or easier DC checks. I can't wait to pickup some more cards that can run CUDA. Hopefully Windows 8 will fix the Bulldozer problem so I can use some of that system for LL or TF also. Just curious what everyone's thoughts are on this? [QUOTE=LaurV;278227]You are right! The hell is in that thing! And it is (theoretical) 1.3, not 1.2. I will put a photo when I get home, if you tell me how to run cudalucas on both gpu's. :smile:[/QUOTE] BTW - LaurV, still curious as to what you ordered... I've seen some of the [URL="http://www.throughwave.co.th/resources/products/supermicro/gpu_4page.pdf"]SuperMicro[/URL] GPU supercomputing server solutions. Is it something like that? Pictures?? :bow: |
[QUOTE=flashjh;283909]
I dropped one instance of mfaktc for CUDALucas and I can't believe how fast it is for LL testing. I haven't run a full 4 cores on one LL in a while, but even with 3 instances of mfakc running the LL is only going to take ~60 hours.[/QUOTE] That's some serious performance! I'm a bit curious about your setup: 1) What's the size of your exponent? Are we talking LL or DC here? 2) How long does your CPU take for the same exponent? 3) Did you consider that your CPU, if it's a 4 core could do 4 test in parallel? 4) While using CUDALucas, what is the performance of your mfaktc instance? Exponent size, Bit Factored and SievePrime depth? Also when using CUDALucas the core in the CPU basically does nothing, you can run an LL test on it with little to no impact on performance. Thanks, |
Some info
[QUOTE=diamonddave;283926]That's some serious performance![/QUOTE]
I'm a bit curious about your setup: This is a QX9650 with 8GB DDR2-1066, 2 MSI GTX 580s, GA-EP45-UD3P - Boot overclock is 9.0 multiplier, 450FSB, memory set to 2.40B. Then I downclock with EasyTune6 to 290FSB - I haven't figured out why I get [U]much[/U] better performance with that and it stays a lot cooler. I have the 3 mfaktc instances all using cores 1-3, not individually assigned and CUDA assigned to core 4. All 3 mfaktc use GPU 1 and CUDA uses GPU 2. [QUOTE]1) What's the size of your exponent? Are we talking LL or DC here?[/QUOTE] The TFs vary, right now I'm running 69-72 or 70-72 with no stages on 49XXXXXX to 52XXXXXX. The LL is first time 4524XXXX. I haven't tested anything higher, I asked GPU to 72 for Lucas-Lehmer assignments. [QUOTE]2) How long does your CPU take for the same exponent?[/QUOTE] I haven't run an LL with this setup but I'll get Prime95 installed and test it to see when it would finish the same exponent. [QUOTE]3) Did you consider that your CPU, if it's a 4 core could do 4 test in parallel?[/QUOTE] Do you mean stop the mfakto and run 4 LLs? [QUOTE]4) While using CUDALucas, what is the performance of your mfaktc instance? Exponent size, Bit Factored and SievePrime depth?[/QUOTE] All three of these are running 70-72 on a 4915XXXX exponent. mfakto1: [CODE] class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 657/4620 | 1.91G | 10.812s | 2h28m | 176.80M/s | 6153 | 2.40%[/CODE] mfakto2: [CODE]class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 2316/4620 | 1.89G | 11.658s | 1h33m | 161.81M/s | 7033 | 2.11%[/CODE] mfakto3: [CODE]class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 2280/4620 | 1.89G | 12.222s | 1h38m | 154.34M/s | 7033 | 2.19%[/CODE] CUDA: [CODE]Iteration 13990000 M( 4524XXXX )C, 0x0fc83c04f4e74388, n = 4194304, CUDALucas v1 .2b (0:52 real, 5.1693 ms/iter, ETA 44:52:20)[/CODE] [QUOTE]Also when using CUDALucas the core in the CPU basically does nothing, you can run an LL test on it with little to no impact on performance. Thanks,[/QUOTE] I hadn't thought of that. When I test throughput for the LL I'll see what effect the CPU LL has on the system. Maybe I can run that too - which leads me back to the original post of what to do with all the extra CPUs. TF on GPU kinda makes a person impatient for LL on CPU. I guess I need to set it and forget it. |
On a 2600, I can get one of those LL's done in slightly less than a month, so three per month with one core for mfaktc. That fourth core of yours is doing literally nothing at the moment -- task manager should be reporting 1 or 2% usage. If you run LL on that core, memory restrictions will reduce mfakto throughput by 1 or 2% -- minor compared to the LL work you're doing. May or may not affect CUDALucas, and if it does, then it will be even less than mfakto. Some notes: CUDALucas I believe is up to version 1.4. Also, CUDALucas, in general, gets around 1/5th=1/4th of the throughput of mfakt*, measured in PrimeNet's GHz-Days metric. This is because the LL test is only sort of parallelizable, whereas TF is so-called 'embarrassingly parallel'. Thus most people run mfakt* on the GPU's, and keep the LL on the CPU simply because that's what it's most efficient at. Some people do use CUDALucas anyways because they don't care about PrimeNet GHz-Days anyways, and there's also the fact that P-1 factoring currently has no GPU equivalent and PrimeNet always has need there. If you can't wait for LL on CPU, then do P-1 factoring with that extra core. (Or TF-LMH, but P-1 would be more useful, I think.) (Edit: You could also run DC's.)
|
For information: i run LL tests on CudaLucas in 5 days for n= +/- 50.xxx.xxx exponents, on GTX 580's card.
|
[QUOTE=flashjh;283909]So, the reason for my post is that I kinda feel like I'm wasting time using CPUs to LL or TF anymore.
[/QUOTE] That is what everybody (including me) is saying since ages here around. See all the discussion in GPU272 thread, too. TF-ing on CPU does not make any sense since years, the very first GPU's were "circles-around" faster. The new Fermi's are faster for LL/DC too. Usually a DC test takes below 24 hours on the hardware you got (how high the 580's are clocked?). And a first-LL on the 48M range takes below 65 hours (like the one you gave as example). But be careful that CL is using powers-of-two FFT sizes, that is why the time is not increasing on the same fashion like for P95. One 55M (around) exponent will take double then a 48M exponent as it will need to use a double FFT size. So, you will get about 130 hours for an 55M, and the time is almost constant (increasing very little, as higher expos need more iterations, but the time per iteration is almost constant), up to 80M or so, where is doubling again (next FFT step). Currently I am doing 130 hours per LL in the higher 50M area, and 24 hours per DC in 28M-32M area, per each GPU, with a single copy of CL running on each GPU, and that will almost maximize the GPU. Unfortunately mfaktc does not seems to take all the advantage of the Fermi's, the internal memory is not used at all, and it relies on CPU for filtering, I need to put all 4/8 cores into 4 or 6 copies of mfaktc to be able to maximize the two GPU's with them, and in this case the computer can't do something else without decreasing the GPU occupation percent. To have the GPU's at max, I need to keep the computer "idle". That is why I would prefer to use CL for DC in one GPU, and two or three copies of mfaktc to TF at the LL-front on the second GPU. This is the optimum performance. At DC front you can clear one expo per each day per each GPU. This is the faster-ever method to clear the exponents. With trial-factoring at DC front you will NOT find a factor each day. Some days you can test 50 exponents for 2-3 bitlevels, or combinations of these (100-300Ghz-days/day) and find 1, 2, 3 factors, but next 5,7,15, etc days you will find none. TF is "lucky draw". DC is "sure". With DC at DC-front, you will clear one exponent per day, per GPU, no question! And (AND!) this will let your CPU free, so you still can do some P-1 testing on it. Or another DC, if you like, using P95, for a 3G processor you will get about 15-20ms per iteration using one core, so you can get one DC-out every week, or every two weeks. That is, with a Fermi and one (ONE!) CPU-core, you can clear 35 expos per month, at least. If you decide to work at DC front. If you decide for LL front, the things are a bit different, and I explained them (not only once) in the GPU-2-72 topic. |
For me at least, I can (almost) max out a my one GPU (460) with one of my four CPU cores, so mfaktc/TF makes more sense. I think it varies more with hardware setup than with actual stats and total throughput etc.. (Do you type a .. ?)
Note to flash: For reference, PrimeNet reports expected 5 days for 25M, and 19 days for 45M. |
[QUOTE=LaurV;284025]That is what everybody (including me) is saying since ages here around. See all the discussion in GPU272 thread, too. TF-ing on CPU does not make any sense since years, the very first GPU's were "circles-around" faster. The new Fermi's are faster for LL/DC too. Usually a DC test takes below 24 hours on the hardware you got (how high the 580's are clocked?). And a first-LL on the 48M range takes below 65 hours (like the one you gave as example). But be careful that CL is using powers-of-two FFT sizes, that is why the time is not increasing on the same fashion like for P95. One 55M (around) exponent will take double then a 48M exponent as it will need to use a double FFT size. So, you will get about 130 hours for an 55M, and the time is almost constant (increasing very little, as higher expos need more iterations, but the time per iteration is almost constant), up to 80M or so, where is doubling again (next FFT step).
Currently I am doing 130 hours per LL in the higher 50M area, and 24 hours per DC in 28M-32M area, per each GPU, with a single copy of CL running on each GPU, and that will almost maximize the GPU. Unfortunately mfaktc does not seems to take all the advantage of the Fermi's, the internal memory is not used at all, and it relies on CPU for filtering, I need to put all 4/8 cores into 4 or 6 copies of mfaktc to be able to maximize the two GPU's with them, and in this case the computer can't do something else without decreasing the GPU occupation percent. To have the GPU's at max, I need to keep the computer "idle". That is why I would prefer to use CL for DC in one GPU, and two or three copies of mfaktc to TF at the LL-front on the second GPU. This is the optimum performance. At DC front you can clear one expo per each day per each GPU. This is the faster-ever method to clear the exponents. With trial-factoring at DC front you will NOT find a factor each day. Some days you can test 50 exponents for 2-3 bitlevels, or combinations of these (100-300Ghz-days/day) and find 1, 2, 3 factors, but next 5,7,15, etc days you will find none. TF is "lucky draw". DC is "sure". With DC at DC-front, you will clear one exponent per day, per GPU, no question! And (AND!) this will let your CPU free, so you still can do some P-1 testing on it. Or another DC, if you like, using P95, for a 3G processor you will get about 15-20ms per iteration using one core, so you can get one DC-out every week, or every two weeks. That is, with a Fermi and one (ONE!) CPU-core, you can clear 35 expos per month, at least. If you decide to work at DC front. If you decide for LL front, the things are a bit different, and I explained them (not only once) in the GPU-2-72 topic.[/QUOTE] Thanks for the breakdown. Once my LLs finish up I'll check into using that GPU for DC. That will leave 7 instances running TF still. |
GPU Computing Guide Update to v 0.07
Hi,
here an updated version of the GPU Computing Guide. Changes: - New versions of mfaktc, mfakto and CUDALucas. Links to all binaries... - Missing CUDA 3.2/4.0 libs for CUDALucas can be downloaded, see page 2 Please check for major bugs. If valid maybe an admin could update the stickies... Happy new year, Brain [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
1 Attachment(s)
[QUOTE=Dubslow;284026]For me at least, I can (almost) max out a my one GPU (460) with one of my four CPU cores, so mfaktc/TF makes more sense. I think it varies more with hardware setup than with actual stats and total throughput etc.. (Do you type a .. ?)
Note to flash: For reference, PrimeNet reports expected 5 days for 25M, and 19 days for 45M.[/QUOTE] I won't exactly call 460 a "Fermi", it contains the first version of GF100 chip, for which the double multiply took 4 times a single multiply operation. For that, TF is more profitable. On a "real" Fermi (5x0, tesla, GF101 chip, quadro, newer stuff with double precision inside), TF become "just a bit" faster (due to clock increase), but CudaLucas become MUCH faster. [B]At DC front[/B], anyhow, it makes no sense to TF anything behind 68-69 bits, regardless of what GPU you have. Look at GPU-2-72 status, people found a DC-factor every 1.5-2.2 days, in average, and "lowlevel" bits (65-68 bits) are "end of life". For 69-70-etc bits, it will take even more time per factor. So, why should I (here "I" means "any owner of a Fermi GPU card") waste a double-time to TF at DC front, when I can directly LL-DC them? (that is LL at DC front). And get rid of one exponent EVERY day. And a bit more, having a CPU core free for P95 DC or P-1, or whatever. [B]At LL front[/B] the things are different, because [B]a factor found by TF[/B] (every 3-5 days, with an average GPU, as it seems now, or say, every 2-3 days with a high-end GPU) [B]will save TWO tests[/B] (LL's) [B]AND some P-1 testing on CPU[/B]. That is, every factor found would save about 10 days of LL work with the BEST GPU around, or two months of work with the best CPU around (one core). As long as we are still finding factors faster (more often then 10 days per factor) by TF, we should "raise" the bit level and do TF on GPUs. But we should do LL tests with CudaLucas for all "optimum FFT lengths", regardless if they are on LL-front or DC-front. People don't really get it how CudaLucas works, and why the time per test is almost constant for a very long range of exponents, then it is instantly doubling for the next exponent. CL is using FFT which is powers of 2 in length, contrary to P95 which has a finer "granulation" of FFT. To put it in a graphic, it would look like the attached picture. That is, CL is "not optimum" in the purple areas, it could use a smaller FFT and get the test done faster. Unfortunately we are now with the LL-front exactly on such a "purple" area. (I did not put any numbers on graphic, in fact I deleted the numbers, this is done on purpose, as the numbers will vary depending on hardware). The times on P95 are also in stairs, but with a better granulation, as P95 "adapts" the FFT size to the exponent size much better then CL does. But CL multiply them in parallel, getting a better time per iteration, and P95 not. For the same FFT size, the time per iteration is the same (theoretical), regardless of the exponent. The total time increases a little with the exponent increasing, because more iterations will be necessary for a bigger exponent. That is why the stairs are not horizontal. They become "optimum" at their "ends" (marked green on the CL graphic), both for P95 and CL. |
[QUOTE=LaurV;284040]
[B]At DC front[/B], anyhow, it makes no sense to TF anything behind 68-69 bits, regardless of what GPU you have.[/QUOTE] I agree here. My only point is that is makes more sense for me to run mfaktc than CUDALucas, regardless of what assignments people are doing or should be doing etc. I can get (almost) full GPU utilization with only one of four cores with mfaktc. Therefore I run mfaktc. This decision has nothing to do with GIMPS/PrimeNet assignments/status. |
[QUOTE=LaurV;284040]People don't really get it how CudaLucas works, and why the time per test is almost constant for a very long range of exponents, then it is instantly doubling for the next exponent. CL is using FFT which is powers of 2 in length, contrary to P95 which has a finer "granulation" of FFT.[/QUOTE]
This does not apply for CUDALucas >= 1.4 any more. msft implemented non-power-of-2-FFTs. But this version is a only aged a few days and being tested. |
[QUOTE=Brain;284038]Hi,
here an updated version of the GPU Computing Guide. Changes: - New versions of mfaktc, mfakto and CUDALucas. Links to all binaries... - Missing CUDA 3.2/4.0 libs for CUDALucas can be downloaded, see page 2 Please check for major bugs. If valid maybe an admin could update the stickies... Happy new year, Brain[/QUOTE] The most recent mfakto readme says something to the effect of "With 11.07+ in Win, you do not need the SDK. Same for 11.11+ in Linux." I would double check to be sure though. What about CUDALucas 1.3? Is that of use? Also, I would consider removing "MOST NEEDED GIMPS WORK TYPE" from CUDALucas. Because all TF <60M has been moved to GPU only, one could make a decent argument that we're short on TF. GPU272 is barely keeping up with the 45M-55M work, much less the current wavefront. (Obviously what I say is not final, but I think it's worth consideration.) Suggestion: Move the link for LESS_CLASSES mfaktc to the remarks section, next to where you talk about efficiency. (Maybe specifically mention LMH?) |
V1.41
CudaLucas v1.41 is running pretty well !:tu:
9.3 ms/iter for 54M exponent on GTX-580 card. Thanks a lot. |
[QUOTE=f11ksx;284207]CudaLucas v1.41 is running pretty well !:tu:
9.3 ms/iter for 54M exponent on GTX-580 card. Thanks a lot.[/QUOTE] Maybe you could/should join the discussion in the [URL="http://www.mersenneforum.org/showthread.php?t=12576"]CUDALucas thread[/URL]. Could you test and reply there that CUDALucas >=1.4 now uses CPU resources if -c flag is set? [QUOTE=Dubslow;284102] What about CUDALucas 1.3? Is that of use?[/QUOTE] There are two 1.3 versions: One by Ethan (EO) which is older (a tuned 1.2b) but laggy for me and another 1.3 version by msft which has additional timing output. As there is a 1.4 (by msft) I'd like to skip 1.3 for confusion reasons... |
GPU Computing Guide Update to v 0.07a
Changes:
- CUDALucas 1.4.2 - mfakto requirements [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
GPU Computing Guide Update to v 0.08
Changes:
- CUDALucas 1.48 - mfaktc for CUDA 4.1 Now, we should really update the sticky post #1 attachments. Otherwise, I'd prefer no such file to having outdated files... [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
[QUOTE=Brain;287640]Changes:
- CUDALucas 1.48 - mfaktc for CUDA 4.1 Now, we should really update the sticky post #1 attachments. Otherwise, I'd prefer no such file to having outdated files...[/QUOTE] Thank you Brain. I was wodering if the "Restrictions" on FFT size (CUDALucas 1.48) still hold, as it now supports non power of 2 FFt sizes. Another question to the forum readers: when you say "[COLOR="Red"]Compilable with CUDA Toolkit 3.1[/COLOR]" do you mean "the source code compiles, but won't work with CUDA Toolkit < 3.1"? Luigi |
[QUOTE=ET_;287651]Thank you Brain.
I was wodering if the "Restrictions" on FFT size (CUDALucas 1.48) still hold, as it now supports non power of 2 FFt sizes.[/QUOTE] I have no reliable values as I never tested 8M FFTs and above. My GTX 560 Ti 1GB will probably be memory limited. Understand the FFT borders as guidelines, I took them from one of msft's posts. [QUOTE=ET_;287651] Another question to the forum readers: when you say "[COLOR=Red]Compilable with CUDA Toolkit 3.1[/COLOR]" do you mean "the source code compiles, but won't work with CUDA Toolkit < 3.1"? Luigi[/QUOTE] Oh, it was just my reaction to today's msft response to CUDA-Compatibility-Mode which works only in CUDA 3.1. I only compiled it for CUDA 4.0 and 4.1. |
[QUOTE=Brain;287654]I have no reliable values as I never tested 8M FFTs and above. My GTX 560 Ti 1GB will probably be memory limited. Understand the FFT borders as guidelines, I took them from one of msft's posts.
Oh, it was just my reaction to today's msft response to CUDA-Compatibility-Mode which works only in CUDA 3.1. I only compiled it for CUDA 4.0 and 4.1.[/QUOTE] Shoichiro just answered my question. CUDALucas v1.48 is worthless with my CUDA Toolkit 3.0 and CC 1.3, so I will stick on v1.3 for now. Luigi |
Brain: For mfaktc you can add/modify this:
"Compilable with: CUDA Toolkit 4.0, 4.1": you can add CUDA Toolkit 3.x to this list, too. I've provided executables with CUDA 4.0 and 4.1 but the source should work with 3.x, too. "GIMPS score estim.: roughly 75 GHz days/day on GTX 560 Ti & 1 CPU core" you could add that this is limited by CPU and the GPU is sitting idle half of the time. Oliver |
[QUOTE=TheJudger;287660]Brain: For mfaktc you can add/modify this:
"Compilable with: CUDA Toolkit 4.0, 4.1": you can add CUDA Toolkit 3.x to this list, too. I've provided executables with CUDA 4.0 and 4.1 but the source should work with 3.x, too. "GIMPS score estim.: roughly 75 GHz days/day on GTX 560 Ti & 1 CPU core" you could add that this is limited by CPU and the GPU is sitting idle half of the time. Oliver[/QUOTE] Oliver, as GTX 600 will come out soon, with a 3x speedup, you'd better think about a way to use some spare GPU processors for sieving... :smile: Luigi |
[QUOTE=ET_;287661]Oliver, as GTX 600 will come out soon, with a 3x speedup, you'd better think about a way to use some spare GPU processors for sieving... :smile:
Luigi[/QUOTE] Or keep it as it is and force GTX 6xx users to run LL on GPU :smile: Oliver |
GPU Computing Guide Update to v 0.08a
[QUOTE=TheJudger;287660]
"GIMPS score estim.: roughly 75 GHz days/day on GTX 560 Ti & 1 CPU core" you could add that this is limited by CPU and the GPU is sitting idle half of the time. Oliver[/QUOTE] My setting: I saturate my CPU with Prime95 and the GPU with CUDALucas. Additionally, I run mfaktc on top. If I understand you correctly, given no other GPU load, you generally recommend running 2 instances? CPU-GPU-tick-tock? I know that 1 single instance is not saturating a modern GPU. I attached the CUDA 3.x info: [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
[QUOTE=TheJudger;287660]Brain: For mfaktc you can add/modify this:
"Compilable with: CUDA Toolkit 4.0, 4.1": you can add CUDA Toolkit 3.x to this list, too. I've provided executables with CUDA 4.0 and 4.1 but the source should work with 3.x, too. "GIMPS score estim.: roughly 75 GHz days/day on GTX 560 Ti & 1 CPU core" you could add that this is limited by CPU and the GPU is sitting idle half of the time. Oliver[/QUOTE] That depends on the speed of the CPU, I only use 1 core on my 560 which is powered by a 4.3 GHz 2500K and it is around 140-150 GHzD/D and 92-95% GPU load. |
Great pdf doc with everything in one place, one correction though:
CUDALucas does not support multi-gpu in CUDA terminology. You may be able to run N different instances on N different GPUs, but you may not run one instance on N GPUs - that's exactly what multi-gpu means. |
GPU Computing Guide Update to v0.09
[QUOTE=Karl M Johnson;288067]Great pdf doc with everything in one place, one correction though:
CUDALucas does not support multi-gpu in CUDA terminology. You may be able to run N different instances on N different GPUs, but you may not run one instance on N GPUs - that's exactly what multi-gpu means.[/QUOTE] Integrated. Changes: - CUDALucas 1.48 max exponent - Multi GPU hint - Minor structure and text changes [URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
GPU Computing Guide Update to v0.10
CUDALucas 1.64 integrated
[URL="http://www.mersenneforum.org/attachments/pdfs/GIMPS_GPU_Computing_Cheat_Sheet.pdf"]GIMPS GPU Computing Cheat Sheet (pdf)[/URL] |
Temps
I live in a fairly cool part of the country and I started GPU TF and LL in the winter. Never had a cooling issue, until today.
Temps got to 61F degrees outside and my system shutdown twice. It's not the CPU causing all the heat, it's the dual 580s I have the cover off, big fan cooling it now, but that isn't going to cut it in the summer. My question is, what do you all do to keep your systems cool? I'm not even overclocking... Thanks |
I'd say either your cards are in a case with absolutely no airflow, or their heatsinks/fans are really really bad. The 460 I have has good enough cooling (stock cooling from mfctr) that even at full load in 80F rooms, it rarely gets above 65C.
I can imagine they cause a lot of heat (their power draw is massive) but either the fans are broken, or after the fans dump the heat it goes nowhere. What temperatures did you see them get up to? |
[QUOTE=flashjh;292057]I have the cover off, big fan cooling it now, but that isn't going to cut it in the summer.
My question is, what do you all do to keep your systems cool? I'm not even overclocking... Thanks[/QUOTE] First, I'm not dealing with that kind of firepower. But the first question is, what, and where are the intake and exhaust fans on your case? If you can get a look at them, get the brand and model. Find out their rated air flow. It is common for there to be an intake fan at the bottom front of the case (assuming a tower.) Since adapter cards are in the lower part of the case, they may be in a pretty direct line with that fan. One hopes that it is at least a 120mm. In general, cases tend to have a front-to-back air flow. If you can ramp that up, it can help a good bit. One part of that is to increase the intake CFM. Assuming a 120mm fan, see if a 38mm thick fan will fit there, and get the highest RPM you can live with on a noise basis. Here is an extreme example (not in stock). [url]http://www.coolerguys.com/840556090687.html[/url] and another that is in stock: [url]http://www.coolerguys.com/840556021698.html[/url] Note: I could not live with these monsters. 60++ db. But they really move air. Then you need to think about getting air out. Preferably, you'd be pulling it out from the immediate vicinity of the video cards. Hence, slot exhaust coolers like this: [url]http://www.coolerguys.com/840556090755.html[/url] The question is, can you fit something like this in when you already have two, two (or 3?!?) slot video cards in there. If not, side cover off with a 12" High Velocity floor fan blasting right in between the cards may actually be your best bet. Or, you can cut a hole in the side panel and mount a 120mm or 140mm fan there. This approach might need some experimentation to find whether blowing in or out works better. At some point, having two major heat producers down in a corner of the case just points out the limitations of standard case ventilation. Visualize air flow, and be aware of dead air heat pockets. Xyzzy has referred to the Very Loud boxes he runs. I'm sure he can offer some suggestions. |
One last thing about case airflow, if your motherboard is a standard not-fancy, then the PCI slots are such that your cards are almost touching, separated by a few millimeters tops; then one card is dumping the entirety of its heat into the other. You'll have to ask nucleon/Xyzzy how they solve that problem.
|
Thanks for all your replies. Though I have been building systems for many years, I have never had two fusion generators in my case before ;)
As for all the suggestions, I have decent air flow and the cards are not too close together, but I think the stock coolers are just really bad. [QUOTE=Dubslow;292059] at full load in 80F rooms, it rarely gets above 65C. What temperatures did you see them get up to?[/QUOTE] How can the cards be lower than room temperature? Mine were around 70C when I checked, but the system didn't shutdown when I was around to see. If the fan suggestions work, I should be ok. Since the system does nothing but factoring, P-1 and CL, I could also take it all out of the case for better airflow. |
Last I saw, nucleon was doing it with open cases, or no case at all, and half a dozen household fans carefully adjusted to blow on the hot parts.
Xyzzy showed pictures which seemed to depict closed cases. But he did refer to them as very loud. This argues for something like the horrific Delta fan I linked to above. If you have to share a room with such a machine, the side-panel-off-with external fan might be more tolerable. A 120mm at 4800 RPM would make a nasty snarly sound. The loudest thing I have is a 92mm that runs flat-out just under 4000, and I keep it throttled back. Also, be aware that the current draw of that Delta is too much to plug into a motherboard's fan headers. Either you run it full tilt, or you get a fan controller that can handle the current. Maybe parking the open case where a window unit A/C can blow into it is the answer for keeping things quieter. EDIT: What brand and model are these 580's? I am curious about their cooler arrangements. They pretty much have to be heat pipe coolers. It is possible that they are not getting the best heat transfer from the GPU's. Short of going to water cooled, it might be possible to improve things by redoing the heatsink compound. There are GPU water cooling rigs out there. They solve the problem of heat accumulation by putting the heat exchanger outside the case. |
[QUOTE=flashjh;292062]
How can the cards be lower than room temperature? [/QUOTE] Theoretically they can, but here is not the case. Maybe you did not notice the "F" after the 80. |
[QUOTE=LaurV;292066]Theoretically they can, but here is not the case. Maybe you did not notice the "F" after the 80.[/QUOTE]
Yes, card is closer to 150F. Conventionally, all computer hardware temperatures are reported in Celsius, for at least a few reasons I can see: Depending on the particular hardware, 90-100 is the hard cutoff for overheating. It also has less digits than it would in Farenheit. (Around 85 is where I might push my 2600, and AMDs are a bit less tolerant of heat; nVidias should have similar tolerances to Intel.) |
[QUOTE=LaurV;292066]Theoretically they can, but here is not the case. Maybe you did not notice the "F" after the 80.[/QUOTE]
Yup, missed that ;) |
Also, don't forget the obvious. Not sure how dusty your computing environment is, but hit the heat sinks & fans with some compressed air and see what comes out.
|
[QUOTE=Dubslow;292068]Yes, card is closer to 150F. Conventionally, all computer hardware temperatures are reported in Celsius, for at least a few reasons I can see: Depending on the particular hardware, 90-100 is the hard cutoff for overheating. It also has less digits than it would in Farenheit. (Around 85 is where I might push my 2600, and AMDs are a bit less tolerant of heat; nVidias should have similar tolerances to Intel.)[/QUOTE]
I have read that 55C is the "heat wall" for Thuban-based Phenom II's. Around this point they are said to get unstable, at least, and you are flirting with damage. I don't let mine get over 51. |
[QUOTE=Dubslow;292061]One last thing about case airflow, if your motherboard is a standard not-fancy, then the PCI slots are such that your cards are almost touching, separated by a few millimeters tops; then one card is dumping the entirety of its heat into the other. You'll have to ask nucleon/Xyzzy how they solve that problem.[/QUOTE]
I started using one of these to move the middle card of my 3-GPU system for better airflow: [URL]http://www.amazon.com/gp/product/B0058UVVX2/[/URL] I might use another, to make it a 4-GPU system (it's a 4-slot mobo, in a too-small tower case) [URL="http://www.amazon.com/gp/product/B0058UVVX2/"][/URL] |
[QUOTE=kladner;292333]I have read that 55C is the "heat wall" for Thuban-based Phenom II's. Around this point they are said to get unstable, at least, and you are flirting with damage. I don't let mine get over 51.[/QUOTE]
When I was using a 1055T (hex core 2.8 GHz) it ran up to 65C on stock cooler, and never had stability issues. Also note, regarding flash, that it was in fact a bad power supply causing his issues, which are now gone. |
[QUOTE=Dubslow;292350]When I was using a 1055T (hex core 2.8 GHz) it ran up to 65C on stock cooler, and never had stability issues.
Also note, regarding flash, that it was in fact a bad power supply causing his issues, which are now gone.[/QUOTE] Thanks for all the help. Turns out my power supply was giving up; the extra heat caused the fans to go faster and the system was shutting down. I replaced the PS but I'm also upgrading the CPU cooler. I would really like to get the 580s cooled off, but without moving to liquid, I don't think there's much I can do. |
[QUOTE=aaronhaviland;292335]I started using one of these to move the middle card of my 3-GPU system for better airflow: [URL]http://www.amazon.com/gp/product/B0058UVVX2/[/URL]
I might use another, to make it a 4-GPU system (it's a 4-slot mobo, in a too-small tower case) [URL="http://www.amazon.com/gp/product/B0058UVVX2/"][/URL][/QUOTE] Can you post a picture so I can see how it works with final setup? What PS do you use on that system? |
Whoa dang, the PDF is way out of date, we're up to CuLu 2.0 now, not 1.4.2 ;)
(Also, GMP-ECM has been ported at least sort of to the GPU, that should be removed.) |
GPU Computing Guide Update
I will probably do this soon. It's kind of useless to update the pdf when CL changes versions so quickly as it did the last weeks.
I monitor the CL thread and will release the update when I think that CL2 is ready for production work. I guess I will be convinced in the next few days. Additionally, I dislike old versions wandering around in the stickies. Two options: Stop publishing the guide in the sticky and use a new thread or get mod rights. I prefer the former. I am not convinced that CUDA-ECM should be integrated yet as it is expert and development level at the moment. Thoughts? Is it useful at all? |
I wasn't saying it should be integrated, just remove the sentence saying it needs to be done.
|
[QUOTE=Brain;295693]I am not convinced that CUDA-ECM should be integrated yet as it is expert and development level at the moment.
Thoughts? Is it useful at all?[/QUOTE]IMO, it is very useful. I've already found three non-trivial factors with it and another day-long run is about 80% completed at the moment. Paul |
Yeah, I've found some small factors as well, and I see that the ECMNET records page ( [url]http://www.loria.fr/~zimmerma/records/ecmnet.html[/url] ) shows several factors whose stage 1 was performed by GPU-ECM (A = ... instead of sigma, and Cyril Bouvier being the discoverer).
My low-end GT 540M running GPU-ECM beta beats the associated Core i7-2670QM running GMP-ECM by a wide margin for some numbers, and other persons' tests show that the ratio is even more damning for CPUs if high-end GT(X) 4xx and 5xx are used. |
[QUOTE=debrouxl;295793]Yeah, I've found some small factors as well, and I see that the ECMNET records page ( [url]http://www.loria.fr/~zimmerma/records/ecmnet.html[/url] ) shows several factors whose stage 1 was performed by GPU-ECM (A = ... instead of sigma, and Cyril Bouvier being the discoverer[/QUOTE]That might be the case but I'm far from certain. Try running gmp-ecm with the -batch parameter to see what I mean.
Paul |
[QUOTE=xilman;295790]IMO, it is very useful. I've already found three non-trivial factors with it and another day-long run is about 80% completed at the moment.
Paul[/QUOTE] My fault: The "Is it useful" was referring to the pdf guide. I will mention GMP-ECM thread in the next version. Just waiting for more news from mfakto - new version likely to come soon. |
This is probably a stupid question, but would running multiple LL tests on one GPU yield a higher throughput?
For example, the GeForce GTX 570 has 480 CUDA cores, and I imagine that a lot of threads would get in each other's way. In this case, would it be better to run, say, ten parallel tests using 48 cores each, instead of doing one test with all 480 cores? I don't think memory bandwidth would be an issue as GPUs normally have much higher bandwidths than CPUs. |
| All times are UTC. The time now is 14:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.