mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-10-11, 00:27   #12
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Quote:
Originally Posted by kjaget View Post
This might be telling you that there's a benefit to running a 3rd instance of mfaktc to keep the card fed with data?
Indeed, Mr chuck could do this...remember I said the GPU was distinctly outrunning the CPU...and he would extract a bit more throughput from the GPU, but it would be at the expense of whatever the third core would otherwise do, and probably raise the temperature some.

Better balance might be to run a copy of CudaLucas, which uses very little CPU, instead, or have the #3 core do P-1. Remember that only 10-20% of exponents will be knocked out by the new capabilities of TF, leaving 80-90% still needing those boring old LL tests.

But the numbers chuck is throwing out are no slouch...even if they are not at the exact peak of optimum GIMPS TF performance.
Christenson is offline   Reply With Quote
Old 2011-10-11, 01:42   #13
ckdo
 
ckdo's Avatar
 
Dec 2007
Cleves, Germany

2×5×53 Posts
Default

I'm quite happy with his throughput ...
ckdo is offline   Reply With Quote
Old 2011-10-11, 02:07   #14
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

92810 Posts
Default

Quote:
Originally Posted by Christenson View Post
Indeed, Mr chuck could do this...remember I said the GPU was distinctly outrunning the CPU...and he would extract a bit more throughput from the GPU, but it would be at the expense of whatever the third core would otherwise do, and probably raise the temperature some.

Better balance might be to run a copy of CudaLucas, which uses very little CPU, instead, or have the #3 core do P-1. Remember that only 10-20% of exponents will be knocked out by the new capabilities of TF, leaving 80-90% still needing those boring old LL tests.

But the numbers chuck is throwing out are no slouch...even if they are not at the exact peak of optimum GIMPS TF performance.
On the GPU I am doing TFs in the 26M range 67—>68 for ckdo and in the 600M range from 64—>67. The lower range takes about 15 min per check, and the upper one about 30 sec. I am using the MORE_CLASSES disabled version for the higher range which has helped a lot, cutting the time from 45 sec to 30 sec BUT increasing the GPU load and temperature. It runs 24 hours/day so that's all the load I am going to put on it.

I'm not the smartest one in the room here — I just have money to spend on a fast computer and am retired so have unlimited time. I never could figure out how to work CudaLucas. As I remember it doesn't take its work from a worktodo file and I wouldn't know how to set up a batch file. mfaktc was as complicated a thing as I could manage...

The CPU is a core-i7 970 OCd to 3675 MHz, six cores with hyperthreading turned off. Two cores do mfaktc with the GPU, two cores LL, one core DC and one core P-1
Chuck is offline   Reply With Quote
Old 2011-10-11, 03:10   #15
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

I can make you smarter, just keep reading what I tell you:

http://mersenneforum.org/showthread.php?t=15545, towards the end of the thread, gives a quick intro to batch files, and a sample command line for CUDALucas. On Windows, there's not much to them, and you don't need much either -- it got pointed out to me that the simplest batch file (just one line) is a command line associated with one of those "shortcuts" on the desktop. Wordpad is a fine editor for Windows batch files, so is notepad.

Then, get you a copy of CUDALucas. There may be some nice pointers in the "putting it all together" thread. If not, you'll need to wade backwards through the CUDALucas thread, but this is admittedly painful.

Run said copy, and ask more questions here or somewhere, as we are WAAAYYY off-topic.

Oh, and don't run Microsoft Internet Explorer unless you are absolutely required to. That's the first rule of anti-virus.
Christenson is offline   Reply With Quote
Old 2011-10-11, 06:03   #16
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

5·223 Posts
Default

Quote:
Originally Posted by Dubslow View Post
What is avg. rate then if it doesn't really matter? I notice the same thing, where the speed up is in avg. rate. (the auto adjust doesn't up it from 5000 if that matters)
Right. I was getting about 107M/sec with SievePrimes=5000, and only 70-80M/sec with SievePrimes=25000.
NBtarheel_33 is offline   Reply With Quote
Old 2011-10-11, 06:05   #17
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

5×223 Posts
Default

Quote:
Originally Posted by Chuck View Post
No, it's running at 85% GPU load.

Chuck
Where can I get a hold of this figure? Does it show up in Task Manager on Windows, or top on Linux?
NBtarheel_33 is offline   Reply With Quote
Old 2011-10-11, 06:11   #18
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

111510 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,



Unless there is an undiscovered bug the value of SievePrimes does not cause missing factors.
Significant speedup... how much faster is it? Keep in mind that the avg. rate doesn't really matters. What matters is the time per class/assignment.

Perhaps you want to dedicate another core to mfaktc. SievePrimes=5000 usually tells you that not enough CPU resources are available. Just start another copy of mfaktc in a separate directory working on different exponents.

Oliver
How much CPU should typically be allocated to each instance of mfaktc? I was working with an 8-core Nehalem Xeon system @ 2.66 GHz with 2 Fermi GPUs. At first, I had Prime95 TFing on all 8 cores, as well as 2 instances of mfaktc going on the GPUs. I noticed that three of the CPU cores were slower than the others, so I stopped their workers, and ran 5 TF CPU cores, and two instances of mfaktc. Things picked up nicely at this point. I also experimented with a third instance of mfaktc; it slowed the GPUs down into the 65-70M/sec range.

If we have X GPUs, should we run exactly X copies of mfaktc, or does it make more efficient use of the GPUs to run more?
NBtarheel_33 is offline   Reply With Quote
Old 2011-10-11, 13:22   #19
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
How much CPU should typically be allocated to each instance of mfaktc? I was working with an 8-core Nehalem Xeon system @ 2.66 GHz with 2 Fermi GPUs. At first, I had Prime95 TFing on all 8 cores, as well as 2 instances of mfaktc going on the GPUs. I noticed that three of the CPU cores were slower than the others, so I stopped their workers, and ran 5 TF CPU cores, and two instances of mfaktc. Things picked up nicely at this point. I also experimented with a third instance of mfaktc; it slowed the GPUs down into the 65-70M/sec range.

If we have X GPUs, should we run exactly X copies of mfaktc, or does it make more efficient use of the GPUs to run more?
Nice system...but I suspect your GPU cards are running circles around your CPU cores at TF....you will need to run *at least* one core of mfaktc to keep one GPU busy with sieved output....maybe two or three, see chuck's discussion of two cores not quite keeping up with the sieving on his GPU.
Christenson is offline   Reply With Quote
Old 2011-10-11, 13:25   #20
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

25×29 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
Where can I get a hold of this figure? Does it show up in Task Manager on Windows, or top on Linux?
I am using the EVGA precision utility which allows for overclocking, but also displays GPU usage statistics. You could also use MSI Afterburner, a different skin but performs the same functions.

Chuck
Chuck is offline   Reply With Quote
Old 2011-10-11, 14:34   #21
kjaget
 
kjaget's Avatar
 
Jun 2005

12910 Posts
Default

Quote:
Originally Posted by Chuck View Post
No, it's running at 85% GPU load.

Chuck
Which tells me that adding a 3rd instance might bring that up to 100% load. Right now 2 CPU cores can't keep up so the GPU is idle for 15% of the time. You'd just have to balance whether 100% of one CPU core is worth trading for an extra 15% of a GPU core. And as you mentioned, that includes not only performance trade-offs, but heat, noise and so on.

You might also consider doing a higher bit-depth for the 600M range. It's possible that there's so much overhead in running such a small bitlevel on a fast card that adding 1 more bit might not slow down the process that much.

I have no idea to the answers to any of these questions, but it might be worth a try experimenting.
kjaget is offline   Reply With Quote
Old 2011-10-12, 00:18   #22
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

25×29 Posts
Default

Quote:
Originally Posted by kjaget View Post
Which tells me that adding a 3rd instance might bring that up to 100% load. Right now 2 CPU cores can't keep up so the GPU is idle for 15% of the time. You'd just have to balance whether 100% of one CPU core is worth trading for an extra 15% of a GPU core. And as you mentioned, that includes not only performance trade-offs, but heat, noise and so on.

You might also consider doing a higher bit-depth for the 600M range. It's possible that there's so much overhead in running such a small bitlevel on a fast card that adding 1 more bit might not slow down the process that much.

I have no idea to the answers to any of these questions, but it might be worth a try experimenting.
This is good enough for me. Eventually I will get to higher bit levels in the 600M range.
Chuck is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
SievePrimes is too big for the current assignment mattmill30 GPU to 72 21 2017-02-01 00:20
4000 < k < 5000 otutusaus Riesel Prime Data Collecting (k*2^n-1) 5 2012-03-07 20:01
5000 < k < 6000 justinsane Riesel Prime Data Collecting (k*2^n-1) 26 2010-12-31 12:27
Factoring on a 5000+ jasong Hardware 3 2006-06-17 08:50
Top-5000 List edorajh Riesel Prime Search 17 2006-03-28 21:57

All times are UTC. The time now is 15:10.


Fri Jul 7 15:10:36 UTC 2023 up 323 days, 12:39, 0 users, load averages: 0.78, 1.04, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔