mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2014-12-17, 18:36   #1
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22·733 Posts
Default Making things work right?

Quote:
Originally Posted by Xyzzy View Post
FWIW, we only submit results on Sundays.

I've noticed your results tend to come in chunks. Is there a reason why you only submit once a week? And not use a tool that submits more frequently?
Mark Rose is offline   Reply With Quote
Old 2014-12-18, 04:20   #2
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3×2,741 Posts
Default

We moved your question to a new thread because we really have a weird problem to solve and your question is directly related to that problem.

Perhaps the community here can help?

We have two boxes that run trial factoring on GPU cards. We do not run anything else on the boxes because we are trying to maximize the ratio of "work done" to the amount of energy (electricity) consumed. Firstly, because we want to be as "green" as possible and secondly because we are trying to keep our electric bill at a reasonable amount. (It costs us to run the cards and in the summer it costs us again to cool the house.) Even though the CPUs in each box are fairly modern, no worktype delivers anything close to the "value" that we get from the GPU cards.

Between the two boxes there are three GTX980 cards.

OOTB, the default (BIOS?) GTX980 power/cooling profile is heavily weighted towards keeping the fan speeds low. We assume this is for noise reasons. If we run the cards "as is" each card will go to about 81°C, the fans will barely kick on and we will get ~560GHz-d/d per card. At no time will the cards draw more than 100% TDP. In fact, they will draw around 65-75% TDP each.

We really dislike managing Windows, but there are tools in Windows that allow us to do three interesting things. The first is we can command the card fans to run at maximum speed. (The boxes are in a vacant room so the noise is not bad.) The second is we can command the cards to use up to 125% TDP. Lastly, we can command the cards to never allow the temperature to go above 80°C. We do all of this with EVGA's Precision X program. As a result of these three tweaks we get ~625GHz-d/d per card. We think this is a significant improvement although we are running the cards full tilt. Note that we are not "overclocking" the cards. We are just enabling the cards to run as fast as possible by tweaking parameters that are "normal". In fact, the cards are "pre-overclocked" from the factory. We are not fans of overclocking in general but these cards have been put through a binning process to weed out the "weaker" cards. It takes a lot more energy per card to get this bump in speed. Between three cards, this bump is equal to ~195GHz-d/d overall. To be safe, we have tested the boxes by putting a garbage bag over them, to ensure that the GPUs will throttle back and stay at a safe temperature in the event "something bad" happens. In every case, whether it is the default profile or our customized profile, the cards respond to the added thermal load and throttle significantly.

Now, the weird thing is we are using an older copy of Windows 7. We are using it in "evaluation" mode, which, with a few registry tweaks, we can extend from the default 30 days to 360 days. We are torn a little about how ethical this is and right now we are just ignoring that issue which is probably not a healthy or proper attitude. We think we could connect the two boxes to the Internet without causing any trouble, but then we have to manage updates, drivers and all sorts of stuff like that. We have an install routine that takes about an hour that installs all of the needed drivers and programs without the need for Internet access. That includes installing Windows itself! The downside is we have to sneakernet our results and assignments via USB key. We have been doing so every Sunday. From a reporting point of view this is not optimal but it takes only a few minutes to do so it isn't a chore or anything.

We are very familiar and comfortable with Linux.

To switch to a Linux solution we see a big obstacle, which is we know of no way to adjust fan speeds, power targets and temperature targets in Linux. As a result, our overall daily output would drop from ~1875GHz-d/d to 1680GHz-d/d.

If we can accept this 10% drop in productivity, we would be ready to implement a Linux solution.

So, to begin this discussion, having read all of that stuff above, is switching to Linux worth the loss in productivity?

Future questions, if the decision is made to run Linux:

- Can we run the boxes from the onboard GPU and use the GTX cards solely as CUDA devices? (Whatever we install will be just a text-based console installation.)
- How will we exchange results and get new assignments from PrimeNet via the GPUto72 interface?

Xyzzy is offline   Reply With Quote
Old 2014-12-18, 04:51   #3
axn
 
axn's Avatar
 
Jun 2003

10011101110112 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
To switch to a Linux solution we see a big obstacle, which is we know of no way to adjust fan speeds, power targets and temperature targets in Linux. As a result, our overall daily output would drop from ~1875GHz-d/d to 1680GHz-d/d.

If we can accept this 10% drop in productivity, we would be ready to implement a Linux solution.

So, to begin this discussion, having read all of that stuff above, is switching to Linux worth the loss in productivity?
What are the power draw numbers for the two scenarios? I would think that the 10% drop would be acceptable if there is a significant difference in power draw.
axn is offline   Reply With Quote
Old 2014-12-18, 16:38   #4
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2×1,039 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
- Can we run the boxes from the onboard GPU and use the GTX cards solely as CUDA devices? (Whatever we install will be just a text-based console installation.)
That should be possible, I'm doing it with my box that has a GTX 560 Ti in it. I had to update BIOS settings to make it always enable the onboard graphics and use them as the primary display device.

You probably need to blacklist the nouveau graphics driver to make Linux use the Novell drivers. See the thread where I described what I had to do to make it work (http://mersenneforum.org/showthread....ouveau&page=22 post 234). I'm using it for msieve polynomial selection and ECM stage 1.

Chris
chris2be8 is offline   Reply With Quote
Old 2014-12-18, 18:11   #5
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3·2,741 Posts
Default

Quote:
Originally Posted by axn View Post
What are the power draw numbers for the two scenarios? I would think that the 10% drop would be acceptable if there is a significant difference in power draw.
We will get power numbers today.

We are also searching to see if there is a way to flash the fan speed to the card BIOS.
Xyzzy is offline   Reply With Quote
Old 2014-12-18, 19:19   #6
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

13B16 Posts
Default

The Linux Nvidia drivers have some limited capability to adjust fan speed and clocks, but if that is not enough, overclock.net has a wealth of information on bios modifications.
owftheevil is offline   Reply With Quote
Old 2014-12-18, 19:58   #7
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

822310 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We will get power numbers today.
We have attached the results.

Attached Thumbnails
Click image for larger version

Name:	tf.png
Views:	174
Size:	14.4 KB
ID:	12105  
Xyzzy is offline   Reply With Quote
Old 2014-12-18, 20:03   #8
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3·2,741 Posts
Default

Quote:
Originally Posted by owftheevil View Post
The Linux Nvidia drivers have some limited capability to adjust fan speed and clocks, but if that is not enough, overclock.net has a wealth of information on bios modifications.
We seem to remember being able to adjust the fan speed via a GUI tool, but it was not persistent across reboots. If we did load up Linux it would be in CLI mode, so that we could remotely manage the boxes via ssh. (We are not interested in VNC or anything like that.)

We will look into the BIOS flashing deal. We have not yet analyzed the power numbers we attached earlier, but at a glance it looks like more speed comes at a greater cost per GHz-d. We also assume flashing the BIOS would void any warranty claims.
Xyzzy is offline   Reply With Quote
Old 2014-12-18, 20:43   #9
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3·2,741 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We have not yet analyzed the power numbers we attached earlier, but at a glance it looks like more speed comes at a greater cost per GHz-d.
We have attached the results.

Attached Thumbnails
Click image for larger version

Name:	tf.png
Views:	168
Size:	9.8 KB
ID:	12106  
Xyzzy is offline   Reply With Quote
Old 2014-12-18, 20:46   #10
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22×733 Posts
Default

I don't have time to write an in-depth response at the moment. Assuming you're using Ubuntu/Kubuntu with the updated nvidia-331 drivers. You'll probably need an even newer version to work with the GTX 970/980s.

1. You can use the on-board video as the primary display device. I'm doing that on the box I'm typing this on. You have to prevent the nvidia driver from loading on boot. The easiest way to accomplish this is to install the bumblebee package. Then to start mfaktc, I use this script for two cards:

mf-start
Code:
#!/bin/bash

mf-stop

if [ "$(lsmod | egrep -c '^nvidia')" = "0" ] ; then
        sudo modprobe nvidia_331
# also need a variant of the following with later drivers
       sudo modprobe nvidia_331-uvm
fi

num=$(lspci | grep NVIDIA | grep VGA | wc -l)

num=$(expr $num - 1)

for i in $(seq 0 $num) ; do
        if [ ! -e /dev/nvidia$i ] ; then
                sudo mknod -m 666 /dev/nvidia$i c 195 $i;
        fi
done

if [ ! -e /dev/nvidiactl ] ; then
        sudo mknod -m 666 /dev/nvidiactl c 195 255
fi

cd ~/mfaktc0 && screen -d -m -S mf0 ./mfaktc.exe -d 0
cd ~/mfaktc1 && screen -d -m -S mf1 ./mfaktc.exe -d 1
mf-stop
Code:
#!/bin/bash

killall mfaktc.exe 2> /dev/null

while pgrep -c mfaktc.exe > /dev/null ; do sleep 0.5 ; done
2. You can control fan speed using the nvidia-settings utility, however, it only works if you're using the nvidia drivers directly, so the above actually disables that. There is a work around I found here but it doesn't work with recent Linux and the drivers. I've hacked it but I haven't got it working 100%. I need to make a GitHub for it. So... the answer is maybe soon. I don't know when I'll have time to work on it.

3. Later Nvidia drivers allow overclocking via the nvidia-settings utility.



If you just use Nvidia as the primary display device, and run the latest drivers, the nvidia-settings utility should do all you want. The start scripts I provided above should still work. Using the on-board video is problematic at the moment.
Mark Rose is offline   Reply With Quote
Old 2014-12-19, 03:18   #11
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We have attached the results.

Quote:
Originally Posted by Xyzzy View Post
We have attached the results.

Looking at the results, OOTB is the way to go.
In the AMD box:
* An incremental 33w is being consumed for a measly gain of 40GH d/d while going from OOTB to 100%
* An incremental 34w is being consumed for 19 GH d/d while going from 100% to 125%

Intel box also shows similar results (x2 of course).

In fact, if you could somehow put the 980 from AMD box to Intel box and ran all 3 at OOTB, you would achieve 1670 GH d/d @ 495w, compared to what the intel box is doing with 2x GPU @ 125% (1251 GH d/d @ 492w)

In conclusion: OOTB is the way to go.
axn is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Making the most of my time. How? ozzy24 Information & Answers 13 2011-03-11 22:48
Electric cars - making things worse? Flatlander Science & Technology 104 2010-10-26 16:28
Making an iPhone app...any ideas? flouran Programming 9 2010-09-15 17:56
would like a script making. :) Mobilemick Operation Billion Digits 1 2006-01-15 03:49
Help making system stable bartok Hardware 7 2003-07-06 07:05

All times are UTC. The time now is 15:25.


Fri Jul 16 15:25:00 UTC 2021 up 49 days, 13:12, 1 user, load averages: 1.71, 1.72, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.