mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-01-24, 02:50   #2168
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Default

Quote:
Originally Posted by LaurV View Post
P.S. why the ".ro" link? some special reason?
No reason. A picture found on Google.
TObject is offline   Reply With Quote
Old 2013-01-24, 04:33   #2169
rjbelans
 
rjbelans's Avatar
 
Dec 2012

7 Posts
Default

Quote:
Originally Posted by rjbelans View Post
I guess I'll give these 590s I've got a whirl and see what they can do. Maybe then I can go see what 4 285 classifieds and 3 580SCs will give. I'm doing other DC projects too, so no quotes on how long before I will get to doing all of this.
Just a quick update. I started running all four of the GPUs on my 2 590s and they are getting about an average of 140 GHz-d/day each, for 560 GHz-d/day total. These cards are watercooled and running 720/1440/1728 clocks. The CPU is a 980X @ 4.0GHz running 1 worker of Prime 95 on 10 threads.


FYI - I noticed a post earlier in this thread talking about how someone was disappointed about a 590 not equalling 2 x 580. That was never expected to happen with these cards because of the lower clocks that were needed to get the two GPUs on a single card and meet all of nVidia's power, heat, etc. requirements. Even with these reduced clocks, I've never had any complaints with these cards.
rjbelans is offline   Reply With Quote
Old 2013-01-24, 05:34   #2170
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Hmm... what program are you running?

With mfaktc 0.20, a single 580 should be north of 400 Eq. GHz, and a 590 should be between 300 and 350 Eq. GHz per GPU.
Dubslow is offline   Reply With Quote
Old 2013-01-24, 06:03   #2171
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32×29×37 Posts
Default

Running two gtx 580, clocked at 781 MHz (factory), with mfaktc 0.20, TF-ing 332M3 range to 72, 73, 74 bits, tweaked the parameters of mfaktc "down" (i.e. to get the card only 96% busy - with the default parameters and an occupancy of 98%++, the computer was not very responsive), I get a stable 392GHzDays/day/card. When I go TF-ing to 75 bits (same parameters, same range), I get a stable 396GHzD/D/card.

You should NOT get lower that this! (scale it for your clock only, and so it is for 590 too: you just scale my figures for the 590's lower clock).

OTOH, keeping the CPU extremely busy, will decrease the mfaktc output. Of course, 0.20 sieves with the GPU, and does not need the same CPU power as 0.19, but it still need the CPU, who coordinates the things. The GPU does not run by itself. For example, when I start 8 workers (HT enabled on my 4-physical-cores CPU), the output of each gtx goes down few (5, 10) GHzDays, and is is not stable anymore (oscillates between 380-390 or so). If I remember right, the 980x is a 6-phys-cores CPU, so running 10 workers on it may be "overcrowding" it a little.... Try pausing P95 for few minutes, and if the output of the cards do not improve, then you may be doing something wrong. Also, cooling it properly is affecting the speed: those new thingies have the bad habit they "throttle" when they get hot.
LaurV is online now   Reply With Quote
Old 2013-01-24, 12:27   #2172
rjbelans
 
rjbelans's Avatar
 
Dec 2012

78 Posts
Default

Quote:
Originally Posted by LaurV View Post
Running two gtx 580, clocked at 781 MHz (factory), with mfaktc 0.20, TF-ing 332M3 range to 72, 73, 74 bits, tweaked the parameters of mfaktc "down" (i.e. to get the card only 96% busy - with the default parameters and an occupancy of 98%++, the computer was not very responsive), I get a stable 392GHzDays/day/card. When I go TF-ing to 75 bits (same parameters, same range), I get a stable 396GHzD/D/card.

You should NOT get lower that this! (scale it for your clock only, and so it is for 590 too: you just scale my figures for the 590's lower clock).

OTOH, keeping the CPU extremely busy, will decrease the mfaktc output. Of course, 0.20 sieves with the GPU, and does not need the same CPU power as 0.19, but it still need the CPU, who coordinates the things. The GPU does not run by itself. For example, when I start 8 workers (HT enabled on my 4-physical-cores CPU), the output of each gtx goes down few (5, 10) GHzDays, and is is not stable anymore (oscillates between 380-390 or so). If I remember right, the 980x is a 6-phys-cores CPU, so running 10 workers on it may be "overcrowding" it a little.... Try pausing P95 for few minutes, and if the output of the cards do not improve, then you may be doing something wrong. Also, cooling it properly is affecting the speed: those new thingies have the bad habit they "throttle" when they get hot.
Quote:
Originally Posted by Dubslow View Post

I'm running 0.20, but I did play with some settings in the .ini file and my CPU is at a constant 90% + usage because of the other things running. Once the current units are completed, after I get home from work tonight, I will try running with no other programs and I'll put the settings back to defaults.
rjbelans is offline   Reply With Quote
Old 2013-01-24, 12:56   #2173
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default Stages=0 vs Stage=1

What are the pros/cons of factoring with Stages=0 vs Stages=1 with wide bit ranges like 79957723,70,74

Beyond a reduction in Result rows I'm not see anything obvious related to performance or reliability with a GTX-570 and 0.20?

I know that mfaktc would/could switch kernels for factoring different ranges when stages is on (0.19). I don't see any difference with 0.20. Did 0.20 make Stages obsolete?

thx
swl551 is offline   Reply With Quote
Old 2013-01-24, 14:26   #2174
Andi_HB
 
Andi_HB's Avatar
 
Mar 2007
Germany

23×3×11 Posts
Smile GTX560 with 268 GHz-days/day

The GTX 560 Performance is listed with 205 GHz-days/day but this is only with the default settings.

I have decreased the GPUSieveProcessSize=8
and increased the GPUSieveSieveSize=128

This increased my GhzDays from 205 to 268 on the GTX 560 with mfaktc 0.20

:D

(Win 7, 64bit)
Andi_HB is offline   Reply With Quote
Old 2013-01-24, 16:15   #2175
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Quote:
Originally Posted by swl551 View Post
What are the pros/cons of factoring with Stages=0 vs Stages=1 with wide bit ranges like 79957723,70,74

Beyond a reduction in Result rows I'm not see anything obvious related to performance or reliability with a GTX-570 and 0.20?

I know that mfaktc would/could switch kernels for factoring different ranges when stages is on (0.19). I don't see any difference with 0.20. Did 0.20 make Stages obsolete?

thx
Stages=1 is faster than Stages=0 (thinking about cleared exponents per time, not GHzd/day...)
With stages=1 in your example there is a ~1.4% chance that there is a factor between 270 and 271, in this case 14/15 of the work is saved. If there is a factor between 271 and 272 there is another ~1.4% chance to save 12/15 of the work. If there is a factor between 272 and 273 there is another ~1.4% chance to save 8/15 of the work. Of course this depends on "StopAfterFactor", too.

The different kernels are still there in mfaktc 0.20. Actually there are 3 new kernels in 0.20.

Oliver
TheJudger is offline   Reply With Quote
Old 2013-01-24, 16:30   #2176
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Stages=1 is faster than Stages=0 (thinking about cleared exponents per time, not GHzd/day...)
With stages=1 in your example there is a ~1.4% chance that there is a factor between 270 and 271, in this case 14/15 of the work is saved. If there is a factor between 271 and 272 there is another ~1.4% chance to save 12/15 of the work. If there is a factor between 272 and 273 there is another ~1.4% chance to save 8/15 of the work. Of course this depends on "StopAfterFactor", too.

The different kernels are still there in mfaktc 0.20. Actually there are 3 new kernels in 0.20.

Oliver
Thanks!
swl551 is offline   Reply With Quote
Old 2013-01-24, 16:52   #2177
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Of course this depends on "StopAfterFactor", too.
To clarify, if StopAfterFactor=2 (stop after current class when factor is found) then there's almost no difference in terms of time, right? Except of course each class takes a bit longer if Stages=0, but the difference should be only a matter of seconds or minutes, not hours like it would be for StopAfterFactor=1.
James Heinrich is offline   Reply With Quote
Old 2013-01-24, 17:04   #2178
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Well, not so easy but my feeling tells me that it is slower anyway!
Using the same example, MORE_CLASSES and the time for a single class from 270 to 271 is T.

First class of 270 to 274: 15T (T + 2T + 4T + 8T), chance for a factor: (1/71 + 1/72 + 1/73 + 1/74) / 960: 5.75e-5
In the same time you can do 15 classes from 270 to 271: 15T, chance for a factor: 1/71 * 15 / 960: 2.20e-4.

Feel free to do the math till the end but I'm pretty sure that stage=1 is faster on average. Of course this is for the average case.

Oliver
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 11:44.


Mon Aug 2 11:44:01 UTC 2021 up 10 days, 6:13, 0 users, load averages: 1.02, 1.16, 1.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.