mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TObject 2013-01-24 02:50

[QUOTE=LaurV;325627]
P.S. why the ".ro" link? some special reason?
[/QUOTE]

No reason. A picture found on Google.

rjbelans 2013-01-24 04:33

[QUOTE=rjbelans;325628]I guess I'll give these 590s I've got a whirl and see what they can do. Maybe then I can go see what 4 285 classifieds and 3 580SCs will give. I'm doing other DC projects too, so no quotes on how long before I will get to doing all of this.[/QUOTE]

Just a quick update. I started running all four of the GPUs on my 2 590s and they are getting about an average of 140 GHz-d/day each, for 560 GHz-d/day total. These cards are watercooled and running 720/1440/1728 clocks. The CPU is a 980X @ 4.0GHz running 1 worker of Prime 95 on 10 threads.


FYI - I noticed a post earlier in this thread talking about how someone was disappointed about a 590 not equalling 2 x 580. That was never expected to happen with these cards because of the lower clocks that were needed to get the two GPUs on a single card and meet all of nVidia's power, heat, etc. requirements. Even with these reduced clocks, I've never had any complaints with these cards.:tu:

Dubslow 2013-01-24 05:34

Hmm... what program are you running?

With mfaktc 0.20, [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1"]a single 580 should be north of 400 Eq. GHz, and a 590 should be between 300 and 350 Eq. GHz per GPU.[/URL]

LaurV 2013-01-24 06:03

Running two gtx 580, clocked at 781 MHz (factory), with mfaktc 0.20, TF-ing 332M3 range to 72, 73, 74 bits, tweaked the parameters of mfaktc "down" (i.e. to get the card only 96% busy - with the default parameters and an occupancy of 98%++, the computer was not very responsive), I get a [B]stable[/B] 392GHzDays/day/card. When I go TF-ing to 75 bits (same parameters, same range), I get a stable 396GHzD/D/card.

You should NOT get lower that this! (scale it for your clock only, and so it is for 590 too: you just scale my figures for the 590's lower clock).

OTOH, keeping the CPU [B]extremely[/B] busy, will decrease the mfaktc output. Of course, 0.20 sieves with the GPU, and does not need the same CPU power as 0.19, but it [B]still[/B] need the CPU, who coordinates the things. The GPU does not run by itself. For example, when I start 8 workers (HT enabled on my 4-physical-cores CPU), the output of each gtx goes down few (5, 10) GHzDays, and is is not stable anymore (oscillates between 380-390 or so). If I remember right, the 980x is a 6-phys-cores CPU, so running 10 workers on it may be "overcrowding" it a little.... Try pausing P95 for few minutes, and if the output of the cards do not improve, then you may be doing something wrong. Also, cooling it properly is affecting the speed: those new thingies have the bad habit they "throttle" when they get hot.

rjbelans 2013-01-24 12:27

[QUOTE=LaurV;325645]Running two gtx 580, clocked at 781 MHz (factory), with mfaktc 0.20, TF-ing 332M3 range to 72, 73, 74 bits, tweaked the parameters of mfaktc "down" (i.e. to get the card only 96% busy - with the default parameters and an occupancy of 98%++, the computer was not very responsive), I get a [B]stable[/B] 392GHzDays/day/card. When I go TF-ing to 75 bits (same parameters, same range), I get a stable 396GHzD/D/card.

You should NOT get lower that this! (scale it for your clock only, and so it is for 590 too: you just scale my figures for the 590's lower clock).

OTOH, keeping the CPU [B]extremely[/B] busy, will decrease the mfaktc output. Of course, 0.20 sieves with the GPU, and does not need the same CPU power as 0.19, but it [B]still[/B] need the CPU, who coordinates the things. The GPU does not run by itself. For example, when I start 8 workers (HT enabled on my 4-physical-cores CPU), the output of each gtx goes down few (5, 10) GHzDays, and is is not stable anymore (oscillates between 380-390 or so). If I remember right, the 980x is a 6-phys-cores CPU, so running 10 workers on it may be "overcrowding" it a little.... Try pausing P95 for few minutes, and if the output of the cards do not improve, then you may be doing something wrong. Also, cooling it properly is affecting the speed: those new thingies have the bad habit they "throttle" when they get hot.[/QUOTE]

[QUOTE=Dubslow;325642]Hmm... what program are you running?

With mfaktc 0.20, [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1"]a single 580 should be north of 400 Eq. GHz, and a 590 should be between 300 and 350 Eq. GHz per GPU.[/URL][/QUOTE]


I'm running 0.20, but I did play with some settings in the .ini file and my CPU is at a constant 90% + usage because of the other things running. Once the current units are completed, after I get home from work tonight, I will try running with no other programs and I'll put the settings back to defaults.

swl551 2013-01-24 12:56

Stages=0 vs Stage=1
 
What are the pros/cons of factoring with Stages=0 vs Stages=1 with wide bit ranges like 79957723,70,74

Beyond a reduction in Result rows I'm not see anything obvious related to performance or reliability with a GTX-570 and 0.20?

I know that mfaktc would/could switch kernels for factoring different ranges when stages is on (0.19). I don't see any difference with 0.20. Did 0.20 make Stages obsolete?

thx

Andi_HB 2013-01-24 14:26

GTX560 with 268 GHz-days/day
 
The GTX 560 Performance is listed with 205 GHz-days/day but this is only with the default settings.

I have decreased the GPUSieveProcessSize=8
and increased the GPUSieveSieveSize=128

This increased my GhzDays from 205 to 268 on the GTX 560 with mfaktc 0.20

:D

(Win 7, 64bit)

TheJudger 2013-01-24 16:15

[QUOTE=swl551;325662]What are the pros/cons of factoring with Stages=0 vs Stages=1 with wide bit ranges like 79957723,70,74

Beyond a reduction in Result rows I'm not see anything obvious related to performance or reliability with a GTX-570 and 0.20?

I know that mfaktc would/could switch kernels for factoring different ranges when stages is on (0.19). I don't see any difference with 0.20. Did 0.20 make Stages obsolete?

thx[/QUOTE]

Stages=1 is faster than Stages=0 (thinking about cleared exponents per time, not GHzd/day...)
With stages=1 in your example there is a ~1.4% chance that there is a factor between 2[SUP]70[/SUP] and 2[SUP]71[/SUP], in this case 14/15 of the work is saved. If there is a factor between 2[SUP]71[/SUP] and 2[SUP]72[/SUP] there is another ~1.4% chance to save 12/15 of the work. If there is a factor between 2[SUP]72[/SUP] and 2[SUP]73[/SUP] there is another ~1.4% chance to save 8/15 of the work. Of course this depends on "StopAfterFactor", too.

The different kernels are still there in mfaktc 0.20. Actually there are 3 new kernels in 0.20.

Oliver

swl551 2013-01-24 16:30

[QUOTE=TheJudger;325675]Stages=1 is faster than Stages=0 (thinking about cleared exponents per time, not GHzd/day...)
With stages=1 in your example there is a ~1.4% chance that there is a factor between 2[SUP]70[/SUP] and 2[SUP]71[/SUP], in this case 14/15 of the work is saved. If there is a factor between 2[SUP]71[/SUP] and 2[SUP]72[/SUP] there is another ~1.4% chance to save 12/15 of the work. If there is a factor between 2[SUP]72[/SUP] and 2[SUP]73[/SUP] there is another ~1.4% chance to save 8/15 of the work. Of course this depends on "StopAfterFactor", too.

The different kernels are still there in mfaktc 0.20. Actually there are 3 new kernels in 0.20.

Oliver[/QUOTE]

Thanks!

James Heinrich 2013-01-24 16:52

[QUOTE=TheJudger;325675]Of course this depends on "StopAfterFactor", too.[/QUOTE]To clarify, if StopAfterFactor=2 (stop after current class when factor is found) then there's almost no difference in terms of time, right? Except of course each class takes a bit longer if Stages=0, but the difference should be only a matter of seconds or minutes, not hours like it would be for StopAfterFactor=1.

TheJudger 2013-01-24 17:04

Well, not so easy but my feeling tells me that it is slower anyway!
Using the same example, MORE_CLASSES and the time for a single class from 2[SUP]70[/SUP] to 2[SUP]71[/SUP] is T.

First class of 2[SUP]70[/SUP] to 2[SUP]74[/SUP]: 15T (T + 2T + 4T + 8T), chance for a factor: (1/71 + 1/72 + 1/73 + 1/74) / 960: 5.75e-5
In the same time you can do 15 classes from 2[SUP]70[/SUP] to 2[SUP]71[/SUP]: 15T, chance for a factor: 1/71 * 15 / 960: 2.20e-4.

Feel free to do the math till the end but I'm pretty sure that stage=1 is faster on average. Of course this is for the average case.

Oliver


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.