mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

kladner 2012-08-07 14:10

[QUOTE=nucleon;307218]Noooooooooo....

Who's going to keep me company at the top. :)
.....
-- Craig[/QUOTE]

Looks like you'll have to get used to solitary splendor in the TF charts. :showoff:Nobody else comes close to your graph slope.:no:

TheJudger 2012-08-07 16:36

Hi all,

here is a small teaser for mfaktc 0.19 [B]RAW GPU performance[/B]

CUDA 4.2, stock GTX 470 (1215MHz):

mfaktc 0.18:
[CODE]kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit | 106.0M/s | 81.6M/s
75bit | 200.0M/s | 156.2M/s
95bit | 160.2M/s | 124.8M/s
76bit | n.a. | n.a.
79bit | 335.4M/s | 262.1M/s
92bit | 267.7M/s | 211.2M/s
[/CODE]

mfaktc 0.19-pre11
[CODE]kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit | 106.0M/s | 81.5M/s
75bit | 214.7M/s | 168.1M/s
95bit | 169.5M/s | 132.2M/s
76bit | 424.7M/s | 334.5M/s
79bit | 343.5M/s | 268.1M/s
92bit | 276.4M/s | 217.8M/s[/CODE]

71bit: unchanged
most of the 75bit, 95bit, 79bit and 92bit improvement is related to the optimizations of the squaring function (thank you, George!).
I guess that older GPUs (CC 1.x) don't see any improvement.

The new 76bit barrett kernel is nice, take the 79bit barrett kernel, (re-)move some lines of code and you're mostly done.

For the future it might be possible to add more kernels:[LIST][*]77 bit barrett kernel (same as 76bit kernel but with more accuracy in preprocessing)[*]78 bit barrett kernel (same as 77bit kernel but with correction step from 79bit kernel for each set bit in the exponent)[/LIST]
Release plan:[LIST][*]run some tests on my GTX 275 (this weekend?)[*]build an release candidate and give to a few people for testing[*]release one week later[/LIST]So [B]if[/B] everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver

NormanRKN 2012-08-07 21:39

cool !:cool:

kladner 2012-08-07 22:21

[QUOTE=TheJudger;307238]Hi all,
.................
So [B]if[/B] everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver[/QUOTE]

Thanks for the update and the work behind it. Looking forward to running 0.19.

Xyzzy 2012-08-10 17:30

[QUOTE]Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.[/QUOTE]The GPUs sold within a day or so on eBay.

If we take the purchase price for the four GPUs (\$1320) and subtract what we sold them for (\$800) we have a net cost of \$520 or so.

We think (?) we used them for about 400,000 GHz/days of work so our cost per GHz/day, not counting the host computers, which are still happily churning away, is 0.13¢ per GHz/day. (Our math might be wrong.)

That seems like a reasonable ROI.

:tank:

TheJudger 2012-08-18 22:25

mfaktc 0.19
 
Hello,

mfaktc 0.19 is now available!

Source code: [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.19.tar.gz[/url]
Windows executables (CUDA 4.2): [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.19.win.cuda42.zip[/url]
Linux executable (CUDA 4.2): [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.19.linux64.cuda42.tar.gz[/url]

Highlights:[LIST][*][B]faster[/B] than mfaktc 0.18, up to 25% for CC >= 2.0, up to 9% for CC 1.x (new kernel, faster squaring code: thank you George!)[*]merged user configureable status line from mfakto[/LIST]
As usual: finish your current assignment and upgrade to mfaktc 0.19 after that. The upgrade is recommended to everyone because a speed improvement is possible on all GPUs.

Oliver

flashjh 2012-08-18 22:38

It works really well! (Make sure your cooling is working properly) :smile:

Thanks for the update!

NormanRKN 2012-08-18 22:49

very good work Oliver. the improvements are top in 0.19 :fusion:!

Norman


p.s.
(super gemacht Olli und der leistungszuwachs ist echt super. barrett76 werkelt gerade mit beachtlicher performance ! noch nen bier hol :beer: )

kladner 2012-08-18 22:52

My thanks, too.

Now I absolutely have to resolve my driver issues and get up to date so I can run this.:redface:

chalsall 2012-08-18 23:33

[QUOTE=TheJudger;308481]The upgrade is recommended to everyone because a speed improvement is possible on all GPUs.[/QUOTE]

Sweet!!! :smile:

On my measly little FX 1800 (CC 1.1), I'm getting about a 8.8% speed increase.

Thanks Oliver and George!!!

Chuck 2012-08-19 00:13

Super, I am running two instances on a GTX 580 and each has increased from 168 M/s to 202 M/s with sieveprimes 2000.

TObject 2012-08-19 00:41

Thank you.

Are the check point files compatible with the previous version?

I have a few long running assignments. And I wonder if I can upgrade right away, or should wait until those are finished.

kladner 2012-08-19 07:31

1 Attachment(s)
Oliver, Sir,

This update is beyond amazing. I am attaching a screenshot of 0.18 running 4 instances. I will post again with a screenshot of 0.19 running the same 4 exponents. It is hard to believe that the difference is real.:shock::w00t::max:

kladner 2012-08-19 07:33

continued
 
1 Attachment(s)
Part 2. mfaktc x4, same exponents, same system load, CPU @ stock, GPUs @ factory OC. Runs are minutes apart.

EDIT: Times drop from 10s of seconds to 2.5 or less. Wow. Four instances/cores (of a Phenom II x6 1090T) can't come close to saturating a GTX 570. (.....with the default .ini. I'm trying a minimum of 3000 now.)

Another edit: Let me climb down from the walls. I just realized that the default ini has Stages=1.:redface: Time to reevaluate!

ET_ 2012-08-19 08:55

I have CC 1.3 and CUDA 4.1: should I update to 4.2 to get that 9% increase of speed?

I guess I could figure it myself just trying, but maybe I'm not the one in this situation... :smile:

Luigi

aketilander 2012-08-19 13:14

[QUOTE=TObject;308501]Thank you.

Are the check point files compatible with the previous version?

I have a few long running assignments. And I wonder if I can upgrade right away, or should wait until those are finished.[/QUOTE]

Well I tried and installed the new version over the old. No, the old checkpoint file does not seem to be compatible with the new version of the program, since the TF of the expo began from the beginning. Since I am doing billion digits exponents I would loose like 2 weeks of work keeping the new version. So if you install the new version of the program and want to keep your work it seems as if you need to wait until the old exponents have finished until you upgrade.

ET_ 2012-08-19 14:34

[QUOTE=ET_;308536]I have CC 1.3 and CUDA 4.1: should I update to 4.2 to get that 9% increase of speed?

I guess I could figure it myself just trying, but maybe I'm not the one in this situation... :smile:

Luigi[/QUOTE]

Yup, it seems so... :smile:

flashjh 2012-08-19 14:55

[QUOTE=aketilander;308549]Well I tried and installed the new version over the old. No, the old checkpoint file does not seem to be compatible with the new version of the program<snip>[/QUOTE]

This is no surprise :smile::
[QUOTE=TheJudger;308481]As usual: [U]finish your current assignment and upgrade to mfaktc 0.19 after that[/U]. The upgrade is recommended to everyone because a speed improvement is possible on all GPUs.

Oliver[/QUOTE]

flashjh 2012-08-19 14:58

[QUOTE=kladner;308531]...I'm trying a minimum of 3000 now.)![/QUOTE]
Do you have the resources to run another instance? With SP dropping so low, your GPU can handle more throughput if your CPU can handle it.

kladner 2012-08-19 16:10

[QUOTE=flashjh;308557]Do you have the resources to run another instance? With SP dropping so low, your GPU can handle more throughput if your CPU can handle it.[/QUOTE]

I'm afraid I really don't, unless I cut P-1 down to one worker. On the bright side, my combined average rate for four instances has gone from a low of about 420 M/s (NumStreams=3, factory OC, and Priority=Low, CPU stock) in 0.18, to around 480 M/s in 0.19. If I put it to NumStreams=5 and run at Normal priority it has hit 540-550 M/s.

It will take a few more completions to see what throughput works up to.

Interestingly, the driver version I got working (devdriver_4.2_winvista-win7_64_301.32_general.exe) seems to only allow the GTX 460 and 570 to run at their Factory OC clocks, 715Mhz and 780Mhz respectively.

Chuck 2012-08-19 18:55

[QUOTE=Chuck;308499]Super, I am running two instances on a GTX 580 and each has increased from 168 M/s to 202 M/s with sieveprimes 2000.[/QUOTE]

I see I should be looking at the time per class instead of this rate; however, the time per class is still lowest with sieveprimes 2000 (12.021s for M57xxxxxx)

Chuck 2012-08-19 19:04

1 Attachment(s)
[QUOTE=Chuck;308571]I see I should be looking at the time per class instead of this rate; however, the time per class is still lowest with sieveprimes 2000 (12.021s for M57xxxxxx)[/QUOTE]

Here are the worker windows.

ckdo 2012-08-19 22:58

Would anyone be so kind to provide an mfaktc-0.19 binary for a glibc-2.11.1 linux64 system, or a version of libc-2.14.so which actually works under Ubuntu 10.04?

I've had no luck building my own and getting mfaktc-0.18 to work again was hard enough...

Dubslow 2012-08-19 23:01

1 Attachment(s)
[QUOTE=ckdo;308598]Would anyone be so kind to provide an mfaktc-0.19 binary for a glibc-2.11.1 linux64 system, or a version of libc-2.14.so which actually works under Ubuntu 10.04?

I've had no luck building my own and getting mfaktc-0.18 to work again was hard enough...[/QUOTE]

Will this do?
[code]bill@Gravemind:~∰∂ /lib/x86_64-linux-gnu/libc-2.13.so
GNU C Library (Ubuntu EGLIBC 2.13-0ubuntu13.1) stable release version 2.13, by Roland McGrath et al.
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.5.2.
Compiled on a Linux 2.6.38 system on 2012-03-06.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
[/code]

Edit: Compilation attached. Warning: CUDA toolkit is version 4.1, so this code won't run with Kepler GPUs. (I can recompile with 4.2 if necessary.) Edit2: Reading your post in more detail, I'm not actually sure this will work... though I can post the libc-2.13.so if you want.

ckdo 2012-08-20 06:40

Dubslow's binary actually works once supplied with a v4.1 libcudart, even with the v2.11.1 glibc. :big grin:

Thanks a bunch, Bill.

TObject 2012-08-21 19:32

I switched my GTX 580 to mfaktc-0.19. I can now load the GPU to 97% with three mfaktc threads where I needed four in version 0.18.

Thank you very much.

Edit: the M/s on each of the three threads is about 157.

James Heinrich 2012-08-21 19:48

[QUOTE=TObject;308812]the M/s on each of the three threads is about 157.[/QUOTE]If you care to fiddle with mfaktc.ini::ProgressFormat a bit, GHz-days/day (%g) is probably more useful to look at than M/s (%r). This is the number you really want to maximize.

edit: Oh, wait, the new default config line [I]does[/I] have %g in there.

TObject 2012-08-21 20:19

[QUOTE=James Heinrich;308813]If you care to fiddle with mfaktc.ini::ProgressFormat a bit, GHz-days/day (%g) is probably more useful to look at than M/s (%r). This is the number you really want to maximize.

[/QUOTE]

Sure. Thanks.

Three threads:
16-minute assignement - 94 GHz-days/day
8-hour assignement - 97 GHz-days/day
8-day assignement - 96 GHz-days/day

GPU load: 97%
GPU clock: 797 MHz
GPU Memory Clock: 2025 MHz

GTX 580

SievePrimes has auto-adjusted down to 2000 on all three threads. That probably means there is more performance to milk out of the GPU, but still we are somewhat close to maximum with only three threads.

That is with me using the computer (web browsing and email at the moment).

kladner 2012-08-21 21:21

Thanks for pointing out these options, James. Showing g-d/d is a welcome addition.

lalera 2012-08-28 13:28

i have a question
 
is it possible to make a lessclasses version of mfaktc v 0.19
for win7 64bit ?
especially because of the new and fast 76bit kernel

TheJudger 2012-08-30 16:34

I have an answer
 
[QUOTE=lalera;309519]is it possible to make a lessclasses version of mfaktc v 0.19
for win7 64bit ?
especially because of the new and fast 76bit kernel[/QUOTE]

Yes, it is possible!
Windows executeables with "Less Classes": [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.19.win.cuda42.LessClasses.zip[/url]

Oliver

NormanRKN 2012-08-30 17:25

hi!
what is the difference between the lessclasses and the stock version ?

TheJudger 2012-08-30 17:53

Hi Norman,

the "LessClasses" version splits the job into 96 of 420 classes instead of 960 of 4620 classes.
This version is good for short running jobs (below 1s per class with the default version) because of less overhead for class switch at the cost of a slightly lower sieve efficency (because multiples of 11 are not avoided with the residue classes).
This option is a compiletime option in src/params.h.

Oliver

NormanRKN 2012-08-30 18:00

thx Olli,

but I can use it for long running jobs too without any trouble ?

Norman

TheJudger 2012-08-30 18:04

Hi Norman,

yes, for sure. Worst case it runs a little bit slower and/or higher demand for CPU.

Oliver

Dubslow 2012-08-30 18:04

[QUOTE=NormanRKN;309760]thx Olli,

but I can use it for long running jobs too without any trouble ?

Norman[/QUOTE]

Yes, but it will be slower than the default version. It's faster for very short jobs, slower for the work that GIMPS usually does.

NormanRKN 2012-08-30 18:16

aaah ok!!
is there a range where "short" starts or ends to know what versions I should use or is it dependent on hardware (speed) ?

Norman

Dubslow 2012-08-30 18:22

[QUOTE=NormanRKN;309765]is there a range where "short" starts or ends to know what versions I should use or is it dependent on hardware (speed) ?[/QUOTE]

[QUOTE=TheJudger;309758]This version is good for short running jobs (below 1s per class with the default version) [/QUOTE]
^^^

kladner 2012-08-30 18:25

Doesn't shorter run time correspond to higher exponents?

Dubslow 2012-08-30 18:28

[QUOTE=kladner;309769]Doesn't shorter run time correspond to higher exponents?[/QUOTE]

Typically, yes, although doing lower bit levels decreases the run-time as well. It helps that the higher exponents are at a lower bit level anyways, so you get "double the savings" so to speak.

NormanRKN 2012-08-30 18:57

OK, i´ve understand.
thank you guys !

Norman

LaurV 2012-08-31 04:04

[QUOTE=TheJudger;309750]Yes, it is possible!
[/QUOTE]
How about a faster 67.13 bit kernel (whatever, but not more then 67 bits in factors, and faster :razz:) and running for small expos (the same like the one distributed before to bcp, me, few others?). If you ever put it on your todo list, don't forget to PM me a link to it.

[edit: and to be on topic: less classes versus normal: The end of the line is that, if you are nitpicker/pettifogger like me, you have to test and compare both versions for your particular system. The "less classes" version would be better for more ranges on a system having a low-end GPU and a top-class CPU (as Oliver said, it is more CPU intensive for sieving). When I did the 332M-333M ranges to 70 bits for Uncwilly, I tested them comparatively and I found out that in my system (heavily strangled by the CPU power, the i7-2600k can't keep up with all GPU's I have in it (usually 2 gtx580, occasionaly a third or a tesla), and to max the GPU's when I run mfaktc, I must run ONLY mfaktc; immediately after I start anything else, like P95, aliqueit, etc, then the GPUs occupancy goes down), so in my system, the "normal" version still performs much better for those expo ranges, over 65 bits [edit2: and 0.19 with its lower SievePrimes performs even BETTER]. In fact, the "less classes" version is faster under 65 bits, but it makes no sense to use it, as mfaktc will do (for this range) the 0-68 bits ALL-IN-ONE chunk, then another two chunks for 69 and 70 bits.

tl;dr: if you plan to do mfaktc intensively, do a bit of tuning first. You may be surprised of what your system can do :smile:
end of edit]

James Heinrich 2012-08-31 11:59

[QUOTE=LaurV;309823]If you ever put it on your todo list, don't forget to PM me a link to it.[/QUOTE]Me too! I'm also one of those who like to wade through small factors.

lycorn 2012-08-31 23:19

[QUOTE=James Heinrich;309844]small factors.[/QUOTE]

You mean small exponents, don´t you?
I think you and LaurV are referring to the version that allows to test expos <1M, and yes, I would also like to get such version.

James Heinrich 2012-09-01 00:31

[QUOTE=lycorn;309886]You mean small exponents, don´t you?
I think you and LaurV are referring to the version that allows to test expos <1M, and yes, I would also like to get such version.[/QUOTE]No, (at least for myself) I'm referring to looking for small [i]factors[/i] on large exponents. For example I've cleared most of the [URL="http://mersenne.info/trial_factored_tabular_data/2/800000000/"]801M range[/URL] to 64-bit, 65-bit, etc; now working up to 70-bit. Or, even more pronounced, pre-factoring in the [URL="http://www.mersenneforum.org/forumdisplay.php?f=50"]OBD[/URL] range (~3322M) to clear out the super-tiny (<60-bit) factors.

LaurV 2012-09-01 07:43

I was indeed talking about small exponents, see my post, the same version of software lycorn mentioned. That software was distributed to a "trusted" lot of crunchers (and I am proud that Oliver put me on that lot), and it was used to look for factors of mersenne numbers with exponents between 2K and 1M, from 60 to 65 bits. I personally took from 60 to 63 a series of exponents which had not so much ECM done on them. The version of that code based on mfaktc 0.18 which I have, I still use occasionally when some corner of the GPU is free. It is very slow, first of because 63 bits means a lot for those small expos (same amount of work like a 70-76 bit assignment for a LL-front exponent) and taking into account that we are not targeting bit-levels higher then - say - 65, then a lot of improvement could be done there too.

Unfortunately the biggest problem of that range is not the bit level, but the sieving process, you have to be careful how high you sieve the classes to avoid eliminating factors (never sieve with primes higher then 2*p, in fact the program never sieves with primes higher then p, this could be improved too, by selecting 2*p if p is 3 (mod 4) or 6*p if p is 1 (mod 4)) which lets behind a lot of candidates for exponentiation, then if the programmer is not careful, mfaktc can run into memory troubles. Handling those things is difficult and makes the program slower. Oliver did a lot of work to be able to lower the exponents so much (to 2k, instead of 1M like the default mfaktc). Using this version for "normal work" is much slower, it is only dedicated to guys who wanna waste their time looking for factors of mersenne numbers with small prime exponents (as I said already on the forum, this is somehow "wasting time": due to the amount of ECM done on that range, there would be no factors below 2^100 or so, remaining undiscovered. Our fun stays in raising the "how far factored" and eventually finding an "ecm miss"... well this never happened up to now, but it would be a nice headline!).

What James is talking about is a different story. He wants to find all [B]small factors[/B] very fast, for high exponents. I already sent him a list with all factors from 0 to 37 bits and expos from 0 to 10G (it took me almost one day to upload it on his server!), and I am on the way to add more bits to it, but the uploading process is very slow, and we are talking about many gigabytes of data.

BTW, James, if you are only interested in exponents below 2^32 (4G29), then the current version (normal build) of mfaktc 0.19 can do this and is very fast.

James Heinrich 2012-09-01 14:34

[QUOTE=LaurV;309914]BTW, James, if you are only interested in exponents below 2^32 (4G29), then the current version (normal build) of mfaktc 0.19 can do this and is very fast.[/QUOTE]Too fast. Even running 6 instances of mfaktc, and taking exponents up to "only" 2^64 I can't get above about 80% GPU usage, and throughput estimates are wildly all over the place (from 10 to 80GHz-day/day per instance, jumping like mad; a sure sign of inefficiency (lack of buffer or I don't know what, but certainly not optimal).

LaurV 2012-09-01 16:14

@Oliver: small cosmetic for 0.19 less classes version (I only tested win64): it still displays 4620 classes (like 0/4620, 1/4620 .... 419/4620). better check that compiler option against hard coded screen messages :razz: Otherwise, it seems to work wonderfully well.

lycorn 2012-09-01 17:03

[QUOTE=James Heinrich;309891]No, (at least for myself) I'm referring to looking for small [I]factors[/I] on large exponents.[/QUOTE]
OK, got it.

@LaurV: Would you be so kind as to sending me a Win7 64 bit exe, in case you have one? (Providing Oliver doesn´t object to it). I am using a GTX560Ti (CC 2.1), and CUDA version 4.2. I would like to give it a go from time to time, just for kicks.
If it´s OK with you both, I´ll PM you an email address.
Thx

TheJudger 2012-09-01 18:11

[QUOTE=LaurV;309945]@Oliver: small cosmetic for 0.19 less classes version (I only tested win64): it still displays 4620 classes (like 0/4620, 1/4620 .... 419/4620). better check that compiler option against hard coded screen messages :razz: Otherwise, it seems to work wonderfully well.[/QUOTE]

Well, you can adjust this in mfaktc.ini, it is a runtime option. But you're right, I should have adjusted this in the default config.

Oliver

TheJudger 2012-09-04 12:22

Are they [B]real world situations[/B] where a single core of a CPU can't feed a single CC 1.x GPU (running mfaktc, ofcourse)?

Oliver

P.S. I'm not talking about a theoretical rig with a crappy Netburst P4 on low clocks driving a Hyperclocked GTX 285 running very low exponents.

James Heinrich 2012-09-04 13:00

[QUOTE=TheJudger;310289]Are they [B]real world situations[/B] where a single core of a CPU can't feed a single CC 1.x GPU (running mfaktc, ofcourse)?[/QUOTE]I guess I live where real world meets theory: I'm playing with small TF on large exponents (2[sup]64[/sup] on >1000M) and mfaktc thrashes madly to try and keep up.

On 8800GT (CC 1.1) with i7-920 @3.2GHz, two mfaktc instances get 98% GPU usage, but at SP=2000 and "wait" value of about 55.

On GTX 570 (CC 2.0) on i7-3930K @ 4.2GHz it's a similar story in this range, it takes 3 instances at SP=2000 to get up to 96% GPU usage, but overall throughput is better with 6 instances.

Both examples running v0.19-lessclasses; Grid=0 on the 8800GT and Grid=3 on the GTX 570.

And yes, I know this doesn't really qualify as "real world" usage, but since you asked... :smile:

lalera 2012-09-04 18:02

the performance of my machine
sbe 3930k at 3.5ghz with a gtx580 at 810mhz
mfaktc v0.19 lessclasses
default-mfaktc.ini
exponents about 865400000
bit 65 to 66
6 instances
about 13 seconds per candidate
gpu at 98%

lalera 2012-09-04 20:45

the performance of
q6600 with gtx260
one instance
--
0.19
Factor=N/A,370000123,65,66
45m/sec
1:20min
gpu 64%
--
0.19 lessclasses
Factor=N/A,370000123,65,66
59m/sec
45 sec
gpu 88%
--

Prime95 2012-09-04 21:01

I think the time may have come to no longer worry much about optimizing mfaktc for CC 1.X GPUs.

frmky 2012-09-09 20:11

[QUOTE=Prime95;310325]I think the time may have come to no longer worry much about optimizing mfaktc for CC 1.X GPUs.[/QUOTE]

My poor S1070. :cry:

ixfd64 2012-09-09 20:52

[QUOTE=Prime95;310325]I think the time may have come to no longer worry much about optimizing mfaktc for CC 1.X GPUs.[/QUOTE]

I'm guessing it won't be long now until the code is integrated into Prime95?

Prime95 2012-09-09 23:02

[QUOTE=ixfd64;310905]I'm guessing it won't be long now until the code is integrated into Prime95?[/QUOTE]

I have no intention of doing that. I had thought about defining a plug-in interface for programs like mfaktc, but I've been sidetracked doing other work.

ixfd64 2012-09-10 03:51

So does this mean Prime95 won't have built-in GPU code in the short term?

Dubslow 2012-09-10 05:36

[QUOTE=ixfd64;310937]So does this mean Prime95 won't have built-in GPU code in the short term?[/QUOTE]

The problem really is independent development. It's impractical to release a new Prime95 every time TheJudger or Bdot release a new mfakt* version. That would be the idea of a plug-in interface, but such a thing is decidedly non-trivial.

TheJudger 2012-09-11 16:15

Hi,

[QUOTE=frmky;310901]My poor S1070. :cry:[/QUOTE]

no, the situation is not that bad. The current development (0.20) does not run very well (slow performance) on CC 1.x GPUs [B]iff[/B] sieving is done on [B]G[/B]PU.
There will be new kernels in mfaktc 0.20 and depending on the factor size (above 2[SUP]76[/SUP]) you'll notice a performance improvement even on CC 1.x GPUs.
So the questions/suggestions are more about how much effort should be spent on CC 1.x GPUs? Currently I don't plan to drop support on CC 1.x GPUs but I don't spent much time on optimizations for CC 1.x, got the idea?
I was thinking about disabling GPU sieving on CC 1.x GPUs (just because it runs slow), this will save some time during testing.

Oliver
P.S. thank you George and rcv for GPU sieving, thank you George for the new kernels :smile:

ET_ 2012-09-12 07:56

[QUOTE=TheJudger;311146]
P.S. thank you George and rcv for GPU sieving, thank you George for the new kernels :smile:[/QUOTE]

Thank you Oliver for not dropping cc 1.3 :bow:

Luigi

aketilander 2012-09-14 17:27

Resume from chekpoint file with mfaktc 0.19
 
[COLOR=black][FONT=Verdana][I]I have a question, which maybe not is a big issue, but anyway:[/I][/FONT][/COLOR]
[COLOR=black][FONT=Verdana][I]A HDD was smoked on one of my systems, but happily I was able to rescue the intermediary results from mfaktc 0.19 from the old hard drive. I was happy because I am doing some work for Operation Billion Digits high up between ^85 and ^86.[/I][/FONT][/COLOR]
[COLOR=black][FONT=Verdana][I]Now I have installed a new disk and Windows fresh from the beginning on that disk.[/I][/FONT][/COLOR]
[COLOR=black][FONT=Verdana][I]When I tried to resume from last checkpoint-file mfaktc 0.19 refuses me to resume and insists that I should begin trial factoring from the beginning. This would mean that I would loose a couple of weeks of work, so I wonder if there is a way to resume?[/I][/FONT][/COLOR]

[B]Sorry, I just looked at the content of the checkpoint files and there are only blank spaces, no real content. So the old Hard disk was in a worse state then I thought! Nothing to do then, sorry for taking your time.[/B]

c10ck3r 2012-10-06 19:58

My Pentium D 940 CPU can't feed my GTX 460 fully. I can get 24 Ghz-Days/day running 24/7 instead of however many it should get (~120).

TheJudger 2012-10-06 21:50

c10ck3r: I guess the poor performance is related to the relative small L1 cache size. You'll like mfaktc 0.20 once it is finished.

Oliver

Btw: CPU Launch date Q1'06 vs. Q3'10 GTX 460, this is four and a half year difference...

c10ck3r 2012-10-07 02:13

It is technically my mothers desktop, with my GPU and PSU... This summer I hope to add a desktop of my own...

bcp19 2012-10-10 01:56

I had a similar problem with a Core2Quad and a 480, could not max out the GPU even with all 4 cores running mfaktc. A single core of my 2500 outproduced the entire quad on the 480.

ixfd64 2012-11-04 00:40

Could we please have Prime95-style timestamps accompany the results? That would be quite useful!

Edit: Never mind, I found out that I can configure it in mfaktc.ini.

flashjh 2012-11-04 01:17

I would also like to see the option to do sieve on the GPU only like mmff. How hard is that?

James Heinrich 2012-11-04 02:26

[QUOTE=flashjh;316901]I would also like to see the option to do sieve on the GPU only like mmff. How hard is that?[/QUOTE][url=http://www.mersenneforum.org/showpost.php?p=311146&postcount=1904]Oliver's post[/url] from 2 months ago seems to indicate it's planned for inclusion in mfaktc v0.20, and presumably working (at least in alpha form) at that time.

flashjh 2012-11-04 02:31

[QUOTE=James Heinrich;316905][URL="http://www.mersenneforum.org/showpost.php?p=311146&postcount=1904"]Oliver's post[/URL] from 2 months ago seems to indicate it's planned for inclusion in mfaktc v0.20, and presumably working (at least in alpha form) at that time.[/QUOTE]
That's right, I forgot. I know it won't be as fast, but it will be nice to use the CPU for something else.

dbaugh 2012-11-04 10:35

The last mention I see of this problem was 10 months ago. It is starting to bug me that when I find big factors with mfaktc I get tiny ECM credit. Is there some trick I need to know to submit factors and get the appropriate credit?

submitted
M980261 no factor from 2^62 to 2^63, We4: 096B068C
M864917 no factor from 2^62 to 2^63, We4: 086205CA
M2831687 has a factor: 419541074737331003927 [TF:68:69:mfaktc 0.18 barrett79_mul32]
found 1 factor for M2831687 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]

response
No factor lines found: 2
Processing result: M980261 no factor from 2^62 to 2^63, We4: 096B068C
CPU credit is 1.0014 GHz-days.
Processing result: M864917 no factor from 2^62 to 2^63, We4: 086205CA
CPU credit is 1.1350 GHz-days.
Mfaktc no factor lines found: 0
Mfakto no factor lines found: 0
Factors found: 1
Processing result: M2831687 has a factor: 419541074737331003927
Insufficient information for accurate CPU credit.
For stats purposes, assuming factor was found using ECM with B1 = 50000.
CPU credit is 0.1150 GHz-days.
P-1 lines found: 0
LL lines found: 0
Mlucas lines found: 0
Glucas (G29) lines found: 0
Glucas lines found: 0
MacLucasFFTW lines found: 0
CUDALucas lines found: 0
ECM lines found: 0

LaurV 2012-11-04 10:57

Try adding a "no factor" line produced by mfactc BEFORE the line showing the factor. Make a generic one, even for different exponent, existent result, etc, if you don't have a "real" one. You may not get credit for it, but it will instruct the parser that the following result (the "factor found" line) was done by mfaktc and get the proper credit.

OTOW, are you wasting CPU times to do TF ?!? If you want to TF lower expo (under 1M) then ask oliver for the version of mfaktc that does lower expos, and you be 10 times faster on doing that (assuming you have a gpu, or else, better use CPU ticks to do p-1).

axn 2012-11-04 11:05

[QUOTE=LaurV;316934]OTOW, are you wasting CPU times to do TF ?!? If you want to TF lower expo (under 1M) then ask oliver for the version of mfaktc that does lower expos, and you be 10 times faster on doing that (assuming you have a gpu, or else, better use CPU ticks to do p-1).[/QUOTE]
Doing TF on low expos is a waste of time. PERIOD. Just do ECM, if you must.

dbaugh 2012-11-04 11:20

Sent him a PM with that very request yesterday. I enjoy working on small exponents. It may be nostagia for the early days. I get perverse pleasure from finding a factor for an exponent that I first LL'ed back in 1997.

I have a 560Ti, 580 and two 7970's, plus 16 other boxes without GPUs.

My pre-V4 ID did not get moved over due to a lapse in activity during the critical time. Hence, the 2005 join date.

flashjh 2012-11-04 14:30

[QUOTE=dbaugh;316937]Sent him a PM with that very request yesterday. I enjoy working on small exponents. It may be nostagia for the early days. I get perverse pleasure from finding a factor for an exponent that I first LL'ed back in 1997.

I have a 560Ti, 580 and two 7970's, plus 16 other boxes without GPUs.

My pre-V4 ID did not get moved over due to a lapse in activity during the critical time. Hence, the 2005 join date.[/QUOTE]
How do your 7970s perform for GPU TF? What processor do you drive them with?

kracker 2012-11-04 14:48

[QUOTE=flashjh;316951]How do your 7970s perform for GPU TF? What processor do you drive them with?[/QUOTE]

I believe the 7970 does about ~330 GHz/days a day, the GHz edition about 370~

Just for ya info, a 7970 is about ~400$, ghz is ~450$.

flashjh 2012-11-04 14:57

[QUOTE=kracker;316954]I believe the 7970 does about ~330 GHz/days a day, the GHz edition about 370~

Just for ya info, a 7970 is about ~400$, ghz is ~450$.[/QUOTE]
I ask because on [URL="http://www.mersenne.ca/mfaktc.php"]James' site[/URL] the 7970 is 1 above the 580 on the list. I'm wondering what CPU is driving the 7970 and what real performance output is achieved.

James' list shows a 580 @ 287.5GHz-days/day. I actually get ~400GHz-days/day.

James Heinrich 2012-11-04 15:00

[QUOTE=flashjh;316955]I ask because on [URL="http://www.mersenne.ca/mfaktc.php"]James' site[/URL] the 7970 is 1 above the 580 on the list.
James' list shows a 580 @ 287.5GHz-days/day. I actually get ~400GHz-days/day.[/QUOTE]Please submit your benchmarks (submission form at the top of that page).

flashjh 2012-11-04 15:10

[QUOTE=James Heinrich;316956]Please submit your benchmarks (submission form at the top of that page).[/QUOTE]
It's been a while... do you have a standard test you want run? Just one instance with nothing else running, correct?

kracker 2012-11-04 15:48

[QUOTE=flashjh;316958]It's been a while... do you have a standard test you want run? Just one instance with nothing else running, correct?[/QUOTE]

Looking [URL="http://mersenne.ca/mfaktc.php#"]here[/URL], it seems there is a submission form.

James Heinrich 2012-11-04 16:18

[QUOTE=flashjh;316958]It's been a while... do you have a standard test you want run? Just one instance with nothing else running, correct?[/QUOTE]The details are on the [url=http://www.mersenne.ca/mfaktc.php#benchmark]submission form[/url].
Since instance of mfakt_, SievePrimes set to 5000.

dbaugh 2012-11-04 19:52

The GPUs are in fast boxes. 3770, 3960x, etc. Not the best bang for the buck, but I figure I have a finite amount of wall time to wait on results. I could use a net and fish for herring to feed the village or have a blast sportfishing. I run as many instances as it takes to saturate the GPU. I use minimal sieve primes. It makes the GPU work harder, but that is sometimes what it takes to get to 98%. I run two instances of prime95 with the extra CPU cycles at the same time. Just looking at counts in results files over time, for similar exponent sizes and bit depths the 580 is 2 to 3 times faster than the 7970 at least where I am currently looking. I need to calculate a GHxD/D box metric. My next dream machine will have a 590. I retire older boxes when the cost of the electricity to run them would be better spent on a faster box. Only you folks would believe the heat issues with this much hardware.

flashjh 2012-11-04 20:04

[QUOTE=dbaugh;316977]The GPUs are in fast boxes. 3770, 3960x, etc. Not the best bang for the buck, but I figure I have a finite amount of wall time to wait on results. I could use a net and fish for herring to feed the village or have a blast sportfishing. I run as many instances as it takes to saturate the GPU. I use minimal sieve primes. It makes the GPU work harder, but that is sometimes what it takes to get to 98%. I run two instances of prime95 with the extra CPU cycles at the same time.[/QUOTE]
Same here. I have two boxes with 2700Ks and two with 3770Ks. I set SP min to 2000 and let the system autoadjust. It takes 6 instances plus P95 running to balance everything. I could stop P95 and run one or two more instances, but I like to run P-1 also. I also have one AMD six-core 1055T with a 580. It was my first 580 system and I went cheaper to save some $$, but that AMD CPU can hardly max out the 580. I can't run P95 on that system or it's unusable and GHZ-days drops down too much. My next purchase is for another 3770K and a MB to match so I can max that system too.

[QUOTE]Just looking at counts in results files over time, for similar exponent sizes and bit depths the 580 is 2 to 3 times faster than the 7970 at least where I am currently looking. I need to calculate a GHxD/D box metric.[/QUOTE]
I'm going to run some benchmarks over the next few days to update James' site for 580s. In reality, I don't think any AMD cards will outpace the 580 because mfakto doesn't run as efficiently as mfaktc. It takes a lot more CPU power to drive AMD cards, which is why I was curiuos what your actual GHz-days/day output was on the 7970. I would like to get a 7990 to run some comparisons, but $1000 is a bit steep for testing.

[QUOTE]My next dream machine will have a 590. I retire older boxes when the cost of the electricity to run them would be better spent on a faster box. Only you folks would believe the heat issues with this much hardware.[/QUOTE]

I have one 590 and I'm not reallIy impressed. It's running mmff-gfn right now. It's two 580s in one, but is not anywhere near what you'd get from two 580s. Granted, in your 3960x you might get better results as I only have a 3770k @ 4.5GHz to drive it. I think it has a lot to do with the fact that it's only PCIe 2.0. Also, 590s are clocked down due to constraints and have limited overclocking capability. If you want to swap two 580s for a 590, I'm game.

ixfd64 2012-11-27 06:22

1 Attachment(s)
I've set up my CUDA environment, but I get the following errors when I try to compile mfaktc 0.19: [see attachment]

Anyone know what I'm doing wrong?

Edit: I've changed the item type to CUDA C/C++ and the platform to VC90, and I've also installed Visual C++ 2008. However, it's still complaining of an issue with the "atomicInc" function. Anyone know how to resolve this?

TheJudger 2012-12-02 00:56

I guess the issue with the "atomicInc" comes from the GPU code, right?
Change the target code to at least sm_11 (sm_10 doesn't support atomicInc).

In the mfaktc-makefiles for Windows it is this:
--generate-code arch=compute_11,code=sm_11 --generate-code arch=compute_20,code=sm_20 --generate-code arch=compute_30,code=sm_30
which is passed to the nvcc.

Oliver

TheJudger 2012-12-06 22:46

Suggestions for a new default status line in mfaktc 0.20, please!

Using the default status line from 0.19 as baseline:[LIST][*][B]keep line width below 80 chars[/B][*]remove "candidates"[*]remove "avg. rate"[*]add "percent complete"[*]remove "class ID" (unsure)[*]add "GHz-days/day (GHz)" (unsure)[/LIST]
Oliver

James Heinrich 2012-12-06 22:52

[QUOTE=TheJudger;320798]Suggestions for a new default status line in mfaktc 0.20, please![/QUOTE]This is what I've been using since it became configurable:[code]ProgressHeader= Date-Time Pct ETA | Exponent Bits | GHz-d/day Sieve Wait
ProgressFormat=%d %T %p %e | %M %l-%u | %g %s %W[/code]

ixfd64 2012-12-06 23:21

I think the current default output is good as is, except maybe "SievePrimes" is a little long. Just "Sieve" is adequate.

Dubslow 2012-12-07 00:05

[QUOTE=ixfd64;320804]I think the current default output is good as is, except maybe "SievePrimes" is a little long. Just "Sieve" is adequate.[/QUOTE]

Not very descriptive. Perhaps "SieveSize"? It's shorter than SievePrimes, though not by much.

kladner 2012-12-07 01:24

[QUOTE=James Heinrich;320799]This is what I've been using since it became configurable:[code]ProgressHeader= Date-Time Pct ETA | Exponent Bits | GHz-d/day Sieve Wait
ProgressFormat=%d %T %p %e | %M %l-%u | %g %s %W[/code][/QUOTE]

Here's mine, not that I'm stuck on it. I did tweak it to get the columns aligned-

[CODE]ProgressHeader= class | candidates | time | ETA | GHz-d/d | SievePrimes | CPU wait
ProgressFormat=%C/4620 | %n | %ts | %e |%g-d/d| %s | %W%%
[/CODE]

LaurV 2012-12-07 01:56

Add sieving with the GPU :razz: (I think this in the main point in going to 2.0 anyhow, but just want to state it clear, so you understand what I need, hehe, so I can free my CPU cores). :wink: Eagerly waiting for the beta version.

kladner 2012-12-07 02:03

:goodposting::w00t: Right On!

RichD 2012-12-07 06:12

[QUOTE=TheJudger;311146]The current development (0.20) does not run very well (slow performance) on CC 1.x GPUs [B]iff[/B] sieving is done on [B]G[/B]PU.
There will be new kernels in mfaktc 0.20 and depending on the factor size (above 2[SUP]76[/SUP]) you'll notice a performance improvement even on CC 1.x GPUs.
...
P.S. thank you George and rcv for GPU sieving, thank you George for the new kernels :smile:[/QUOTE]

How does this affect ones decision on which nvidia card to acquire moving forward?
CC vs. raw speed??

Dubslow 2012-12-07 07:38

[QUOTE=RichD;320836]How does this affect ones decision on which nvidia card to acquire moving forward?
CC vs. raw speed??[/QUOTE]

For maximum speed, get CC 2.0 (not 2.1). That means a 570 or a 580 (most of the lower cards in 5xx are CC 2.1). The most recent series, 6xx, are all CC 3.0, and all suck at CUDA, and are more expensive to boot. Like a 560 Ti (with extra cores) [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1"]can keep pace[/URL] with a 680 (despite being CC 2.1).

Btw, is it really true that a 570 gets almost the same performance as a 580 for a fraction of the cost?

LaurV 2012-12-07 08:10

Sure. I was talking about this long before [URL="http://www.mersenne.ca/cudalucas.php"]this[/URL] and [URL="http://www.mersenne.ca/mfaktc.php?sort=jvr"]this[/URL] appeared (see the sorting of the table, for the last link). Batalov and others said too. At that time I said that instead of buying 2x580, the best compromise money/performance/electricity was to take 4 pieces of (Asus) "560 Ti Top" (which is in fact a 570 with some features cut, and overclocked to 1G or 950M, you get the same performance as a 570 due to overclock, for about same consumption, and lower price) and fill a Rampage or whatever 4-pci-e-slots board with them. Then with the money left you still can pay the electricity, therefore running them "for free" for few weeks.

Xyzzy 2012-12-07 12:35

[QUOTE]Btw, is it really true that a 570 gets almost the same performance as a 580 for a fraction of the cost?[/QUOTE]We heard of some weird guy who ran just four 570 cards for a while and that was enough to take him to #2 lifetime overall (at the time) for TF.

:mike:


All times are UTC. The time now is 22:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.