mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2011-12-10 00:13

[QUOTE=TheJudger;279285]Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE]
| CUDA 3.2 | CUDA 4.1-RC1
mfaktc 0.17 | 260.94M/s | 261.93M/s
mfaktc 0.18-pre9 | 260.80M/s | 258.97M/s[/CODE][/QUOTE]

Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE]
| CUDA 3.2 | CUDA 4.1-RC2
mfaktc 0.17 | 260.94M/s | 261.93M/s
mfaktc 0.18-pre10 | 260.80M/s | 265.39M/s[/CODE]

A little bit better than before :smile: but there are no changes in the code of the barrett79 kernel from -pre9 to -pre10...

Factory overclocked GTX 560Ti (1701MHz), barrett92 kernel, raw GPU speed (without sieving), M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP][CODE]
| CUDA 4.1-RC2
mfaktc 0.17 | 170.62M/s
mfaktc 0.18-pre10 | 173.32M/s[/CODE]

A little bit faster, too. But the difference between compute capability 2.0 and 2.1 increases further... :sad:

Oliver

TheJudger 2011-12-19 23:26

mfaktc 0.18
 
Hello!

[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.tar.gz[/url]
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.win.zip[/url]
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.linux64.tar.gz[/url]

The executables need at least a [B]CUDA 4.0[/B] capable driver (270 series driver or newer). The Windows zip archive contains both, the 32 bit and 64 bit version. I'll upload new executables once [B]CUDA 4.1[/B] is public available. The sources should compile with older CUDA version, too, but they might be slower. CUDA 4.1 will give another performance improvement for the barrett based kernels on compute capability 2.x GPUs (especially on 2.0).

Compared to mfaktc 0.17 there are "more than usuall" minor changes. Highlights from the Changelog.txt:[LIST][*]autoadjustment of SievePrimes is now less dependend on the gridsize and
absolute speed. Instead of measuring the absolute (average) time waited
per precessing block (grid size) now the relative time spent on waiting
for the GPU is calculated. In the per-class output "avg. wait" is replaced
by "CPU wait".[*]new commandline option: "-v" (verbosity) let the user decide how many
informations are printed
(suggested by aspen on [url]www.mersenneforum.org[/url])[*]"has a factor" result lines now contain informations (program name,
versions, bitlevel, ...) James Heinrich is working on this on the server
side. This should give more accurate credits for "has a factor" results
from the primenet server once this is fully implemented.[*]mfaktc no longer refuses to load a checkpoint file from a Linux version
with a Windows version of mfaktc and vice versa. Of course mfaktc still
refuses to load checkpoint files from other versions than itself
(identical version string!)[*]added a (simple) signal handler (captures SIGINT and SIGTERM).
1st ^C: mfaktc will exit after the currently processed class is finished.
2nd ^C: mfaktc will stop immediately[*]added a minimum delay between two checkpoint file writes. The user can set
the delay in mfaktc.ini (CheckpointDelay).[*]added a new code path to barrett79_mul32 and barrett92_mul32 kernels, CUDA
>= 4.1 features multiply-add with carry for compute capability >= 2.0.
On my GTX 470 (compute capability) this yields up to 15% for
barrett92_mul32 and up to 7% for barrett79_mul32 extra throughput.[/LIST]
As usuall: finish your current assignments with your current version and do the update after it, mfaktc 0.18 will refuse foreign checkpoint files.

Oliver

kladner 2011-12-20 00:43

Kudos!
 
Many thanks, sir! I am impatient for my current assignments to finish so that I can put this version into service.

Dubslow 2011-12-20 01:34

Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...

LaurV 2011-12-20 03:47

[QUOTE=TheJudger;282838]...mfaktc 0.18...[/QUOTE]
Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file).

diamonddave 2011-12-20 04:04

[QUOTE=TheJudger;282838][*]"has a factor" result lines now contain informations (program name,
versions, bitlevel, ...) James Heinrich is working on this on the server
side. This should give more accurate credits for "has a factor" results
from the primenet server once this is fully implemented.
[/QUOTE]

Many thanks! Can't wait to test this feature with a new exponent!

kladner 2011-12-20 05:05

The new version seems to be working well. At least, there have been no problems reported.

TheJudger 2011-12-20 11:13

[QUOTE=Dubslow;282851]Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...[/QUOTE]

They are included in the archives for the executables, aren't they?

[QUOTE=LaurV;282864]Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file).[/QUOTE]

Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too.

So I guess I'll add this for 0.19?

Oliver

James Heinrich 2011-12-20 13:55

[QUOTE=TheJudger;282896]Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too.[/QUOTE]I also think it would be good to have as a configurable option. Naturally you'll need to lock the file for writing for the split second it takes to write the result line so two instances don't try and write at the same time.

Along the same lines, a unified worktodo.txt would also be nice, perhaps split into [Worker #1], [Worker #2], etc sections. This is of course a little more work than a configurable results.txt, but lets us just deal with one in and one out for each machine, in a format that's already familiar to us from Prime95.

Even better would be to optimize/thread the sieving such that we'd only ever need to run a single mfaktc instance (sieving would spread across as many CPU cores as needed to feed the GPU(s). But that's a whole other set of complications for a much later release. :smile:

Chuck 2011-12-20 14:47

Great! Thanks for the update. I've got two instances running now.

kladner 2011-12-20 15:07

.17 vs .18
 
1 Attachment(s)
This was rather a quick test, showing the difference between mfaktc .17 and .18. V.18 did eventually drop to SievePrimes 5000, though the time didn't really change that much.

EDIT: These were run with the same exponent in single instances.


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.