![]() |
[QUOTE=TheJudger;279285]Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE]
| CUDA 3.2 | CUDA 4.1-RC1 mfaktc 0.17 | 260.94M/s | 261.93M/s mfaktc 0.18-pre9 | 260.80M/s | 258.97M/s[/CODE][/QUOTE] Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE] | CUDA 3.2 | CUDA 4.1-RC2 mfaktc 0.17 | 260.94M/s | 261.93M/s mfaktc 0.18-pre10 | 260.80M/s | 265.39M/s[/CODE] A little bit better than before :smile: but there are no changes in the code of the barrett79 kernel from -pre9 to -pre10... Factory overclocked GTX 560Ti (1701MHz), barrett92 kernel, raw GPU speed (without sieving), M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP][CODE] | CUDA 4.1-RC2 mfaktc 0.17 | 170.62M/s mfaktc 0.18-pre10 | 173.32M/s[/CODE] A little bit faster, too. But the difference between compute capability 2.0 and 2.1 increases further... :sad: Oliver |
mfaktc 0.18
Hello!
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.tar.gz[/url] [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.win.zip[/url] [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.linux64.tar.gz[/url] The executables need at least a [B]CUDA 4.0[/B] capable driver (270 series driver or newer). The Windows zip archive contains both, the 32 bit and 64 bit version. I'll upload new executables once [B]CUDA 4.1[/B] is public available. The sources should compile with older CUDA version, too, but they might be slower. CUDA 4.1 will give another performance improvement for the barrett based kernels on compute capability 2.x GPUs (especially on 2.0). Compared to mfaktc 0.17 there are "more than usuall" minor changes. Highlights from the Changelog.txt:[LIST][*]autoadjustment of SievePrimes is now less dependend on the gridsize and absolute speed. Instead of measuring the absolute (average) time waited per precessing block (grid size) now the relative time spent on waiting for the GPU is calculated. In the per-class output "avg. wait" is replaced by "CPU wait".[*]new commandline option: "-v" (verbosity) let the user decide how many informations are printed (suggested by aspen on [url]www.mersenneforum.org[/url])[*]"has a factor" result lines now contain informations (program name, versions, bitlevel, ...) James Heinrich is working on this on the server side. This should give more accurate credits for "has a factor" results from the primenet server once this is fully implemented.[*]mfaktc no longer refuses to load a checkpoint file from a Linux version with a Windows version of mfaktc and vice versa. Of course mfaktc still refuses to load checkpoint files from other versions than itself (identical version string!)[*]added a (simple) signal handler (captures SIGINT and SIGTERM). 1st ^C: mfaktc will exit after the currently processed class is finished. 2nd ^C: mfaktc will stop immediately[*]added a minimum delay between two checkpoint file writes. The user can set the delay in mfaktc.ini (CheckpointDelay).[*]added a new code path to barrett79_mul32 and barrett92_mul32 kernels, CUDA >= 4.1 features multiply-add with carry for compute capability >= 2.0. On my GTX 470 (compute capability) this yields up to 15% for barrett92_mul32 and up to 7% for barrett79_mul32 extra throughput.[/LIST] As usuall: finish your current assignments with your current version and do the update after it, mfaktc 0.18 will refuse foreign checkpoint files. Oliver |
Kudos!
Many thanks, sir! I am impatient for my current assignments to finish so that I can put this version into service.
|
Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...
|
[QUOTE=TheJudger;282838]...mfaktc 0.18...[/QUOTE]
Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file). |
[QUOTE=TheJudger;282838][*]"has a factor" result lines now contain informations (program name,
versions, bitlevel, ...) James Heinrich is working on this on the server side. This should give more accurate credits for "has a factor" results from the primenet server once this is fully implemented. [/QUOTE] Many thanks! Can't wait to test this feature with a new exponent! |
The new version seems to be working well. At least, there have been no problems reported.
|
[QUOTE=Dubslow;282851]Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...[/QUOTE]
They are included in the archives for the executables, aren't they? [QUOTE=LaurV;282864]Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file).[/QUOTE] Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too. So I guess I'll add this for 0.19? Oliver |
[QUOTE=TheJudger;282896]Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too.[/QUOTE]I also think it would be good to have as a configurable option. Naturally you'll need to lock the file for writing for the split second it takes to write the result line so two instances don't try and write at the same time.
Along the same lines, a unified worktodo.txt would also be nice, perhaps split into [Worker #1], [Worker #2], etc sections. This is of course a little more work than a configurable results.txt, but lets us just deal with one in and one out for each machine, in a format that's already familiar to us from Prime95. Even better would be to optimize/thread the sieving such that we'd only ever need to run a single mfaktc instance (sieving would spread across as many CPU cores as needed to feed the GPU(s). But that's a whole other set of complications for a much later release. :smile: |
Great! Thanks for the update. I've got two instances running now.
|
.17 vs .18
1 Attachment(s)
This was rather a quick test, showing the difference between mfaktc .17 and .18. V.18 did eventually drop to SievePrimes 5000, though the time didn't really change that much.
EDIT: These were run with the same exponent in single instances. |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.