mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

aketilander 2012-12-07 15:54

[QUOTE=Xyzzy;320851]We heard of some weird guy who ran just four 570 cards for a while and that was enough to take him to #2 lifetime overall (at the time) for TF.

:mike:[/QUOTE]

Yeah I have heard of him :smile: I think his electric supplyer came whith a bunch of flowers on his birthday?! :no:

RichD 2012-12-07 16:15

So when sieving is moved to the GPU, CC still rules.

(I wouldn't think much arithmetic is needed in sieving, that's the reason for the question.) :smile:

ixfd64 2012-12-07 18:50

With regards to the output, another idea is to split some of the headings into two lines. For example, "candidates" could be changed to "Block\nsize" and "SievePrimes" could be changed to "Sieve\nsize" (or something similar). Just a thought.

TheJudger 2012-12-07 20:32

[QUOTE=RichD;320836]How does this affect ones decision on which nvidia card to acquire moving forward?
CC vs. raw speed??[/QUOTE]

For mfaktc CC 2.0 still has the highest efficency (performance per (core * clock)) but Geforce GTX 670/680/690 (CC 3.0) are not that bad.
GTX 580 still has the highest performance per GPU but GTX 670/680/690 have higher performance per watt.

[CODE]Starting trial factoring M60xxxxxx from 2^72 to 2^73 (15.88 GHz-days)
[...]
no factor for M60xxxxxx from 2^72 to 2^73 [mfaktc 0.20-pre5 barrett76_mul32_gs]
tf(): total time spent: 1h 44m 46.991s
[/CODE]
[url=http://www.nvidia.com/content/PDF/kepler/Tesla-K10-Board-Specification-BD-06280-001-v06.pdf]Tesla K10[/url] (just a lower clocked GTX 690) using GPU sieving (CPU usage is less than 1%), reports only 69W (Teslas can report power consumption). This is for one GPU (Tesla K10 has 2x GK104)
Tesla C2075 (lower clocked GF 110 like GTX 570/580) reports more than 150W. They are faster but not twice as fast as the K10...

[QUOTE=ixfd64;320804]I think the current default output is good as is, except maybe "SievePrimes" is a little long. Just "Sieve" is adequate.[/QUOTE]

[QUOTE=ixfd64;320879]With regards to the output, another idea is to split some of the headings into two lines. For example, "candidates" could be changed to "Block\nsize" and "SievePrimes" could be changed to "Sieve\nsize" (or something similar). Just a thought.[/QUOTE]

Candidates and avg. rate are even harder to understand in mfaktc 0.20 because they have different meanings for CPU- and GPU sieving. That's why I want to remove them.

Oliver

TheJudger 2012-12-15 17:17

Hi,

[QUOTE=James Heinrich;320799]This is what I've been using since it became configurable:[code]ProgressHeader= Date-Time Pct ETA | Exponent Bits | GHz-d/day Sieve Wait
ProgressFormat=%d %T %p %e | %M %l-%u | %g %s %W[/code][/QUOTE]

based on James configuration what do you think about this:[CODE]
ProgressHeader=Date Time Pct ETA | Exponent Bits | GHz-d/day Sieve Wait
ProgressFormat=%d %T %p %e | %M %l-%u | %g %s %W%%[/CODE]

And this is how it looks like[CODE]
Date Time Pct ETA | Exponent Bits | GHz-d/day Sieve Wait
Dec 15 18:15 5.7 16m34s | 66362159 70-71 | 295.36 82485 n.a.%
Dec 15 18:15 5.8 16m33s | 66362159 70-71 | 295.36 82485 n.a.%[/CODE]

When you don't like it, it is still userconfigurable.:grin:

Oliver

RichD 2012-12-15 22:03

My (two cents thought) is to move anything that does not change to the title or header. No use outputting the same constants over again. That might free up some space on the status line.

James Heinrich 2012-12-17 18:36

My configuration is based on PrintMode=1 where there is only a single header and single current-status line.

TheJudger 2012-12-25 23:38

mfaktc 0.20 decission
 
Hi,

please help me making another decission! :smile:
As you may allready know: mfaktc 0.20 will allow sieving done on GPU (for GPUs with CC 2.0 or newer). How should we handle this?[LIST=1][*]disabled by default, enable with commandline switch "[I]-gs[/I]"[*]disabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]enabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]other[/LIST]My vote is #3.
Option #1 is how George has implemented it when he has written the GPU sieve code for mfaktc, options #2/#3 are in sync with mfakt[B][COLOR="Red"]o[/COLOR][/B].

Oliver

Bdot 2012-12-25 23:48

[QUOTE=TheJudger;322618]Hi,

please help me makeing another decission! :smile:
As you may allready know: mfaktc 0.20 will allow sieving done on GPU (for GPUs with CC 2.0 or newer). How should we handle this?[LIST=1][*]disabled by default, enable with commandline switch "[I]-gs[/I]"[*]disabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]enabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]other[/LIST]My vote is #3.
Option #1 is how George has implemented it when he has written the GPU sieve code for mfaktc, options #2/#3 are in sync with mfakt[B][COLOR="Red"]o[/COLOR][/B].

Oliver[/QUOTE]
mfakto does not have a working GPU sieve yet, I just prepared the variable. I will adjust mfakto to mfaktc once I get these kernels to work. Anyway, I'd prefer #3 as well ...

kracker 2012-12-26 00:10

[QUOTE=TheJudger;322618]Hi,

please help me makeing another decission! :smile:
As you may allready know: mfaktc 0.20 will allow sieving done on GPU (for GPUs with CC 2.0 or newer). How should we handle this?[LIST=1][*]disabled by default, enable with commandline switch "[I]-gs[/I]"[*]disabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]enabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]other[/LIST]My vote is #3.
Option #1 is how George has implemented it when he has written the GPU sieve code for mfaktc, options #2/#3 are in sync with mfakt[B][COLOR=Red]o[/COLOR][/B].

Oliver[/QUOTE]

#3 as well.

swl551 2012-12-26 00:12

[QUOTE=TheJudger;322618]Hi,

please help me makeing another decission! :smile:

As you may allready know: mfaktc 0.20 will allow sieving done on GPU (for GPUs with CC 2.0 or newer). How should we handle this?[LIST=1][*]disabled by default, enable with commandline switch "[I]-gs[/I]"[*]disabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]enabled by default, controlled by variable "[I]SieveOnGPU[/I]" in mfaktc.ini[*]other[/LIST]My vote is #3.
Option #1 is how George has implemented it when he has written the GPU sieve code for mfaktc, options #2/#3 are in sync with mfakt[B][COLOR=red]o[/COLOR][/B].

Oliver[/QUOTE]


#3 as well.

VictordeHolland 2012-12-26 00:49

My vote is also on #3
I see a pattern emerging :razz:.

Chuck 2012-12-26 01:11

I like #3.

Chuck

James Heinrich 2012-12-26 02:54

#3, as with everyone else. Specifically, always use GPU sieving when possible (CC 2.0+, >2[sup]64[/sup]) unless explicitly disabled with "SieveOnGPU=0" in mfaktc.ini

axn 2012-12-26 03:22

[QUOTE=TheJudger;322618]
4. other
[/QUOTE]
Intelligent/Smart option -- Do a selftest and pick the one with highest thruput. Can be overridden by <whatever> mechanism.

LaurV 2012-12-26 04:42

[QUOTE=axn;322637]Intelligent/Smart option -- Do a selftest and pick the one with highest thruput. Can be overridden by <whatever> mechanism.[/QUOTE]
Too complicate, not necessarily useful... CPU status can change later (like from busy to free, from free to busy, from partial busy to full busy, etc). leading the "selftest" futile and having various influence on the OTHER tasks/workers using the CPU. The most people prefer to have the CPU free to do other things (P-1??). I'd go with number #3 too.

kladner 2012-12-26 05:28

I agree on the #3 choice. I am eagerly looking forward to experimenting with the new version. I also look forward to having multiple CPU cores to use for other things without impinging on mfaktc performance. Those 'other things' would likely include P-1, but it would be cool to try multi-thread LL in P95. While I know that GPUs can generally do DC or TC faster, there are the times that only a P95 result will do.

flashjh 2012-12-26 05:41

[QUOTE=James Heinrich;322635]#3, as with everyone else. Specifically, always use GPU sieving when possible (CC 2.0+, >2[sup]64[/sup]) unless explicitly disabled with "SieveOnGPU=0" in mfaktc.ini[/QUOTE]
+1

firejuggler 2012-12-26 09:05

3# as well, even if i'm not really active on Gpu front at this time

ET_ 2012-12-26 10:47

#3 for me, even if I used mmff 0.26 with no -gs switch, and it appeared to sieve on GPU.

Luigi

James Heinrich 2012-12-26 11:57

[QUOTE=axn;322637]Intelligent/Smart option -- Do a selftest and pick the one with highest thruput.[/QUOTE]I'm not sure if that would differ from #3 -- I'm not sure if there are cases where GPU sieving would be [i]slower[/i] that CPU sieving. Possibly on a very-slow GPU (GT 620 or similar) with a very fast CPU -- I'm not sure if Oliver/George have looked into the cutoff points of efficiency.

It has been determined that CC 1.x GPUs have poor throughput for GPU sieving to the point where it makes no sense so it's never available, but for CC 2.0+ the benefit is significant.

The value for GPUSievePrimes [i]might[/i] be a viable target for auto-adjustment, but in brief testing I found very little difference in throughput using different values.

VictordeHolland 2012-12-26 13:09

Can somebody give an rough indication of when the 0.20 'production' client will be released? I know it is all done in your spare time, but I (and probably others) are eagerly waiting for this big improvement.

TheJudger 2012-12-26 16:35

Hello,

[QUOTE=James Heinrich;322680]I'm not sure if that would differ from #3 -- I'm not sure if there are cases where GPU sieving would be [i]slower[/i] that CPU sieving. Possibly on a very-slow GPU (GT 620 or similar) with a very fast CPU -- I'm not sure if Oliver/George have looked into the cutoff points of efficiency.
[/QUOTE]

OK, we have a winner: option #3. :smile:
No, no automatic switching between CPU and GPU sieving, I like simple solutions.

Oliver

TheJudger 2012-12-26 16:40

[QUOTE=VictordeHolland;322687]Can somebody give an rough indication of when the 0.20 'production' client will be released? I know it is all done in your spare time, but I (and probably others) are eagerly waiting for this big improvement.[/QUOTE]

When it's done! :yucky:

Oliver

P.S. I [B]want[/B] to finish v0.20 this year. Add a week or two for testing where I give it to a few people and if no problems are found I'll release it.

xtreme2k 2012-12-27 10:02

1 Attachment(s)
Hey guys,

Got a new 670GTX (upgrade from 460GTX). I am now cracking at approx 195-200M/s (vs 150M/s) which is pretty poor given the upgrade. NV really made the new chip a pure gaming chip with crappy compute performance and its really showing.

The only advantage I know would be this GPU uses similiar power vs the 460GTX (if not only slightly more).

Does mfaktc work on dual GPU systems with separate GPU? How does one assign each GPU (if this is possible). I was thinking of plugging in the 460GTX purely for compute use? What do you guys think?

LaurV 2012-12-27 10:04

Yes, you can plug both of them in, and depending on your CPU, use more instances of mfaktc for each.
Use the "-d x" switch to say which GPU be used by which instance, substitute x with the gpu number. Is in the docs, and right in this very thread too, few pages back.

Sutton Shin 2012-12-27 10:07

Stupid Question
 
Where is the program for windows 7 x64?

I KNOW that it is a stupid question. I am too lazy to go through 86 pages of posts.

Dubslow 2012-12-27 10:33

[QUOTE=Sutton Shin;322807]Where is the program for windows 7 x64?

I KNOW that it is a stupid question. I am too lazy to go through 86 pages of posts.[/QUOTE]

Very stupid. Even if you're too lazy to go through posts, [URL="http://lmgtfy.com/?q=mfaktc"]Google is your friend[/URL]. (First or third link are perfectly useful.)

:google:

[QUOTE=LaurV;322806]Yes, you can plug both of them in, and depending on your CPU, use more instances of mfaktc for each.
Use the "-d x" switch to say which GPU be used by which instance, substitute x with the gpu number. Is in the docs, and right in this very thread too, few pages back.[/QUOTE]

In other words, RTFM.


____________________________________________________________________________
:razz:

TheJudger 2012-12-27 11:55

[QUOTE=xtreme2k;322805]Hey guys,

Got a new 670GTX (upgrade from 460GTX). I am now cracking at approx 195-200M/s (vs 150M/s) which is pretty poor given the upgrade. NV really made the new chip a pure gaming chip with crappy compute performance and its really showing. [/QUOTE]

Well, you're limited by your CPU, a single core can't feed your GTX 670. You can either start a second instance using a second CPU core our wait for mfaktc 0.20 and use GPU sieving. Depending on the exponent and bitlevel a GTX 670 should yield > 300M/s in best case (barrett76 kernel).
And you're right, for applications which make heavy use of integer instructions (like mfaktc), the newer chips are not so good. Anyway the energy efficiency (performance per watt while running mfaktc) is very good.

Oliver

James Heinrich 2012-12-27 11:56

[QUOTE=Sutton Shin;322807]Where is the program for windows 7 x64?[/QUOTE]mfaktc releases are archived here:
[url]http://www.mersenneforum.org/mfaktc/[/url]
You're looking for [b]mfaktc-0.19.win.cuda42.zip[/b]

LaurV 2012-12-28 06:05

I am going to edit the first post of the thread to put the link to /mfaktc into it, for who is coming next. Just FYI.

Aillas 2012-12-31 16:08

mfaktc win 0.19 cuda4.0
 
Hi,

I have an old cuda GPU (Quatro FX 880 M) with cuda 4.0. Only mfaktc 0.18 is compiled for cuda 4.0/4.1/4.2.

I saw that the new algo in 0.19 is almost +25% compare to v0.18.

Could someone compile mfaktc 0.19 for cuda 4.0 (on win) please?

Thanks,

Ludovic

James Heinrich 2012-12-31 16:38

[QUOTE=Aillas;323191]Could someone compile mfaktc 0.19 for cuda 4.0 (on win) please?[/QUOTE]v0.20 is getting very close to release (a few days, I believe), so you might even (soon) be able to get a CUDA 4.0 version of 0.20.
Be aware, however, that your GPU is CC v1.2 so all the benefits gained from GPU sieving in v0.20 won't be available to you, unfortunately.

Aillas 2012-12-31 17:30

Thanks for the info. waiting for the 0.20 :)

Bdot 2012-12-31 17:48

[QUOTE=James Heinrich;323193]v0.20 is getting very close to release (a few days, I believe), so you might even (soon) be able to get a CUDA 4.0 version of 0.20.
Be aware, however, that your GPU is CC v1.2 so all the benefits gained from GPU sieving in v0.20 won't be available to you, unfortunately.[/QUOTE]

I'm running the prerel version of 0.20 on the same FX 880M. I'm getting about 13.5 GHz-days/day (14.5 M/s, which is 0.5% more than v0.19 and about 20% more than v0.18).

ixfd64 2013-01-04 05:12

Dumb question: will the GPU sieving in version 0.20 make multiple instances of mfaktc (for single GPUs) unnecessary?

James Heinrich 2013-01-04 13:57

[QUOTE=ixfd64;323577]Dumb question: will the GPU sieving in version 0.20 make multiple instances of mfaktc (for single GPUs) unnecessary?[/QUOTE]Yes, at least in my experience. A single GPU-sieving instance happily chews up an entire GTX 570. And gives about 400Ghd/d doing so, which is [I]considerably[/I] above my [URL="http://www.mersenne.ca/mfaktc.php"]benchmark charts[/URL] (which suggest around 255GHd/d) -- once v0.20 is released I'm going to request new benchmarks from everyone (but please wait until I've decided on test parameters).

Multiple GPUs (e.g. SLI) will still require one instance per GPU.

You can of course still run multiple instances on a single GPU if you like, and I've observed some interesting (to me) behavior in that it seemed to load-balance to give more GPU time to the harder assignments, if that makes any sense. For example, if you ran two instances on two 60M range exponents, one from 71-72 and one from 72-73, the latter takes twice the GHz-days to complete, but with GPU-sieving it seems that 72-73 gets double the GPU-time, so both assignments would complete in about the same real-time. I assume this is just an unintentional side-effect of the GPU sieving.

sonjohan 2013-01-04 15:56

[QUOTE=James Heinrich;323607]Yes, at least in my experience. A single GPU-sieving instance happily chews up an entire GTX 570. And gives about 400Ghd/d doing so, which is [I]considerably[/I] above my [URL="http://www.mersenne.ca/mfaktc.php"]benchmark charts[/URL] (which suggest around 255GHd/d) -- once v0.20 is released I'm going to request new benchmarks from everyone (but please wait until I've decided on test parameters).

Multiple GPUs (e.g. SLI) will still require one instance per GPU.

You can of course still run multiple instances on a single GPU if you like, and I've observed some interesting (to me) behavior in that it seemed to load-balance to give more GPU time to the harder assignments, if that makes any sense. For example, if you ran two instances on two 60M range exponents, one from 71-72 and one from 72-73, the latter takes twice the GHz-days to complete, but with GPU-sieving it seems that 72-73 gets double the GPU-time, so both assignments would complete in about the same real-time. I assume this is just an unintentional side-effect of the GPU sieving.[/QUOTE]
I currently use 2 instances on my GTX 570M. I'll be happy to do some benchmarks for you if needed. I'll even deactivate prime95 during the benchmark (not the case right now).

TheJudger 2013-01-04 21:40

[QUOTE=ixfd64;323577]Dumb question: will the GPU sieving in version 0.20 make multiple instances of mfaktc (for single GPUs) unnecessary?[/QUOTE]

Yes, as James allready told one instance per GPU is enough. And 32bit mfaktc is a little bit faster than 64bit mfaktc in case of GPU sieving. For CPU sieving the 64bit version is still recommended.

Oliver

James Heinrich 2013-01-04 21:51

[QUOTE=TheJudger;323674]And 32bit mfaktc is a little bit faster than 64bit mfaktc in case of GPU sieving. For CPU sieving the 64bit version is still recommended.[/QUOTE]That's interesting. Why the reversal?

TheJudger 2013-01-04 22:47

Hi James,

the simplified answer:
In 64bit mode the memory adresses are 64bit while in 32bit mode the memory adresses are 32bit in size. So in 64bit each load or store of memory adresses needs twice than bandwidth, when you need to modidy adresses (e.g. increment, decrement) you have to deal with 64 bit numbers. So 64bit is slower than 32bit. [B]But[/B] for x86 vs. x86_64 CPU you have additional registers available in 64bit mode, registers are the fastest memory in a cpu, much faster than caches or memory. For the sieve in mfaktc those extra registers make it faster.
There is a project for Linux where you can use additional register introduced in x86_64 while using 32bit adresses ([url]http://en.wikipedia.org/wiki/X32_ABI[/url]), but to be honest: I didn't try.

The Nvidia GPU have the same number of registers accessible in 32 and 64 bit mode thus you have to pay the penalty for bigger adresses in 64 bit numbers whit no benefit except the fact that you can use more memory (which is not needed for mfaktc).

Oliver

James Heinrich 2013-01-04 22:54

OK, I suspected something like that, but always much better to get the detailed answer from the author rather than users guessing -- thanks! :smile:

swl551 2013-01-04 23:17

[QUOTE]The Nvidia GPU have the same number of registers accessible in 32 and 64 bit mode thus you have to pay the penalty for bigger adresses in 64 bit numbers whit no benefit except the fact that you can use more memory (which is not needed for mfaktc).[/QUOTE]

Oliver, what is the real-world effect on performance for 32 vs 64. Is it great enough to motivate a move back to 32bit for hardcore workers?

James Heinrich 2013-01-04 23:26

[QUOTE=swl551;323680]Oliver, what is the real-world effect on performance for 32 vs 64. Is it great enough to motivate a move back to 32bit for hardcore workers?[/QUOTE]I would assume that the recommendation would be:

64-bit = CPU sieving (CC 1.x or TF below 2[sup]64[/sup])
32-bit = everyone else

TheJudger 2013-01-04 23:31

[QUOTE=swl551;323680]Oliver, what is the real-world effect on performance for 32 vs 64. Is it great enough to motivate a move back to 32bit for hardcore workers?[/QUOTE]

For Windows: as usuall I'll include both, 32 and 64bit, in one download.
For Linux: I haven't decided yet, perhaps just a 64bit binary (how many people are using my Linux binaries?)

On my GTX 470 the improvement is a little bit below 1%.

Oliver

James Heinrich 2013-01-04 23:41

[QUOTE=TheJudger;323683]On my GTX 470 the improvement is a little bit below 1%.[/QUOTE]Same on my GTX 570 (32-bit = [COLOR="Green"]420.34[/COLOR] GHd/d, 64-bit = [COLOR="Indigo"]416.68[/COLOR] GHd/d) on [FONT="Courier New"]Factor=50000017,72,73[/FONT]

swl551 2013-01-04 23:42

[QUOTE=James Heinrich;323685]Same on my GTX 570 (32-bit = [COLOR=Green]420.34[/COLOR] GHd/d, 64-bit = [COLOR=Indigo]416.68[/COLOR] GHd/d) on [FONT=Courier New]Factor=50000017,72,73[/FONT][/QUOTE]
Thank you.

ckdo 2013-01-05 00:35

[QUOTE=TheJudger;323683]On my GTX 470 the improvement is a little bit below 1%.[/QUOTE]

Ain't gonna make me downgrade to a 32 bit distro.

TObject 2013-01-05 02:36

Will the save file format be compatible with 0.19?

I imagine not. Is it possible for somebody to post instructions on how to continue a run mid level? It would be nice to have a set of instructions, instead of every one of us winging our own and potentially compromising results in the process.

Otherwise, a recommendation to finish current started assignments before upgrading is called for.

Enjoying your product a great deal. Thank you.

Dubslow 2013-01-05 02:49

[QUOTE=TObject;323708]Will the save file format be compatible with 0.19?
[/QUOTE]

Actually, I doubt the GPU Sieve alone would cause a save file format change. The data stored in the save files is between-class data, so no sieving data is saved anywhere. Thus, a change in the sieving process wouldn't necessitate a change in save file format.

Of course, there could be other features/changes that could change the format, so who knows? (Well, TheJudger does. :smile:)

LaurV 2013-01-05 09:56

[QUOTE=Dubslow;323712]Actually, I doubt the GPU Sieve alone would cause a save file format change. The data stored in the save files is between-class data, so no sieving data is saved anywhere. Thus, a change in the sieving process wouldn't necessitate a change in save file format.

Of course, there could be other features/changes that could change the format, so who knows? (Well, TheJudger does. :smile:)[/QUOTE]
Checkpoint files include version number therefore different checksum. You may be able to trick the new version into believing the old checkpoints, if you modify the file (which is not difficult to do by hand). But generally assignments are very fast. Finish them with the old one, start new with the new one.

Dubslow 2013-01-05 11:37

[QUOTE=LaurV;323738]Checkpoint files include version number therefore different checksum. You may be able to trick the new version into believing the old checkpoints, if you modify the file (which is not difficult to do by hand). But generally assignments are very fast. Finish them with the old one, start new with the new one.[/QUOTE]

Ahh... clearly I've spoiled myself with CUDALucas. (Btw, there's an awesome surprise showing up on that front very soon.)

TheJudger 2013-01-05 13:40

[QUOTE=ckdo;323690]Ain't gonna make me downgrade to a 32 bit distro.[/QUOTE]

You don't need to downgrade to a 32bit OS, you can run 32bit application in 64bit Windows/Linux...

ckdo 2013-01-05 14:03

It always appeared to me as if 32 bit apps in a 64 bit OS ran under some sort of compatibility layer, and somewhat slower than in the 32 bit version of the same OS, if they ran at all. This may have changed, however.

James Heinrich 2013-01-05 14:37

[QUOTE=ckdo;323751]It always appeared to me as if 32 bit apps in a 64 bit OS ran under some sort of compatibility layer, and somewhat slower than in the 32 bit version of the same OS, if they ran at all. This may have changed, however.[/QUOTE]I can't speak for *nix, but on Win64 I believe 32-bit apps can run just fine, albeit under [url=http://en.wikipedia.org/wiki/WoW64]WoW64[/url] -- the compatibility layer you speak of. It may "slow things down" very slightly, but in this case the benefits of the lower execution overhead of mfaktc32 vs mfaktc64 still show 32-bit as the winner for GPU-sieving.
16-bit apps, on the other hand, won't run on Win64 (they will on Win32). You can, however, usually run them inside DOSbox or similar (16-bit app in a 32-bit wrapper in a 64-bit wrapper).

In any case, on Win64 you can see from my post above that 32-bit mfaktc 0.20 is 1% faster than the 64-bit version (for the reasons Oliver explained above).

ixfd64 2013-01-06 04:21

I have a suggestion for the documentation.

If the user switches to Stages=0 in the middle of an assignment that spans multiple bit ranges, then any work done in the current range will be lost. This should be noted in mfaktc.ini.

c10ck3r 2013-01-06 16:46

Every time I see that this thread has been posted in, I think "Yay, 0.20 is out of beta! GPU sieving for all!" and in all my excitement I accidentally the whole thing.

kracker 2013-01-06 17:01

[QUOTE=c10ck3r;323831]Every time I see that this thread has been posted in, I think "Yay, 0.20 is out of beta! GPU sieving for all!" and in all my excitement I accidentally the whole thing.[/QUOTE]

Uhh what? You don't like betas? :confused:

kladner 2013-01-06 17:07

[U][B]I[/B][/U] like BETAs just fine if they are made available. However, I have long had to cultivate patience when it comes to software/drivers/patches/updates, etc.

I do share c10ck3r's eagerness when I see a post in this thread, but these things happen when they happen.

c10ck3r 2013-01-06 17:26

[QUOTE=kracker;323833]Uhh what? You don't like betas? :confused:[/QUOTE]
Worst fish ever. They taste like plastic :razz:
No, I have no problem with beta testing, I just am looking forward to public release, hopefully w/o any errors.

TheJudger 2013-01-06 18:42

mfaktc 0.20
 
Hello,

mfaktc 0.20 is finally available!
[LIST][*]Source code: [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.tar.gz[/url][*]Windows executables (CUDA 4.2): [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.win.cuda42.zip[/url][*]Windows executables (CUDA 4.2), [I]LessClasses[/I] for shortrunning jobs: [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.win.cuda42.LessClasses.zip[/url][*]Linux executable (CUDA 4.2): [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.linux64.cuda42.tar.gz[/url][/LIST]
Highlights:[LIST=1][*][B]GPU sieving[/B] for CC 2.0 or higher GPUs, [B]one[/B] instance per GPU, [B]very low[/B] CPU usage, recommended for all users when running on a supported GPU! Thank you very much George![*]three new kernels: barrett77, barrett87 and barret88. They are slower than barrett76 but faster than barrett79 so there is a performance boost for TF above 2[SUP]76[/SUP] and below 2[SUP]88[/SUP] compared to mfaktc 0.19. Again thank you very much George![/LIST]
There are some other changes, too, see the Changelog.txt for the full list.

As usual: finish your current assignment and upgrade to mfaktc 0.20 after that. The upgrade is recommended to everyone because a speed improvement is possible on all GPUs.


Happy factoring!
Oliver

ixfd64 2013-01-06 18:44

Awesome! :banana:

kracker 2013-01-06 19:11

Awesome! :victor:

Chuck 2013-01-06 19:13

OK I've got it going. If I wanted to reduce the load on the GPU a little bit to cut down on the temperature, are there any settings in mfaktc.ini which would accomplish this?

Great job on this big improvement.

Chuck

kladner 2013-01-06 20:05

Many thanks to all who helped make this possible, either through coding or testing. Most especially, thank you Oliver and George! This is a happy day!

Aramis Wyler 2013-01-06 20:20

Very exciting!
 
repost, eh.

Chuck 2013-01-06 20:22

1 Attachment(s)
Here's my first completed result.

kracker 2013-01-06 20:25

[QUOTE=Chuck;323850]Here's my first completed result.[/QUOTE]

What gpu? 580?

Chuck 2013-01-06 20:35

[QUOTE=kracker;323851]What gpu? 580?[/QUOTE]

Yes; GTX580 @ 797 MHz

Aramis Wyler 2013-01-06 20:38

Exciting day indeed! The processing time for the numbers I'm working on went from ~50 minutes (each, of 4) to ~11 minutes on my gtx 480, So total throughput went up just a bit, I only have to maintain 1 instance/worktodo file, and the cpu is completely unbound now. Starting prime95 on 3 (of 4) cpu cores didn't seem to affect the throughput of the new mfaktc.

I'm curious about the seive though, as on the cpu bound versions it would roll from ~32k down to the minimum (5k) and now it sits solid at 82485.

swl551 2013-01-06 21:26

[QUOTE=Chuck;323843]OK I've got it going. If I wanted to reduce the load on the GPU a little bit to cut down on the temperature, are there any settings in mfaktc.ini which would accomplish this?

Great job on this big improvement.

Chuck[/QUOTE]

Try reducing your Core Clock mhz and maybe your voltage (if you lower clocks enough). Less mhz = less heat.

firejuggler 2013-01-06 22:08

It's working, thanks .

Chuck 2013-01-06 22:26

[QUOTE=swl551;323860]Try reducing your Core Clock mhz and maybe your voltage (if you lower clocks enough). Less mhz = less heat.[/QUOTE]

OK that was helpful. I had never fooled with the voltage adjustment and I did some checking on the net to see how low other people went with the card remaining stable.

Roy_Sirl 2013-01-06 22:41

Excellent - a red letter day indeed! Many thanks to all those involved.

swl551 2013-01-06 22:49

Outstanding
 
1 Attachment(s)
Fasther throughput on i5 (bottlnecked with 4 instances using 0.19)
CPU doesn't even throttle up (power draw 6.4 watts)
LOWER overall system power consumption by 100 watts!



OUTSTANDING RESULTS!

Oh and nvidia 310.90 just came out.

firejuggler 2013-01-06 22:57

1 Attachment(s)
Geforce 560, option by default, 200 Ghz-d by day

kladner 2013-01-06 23:01

Still running -st2 on both GPUs. GTX 570 at about 55%, 460 about 75%. Affinity not set. CPU at about 20-25%. The 570 finished. With the 460 still running it's pulling 10-12% CPU.

EDIT: Now running an assignment on the 570, and there is negligible CPU usage for that instance. Just -st2 is using more CPU.

TheJudger 2013-01-06 23:05

[QUOTE=kladner;323872]Still running -st2 on both GPUs. GTX 570 at about 55%, 460 about 75%. Affinity not set. CPU at about 20-25%. The 570 finished. With the 460 still running it's pulling 10-12% CPU.[/QUOTE]

-st/-st2 tests both, CPU-sieve and GPU-sieve kernels.

Oliver

Chuck 2013-01-07 00:02

[QUOTE=swl551;323869]Fasther throughput on i5 (bottlnecked with 4 instances using 0.19)
CPU doesn't even throttle up (power draw 6.4 watts)
LOWER overall system power consumption by 100 watts!



OUTSTANDING RESULTS!

Oh and nvidia 310.90 just came out.[/QUOTE]

I upgraded to 310.70 two weeks ago, and it gave me slightly poorer performance (with the non-GPU sieving mfaktc) than the earlier 306.97 version. I reverted back to the earlier version. I guess I could change drivers again with this new mfaktc and see what the results are.

Chuck

swl551 2013-01-07 00:07

[QUOTE=Chuck;323878]I upgraded to 310.70 two weeks ago, and it gave me slightly poorer performance (with the non-GPU sieving mfaktc) than the earlier 306.97 version. I reverted back to the earlier version. I guess I could change drivers again with this new mfaktc and see what the results are.

Chuck[/QUOTE]

310.70 had a short life, suggesting it did have some problems.

kladner 2013-01-07 00:21

[QUOTE=swl551;323869]Oh and nvidia 310.90 just came out.

310.70 had a short life, suggesting it did have some problems. [/QUOTE]

That's very good to know. Thanks! I'm running 310.70, but will upgrade.

@ Oliver- Thanks for the explanation. I'm now running assignments on both cards:

[CODE]64.8M 69-73 assignments -both cards
GTX 570 6.004s 415 GHz-D/D GPU 95% 823 MHz 71 C
GTX 460 12.135s 205 GHz-D/D GPU 98% 830 MHz 70 C
[/CODE]

CPU is essentially idle.

The temperatures above are about 2-3 C higher than with 4 instances of mfaktc 0.19 on the 570, and 2 instances on the 460. Considering that the CPU is idle, they seem to be working harder. On the other hand, a cool CPU means that the case fans throttled back.

This looks like a big jump in throughput, and I haven't even gotten P95 running yet. I now have up to six cores to run P-1, or something. :smile:

swl551 2013-01-07 00:59

comparison
 
0.19 running 4 instances on GTX-570 988mv@850mhz with i7-2600k @ 4.2ghz
yielded [B] 118[/B]ghzDay per instance = [B]472[/B]ghzDay total


0.20 running 1 instance on GTX-570 988mv@850mhz with i7-2600k @ 4.2ghz
yielding [B]427[/B]ghzDay. [U]down [B]45[/B]ghzDay[/U]


However on my i5 2500k 0.20 increased by [B]25[/B]ghzDay compared to 0.19. (cpu was bottlenecked)

kracker 2013-01-07 01:03

[QUOTE=swl551;323882]
0.20 running 1 instance on GTX-570 988mv@850mhz with i7-2600k @ 4.2ghz
yielding [B]427[/B]ghzDay. [U]down [B]45[/B]ghzDay[/U]
[/QUOTE]

But now maybe you can run something on the cpu! :max:

EDIT: Er, that is using the part of da cpu that 0.19 *used* to take.

kladner 2013-01-07 01:50

With 6x 0.19 on the 570, and 2x on the 460 I was getting a combined GHz-D/D of 530-535 (per mfaktc readings). With 2.0 (32 bit -slightly better than 64 bit) those mfaktc readings total about 625 GHz-D/D with 1 instance per GPU. I now have 5 workers running P-1 in P95. This last item did not seem make any difference in the mfaktc 2.0 performance.

EDIT: [OT]PrimeNet still keeps sticking in 1 DC assignment (Worker #3) to 4 P-1s in P95. All the settings I can find, both online and in P95, are set to P-1. I have worked around this by moving all the assignments from the other 4 workers to Worker #3. This stops #3 from getting more assignments for now, and the other 4 then fill in with P-1s.[/OT]

EDIT2: Sieve is running 82,485 on both GPUs, with SievePrimesAdjust=1. Otherwise the settings are default except for CheckpointDelay=300, Stages=0, and StopAfterFactor=2.

Chuck 2013-01-07 02:08

[QUOTE=kladner;323884]EDIT: [OT]PrimeNet still keeps sticking in 1 DC assignment (Worker #3) to 4 P-1s in P95. All the settings I can find, both online and in P95, are set to P-1. I have worked around this by moving all the assignments from the other 4 workers to Worker #3. This stops #3 from getting more assignments for now, and the other 4 then fill in with P-1s.[/OT][/QUOTE]

That's funny, it did that to me too on a new machine I started up a couple of days ago. The P-1 worker on initial startup up was given 4 P-1 assignments, then after finishing one it was assigned a DC.

kladner 2013-01-07 02:12

[QUOTE=Chuck;323887]That's funny, it did that to me too on a new machine I started up a couple of days ago. The P-1 worker on initial startup up was given 4 P-1 assignments, then after finishing one it was assigned a DC.[/QUOTE]

It's been an ongoing thing for me if I run more than two workers, though that's in another thread.

EDIT: Ongoing work on the GTX 460 with 64 and 32 bit exe's. First result on the 570 with 64 and 32 bit exe's:

[CODE]64 bit exe

CUDA device info
name GeForce GTX 460
compute capability 2.1
maximum threads per block 1024
number of multiprocessors 7 (336 shader cores)
clock rate 1660MHz

Automatic parameters
threads per grid 917504

running a simple selftest...
Selftest statistics
number of tests 92
successfull tests 92

selftest PASSED!

got assignment: exp=64801397 bit_min=69 bit_max=73 (27.68 GHz-days)
Starting trial factoring M64801397 from 2^69 to 2^73 (27.68 GHz-days)
k_min = 4554653426400
k_max = 72874454895928
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 06 17:26 | 0 0.1% | 12.093 3h13m | 205.98 82485 n.a.%
Jan 06 17:26 | 3 0.2% | 12.130 3h13m | 205.35 82485 n.a.%
Jan 06 17:26 | 4 0.3% | 12.137 3h13m | 205.23 82485 n.a.%
--------------------------------
Jan 06 18:21 | 1312 28.5% | 12.095 2h18m | 205.94 82485 n.a.%
Jan 06 18:21 | 1315 28.6% | 12.129 2h18m | 205.36 82485 n.a.%
Jan 06 18:21 | 1320 28.8% | 12.140 2h18m | 205.18 82485 n.a.%
Jan 06 18:21 | 1323 28.9% | 12.127 2h18m | 205.40 82485 n.a.%
Jan 06 18:22 | 1327 29.0% | 12.140 2h17m | 205.18 82485 n.a.%
Jan 06 18:22 | 1332 29.1% | 12.128 2h17m | 205.38 82485 n.a.%
received signal "SIGINT"

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 06 18:51 | 1339 29.3% | 12.079 2h16m | 206.21 82485 n.a.%
Jan 06 18:52 | 1344 29.4% | 12.128 2h17m | 205.38 82485 n.a.%
Jan 06 18:52 | 1348 29.5% | 12.121 2h16m | 205.50 82485 n.a.%
Jan 06 18:52 | 1360 29.6% | 12.135 2h16m | 205.26 82485 n.a.%
Jan 06 18:52 | 1363 29.7% | 12.149 2h16m | 205.03 82485 n.a.%
Jan 06 18:52 | 1368 29.8% | 12.142 2h16m | 205.14 82485 n.a.%
Jan 06 18:53 | 1372 29.9% | 12.145 2h16m | 205.09 82485 n.a.%
Jan 06 18:53 | 1375 30.0% | 12.146 2h16m | 205.08 82485 n.a.%
received signal "SIGINT"

32 bit exe
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 06 18:55 | 1384 30.2% | 12.122 2h15m | 205.48 82485 n.a.%
Jan 06 18:55 | 1392 30.3% | 12.123 2h15m | 205.47 82485 n.a.%
Jan 06 18:55 | 1395 30.4% | 12.123 2h14m | 205.47 82485 n.a.%
Jan 06 18:55 | 1399 30.5% | 12.119 2h14m | 205.53 82485 n.a.%
Jan 06 18:56 | 1404 30.6% | 12.123 2h14m | 205.47 82485 n.a.%
Jan 06 18:56 | 1407 30.7% | 12.112 2h14m | 205.65 82485 n.a.%
Jan 06 18:56 | 1419 30.8% | 12.122 2h14m | 205.48 82485 n.a.%
Jan 06 18:56 | 1420 30.9% | 12.122 2h13m | 205.48 82485 n.a.%
Jan 06 18:56 | 1423 31.0% | 12.121 2h13m | 205.50 82485 n.a.%
Jan 06 18:57 | 1428 31.1% | 12.121 2h13m | 205.50 82485 n.a.%
Jan 06 18:57 | 1432 31.3% | 12.119 2h13m | 205.53 82485 n.a.%


GTX 570
64 bit exe

Jan 06 18:21 | 3760 81.6% | 5.954 17m34s | 418.34 82485 n.a.%
Jan 06 18:21 | 3772 81.7% | 5.940 17m25s | 419.33 82485 n.a.%
Jan 06 18:21 | 3777 81.8% | 5.983 17m27s | 416.31 82485 n.a.%
Jan 06 18:21 | 3780 81.9% | 5.987 17m22s | 416.04 82485 n.a.%
Jan 06 18:21 | 3781 82.0% | 6.002 17m18s | 415.00 82485 n.a.%
Jan 06 18:21 | 3784 82.1% | 6.003 17m13s | 414.93 82485 n.a.%
Jan 06 18:21 | 3789 82.2% | 6.005 17m07s | 414.79 82485 n.a.%
Jan 06 18:21 | 3792 82.3% | 6.003 17m01s | 414.93 82485 n.a.%
Jan 06 18:21 | 3796 82.4% | 5.982 16m51s | 416.38 82485 n.a.%
Jan 06 18:22 | 3801 82.5% | 5.997 16m47s | 415.34 82485 n.a.%
Jan 06 18:22 | 3805 82.6% | 6.000 16m42s | 415.13 82485 n.a.%
Jan 06 18:22 | 3816 82.7% | 5.999 16m36s | 415.20 82485 n.a.%
Jan 06 18:22 | 3817 82.8% | 5.973 16m26s | 417.01 82485 n.a.%
Jan 06 18:22 | 3829 82.9% | 5.936 16m14s | 419.61 82485 n.a.%
received signal "SIGINT"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 06 18:51 | 3892 84.5% | 6.003 14m54s | 414.93 82485 n.a.%
Jan 06 18:51 | 3900 84.6% | 5.998 14m48s | 415.27 82485 n.a.%
Jan 06 18:51 | 3901 84.7% | 5.903 14m28s | 421.96 82485 n.a.%
Jan 06 18:51 | 3904 84.8% | 5.914 14m23s | 421.17 82485 n.a.%
Jan 06 18:51 | 3912 84.9% | 6.000 14m30s | 415.13 82485 n.a.%
Jan 06 18:51 | 3921 85.0% | 5.980 14m21s | 416.52 82485 n.a.%
Jan 06 18:52 | 3924 85.1% | 5.985 14m16s | 416.17 82485 n.a.%
Jan 06 18:52 | 3925 85.2% | 5.987 14m10s | 416.04 82485 n.a.%
Jan 06 18:52 | 3936 85.3% | 5.984 14m04s | 416.24 82485 n.a.%


32 bit exe
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 06 18:54 | 3945 85.5% | 5.968 13m50s | 417.36 82485 n.a.%
Jan 06 18:54 | 3949 85.6% | 5.980 13m45s | 416.52 82485 n.a.%
Jan 06 18:54 | 3957 85.7% | 5.920 13m31s | 420.74 82485 n.a.%
Jan 06 18:55 | 3960 85.8% | 5.910 13m24s | 421.46 82485 n.a.%
Jan 06 18:55 | 3961 85.9% | 5.973 13m26s | 417.01 82485 n.a.%
Jan 06 18:55 | 3964 86.0% | 5.972 13m20s | 417.08 82485 n.a.%
Jan 06 18:55 | 3969 86.1% | 5.972 13m14s | 417.08 82485 n.a.%
Jan 06 18:55 | 3976 86.3% | 5.972 13m08s | 417.08 82485 n.a.%
Jan 06 18:55 | 3981 86.4% | 5.971 13m02s | 417.15 82485 n.a.%
Jan 06 18:55 | 3984 86.5% | 5.973 12m56s | 417.01 82485 n.a.%
Jan 06 18:55 | 3997 86.6% | 5.973 12m51s | 417.01 82485 n.a.%
Jan 06 18:55 | 4005 86.7% | 5.966 12m44s | 417.50 82485 n.a.%
Jan 06 18:55 | 4009 86.8% | 5.972 12m38s | 417.08 82485 n.a.%
------------------
Jan 06 19:08 | 4596 99.6% | 5.968 0m24s | 417.36 82485 n.a.%
Jan 06 19:08 | 4597 99.7% | 5.959 0m18s | 417.99 82485 n.a.%
Jan 06 19:08 | 4600 99.8% | 5.953 0m12s | 418.41 82485 n.a.%
Jan 06 19:08 | 4605 99.9% | 5.973 0m06s | 417.01 82485 n.a.%
Jan 06 19:08 | 4617 100.0% | 5.972 0m00s | 417.08 82485 n.a.%
no factor for M64802879 from 2^69 to 2^73 [mfaktc 0.20 barrett76_mul32_gs]
tf(): time spent since restart: 0h 13m 59.077s
estimated total time spent: 1h 35m 53.670s
[/CODE]

EDIT2: Second 570 result. Damn! This is fast!
[CODE]Jan 06 20:43 | 4569 99.1% | 5.912 0m53s | 421.51 82485 n.a.%
Jan 06 20:43 | 4577 99.2% | 5.914 0m47s | 421.36 82485 n.a.%
Jan 06 20:43 | 4580 99.3% | 5.913 0m41s | 421.43 82485 n.a.%
Jan 06 20:43 | 4584 99.4% | 5.915 0m35s | 421.29 82485 n.a.%
Jan 06 20:43 | 4589 99.5% | 5.914 0m30s | 421.36 82485 n.a.%
Jan 06 20:43 | 4592 99.6% | 5.914 0m24s | 421.36 82485 n.a.%
Jan 06 20:43 | 4604 99.7% | 5.914 0m18s | 421.36 82485 n.a.%
Jan 06 20:43 | 4605 99.8% | 5.915 0m12s | 421.29 82485 n.a.%
Jan 06 20:43 | 4613 99.9% | 5.914 0m06s | 421.36 82485 n.a.%
Jan 06 20:44 | 4617 100.0% | 5.914 0m00s | 421.36 82485 n.a.%
no factor for M64773187 from 2^69 to 2^73 [mfaktc 0.20 barrett76_mul32_gs]
tf(): total time spent: 1h 35m 24.141s[/CODE]

owftheevil 2013-01-07 02:49

I've always avoided tf in the past because of the high cpu load. Now it looks like I'll be sharing gpu time between tf and dc. Thank you to everyone involved in making this new release. The code look pretty too. Maybe I can learn something from it.

LaurV 2013-01-07 04:12

@Oliver: Sir, you made my day! Outstanding. For you, for George, and all the people involved in developing and testing, [URL="http://www.youtube.com/watch?v=e5WygWnzj3w"]here you go[/URL]!

ixfd64 2013-01-07 06:07

I've had the chance to complete some assignments using the version 0.20, and I can say it's about three times as fast as 0.19 on the GTX 555. Granted, I didn't let mfaktc reach its full potential because I run Prime95 on all of my cores, but it's nonetheless a remarkable improvement.

The new version does cause my computer to lag a little, but that's something I can stand. :smile:

James Heinrich 2013-01-07 14:29

[QUOTE=kladner;323884]Sieve is running 82,485 on both GPUs, with SievePrimesAdjust=1[/QUOTE]ini settings "SievePrimes" and "SievePrimesAdjust" only apply to CPU sieving. The GPU sieving settings are controlled by "SieveOnGPU", "GPUSievePrimes", "GPUSieveSize", "GPUSieveProcessSize". There is no mechanism for auto-adjusting GPUSievePrimes -- for the balance between CPU sieving and GPU crunching you can check whether GPU is waiting for CPU or vice-versa, but with GPU sieving the GPU is waiting for the GPU, so there's never any idle time, it's just a balance of how much effort is spent in the sieving portion. In my brief tests, (at least small changes to) the value of SievePrimes doesn't make much difference in overall throughput, so I'm content to leave it at the default 82485.

James Heinrich 2013-01-07 15:01

Now that everyone has access to v0.20, I'd like to ask for a new round of benchmarks from everyone so I can update my [url=http://www.mersenne.ca/mfaktc.php#benchmark]GPU-TF benchmark page[/url].

Please submit the results using the form on the benchmark page:
[url]http://www.mersenne.ca/mfaktc.php#benchmark[/url]

Please keep these requests in mind:[list][*]mfaktc v0.20[*]32-bit mfaktc is preferred, please mention 32/64 when submitting[*]GPU sieving enabled, GPUSievePrimes=82485 (default)[*]assignment something around 60-70M, to 2[sup]73[/sup] (whatever you're working on currently is probably fine, as long as it takes at least 30 minutes per assignment, preferably an hour or longer).[/list]

TheJudger 2013-01-07 15:54

Hi James,

why not using a fixed exponent for the benchmark. The GHzd rating (at least the formulas I've seen so far) do not care about the exponent much. Those formulas take care about the number of FCs for the exponent but there are other (minor) effects, too.
Take a look here: [url]http://mersenne.org/various/math.php[/url][LIST][*]if the exponent has many 1 (in binary representation) than there are alot of additional "multiply by 2". OK, they are relative cheap but it is measureable.[*]bigger exponents need more iterations than smaller exponents. Again for current exponents the effect is not that big... but it is there.[/LIST]
Oliver

P.S. My personal benchmark exponent is 66362159 ;)

kladner 2013-01-07 15:58

[QUOTE=James Heinrich;323923]ini settings "SievePrimes" and "SievePrimesAdjust" only apply to CPU sieving. The GPU sieving settings are controlled by "SieveOnGPU", "GPUSievePrimes", "GPUSieveSize", "GPUSieveProcessSize". There is no mechanism for auto-adjusting GPUSievePrimes -- for the balance between CPU sieving and GPU crunching you can check whether GPU is waiting for CPU or vice-versa, but with GPU sieving the GPU is waiting for the GPU, so there's never any idle time, it's just a balance of how much effort is spent in the sieving portion. In my brief tests, (at least small changes to) the value of SievePrimes doesn't make much difference in overall throughput, so I'm content to leave it at the default 82485.[/QUOTE]

Oh. Of course. :blush: I read through mfaktc.ini, but the new settings obviously did not stick with me. Thanks.

LaurV 2013-01-07 16:33

Time to make Uncwilly happy...

kracker 2013-01-07 16:36

[QUOTE=LaurV;323933]Time to make Uncwilly happy...[/QUOTE]
:razz:

James Heinrich 2013-01-07 16:39

[QUOTE=TheJudger;323927]why not using a fixed exponent for the benchmark.[/QUOTE]Mostly to avoid wasting 5-10 GHz-days of work -- users can just submit info about whatever they're currently working on with minimal extra effort. Also helps me get more benchmark data. Bigger exponents take more iterations, and are also worth more credit, so that's a non-issue.

chalsall 2013-01-07 17:59

[QUOTE=TheJudger;323840]mfaktc 0.20 is finally available![/QUOTE]

Nice work as always Oliver. (And, of course, George).

For those who suddenly find themselves with spare CPU capacity available (and have some memory available), please consider doing some P-1 work.

The TF wavefront is currently holding steady to the LL wavefront (we're about 47 days ahead), but sadly many LL assignments are being made without P-1 having already been done properly.


All times are UTC. The time now is 22:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.