mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

apsen 2011-07-06 02:20

[QUOTE=Christenson;265360]If you wade far enough back in this thread (it's 700 posts long!), you will find the early versions of mfaktc. [/QUOTE]

I've went through the whole thread and the latest version that worked for me was 0.8. The later versions seem to be compiled for cc1.1 or higher. But when I tried to compile 0.8 for Win64on my own I'm getting "cudaStreamCreate() failed". What may I be doing wrong?

Christenson 2011-07-06 04:25

[QUOTE=apsen;265613]I've went through the whole thread and the latest version that worked for me was 0.8. The later versions seem to be compiled for cc1.1 or higher. But when I tried to compile 0.8 for Win64on my own I'm getting "cudaStreamCreate() failed". What may I be doing wrong?[/QUOTE]

Hopefully you gave the locations for those versions to Rodrigo's thread...otherwise someone else in your shoes will end up walking the same long mile.

At a guess, you need a copy of cudart.dll in the same directory as your executable and/or current working directory. See Rodrigo's thread for the pointers.

At a second guess, your card can only support one stream at a time, so try telling it to use only one stream.

To really know, you will need to get the error code from cudaStreamCreate, which will require you to program a bit. You will then have to go look it up in the Nvidia documentation. I can modify the code, but can't compile for Win32. Let me know if I need to do that.

apsen 2011-07-06 13:34

[QUOTE=Christenson;265628]Hopefully you gave the locations for those versions to Rodrigo's thread...otherwise someone else in your shoes will end up walking the same long mile.
[/QUOTE]

No. I don't really have an answer yet. So far I could only tell that version 0.8 seems to be the best bet.

[QUOTE=Christenson;265628]
At a guess, you need a copy of cudart.dll in the same directory as your executable and/or current working directory. See Rodrigo's thread for the pointers.

At a second guess, your card can only support one stream at a time, so try telling it to use only one stream.

To really know, you will need to get the error code from cudaStreamCreate, which will require you to program a bit. You will then have to go look it up in the Nvidia documentation. I can modify the code, but can't compile for Win32. Let me know if I need to do that.[/QUOTE]

There's problem with my compile. The downloaded 0.8 works - mine doesn't. The return code from cudaStreamCreate is 10200 and it's out of range of defined codes in cuda.h.

To address your specific points:
Yes I have cudart.dll - without it the program will not even start.
The downloaded 0.8 works with 3 streams.

It just occured to me that I may need to use even older CUDA toolkit...

Maybe kjaget could chime in...

BTW 0.8 is posted in post #280.

Christenson 2011-07-06 16:13

10200 = 27D8....you sure you have the right return-type declared for cudaStreamCreate?

If you are just trying to run mfaktc, I'd be inclined to ignore the "I can't build it" problem. What do you hope to do with the modification?

TheJudger 2011-07-09 13:23

[QUOTE=Prime95;263851]I think that's unlucky but not suspicious. Xyzzy's was worrisome because it involved 9000 tests.[/QUOTE]

Some data from my latest two runs (regular TF runs as assigned from primenet server in M58.xxx.xxx to M60.4xx.xxx)

1st batch[LIST][*]1956 assignments from 2^69 to 2^70[*]1932 no factor results[*]24 factor results ([B]25 factors[/B], one exponent has 2 factors between 2^69 and 2^70)[/LIST]Expected number of factors: 1956/69 = [B]~28.35[/B]

2nd batch[LIST][*]2089 assignments from 2^69 to 2^70[*]2051 no factor results[*]38 factor results ([B]38 factors[/B])[/LIST]Expected number of factors: 2089/69 = [B]~30.41[/B]

I feel comfortable with these results. :smile:

These runs included 300+ "no factor results" in a row aswell as "5 factors from ~50 assignments"

Oliver

davieddy 2011-07-10 03:11

Putting some flesh on the bones
 
[QUOTE=TheJudger;265946]Some data from my latest two runs (regular TF runs as assigned from primenet server in M58.xxx.xxx to M60.4xx.xxx)






1st batch[LIST][*]1956 assignments from 2^69 to 2^70[*]1932 no factor results[*]24 factor results ([B]25 factors[/B], one exponent has 2 factors between 2^69 and 2^70)[/LIST]Expected number of factors: 1956/69 = [B]~28.35[/B]






2nd batch[LIST][*]2089 assignments from 2^69 to 2^70[*]2051 no factor results[*]38 factor results ([B]38 factors[/B])[/LIST]Expected number of factors: 2089/69 = [B]~30.41[/B]

I feel comfortable with these results. :smile:

These runs included 300+ "no factor results" in a row aswell as "5 factors from ~50 assignments"

Oliver[/QUOTE]

I feel comfortable that your results in no way make one doubt
the hypothesis that you conducted 4045 independent trials,
the probability of a "success" being 1/69. (See the Kamasutra).

2 factors in one trial? 1/69[SUP]2[/SUP] = 1/4761. Found one. Tick.

Expected 60 successes +/- 8. Found 62. Tick.

Expected number of "gaps">300 = (68/69)[SUP]300[/SUP] *60 = 0.75
Probabity of no such gaps e^-0.75 = 0.47. Found one. Tick.

Probability of a gap <28 ~1/3. For 4 such gaps in succession,
expected total gap ~50.
Probability of 4 or more such gaps in a row =1/81
Found one in 60. Tick.

Thoughtful comments on this analysis welcome.

David

davieddy 2011-07-10 04:17

[QUOTE=davieddy;265982]
2 factors in one trial? 1/69[SUP]2[/SUP] [/QUOTE]

From experience, this might be off by a factor of 2 either way
Enough thinking for now!

davieddy 2011-07-10 07:56

[QUOTE=davieddy;265982]
Probability of a gap <28 ~1/3. For 4 such gaps in succession,
expected total gap ~50.
Probability of 4 or more such gaps in a row =1/81
[/QUOTE]
Expect 60*2/3=40 "long" gaps (>= 28).
Each of them has a 1/81 chance of being followed by 4+ "short" gaps.
So expected runs of 4+ short gaps is 0.5

This might seem a strange way to approach the question of
finding 5 factors in ~50 tests, but the Poisson distribution tells us
that if we expect 50/69 factors in a randomly selected 50 tests,
the probability of 5+ factors is 0.0009. 4045/50 = 81, so this way we
would expect 0.0729 occurrences of 5 ln 50. It is clear to see why this
underestimates the likelihood of finding [B]some[/B] run of 5 factors in 50 tests,
but [B]very hard to see how to adjust it.[/B]

Note that this problem is the same as judging how lucky GIMPS
has been to find 7 "short" gaps in a row between Mprimes. In this case,
50% of gaps are short and 50% are long (the boundary being an
exponent ratio of ~1.3)
My conclusion is that we expect 1 such run in 256 Mprimes:
lucky yes, outrageous no.

[QUOTE=davieddy;265984]From experience, 1/69[SUP]2[/SUP] might be off by a factor of 2[/QUOTE]

Pretty sure it should be 1/2! * 1/69[SUP]2[/SUP] (Poisson again)

David

apsen 2011-07-13 15:26

:confused:

On my 4 core system mfaktc 0.17 performance suffer if something is running on other cores. I do not see that effect on 2 core system with mfaktc 0.8.

For example:

I run only mfaktc 0.17 (on core #4) i get about 93M/s.
If I'll start one Prime95 worker on another core (say #1) it drops to about 75M/s.
If I'll start two Prime95 workers on another core (say #1 and #2) it drops to about 69M/s.
If I'll start three Prime95 workers on another core (say #1, #2 and #3) it drops to about 35M/s.

Forth worker has almost no effect as mfaktc runs at higher priority.

Win 7 x64, Q6600, GTX 465, mfaktc 0.17.

The other computer (Win 7 x64, E5200, 8800 GTS, mfaktc 0.8) has a consistent output of about 26M/s no matter whether Prime95 is running or not.

:confused:

James Heinrich 2011-07-13 16:02

[QUOTE=apsen;266287]Q6600[/QUOTE]That's your problem. I also have a Q6600 and the multi-core performance is horrible. If you check your Prime95 performance on 1/2/3/4 cores you'll notice it will also scale badly -- as you load more cores the throughput of each drops.

apsen 2011-07-13 17:07

[QUOTE=James Heinrich;266289]I also have a Q6600 and the multi-core performance is horrible. If you check your Prime95 performance on 1/2/3/4 cores you'll notice it will also scale badly -- as you load more cores the throughput of each drops.[/QUOTE]

What is the best overall performance? Use 3 cores and leave one idle?


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.