mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

bcp19 2012-03-30 13:45

I find it interesting that kjaget and I have basically said the same thing but in different terms, yet what I said is not understood. Maybe instead of saying at 26M I get:

2500/480 - 2 cores mfaktc - 149.26GHzD/DC 'lost'
2400/560Ti - 3 cores mfaktc - 159.53
X4 645/460 - 3 cores mfaktc - 172.78
Q6600/560 - 3 cores mfaktc - 204.85
Q8200/550Ti - 3 cores mfaktc - 228.15

I should say: At 26M, these systems can either perform 1 DC or X ^69 TFs:
2500/480 - 2 cores mfaktc - 1 DC or 64.9 ^69 TFs
2400/560Ti - 3 cores mfaktc - 1 DC or 69.4 ^69 TFs
X4 645/460 - 3 cores mfaktc - 1 DC or 75.1 ^69 TFs
Q6600/560 - 3 cores mfaktc - 1 DC or 89 ^69 TFs
Q8200/550Ti - 3 cores mfaktc - 1 DC or 99.2 ^69 TFs

At the 45M level, the above systems could perform 168.2, 180, 195.1, 231.3 and 257.6 TF to ^71 or 2 LLs.

chalsall 2012-03-30 14:16

[QUOTE=bcp19;294832]At the 45M level, the above systems could perform 168.2, 180, 195.1, 231.3 and 257.6 TF to ^71 or 2 LLs.[/QUOTE]

Interesting...

Could you redo this analysis for 30M TF to 70 vs. 1 LL(DC), 52M TF to 72 vs 2 LLs, and 58.52M TF to 73 vs 2LLs?

I choose these numbers because the first two are where we're working currently, and the last is the current transition point to 73 based on Prime95's transition point.

I, like George et al, feel the transition at 58.52 to 73 should be lower, but I don't think that should happen until we've (mostly) cleared out the wave.

kjaget 2012-03-30 14:28

I think the confusion is that GHz-day/day is generated at different rates on GPUs and CPUs (and for different assignment and types on the same hardware). So adding that in rather than just measuring raw times is confusing the issue. Even if you're using it to convert to and from time temporarily it adds an extra layer of complexity - and an additional assumption - that isn't needed.

bcp19 2012-03-30 14:54

1 Attachment(s)
[QUOTE=chalsall;294833]Interesting...

Could you redo this analysis for 30M TF to 70 vs. 1 LL(DC), 52M TF to 72 vs 2 LLs, and 58.52M TF to 73 vs 2LLs?

I choose these numbers because the first two are where we're working currently, and the last is the current transition point to 73 based on Prime95's transition point.

I, like George et al, feel the transition at 58.52 to 73 should be lower, but I don't think that should happen until we've (mostly) cleared out the wave.[/QUOTE]

This isn't up to 70, but I had this worked out when I saw your post:

chalsall 2012-03-30 15:11

[QUOTE=bcp19;294841]This isn't up to 70, but I had this worked out when I saw your post:[/QUOTE]

Sweet!!! Thanks.

So, this clearly shows that we're going to 70 bits too early in the DC range. But you've said you still want to do that. Do you still? You're the main producer remaining in that range, so I'll defer to you on that.

In the LL range it shows that what we're doing now is "economical", and that we can go to 73 a little lower than the Prime95 transition point once we've finished everything below 58.52M to 72.

It also clearly shows that the CPU/GPU combinations have a huge influence on the cross-over points.

bcp19 2012-03-30 15:19

[QUOTE=kjaget;294836]I think the confusion is that GHz-day/day is generated at different rates on GPUs and CPUs (and for different assignment and types on the same hardware). So adding that in rather than just measuring raw times is confusing the issue. Even if you're using it to convert to and from time temporarily it adds an extra layer of complexity - and an additional assumption - that isn't needed.[/QUOTE]

Actually, I have less complexity than you. The givens I use are %DC(2LL)/day/CPU core, # of CPU cores used, %DC(2LL)/day/gpu, GHzD/Day output by GPU. Let's call those a,b,c,d. My formula then is d/(a*b+c). b and d are static while a and c vary with the exp tested. I have no need to take timings, as I can check James' site to see them and convert. I have to add a new variable to the formula to give #TF/DC(2LL), making it (d/(a*b+c))/e where e is the GHz credit for the exponent at the target bit level.

bcp19 2012-03-30 15:25

[QUOTE=chalsall;294844]Sweet!!! Thanks.

So, this clearly shows that we're going to 70 bits too early in the DC range. But you've said you still want to do that. Do you still? You're the main producer remaining in that range, so I'll defer to you on that.

In the LL range it shows that what we're doing now is "economical", and that we can go to 73 a little lower than the Prime95 transition point once we've finished everything below 58.52M to 72.

It also clearly shows that the CPU/GPU combinations have a huge influence on the cross-over points.[/QUOTE]

I have a feeling that my graph is off where theory is concerned, as I used a flat 1% factor found per bit level. Some people say the chance to find a factor is 1/bit level, but P-1 has been done on the DCs so that alters the equation. This is an EXTREMELY rough graph, and without P-1 on the LL candidates, they probably have greater than a 1% chance per bit level.

Edit: I also just found an error... the timings I was using for the 2400 were from v26.6, which means the 2400 is actually worse than the 2500 as it loses 13% with the timing change.

chalsall 2012-03-30 15:43

[QUOTE=bcp19;294847]This is an EXTREMELY rough graph, and without P-1 on the LL candidates, they probably have greater than a 1% chance per bit level.[/QUOTE]

Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the [URL="http://www.gpu72.com/reports/factor_percentage/"]Factor Found Percentage[/URL] report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... :wink:)

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.

KyleAskine 2012-03-30 15:49

[QUOTE=chalsall;294849]Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the [URL="http://www.gpu72.com/reports/factor_percentage/"]Factor Found Percentage[/URL] report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... :wink:)

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.[/QUOTE]

I noticed this person's tactics when I was ALMOST 2nd place in days saved a while ago.

Then I looked the next day and I was around 15000 GHz-d/saved behind :yucky:

bcp19 2012-03-30 15:53

[QUOTE=chalsall;294849]Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the [URL="http://www.gpu72.com/reports/factor_percentage/"]Factor Found Percentage[/URL] report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... :wink:)

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.[/QUOTE]

If you use my machines as a baseline, and your 1.125%, worst case is 37M and best case is 32M. I'd say to change it to 32 for now since I am the main producer and I will switch the dogs over to LL range.

Also, using the 1.125%, the changeover to ^72 becomes 42-47M on the dogs and ^73 becomes 56-57M. ^71 is the weird area... for DC it is 40-43M but for LL it is 35-37M (which we don't have)

bcp19 2012-03-30 17:25

I just realized, all my graphs have been done on nVidia cards. While I only have 1 data set for AMD, you may find this surprising:


I was running a 5770(which cannot currently do DC/LL) in my 2500, with P95 sharing the core. A full core takes ~8.7ms/iter, the shared core took ~21.25ms/iter. So, 41% of the core was being used by P95 or 59% lost to mfakto. A full core can do 37% of a 26M DC/day, so 59% of that 37% gives a 'loss' of 21.83% of a DC/day. The 5770 was outputting ~64GD/day, which means it would produce 293 GD in the time the lost portion of the core could do 1 DC. This equates to 127.5 TF for that lost DC. This changes the breakeven point for ^70 to 29M, ^71 to 37M, and ^72 to 47M. For 2LL, this means ^72 would be at 37M and ^73 would be at 47M, and this is on my [B]2nd 'worst'[/B] system.

I need a new PS for my Core2Duo before I can run tests on it with the 5770, but if we guesstimate, and say it takes both cores of the Duo to only get the same 64 GD, you end up with 418.9GD/lost DC or 182 TF at 26M. That makes 26M the ^70 mark, 33M the ^71 mark and 41M the ^72 mark for DC with 41m the ^73 mark, 51M the ^74 mark and 59m the ^75 for 2LL.


All times are UTC. The time now is 23:17.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.