mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-30, 13:45   #1739
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

I find it interesting that kjaget and I have basically said the same thing but in different terms, yet what I said is not understood. Maybe instead of saying at 26M I get:

2500/480 - 2 cores mfaktc - 149.26GHzD/DC 'lost'
2400/560Ti - 3 cores mfaktc - 159.53
X4 645/460 - 3 cores mfaktc - 172.78
Q6600/560 - 3 cores mfaktc - 204.85
Q8200/550Ti - 3 cores mfaktc - 228.15

I should say: At 26M, these systems can either perform 1 DC or X ^69 TFs:
2500/480 - 2 cores mfaktc - 1 DC or 64.9 ^69 TFs
2400/560Ti - 3 cores mfaktc - 1 DC or 69.4 ^69 TFs
X4 645/460 - 3 cores mfaktc - 1 DC or 75.1 ^69 TFs
Q6600/560 - 3 cores mfaktc - 1 DC or 89 ^69 TFs
Q8200/550Ti - 3 cores mfaktc - 1 DC or 99.2 ^69 TFs

At the 45M level, the above systems could perform 168.2, 180, 195.1, 231.3 and 257.6 TF to ^71 or 2 LLs.
bcp19 is offline   Reply With Quote
Old 2012-03-30, 14:16   #1740
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

230478 Posts
Default

Quote:
Originally Posted by bcp19 View Post
At the 45M level, the above systems could perform 168.2, 180, 195.1, 231.3 and 257.6 TF to ^71 or 2 LLs.
Interesting...

Could you redo this analysis for 30M TF to 70 vs. 1 LL(DC), 52M TF to 72 vs 2 LLs, and 58.52M TF to 73 vs 2LLs?

I choose these numbers because the first two are where we're working currently, and the last is the current transition point to 73 based on Prime95's transition point.

I, like George et al, feel the transition at 58.52 to 73 should be lower, but I don't think that should happen until we've (mostly) cleared out the wave.
chalsall is offline   Reply With Quote
Old 2012-03-30, 14:28   #1741
kjaget
 
kjaget's Avatar
 
Jun 2005

3×43 Posts
Default

I think the confusion is that GHz-day/day is generated at different rates on GPUs and CPUs (and for different assignment and types on the same hardware). So adding that in rather than just measuring raw times is confusing the issue. Even if you're using it to convert to and from time temporarily it adds an extra layer of complexity - and an additional assumption - that isn't needed.
kjaget is offline   Reply With Quote
Old 2012-03-30, 14:54   #1742
bcp19
 
bcp19's Avatar
 
Oct 2011

12478 Posts
Default

Quote:
Originally Posted by chalsall View Post
Interesting...

Could you redo this analysis for 30M TF to 70 vs. 1 LL(DC), 52M TF to 72 vs 2 LLs, and 58.52M TF to 73 vs 2LLs?

I choose these numbers because the first two are where we're working currently, and the last is the current transition point to 73 based on Prime95's transition point.

I, like George et al, feel the transition at 58.52 to 73 should be lower, but I don't think that should happen until we've (mostly) cleared out the wave.
This isn't up to 70, but I had this worked out when I saw your post:
Attached Thumbnails
Click image for larger version

Name:	graph1.jpg
Views:	155
Size:	202.7 KB
ID:	7840  
bcp19 is offline   Reply With Quote
Old 2012-03-30, 15:11   #1743
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by bcp19 View Post
This isn't up to 70, but I had this worked out when I saw your post:
Sweet!!! Thanks.

So, this clearly shows that we're going to 70 bits too early in the DC range. But you've said you still want to do that. Do you still? You're the main producer remaining in that range, so I'll defer to you on that.

In the LL range it shows that what we're doing now is "economical", and that we can go to 73 a little lower than the Prime95 transition point once we've finished everything below 58.52M to 72.

It also clearly shows that the CPU/GPU combinations have a huge influence on the cross-over points.
chalsall is offline   Reply With Quote
Old 2012-03-30, 15:19   #1744
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by kjaget View Post
I think the confusion is that GHz-day/day is generated at different rates on GPUs and CPUs (and for different assignment and types on the same hardware). So adding that in rather than just measuring raw times is confusing the issue. Even if you're using it to convert to and from time temporarily it adds an extra layer of complexity - and an additional assumption - that isn't needed.
Actually, I have less complexity than you. The givens I use are %DC(2LL)/day/CPU core, # of CPU cores used, %DC(2LL)/day/gpu, GHzD/Day output by GPU. Let's call those a,b,c,d. My formula then is d/(a*b+c). b and d are static while a and c vary with the exp tested. I have no need to take timings, as I can check James' site to see them and convert. I have to add a new variable to the formula to give #TF/DC(2LL), making it (d/(a*b+c))/e where e is the GHz credit for the exponent at the target bit level.
bcp19 is offline   Reply With Quote
Old 2012-03-30, 15:25   #1745
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

Quote:
Originally Posted by chalsall View Post
Sweet!!! Thanks.

So, this clearly shows that we're going to 70 bits too early in the DC range. But you've said you still want to do that. Do you still? You're the main producer remaining in that range, so I'll defer to you on that.

In the LL range it shows that what we're doing now is "economical", and that we can go to 73 a little lower than the Prime95 transition point once we've finished everything below 58.52M to 72.

It also clearly shows that the CPU/GPU combinations have a huge influence on the cross-over points.
I have a feeling that my graph is off where theory is concerned, as I used a flat 1% factor found per bit level. Some people say the chance to find a factor is 1/bit level, but P-1 has been done on the DCs so that alters the equation. This is an EXTREMELY rough graph, and without P-1 on the LL candidates, they probably have greater than a 1% chance per bit level.

Edit: I also just found an error... the timings I was using for the 2400 were from v26.6, which means the 2400 is actually worse than the 2500 as it loses 13% with the timing change.

Last fiddled with by bcp19 on 2012-03-30 at 15:37
bcp19 is offline   Reply With Quote
Old 2012-03-30, 15:43   #1746
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by bcp19 View Post
This is an EXTREMELY rough graph, and without P-1 on the LL candidates, they probably have greater than a 1% chance per bit level.
Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the Factor Found Percentage report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... )

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.
chalsall is offline   Reply With Quote
Old 2012-03-30, 15:49   #1747
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2·5·29 Posts
Default

Quote:
Originally Posted by chalsall View Post
Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the Factor Found Percentage report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... )

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.
I noticed this person's tactics when I was ALMOST 2nd place in days saved a while ago.

Then I looked the next day and I was around 15000 GHz-d/saved behind
KyleAskine is offline   Reply With Quote
Old 2012-03-30, 15:53   #1748
bcp19
 
bcp19's Avatar
 
Oct 2011

2A716 Posts
Default

Quote:
Originally Posted by chalsall View Post
Agreed. Probably about 1.125%.

And so everyone knows, the emprical data on the Factor Found Percentage report is undercounting a bit in the "to 71" and "to 72" columns for reasons I won't go into now, but this will hopefully correct itself shortly. (Hint, hint to the person responsible... )

But this doesn't change the fact that by both your and James' analysis, we're going to 70 bits too early in the DC range.
If you use my machines as a baseline, and your 1.125%, worst case is 37M and best case is 32M. I'd say to change it to 32 for now since I am the main producer and I will switch the dogs over to LL range.

Also, using the 1.125%, the changeover to ^72 becomes 42-47M on the dogs and ^73 becomes 56-57M. ^71 is the weird area... for DC it is 40-43M but for LL it is 35-37M (which we don't have)

Last fiddled with by bcp19 on 2012-03-30 at 16:01
bcp19 is offline   Reply With Quote
Old 2012-03-30, 17:25   #1749
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

I just realized, all my graphs have been done on nVidia cards. While I only have 1 data set for AMD, you may find this surprising:


I was running a 5770(which cannot currently do DC/LL) in my 2500, with P95 sharing the core. A full core takes ~8.7ms/iter, the shared core took ~21.25ms/iter. So, 41% of the core was being used by P95 or 59% lost to mfakto. A full core can do 37% of a 26M DC/day, so 59% of that 37% gives a 'loss' of 21.83% of a DC/day. The 5770 was outputting ~64GD/day, which means it would produce 293 GD in the time the lost portion of the core could do 1 DC. This equates to 127.5 TF for that lost DC. This changes the breakeven point for ^70 to 29M, ^71 to 37M, and ^72 to 47M. For 2LL, this means ^72 would be at 37M and ^73 would be at 47M, and this is on my 2nd 'worst' system.

I need a new PS for my Core2Duo before I can run tests on it with the 5770, but if we guesstimate, and say it takes both cores of the Duo to only get the same 64 GD, you end up with 418.9GD/lost DC or 182 TF at 26M. That makes 26M the ^70 mark, 33M the ^71 mark and 41M the ^72 mark for DC with 41m the ^73 mark, 51M the ^74 mark and 59m the ^75 for 2LL.

Last fiddled with by bcp19 on 2012-03-30 at 17:32
bcp19 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 07:28.


Mon Aug 2 07:28:46 UTC 2021 up 10 days, 1:57, 0 users, load averages: 1.53, 1.29, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.