mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-08-03, 08:36   #1838
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.
This kernel does not work up to 77 bits. When the factor candidates are above ~276.8 there is relative high chance for an interger overflow (interim results >= 280).
This seems to occur when the exponent has continuous 1 in binary representation (which causes the "optional multiply by 2"). I'm not sure whether this is the only cause or not.
The kernel works absolute save for FCs up to 276 so there is a very high chance that mfaktc 0.19 will feature a new kernel: barrett76_mul32. I need to check on my CC 1.3 GPU, too. I guess this will be the fastest kernel for those old GPUs, too.

George: I want to test the current code on my GTX 275 this evening, after that I'll sent you the new code (which features some debugging code, too).

Oliver
TheJudger is offline   Reply With Quote
Old 2012-08-03, 19:11   #1839
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·32·419 Posts
Default

Quote:
Originally Posted by TheJudger View Post
This kernel does not work up to 77 bits. When the factor candidates are above ~276.8 there is relative high chance for an interger overflow.
OK, I finally did the error analysis (rather than relying on the comment in the code that implied the Barrett operation resulted in a value off by at most a factor of 3.

We are multiplying a 3 word value (floor (2^160/f)) by a 5 word value and ignoring the 5 bottom words of the result. By my reckoning the big multiply is ignoring 6 partial results in the the 4th word which could generate 5 carries. Also, accounting for the error introduced by the floor function introduces another possible carry.

Thus, the quotient can be off by up to 6.

The doubling gives us off by 12, which means we need 4 pad bits -- just as Oliver observed.
Prime95 is online now   Reply With Quote
Old 2012-08-05, 16:23   #1840
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

200658 Posts
Default

Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.

This decision was made two months ago but we apparently were in denial (Not the river in Egypt!) that the bills we were receiving were anomalies.

We plan to remove them today and put the boxes on P-1, using the resources at gpu72.com. of course. This will drop our electrical usage by about half.

So, we have four GPUs to sell, cheap.

http://www.newegg.com/Product/Produc...82E16814121432

$200 each, plus shipping and insurance. They do not have a warranty but they have run 24×7 for a long time with no issues.

Be aware that each GPU takes up three slots!

If you are interested in one or more of these, please PM us.
Xyzzy is offline   Reply With Quote
Old 2012-08-05, 23:21   #1841
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100101000110012 Posts
Default

These are good ones! :tempted:
Quote:
Why do you keep calling me JéSUS?! Do I look Puerto-Rican to you? ...My name is Zeus!
Batalov is offline   Reply With Quote
Old 2012-08-05, 23:38   #1842
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AE16 Posts
Default

I would have jumped on one of these in an instant had I not just gotten a Gigabyte 570 off eBay for about the same price.
kladner is offline   Reply With Quote
Old 2012-08-07, 11:08   #1843
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

Noooooooooo....

Who's going to keep me company at the top. :)

BTW I might be moving in Sept. There maybe some down time for me coming up.

-- Craig


Quote:
Originally Posted by Xyzzy View Post
Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.

This decision was made two months ago but we apparently were in denial (Not the river in Egypt!) that the bills we were receiving were anomalies.

We plan to remove them today and put the boxes on P-1, using the resources at gpu72.com. of course. This will drop our electrical usage by about half.

So, we have four GPUs to sell, cheap.

http://www.newegg.com/Product/Produc...82E16814121432

$200 each, plus shipping and insurance. They do not have a warranty but they have run 24×7 for a long time with no issues.

Be aware that each GPU takes up three slots!

If you are interested in one or more of these, please PM us.
nucleon is offline   Reply With Quote
Old 2012-08-07, 14:10   #1844
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by nucleon View Post
Noooooooooo....

Who's going to keep me company at the top. :)
.....
-- Craig
Looks like you'll have to get used to solitary splendor in the TF charts. Nobody else comes close to your graph slope.
kladner is offline   Reply With Quote
Old 2012-08-07, 16:36   #1845
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi all,

here is a small teaser for mfaktc 0.19 RAW GPU performance

CUDA 4.2, stock GTX 470 (1215MHz):

mfaktc 0.18:
Code:
kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit  | 106.0M/s             |  81.6M/s
75bit  | 200.0M/s             | 156.2M/s
95bit  | 160.2M/s             | 124.8M/s
76bit  |     n.a.             |     n.a.
79bit  | 335.4M/s             | 262.1M/s
92bit  | 267.7M/s             | 211.2M/s
mfaktc 0.19-pre11
Code:
kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit  | 106.0M/s             |  81.5M/s
75bit  | 214.7M/s             | 168.1M/s
95bit  | 169.5M/s             | 132.2M/s
76bit  | 424.7M/s             | 334.5M/s
79bit  | 343.5M/s             | 268.1M/s
92bit  | 276.4M/s             | 217.8M/s
71bit: unchanged
most of the 75bit, 95bit, 79bit and 92bit improvement is related to the optimizations of the squaring function (thank you, George!).
I guess that older GPUs (CC 1.x) don't see any improvement.

The new 76bit barrett kernel is nice, take the 79bit barrett kernel, (re-)move some lines of code and you're mostly done.

For the future it might be possible to add more kernels:
  • 77 bit barrett kernel (same as 76bit kernel but with more accuracy in preprocessing)
  • 78 bit barrett kernel (same as 77bit kernel but with correction step from 79bit kernel for each set bit in the exponent)

Release plan:
  • run some tests on my GTX 275 (this weekend?)
  • build an release candidate and give to a few people for testing
  • release one week later
So if everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver

Last fiddled with by TheJudger on 2012-08-07 at 16:36
TheJudger is offline   Reply With Quote
Old 2012-08-07, 21:39   #1846
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

22·17 Posts
Default

cool !
NormanRKN is offline   Reply With Quote
Old 2012-08-07, 22:21   #1847
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi all,
.................
So if everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver
Thanks for the update and the work behind it. Looking forward to running 0.19.
kladner is offline   Reply With Quote
Old 2012-08-10, 17:30   #1848
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

824510 Posts
Default

Quote:
Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.
The GPUs sold within a day or so on eBay.

If we take the purchase price for the four GPUs (\$1320) and subtract what we sold them for (\$800) we have a net cost of \$520 or so.

We think (?) we used them for about 400,000 GHz/days of work so our cost per GHz/day, not counting the host computers, which are still happily churning away, is 0.13¢ per GHz/day. (Our math might be wrong.)

That seems like a reasonable ROI.

Xyzzy is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 01:04.


Fri Aug 6 01:04:20 UTC 2021 up 13 days, 19:33, 1 user, load averages: 2.06, 2.40, 2.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.