mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2012-08-03 08:36

[QUOTE=Prime95;306572]Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.[/QUOTE]

This kernel does [B]not[/B] work up to 77 bits. When the factor candidates are above ~2[SUP]76.8[/SUP] there is relative high chance for an interger overflow (interim results >= 2[SUP]80[/SUP]).
This seems to occur when the exponent has continuous 1 in binary representation (which causes the "optional multiply by 2"). I'm not sure whether this is the only cause or not.
The kernel works absolute save for FCs up to 2[SUP]76[/SUP] so there is a very high chance that mfaktc 0.19 will feature a new kernel: barrett76_mul32. I need to check on my CC 1.3 GPU, too. I guess this will be the fastest kernel for those old GPUs, too. :smile:

George: I want to test the current code on my GTX 275 this evening, after that I'll sent you the new code (which features some debugging code, too).

Oliver

Prime95 2012-08-03 19:11

[QUOTE=TheJudger;306808]This kernel does [B]not[/B] work up to 77 bits. When the factor candidates are above ~2[SUP]76.8[/SUP] there is relative high chance for an interger overflow.[/QUOTE]

OK, I finally did the error analysis (rather than relying on the comment in the code that implied the Barrett operation resulted in a value off by at most a factor of 3.

We are multiplying a 3 word value (floor (2^160/f)) by a 5 word value and ignoring the 5 bottom words of the result. By my reckoning the big multiply is ignoring 6 partial results in the the 4th word which could generate 5 carries. Also, accounting for the error introduced by the floor function introduces another possible carry.

Thus, the quotient can be off by up to 6.

The doubling gives us off by 12, which means we need 4 pad bits -- just as Oliver observed.

Xyzzy 2012-08-05 16:23

Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.

This decision was made two months ago but we apparently were in denial (Not the river in Egypt!) that the bills we were receiving were anomalies.

We plan to remove them today and put the boxes on P-1, using the resources at gpu72.com. of course. This will drop our electrical usage by about half.

So, we have four GPUs to sell, cheap.

[URL]http://www.newegg.com/Product/Product.aspx?Item=N82E16814121432[/URL]

$200 each, plus shipping and insurance. They do not have a warranty but they have run 24×7 for a long time with no issues.

Be aware that each GPU takes up three slots!

If you are interested in one or more of these, please PM us.

Batalov 2012-08-05 23:21

These are good ones! :tempted:
[QUOTE]Why do you keep calling me JéSUS?! Do I look Puerto-Rican to you? ...My name is Zeus![/QUOTE]

kladner 2012-08-05 23:38

I would have jumped on one of these in an instant had I not just gotten a Gigabyte 570 off eBay for about the same price.

nucleon 2012-08-07 11:08

Noooooooooo....

Who's going to keep me company at the top. :)

BTW I might be moving in Sept. There maybe some down time for me coming up.

-- Craig


[QUOTE=Xyzzy;307021]Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.

This decision was made two months ago but we apparently were in denial (Not the river in Egypt!) that the bills we were receiving were anomalies.

We plan to remove them today and put the boxes on P-1, using the resources at gpu72.com. of course. This will drop our electrical usage by about half.

So, we have four GPUs to sell, cheap.

[URL]http://www.newegg.com/Product/Product.aspx?Item=N82E16814121432[/URL]

$200 each, plus shipping and insurance. They do not have a warranty but they have run 24×7 for a long time with no issues.

Be aware that each GPU takes up three slots!

If you are interested in one or more of these, please PM us.[/QUOTE]

kladner 2012-08-07 14:10

[QUOTE=nucleon;307218]Noooooooooo....

Who's going to keep me company at the top. :)
.....
-- Craig[/QUOTE]

Looks like you'll have to get used to solitary splendor in the TF charts. :showoff:Nobody else comes close to your graph slope.:no:

TheJudger 2012-08-07 16:36

Hi all,

here is a small teaser for mfaktc 0.19 [B]RAW GPU performance[/B]

CUDA 4.2, stock GTX 470 (1215MHz):

mfaktc 0.18:
[CODE]kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit | 106.0M/s | 81.6M/s
75bit | 200.0M/s | 156.2M/s
95bit | 160.2M/s | 124.8M/s
76bit | n.a. | n.a.
79bit | 335.4M/s | 262.1M/s
92bit | 267.7M/s | 211.2M/s
[/CODE]

mfaktc 0.19-pre11
[CODE]kernel | M66362159 above 2^64 | M3321932839 above 2^64
-------+----------------------+-----------------------
71bit | 106.0M/s | 81.5M/s
75bit | 214.7M/s | 168.1M/s
95bit | 169.5M/s | 132.2M/s
76bit | 424.7M/s | 334.5M/s
79bit | 343.5M/s | 268.1M/s
92bit | 276.4M/s | 217.8M/s[/CODE]

71bit: unchanged
most of the 75bit, 95bit, 79bit and 92bit improvement is related to the optimizations of the squaring function (thank you, George!).
I guess that older GPUs (CC 1.x) don't see any improvement.

The new 76bit barrett kernel is nice, take the 79bit barrett kernel, (re-)move some lines of code and you're mostly done.

For the future it might be possible to add more kernels:[LIST][*]77 bit barrett kernel (same as 76bit kernel but with more accuracy in preprocessing)[*]78 bit barrett kernel (same as 77bit kernel but with correction step from 79bit kernel for each set bit in the exponent)[/LIST]
Release plan:[LIST][*]run some tests on my GTX 275 (this weekend?)[*]build an release candidate and give to a few people for testing[*]release one week later[/LIST]So [B]if[/B] everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver

NormanRKN 2012-08-07 21:39

cool !:cool:

kladner 2012-08-07 22:21

[QUOTE=TheJudger;307238]Hi all,
.................
So [B]if[/B] everything is fine I guess it will take 10-14 days from now for mfaktc 0.19.

Oliver[/QUOTE]

Thanks for the update and the work behind it. Looking forward to running 0.19.

Xyzzy 2012-08-10 17:30

[QUOTE]Due to a recent hike in our electricity rate, too much current draw, inadequate branch circuits and an inadequate central air cooling system, we are going to drop off of trial factoring with our GPUs.[/QUOTE]The GPUs sold within a day or so on eBay.

If we take the purchase price for the four GPUs (\$1320) and subtract what we sold them for (\$800) we have a net cost of \$520 or so.

We think (?) we used them for about 400,000 GHz/days of work so our cost per GHz/day, not counting the host computers, which are still happily churning away, is 0.13¢ per GHz/day. (Our math might be wrong.)

That seems like a reasonable ROI.

:tank:


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.