mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-08-01, 02:16   #1827
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
More "extra credit":

Code:
Processing result: M56505451 has a factor: 86553876518403762963169
CPU credit is 323.9309 GHz-days.
Processing result: M56488651 has a factor: 35566445275259107720993
CPU credit is 129.5622 GHz-days.
Processing result: M56491177 has a factor: 23502006329787341695151
CPU credit is 89.0731 GHz-days.
kracker is offline   Reply With Quote
Old 2012-08-01, 03:09   #1828
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72·197 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
More "extra credit":
Yeah, you looked like you need some credit, so that's why
------------------------------------------------------------
@prime95, related to barrett77:
"Enter George Woltman, an excellent programmer and organizer..."
(from the Encyclopedia Galactica, the History of Mersenne Primes section)
(we need a smiley which take out his hat!)
(edit, ok, this will substitute:)

When can we get mfaktc binaries for win64? (eventually for both the "classic" version, and the one for tf small expos, here a 20% improvement will look great, in fact we would be happier with a "barrett67" and a 50% improvement )

Last fiddled with by LaurV on 2012-08-01 at 03:18
LaurV is offline   Reply With Quote
Old 2012-08-01, 10:17   #1829
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

4416 Posts
Default

wow, that is a perfomance boost
NormanRKN is offline   Reply With Quote
Old 2012-08-01, 22:24   #1830
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.
Very nice! In mfakto, this new 77-bit kernel is even 5% faster than the 70-bit, and 10% faster than the 73-bit kernels I have, making it the fastest again for VLIW5. The newer architectures benefit less from this kernel.

Too bad this trick only works for the 79-bit kernel with it's fixed 281/f inverse. The other barretts with the 2bit_max+1/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits).
Bdot is offline   Reply With Quote
Old 2012-08-02, 01:18   #1831
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011100012 Posts
Default

Quote:
Originally Posted by Bdot View Post
Very nice! In mfakto, this new 77-bit kernel is even 5% faster than the 70-bit, and 10% faster than the 73-bit kernels
Glad it helped and is passing your tests too.

You can also create a 78-bit kernel that only adjusts the result when there is a multiplication by 2.
Prime95 is offline   Reply With Quote
Old 2012-08-02, 09:42   #1832
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

Hi George,

Quote:
Originally Posted by Prime95 View Post
Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.
cool, I'll test this (again!). Some time ago I've tried similar but failed somehow. Did you run the tests with CHECKS_MODBASECASE (src/params.h) enabled?

@others: please be carefully, there are some other changes and testing needed before this is save for daily usage, with this modification alone it will choose this kernel for TF up to 279 and it will fail there.

I guess I'll reschedule my release plan for 0.19 and add this.

Oliver
TheJudger is offline   Reply With Quote
Old 2012-08-02, 09:47   #1833
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25B516 Posts
Default

Quote:
Originally Posted by TheJudger View Post
I guess I'll reschedule my release plan for 0.19 and add this.
We fully agree with this! Eagerly waiting!
LaurV is offline   Reply With Quote
Old 2012-08-02, 14:33   #1834
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by Bdot View Post
Too bad this trick only works for the 79-bit kernel with it's fixed 281/f inverse. The other barretts with the 2bit_max+1/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits).
I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.
Prime95 is offline   Reply With Quote
Old 2012-08-02, 14:57   #1835
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.
I tried (with my 75-bit kernel*, and a 68-bit factor), and the 1/f step left a remainder that increased with each loop until I had an overflow. I'll check the details later.

* 5 words with 15 bits each, to avoid the expensive 32-bit multiplications and use mul24 instead.
Bdot is offline   Reply With Quote
Old 2012-08-02, 15:56   #1836
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010101112 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Some data from tf_barrett96.cu: mod_simple_96():
Code:
 qi = 0
 q =   00000007 3C3F1F20 C454D397
 nn =  00000000 00000000 00000000
 res = 00000007 3C3F1F1F C454D397
res = q - nn;

So for now it looks like CUDA 5.0.7 fails when somebody uses sub with carry when the subtrahend is 0. So for now it looks like a bug in CUDA 5.0.7.

Oliver
Quote:
Originally Posted by TheJudger View Post
Nvidia confirmed the bug so I would say: not my fault/problem!

Oliver
btw.: Nvidia told me they have fixed the bug with a driver update. Unfortionaly this driver is not yet available for me.
TheJudger is offline   Reply With Quote
Old 2012-08-02, 20:06   #1837
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D7116 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits.
I forgot about all the nasty bit-shifting that kernel performs. It may not be possible to retrieve a 96-bit quotient -- needs further research.

Last fiddled with by Prime95 on 2012-08-02 at 20:06
Prime95 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 07:31.


Mon Aug 2 07:31:19 UTC 2021 up 10 days, 2 hrs, 0 users, load averages: 1.18, 1.26, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.