![]() |
|
|
#1827 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
|
|
|
|
|
|
|
#1828 |
|
Romulan Interpreter
Jun 2011
Thailand
72·197 Posts |
Yeah, you looked like you need some credit, so that's why
![]() ------------------------------------------------------------ @prime95, related to barrett77: "Enter George Woltman, an excellent programmer and organizer..." (from the Encyclopedia Galactica , the History of Mersenne Primes section)(we need a smiley which take out his hat!) (edit, ok, this will substitute: )When can we get mfaktc binaries for win64? (eventually for both the "classic" version, and the one for tf small expos, here a 20% improvement will look great, in fact we would be happier with a "barrett67" and a 50% improvement )
Last fiddled with by LaurV on 2012-08-01 at 03:18 |
|
|
|
|
|
#1829 |
|
Jul 2012
Saarland / Germany
4416 Posts |
wow, that is a perfomance boost
|
|
|
|
|
|
#1830 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Too bad this trick only works for the 79-bit kernel with it's fixed 281/f inverse. The other barretts with the 2bit_max+1/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits). |
|
|
|
|
|
|
#1831 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011100012 Posts |
|
|
|
|
|
|
#1832 | |
|
"Oliver"
Mar 2005
Germany
111110 Posts |
Hi George,
Quote:
@others: please be carefully, there are some other changes and testing needed before this is save for daily usage, with this modification alone it will choose this kernel for TF up to 279 and it will fail there. I guess I'll reschedule my release plan for 0.19 and add this. Oliver |
|
|
|
|
|
|
#1833 |
|
Romulan Interpreter
Jun 2011
Thailand
25B516 Posts |
|
|
|
|
|
|
#1834 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.
|
|
|
|
|
|
#1835 | |
|
Nov 2010
Germany
25516 Posts |
Quote:
* 5 words with 15 bits each, to avoid the expensive 32-bit multiplications and use mul24 instead. |
|
|
|
|
|
|
#1836 | |
|
"Oliver"
Mar 2005
Germany
100010101112 Posts |
Quote:
|
|
|
|
|
|
|
#1837 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
1D7116 Posts |
I forgot about all the nasty bit-shifting that kernel performs. It may not be possible to retrieve a 96-bit quotient -- needs further research.
Last fiddled with by Prime95 on 2012-08-02 at 20:06 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |