![]() |
|
|
#1827 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
|
|
|
|
|
|
|
#1828 |
|
Romulan Interpreter
Jun 2011
Thailand
3×3,221 Posts |
Yeah, you looked like you need some credit, so that's why
![]() ------------------------------------------------------------ @prime95, related to barrett77: "Enter George Woltman, an excellent programmer and organizer..." (from the Encyclopedia Galactica , the History of Mersenne Primes section)(we need a smiley which take out his hat!) (edit, ok, this will substitute: )When can we get mfaktc binaries for win64? (eventually for both the "classic" version, and the one for tf small expos, here a 20% improvement will look great, in fact we would be happier with a "barrett67" and a 50% improvement )
Last fiddled with by LaurV on 2012-08-01 at 03:18 |
|
|
|
|
|
#1829 |
|
Jul 2012
Saarland / Germany
4416 Posts |
wow, that is a perfomance boost
|
|
|
|
|
|
#1830 | |
|
Nov 2010
Germany
10010101012 Posts |
Quote:
Too bad this trick only works for the 79-bit kernel with it's fixed 281/f inverse. The other barretts with the 2bit_max+1/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits). |
|
|
|
|
|
|
#1831 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·32·419 Posts |
|
|
|
|
|
|
#1832 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi George,
Quote:
@others: please be carefully, there are some other changes and testing needed before this is save for daily usage, with this modification alone it will choose this kernel for TF up to 279 and it will fail there. I guess I'll reschedule my release plan for 0.19 and add this. Oliver |
|
|
|
|
|
|
#1833 |
|
Romulan Interpreter
Jun 2011
Thailand
966310 Posts |
|
|
|
|
|
|
#1834 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×32×419 Posts |
I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.
|
|
|
|
|
|
#1835 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
* 5 words with 15 bits each, to avoid the expensive 32-bit multiplications and use mul24 instead. |
|
|
|
|
|
|
#1836 | |
|
"Oliver"
Mar 2005
Germany
45716 Posts |
Quote:
|
|
|
|
|
|
|
#1837 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×32×419 Posts |
I forgot about all the nasty bit-shifting that kernel performs. It may not be possible to retrieve a 96-bit quotient -- needs further research.
Last fiddled with by Prime95 on 2012-08-02 at 20:06 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |