mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

kracker 2012-08-01 02:16

[QUOTE=Xyzzy;306576]More "extra credit":

[CODE]Processing result: M56505451 has a factor: 86553876518403762963169
CPU credit is 323.9309 GHz-days.
Processing result: M56488651 has a factor: 35566445275259107720993
CPU credit is 129.5622 GHz-days.
Processing result: M56491177 has a factor: 23502006329787341695151
CPU credit is 89.0731 GHz-days.[/CODE][/QUOTE]
:omg:

LaurV 2012-08-01 03:09

[QUOTE=Xyzzy;306576]More "extra credit":[/QUOTE]
Yeah, you looked like you need some credit, so that's why :razz:
------------------------------------------------------------
@prime95, related to barrett77:
"Enter George Woltman, an excellent programmer and organizer..."
(from the Encyclopedia Galactica:smile:, the [URL="http://primes.utm.edu/mersenne/"]History of Mersenne Primes[/URL] section)
(we need a smiley which take out his hat!)
(edit, ok, this will substitute::bow:)

When can we get mfaktc binaries for win64? (eventually for both the "classic" version, and the one for tf small expos, here a 20% improvement will look great, in fact we would be happier with a "barrett67" and a 50% improvement :razz:)

NormanRKN 2012-08-01 10:17

wow, that is a perfomance boost :w00t:

Bdot 2012-08-01 22:24

[QUOTE=Prime95;306572]Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.[/QUOTE]

Very nice! In mfakto, this new 77-bit kernel is even 5% faster than the 70-bit, and 10% faster than the 73-bit kernels I have, making it the fastest again for VLIW5. The newer architectures benefit less from this kernel.

Too bad this trick only works for the 79-bit kernel with it's fixed 2[SUP]81[/SUP]/f inverse. The other barretts with the 2[SUP]bit_max+1[/SUP]/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits).

Prime95 2012-08-02 01:18

[QUOTE=Bdot;306663]Very nice! In mfakto, this new 77-bit kernel is even 5% faster than the 70-bit, and 10% faster than the 73-bit kernels [/QUOTE]

Glad it helped and is passing your tests too.

You can also create a 78-bit kernel that only adjusts the result when there is a multiplication by 2.

TheJudger 2012-08-02 09:42

Hi George,

[QUOTE=Prime95;306572]Oliver,

I propose creating a barrett77_mul32. This is the same as barrett79_mul32 but with the mod_simple_96 moved out of the loop. As long as f does not exceed 77 bits, a will not exceed 80 bits (above 80 bits and square_96_160 will fail).

I tested this out and it passes the self tests up through 77 bits. Raw speed went from 205M/sec to 250M/sec.

Crude source is attached.[/QUOTE]

cool, I'll test this (again!). Some time ago I've tried similar but failed somehow. Did you run the tests with CHECKS_MODBASECASE (src/params.h) enabled?

@others: please be carefully, there are some other changes and testing needed before this is save for daily usage, with this modification alone it [B]will[/B] choose this kernel for TF up to 2[SUP]79[/SUP] and it [B]will[/B] fail there.

I guess I'll reschedule my release plan for 0.19 and add this.

Oliver

LaurV 2012-08-02 09:47

[QUOTE=TheJudger;306707]I guess I'll reschedule my release plan for 0.19 and add this.[/QUOTE]
We fully agree with this! Eagerly waiting!

Prime95 2012-08-02 14:33

[QUOTE=Bdot;306663]
Too bad this trick only works for the 79-bit kernel with it's fixed 2[SUP]81[/SUP]/f inverse. The other barretts with the 2[SUP]bit_max+1[/SUP]/f inverse cannot deal with the larger square in my kernels (the inverse does not seem to have enough significant digits).[/QUOTE]

I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.

Bdot 2012-08-02 14:57

[QUOTE=Prime95;306723]I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. That is, a 90-bit factor will generate a 90-bit remainder + 3 bits because we're pretty sloppy calculating the remainder. When we square the 93-bit result we get a 186-bit value. We then apply 1/f to get a 96-bit quotient - which just fits in our 3 registers.[/QUOTE]
I tried (with my 75-bit kernel[SUP]*[/SUP], and a 68-bit factor), and the 1/f step left a remainder that increased with each loop until I had an overflow. I'll check the details later.

[SUP]*[/SUP] 5 words with 15 bits each, to avoid the expensive 32-bit multiplications and use mul24 instead.

TheJudger 2012-08-02 15:56

[QUOTE=TheJudger;304737]Some data from tf_barrett96.cu: mod_simple_96():
[CODE]
qi = 0
q = 00000007 3C3F1F[COLOR="Red"]20[/COLOR] C454D397
nn = 00000000 00000000 00000000
res = 00000007 3C3F1F[COLOR="Red"]1F[/COLOR] C454D397
[/CODE]
res = q - nn;

So for now it looks like CUDA 5.0.7 fails when somebody uses sub with carry when the subtrahend is 0. So for now it looks like a bug in CUDA 5.0.7.

Oliver[/QUOTE]

[QUOTE=TheJudger;305015]Nvidia confirmed the bug so I would say: not my fault/problem! :smile:

Oliver[/QUOTE]

btw.: Nvidia told me they have fixed the bug with a driver update. Unfortionaly this driver is not yet available for me.

Prime95 2012-08-02 20:06

[QUOTE=Prime95;306723]I haven't tried it, but this should also work for the barrett 96-bit kernel for factors up to 90 bits. [/QUOTE]

I forgot about all the nasty bit-shifting that kernel performs. It may not be possible to retrieve a 96-bit quotient -- needs further research.


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.