mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   /. Video processor compiler (https://www.mersenneforum.org/showthread.php?t=1762)

tha 2003-12-21 18:35

/. Video processor compiler
 
Slashdot is running an article on a compiler for processors of video cards

[URL=http://developers.slashdot.org/developers/03/12/21/169200.shtml?tid=152&tid=185]Slashdot article[/URL]

PrimeCruncher 2003-12-21 19:55

Wow. Only works on a few vid cards apparently (GeForceFX and above and ATI 97/9800). Also, briefly reading through the documentation, there appears to be a stumbling block:

From [URL=http://graphics.stanford.edu/projects/brookgpu/lang.html]this page[/URL]:

[QUOTE]Examples of legal reductions are sum, product, min/max, OR, AND, and XOR bit operations. Examples of invalid reductions include subtraction and dividision.[/QUOTE]

I might be wrong about this since I don't have much experience in C. Perhaps a more experienced programmer could comment on this?

GP2 2003-12-21 19:55

Unfortunately, all my farm boxes use VGA cards that I bought for literally $1 each on eBay.

:razz:

GP2 2003-12-21 20:06

Once upon a time, Nick Craig-Wood wrote an integer Lucas-Lehmer tester for the ARM chip.

[url]http://www.craig-wood.com/nick/armprime/[/url]

The ARM chip had no divide instruction, but he found an ingenious way to calculate x mod p without division.

Perhaps this code could be revived for use on graphics cards. Unfortunately it doesn't seem like he's worked on it since 1999.

dsouza123 2003-12-23 17:39

If division isn't available then subtraction of multiples x from p until the amount left is less then x.

If divistion and subtraction aren't available then add multiples of x while less than p.

Shifts can be used for multiplying and dividing by 2.

I think integers in 2's complement will add negatives.

Maybeso 2003-12-24 06:25

For the LL test everything is done mod Mp, so there are other tricks to replace division and subtraction.

-- Tho I'm not sure adding (Mp - 2) instead of subtracting 2 is reasonable. :wink:

ColdFury 2003-12-24 06:52

[URL=http://www.cs.unm.edu/~kmorel/documents/fftgpu/]http://www.cs.unm.edu/~kmorel/documents/fftgpu/[/URL]

FFT's can be done on a GPU. However, as usual, the problem is precision. Anyone know any useful tricks to use two singles as a double?

Dresdenboy 2003-12-24 22:20

There is an older thread about using GPUs for calculations:

[url]http://mersenneforum.org/showthread.php?s=&threadid=432[/url]

Even something simple like game of life is hard to port to a GPU by using an efficient algorithm (like calculating one life cell per bit)

ewmayer 2003-12-26 16:49

[QUOTE][i]Originally posted by GP2 [/i]
[B]Once upon a time, Nick Craig-Wood wrote an integer Lucas-Lehmer tester for the ARM chip.

[url]http://www.craig-wood.com/nick/armprime/[/url]

The ARM chip had no divide instruction, but he found an ingenious way to calculate x mod p without division.[/B][/QUOTE]

The LL test itself requires no divide, since the modulus has a convenient binary form, and because (since 1994) the Crandall/Fagin DWT algorithm allows us to further effect the mod automatically while doing the trasform-based multiply.

In Nick's case, any mods that would have been needed would have been with respect to the prime he used for his all-integer transform, p = 2^64 - 2^32 + 1. That prime has a nice binary form that allows efficient modding without division.

For general primes of no special form, we can use the well-known Montgomery division-less mod, which effects the mod via a clever precomputation of the inverse of p modulo a power of two, typically one chosen to coincide with the natural wordsize boundary of the hardware. This inverse can be efficiently computed using a Newton-style iteration of the same kind used to quickly find floating-point inverses without dividing, and using the fact that the standard hardware integer multiply of two (say) 64-bit ints is really mod-2^64 multiply. Once one has p^(-1) mod 2^n, a few more muls (including several which require a full double-wide integer product) suffice to return x*y*2^(-n) mod p, which can easily be converted to the desired x*y mod p.

My sieve-based factoring code uses the Montgomery mod. I believe George's uses a completely different (but similarly fast, especially on hardware which doesn't support double-wide integer muls) way to get x*y mod p.

optim 2003-12-27 10:16

BrookGPU
 
General Programming of GPUs with C:


[url]http://developers.slashdot.org/arti...052&tid=185[/url]



[url]http://www.xbitlabs.com/news/video/...1222075229.html[/url]



[url]http://graphics.stanford.edu/projects/brookgpu/[/url]



[url]http://sourceforge.net/projects/brook[/url]


maybe we could use this to speed up Prime95/MPrime ??????

PrimeCruncher 2003-12-27 16:43

Re: BrookGPU
 
[QUOTE][i]Originally posted by optim [/i]
[B]maybe we could use this to speed up Prime95/MPrime ?????? [/B][/QUOTE]

Maybe if we can't use the GPU for [I]every[/I] instruction, maybe we can use it for [I]some[/I] and speed things up. George?


All times are UTC. The time now is 08:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.