mersenneforum.org /. Video processor compiler
 Register FAQ Search Today's Posts Mark Forums Read

 2003-12-21, 18:35 #1 tha     Dec 2002 22·3·71 Posts /. Video processor compiler Slashdot is running an article on a compiler for processors of video cards Slashdot article
2003-12-21, 19:55   #2
PrimeCruncher

Sep 2003

2BE16 Posts

Wow. Only works on a few vid cards apparently (GeForceFX and above and ATI 97/9800). Also, briefly reading through the documentation, there appears to be a stumbling block:

Quote:
 Examples of legal reductions are sum, product, min/max, OR, AND, and XOR bit operations. Examples of invalid reductions include subtraction and dividision.
I might be wrong about this since I don't have much experience in C. Perhaps a more experienced programmer could comment on this?

 2003-12-21, 19:55 #3 GP2     Sep 2003 13·199 Posts Unfortunately, all my farm boxes use VGA cards that I bought for literally \$1 each on eBay.
 2003-12-21, 20:06 #4 GP2     Sep 2003 13×199 Posts Once upon a time, Nick Craig-Wood wrote an integer Lucas-Lehmer tester for the ARM chip. http://www.craig-wood.com/nick/armprime/ The ARM chip had no divide instruction, but he found an ingenious way to calculate x mod p without division. Perhaps this code could be revived for use on graphics cards. Unfortunately it doesn't seem like he's worked on it since 1999.
 2003-12-23, 17:39 #5 dsouza123     Sep 2002 2·331 Posts If division isn't available then subtraction of multiples x from p until the amount left is less then x. If divistion and subtraction aren't available then add multiples of x while less than p. Shifts can be used for multiplying and dividing by 2. I think integers in 2's complement will add negatives.
 2003-12-24, 06:25 #6 Maybeso     Aug 2002 Portland, OR USA 2×137 Posts For the LL test everything is done mod Mp, so there are other tricks to replace division and subtraction. -- Tho I'm not sure adding (Mp - 2) instead of subtracting 2 is reasonable.
 2003-12-24, 06:52 #7 ColdFury     Aug 2002 1010000002 Posts http://www.cs.unm.edu/~kmorel/documents/fftgpu/ FFT's can be done on a GPU. However, as usual, the problem is precision. Anyone know any useful tricks to use two singles as a double?
 2003-12-24, 22:20 #8 Dresdenboy     Apr 2003 Berlin, Germany 5518 Posts There is an older thread about using GPUs for calculations: http://mersenneforum.org/showthread.php?s=&threadid=432 Even something simple like game of life is hard to port to a GPU by using an efficient algorithm (like calculating one life cell per bit)
2003-12-26, 16:49   #9
ewmayer
2ω=0

Sep 2002
República de California

3×7×13×43 Posts

Quote:
 Originally posted by GP2 Once upon a time, Nick Craig-Wood wrote an integer Lucas-Lehmer tester for the ARM chip. http://www.craig-wood.com/nick/armprime/ The ARM chip had no divide instruction, but he found an ingenious way to calculate x mod p without division.
The LL test itself requires no divide, since the modulus has a convenient binary form, and because (since 1994) the Crandall/Fagin DWT algorithm allows us to further effect the mod automatically while doing the trasform-based multiply.

In Nick's case, any mods that would have been needed would have been with respect to the prime he used for his all-integer transform, p = 2^64 - 2^32 + 1. That prime has a nice binary form that allows efficient modding without division.

For general primes of no special form, we can use the well-known Montgomery division-less mod, which effects the mod via a clever precomputation of the inverse of p modulo a power of two, typically one chosen to coincide with the natural wordsize boundary of the hardware. This inverse can be efficiently computed using a Newton-style iteration of the same kind used to quickly find floating-point inverses without dividing, and using the fact that the standard hardware integer multiply of two (say) 64-bit ints is really mod-2^64 multiply. Once one has p^(-1) mod 2^n, a few more muls (including several which require a full double-wide integer product) suffice to return x*y*2^(-n) mod p, which can easily be converted to the desired x*y mod p.

My sieve-based factoring code uses the Montgomery mod. I believe George's uses a completely different (but similarly fast, especially on hardware which doesn't support double-wide integer muls) way to get x*y mod p.

 2003-12-27, 10:16 #10 optim     Nov 2003 European Union 23·13 Posts BrookGPU General Programming of GPUs with C: http://developers.slashdot.org/arti...052&tid=185 http://www.xbitlabs.com/news/video/...1222075229.html http://graphics.stanford.edu/projects/brookgpu/ http://sourceforge.net/projects/brook maybe we could use this to speed up Prime95/MPrime ??????
2003-12-27, 16:43   #11
PrimeCruncher

Sep 2003

2×33×13 Posts
Re: BrookGPU

Quote:
 Originally posted by optim maybe we could use this to speed up Prime95/MPrime ??????
Maybe if we can't use the GPU for every instruction, maybe we can use it for some and speed things up. George?

 Similar Threads Thread Thread Starter Forum Replies Last Post Explorer09 Software 1 2017-01-23 02:50 Dubslow Programming 2 2016-02-27 06:55 ixfd64 Software 7 2011-02-25 20:05 R.D. Silverman Cunningham Tables 30 2010-10-02 22:12 geoff Programming 3 2007-09-26 03:09

All times are UTC. The time now is 12:54.

Wed Aug 17 12:54:48 UTC 2022 up 41 days, 7:42, 1 user, load averages: 1.30, 1.43, 1.45