/. Video processor compiler

Wow. Only works on a few vid cards apparently (GeForceFX and above and ATI 97/9800). Also, briefly reading through the documentation, there appears to be a stumbling block:
Unfortunately, all my farm boxes use VGA cards that I bought for literally $1 each on eBay.

Once upon a time, Nick CraigWood wrote an integer LucasLehmer tester for the ARM chip.
http://www.craigwood.com/nick/armprime/ The ARM chip had no divide instruction, but he found an ingenious way to calculate x mod p without division. Perhaps this code could be revived for use on graphics cards. Unfortunately it doesn't seem like he's worked on it since 1999. 
If division isn't available then subtraction of multiples x from p until the amount left is less then x.
If divistion and subtraction aren't available then add multiples of x while less than p. Shifts can be used for multiplying and dividing by 2. I think integers in 2's complement will add negatives. 
For the LL test everything is done mod Mp, so there are other tricks to replace division and subtraction.
 Tho I'm not sure adding (Mp  2) instead of subtracting 2 is reasonable. 
http://www.cs.unm.edu/~kmorel/documents/fftgpu/
FFT's can be done on a GPU. However, as usual, the problem is precision. Anyone know any useful tricks to use two singles as a double? 
There is an older thread about using GPUs for calculations:
http://mersenneforum.org/showthread.php?s=&threadid=432 Even something simple like game of life is hard to port to a GPU by using an efficient algorithm (like calculating one life cell per bit) 
In Nick's case, any mods that would have been needed would have been with respect to the prime he used for his allinteger transform, p = 2^64  2^32 + 1. That prime has a nice binary form that allows efficient modding without division. For general primes of no special form, we can use the wellknown Montgomery divisionless mod, which effects the mod via a clever precomputation of the inverse of p modulo a power of two, typically one chosen to coincide with the natural wordsize boundary of the hardware. This inverse can be efficiently computed using a Newtonstyle iteration of the same kind used to quickly find floatingpoint inverses without dividing, and using the fact that the standard hardware integer multiply of two (say) 64bit ints is really mod2^64 multiply. Once one has p^(1) mod 2^n, a few more muls (including several which require a full doublewide integer product) suffice to return x*y*2^(n) mod p, which can easily be converted to the desired x*y mod p. My sievebased factoring code uses the Montgomery mod. I believe George's uses a completely different (but similarly fast, especially on hardware which doesn't support doublewide integer muls) way to get x*y mod p. 

BrookGPU
General Programming of GPUs with C:
http://developers.slashdot.org/arti...052&tid=185 http://www.xbitlabs.com/news/video/...1222075229.html http://graphics.stanford.edu/projects/brookgpu/ http://sourceforge.net/projects/brook maybe we could use this to speed up Prime95/MPrime ?????? 
Re: BrookGPU
