Re the B thing: Ah yes, that makes much more sense. Please fix it Brain.

Re CUDALucas: flash did some programming, but none I couldn't have done myself. (It would have taken me quite a bit more time than him though.

) Unfortunately, until he returns, you're the only one (AFAIK) who is capable of compiling CUDALucas for Windows, and I can theoretically take care of the programming myself. (We might need to go through some extensive PM conversations as you copy/paste warnings/errors etc., but I should be able to do it with little more than the copy/pasting from you.)

Re P-1: Since P-1 is also a bunch of multiplication mod Mp, like the LL test, I think our best bet is to modify CUDALucas. First step would be to learn (and I mean

*learn*) about

this thingy. That would be something that would require some serious tutoring from the smart people, e.g. msft, ewmayer, Prime95, etc. etc.

This might be a good place to start, being the genesis of all modern LL programs, CUDALucas and Prime95 included (though perhaps excluding Mlucas, you'll have to ask ewmayer about that), and being written by Richard Crandall, one of the guys who came up with the IBDWT. (If you look closely, some of the comments and functions in CUDALucas.cu are actually verbatim (or close to it) leftovers from that link.) PS: I have been considering starting such a tutorial thread, but between fixing YAFU's minrels, BOINCifying a modified Msieve (not to mention Prime95) and restarting university in less than a month, I figured it was too much. PPS: Wiki says this: "If we perform carrying on the negacyclic convolution, the result is equivalent to the product of the inputs mod B^n + 1." together with "If we perform carrying on the cyclic convolution, the result is equivalent to the product of the inputs mod B^n − 1." But then it says: "In this algorithm, it will be more useful to compute the negacyclic convolution" but it seems to me that using the cyclic convolution would make more sense since our test should be mpd 2^p-1? Or is the negacyclic thingy still faster, and just be sure to catch values that are 2^p and 2^p-1 (which should reduce to 1 and 0 mod 2^p-1?) PPPS: How do you represent a bignum as an array of doubles? How many bits of the num does each double represent? (Do we assume a double has 64 bits of memory? Do you assume IEEE 754 format? Can you use the exponent bits, e.g. via shifts?)