fix integer overflow with deg 4 in sieve_lattice()
In stage1_sieve.c:
Code:
/* structures for storing arithmetic progressions. Rational leading coeffs of NFS polynomials are assumed to be the product of two groups of factors p, each of size at most 32 bits (32 bits is enough for 512bit factorizations), and candidates must satisfy a condition modulo p^2 */ #define MAX_P ((uint64)(1)) But, for degree==4, max p is still 32 bits. Code:
if (degree == 4) { CUDA_TRY(cuModuleGetGlobal(&L.gpu_p_array, NULL, gpu_module64, "pbatch")) done = sieve_lattice_gpu_deg4_64(obj, &L, &sieve_small, &sieve_large, (uint32)small_p_min, (uint32)small_p_max, (uint32)large_p_min, (uint32)large_p_max, gpu_info, gpu_kernel64); } The attached patch does a simple fix for this. 
Forgot to mention, this is about msieveGPU only.

Maybe it would be better to instead have different MAX_P defined for the different degrees. (uint32)(1) for degree==4 and (uint64)(1) for larger degrees.

I like this patch better than the earlier one. See attached.

