20100410, 23:14  #1 
Aug 2002
Buenos Aires, Argentina
31×43 Posts 
SQUFOF implementation
Since there are not many places on the Internet where you can find a SQUFOF implementation, I've written mine in Java based on the algorithm explained in the Gower and Wagstaff paper. This will be used in my factorization applet.
Code:
/* Implementation of algorithm explained in Gower and Wagstaff paper */ static int SQUFOF(long N, int queue[]) { double sqrt; int B, Q, Q1, P, P1, L, S; int i, r, s, t, q, u; int queueHead, queueTail, queueIndex; /* Step 1: Initialize */ if ((N & 3) == 1) { N <<= 1; } sqrt = Math.sqrt(N); S = (int)sqrt; if ((long)(S+1)*(long)(S+1)<=N) { S++; } if ((long)S*(long)S > N) { S; } if ((long)S*(long)S == N) { return S; } Q1 = 1; P = S; Q = (int)N  P*P; L = (int)(2*Math.sqrt(2*sqrt)); B = L << 1; queueHead = 0; queueTail = 0; /* Step 2: Cycle forward to find a proper square form */ for (i=0; i<=B; i++) { q = (S+P)/Q; P1 = q*QP; if (Q <= L) { if ((Q & 1) == 0) { queue[queueHead++] = Q >> 1; queue[queueHead++] = P % (Q >> 1); if (queueHead == 100) { queueHead = 0; } } else if (Q+Q<=L) { queue[queueHead++] = Q; queue[queueHead++] = P % Q; if (queueHead == 100) { queueHead = 0; } } } t = Q1+q*(PP1); Q1 = Q; Q = t; P = P1; if ((i & 1) == 0 && ((Q & 7) < 2  (Q & 7) == 4)) { r = (int)Math.sqrt(Q); if (r*r == Q) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { /* Step 3: Compute inverse square root of the square form */ Q1 = r; u = (SP)%r; u += (u >> 31) & r; P = Su; Q = (int)((N(long)P*(long)P)/Q1); /* Step 4: Cycle in the reverse direction to find a factor of N */ for (;;) { q = (S+P)/Q; P1 = q*QP; if (P == P1) { break; } t = Q1 +q*(PP1); Q1 = Q; Q = t; P = P1; } /* Step 5: Get the factor of N */ if ((Q & 1) == 0) { return Q >> 1; } return Q; } s = queue[queueIndex++]; t = queue[queueIndex++]; if (queueIndex == 100) { queueIndex = 0; } if ((Pt)%s == 0) { break; } } if (r > 1) { queueTail = queueIndex; } if (r == 1) { queueIndex = queueTail; for (;;) { if (queueIndex == queueHead) { break; } if (queue[queueIndex] == 1) { return 0; } queueIndex += 2; if (queueIndex == 100) { queueIndex = 0; } } } } } } return 0; } Notice that in Step 2 a square root must be computed every two cycles of the loop. This can probably be optimized. The paper says that step 4 can be optimized but I will need to investigate it. Maybe some forum participants who programmed SQUFOF before (Bob and others) can give me some hints in order to optimize it. Last fiddled with by alpertron on 20100410 at 23:16 
20100411, 01:45  #2 
"Ben"
Feb 2007
13·257 Posts 
Are you using multipliers? I didn't see that you were, but maybe I missed where they were implemented or maybe they are already in N when this routine is called. Racing several small multipliers (by keeping a small amount of state memory for each multipler and running smallish batches of iterations on each) is a good optimization. Msieve and yafu both have implementations which do this.
The step 4 optimization I believe is called fast return. Search for papers by Steven McMath on this subject  he researched it extensively and also published some code, I think... yes, it's called "Parallel integer factorization using quadratic forms". I never got around to implementing fast return, so if you are successful I'd be very interested in seeing it. 
20100411, 02:01  #3 
Aug 2002
Buenos Aires, Argentina
31×43 Posts 
I have not implemented multipliers in the code. First I wanted to be sure that the code worked. There were two errors in the algorithm (section 3.3 of the Gower and Wagstaff paper) that I corrected in the implementation.
From what I read in http://hal.inria.fr/docs/00/45/16/04...draft_02_1.pdf it appears that the "fast return" is really faster when factoring numbers greater than 20 digits, but I think that in this case SIQS will be faster. 
20100411, 11:57  #4 
Tribal Bullet
Oct 2004
110110100001_{2} Posts 
What corrections did you have to make? It's been a long time, but the SQUFOF code in msieve was taken directly from Gower's description, and I didn't understand enough of the underlying math to make corrections.
Also, it was quite helpful in C to handle a special case of q=1 before doing the divisions at the beginning of the even and odd cycles. It probably wouldn't help in Java though. Last fiddled with by jasonp on 20100411 at 11:59 
20100411, 12:16  #5  
Nov 2003
2^{6}×113 Posts 
Quote:
Code:
squfof(n,fac1,fac2) int n[]; int *fac1, *fac2; { /* start of squfof: factor n as fac1*fac2 faster in FP?????*/ int temp[MPDIM], temp1[MPDIM]; register int iq,ll,l2,p,pnext,q,qlast,r,s,t,i; int jter,iter; if (SQUFOF) squfof_calls++; #define DEBUG_QSC 0 if (DEBUG_QSC) { (void) printf("trying qs on: "); write_num(n,numstr); fflush(stdout); } jter = small_qs(n,fac1,fac2); if (DEBUG_QSC) { (void) printf("qs returns %d\n",jter); (void) printf("tried qs on "); write_num(n, numstr); (void) printf("fac1, fac2 = %d %d, squfof returning %d\n", *fac1, *fac2, jter); fflush(stdout); } /* 1 should be very rare; QS found no nontrivial GCD's */ if (jter != 1) return(jter); /* Notes: could replace calls to the MP routines [on Pentium] by using*/ /* __int64 for temp,temp1, convert n to __int64 etc. But the MP stuff */ /* is not inside a loop. It is not worth doing */ qlast = 1; mpsqrt(n,temp); s = LAST(temp); p = s; mult(temp,temp,temp1); subt(n,temp1,temp); /* temp = n  floor(sqrt(n))^2 */ if (SIZE(temp) <= SINGLE && LAST(temp) == 0) { /* Here n is a square */ *fac1 = s; *fac2 = s; return(1); } q = LAST(temp); /* q = excess of n over next smaller square */ ll = 1 + 2*(int)sqrt((double)(p+p)); l2 = ll/2; qpoint = 0; /* In the loop below, we need to check if q is a square right before */ /* the end of the loop. Is there a faster way? The current way is */ /* EXPENSIVE! (many branches and double prec sqrt) */ for (jter=0; jter < 800000; jter++) /* I see no way to speed this */ { /* main loop */ iq = (s + p)/q; pnext = iq*q  p; if (q <= ll) { if ((q & 1) == 0) enqu(q/2,&jter); else if (q <= l2) enqu(q,&jter); if (jter < 0) { /* queue overflow: try pollard rho */ if (Q_OVERFLOW) overflow_count++; /* should be rare */ return(pollardrho2(n,fac1,fac2)); /* also try squfof(3N)? */ } } t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; /* check for square; even iter */ if (jter & 1) continue; /* jter is odd:omit square test */ // if (t = (q & 3) > 1) continue; /* skip t = 2,3 mod 4 */ // if (q & 7 == 5) continue; /* skip q = 5 mod 8 */ // if (t == 0 && (q & 15) > 5) continue; /* skip 8, 12 mod 16 */ // if (s3465[q % 3465] == 'n') continue; /* skip QNR's mod 3465 */ r = (int)sqrt((double)q); /* r = floor(sqrt(q)) */ if (q != r*r) continue; if (qpoint == 0) goto gotit; for (i=0; i<qpoint1; i+=2) /* treat queue as list for simplicity*/ { if (r == qqueue[i]) goto contin; if (r == qqueue[i+1]) goto contin; } if (r == qqueue[qpoint1]) continue; goto gotit; contin:; } /* end of main loop */ void sqfprep() { int i; for (i=0; i<3465; i++) s3465[i] = 'n'; for (i=0; i<1733; i++) s3465[(i*i) % 3465] = 'r'; } void enqu(q,iter) int q; int *iter; { qqueue[qpoint] = q; if (++qpoint > 50) *iter = 1; } 

20100411, 13:40  #6  
Aug 2002
Buenos Aires, Argentina
31·43 Posts 
Quote:
In Step 4a the paper says Set but it must be The last one should be easy to catch because even the example provided does not work using the step 4a as written on the paper. Last fiddled with by alpertron on 20100411 at 14:39 

20100411, 14:37  #7 
Aug 2002
Buenos Aires, Argentina
31·43 Posts 
Acording to the paper, the implementation of the queue as a list as written by Bob above is slower than the queue because some proper square forms could be skipped.

20100411, 16:49  #8 
"Ben"
Feb 2007
13×257 Posts 
comparisons
I went ahead and did some comparisons. Bob, I had to port your code so that it works outside your lattice siever, in my test routine. If anything it should be faster, since it uses native 64 bit integers rather than your multiple precision routines, when necessary. I also got rid of the calls to pollard rho, and your miniqs, so that the comparison is between squfof routines only. Here is your modified code:
Code:
int qqueue[100]; int qpoint; void enqu(int q,int *iter) { qqueue[qpoint] = q; if (++qpoint > 100) *iter = 1; } int squfof_rds(int64 n,int *fac1,int *fac2) { /* start of squfof: factor n as fac1*fac2 faster in FP?????*/ int64 temp, temp1; register int iq,ll,l2,p,pnext,q,qlast,r,s,t,i; int jter,iter; qlast = 1; s = (int)sqrt(n); p = s; temp1 = s * s; temp = n  temp1; /* temp = n  floor(sqrt(n))^2 */ if (temp == 0) { /* Here n is a square */ *fac1 = s; *fac2 = s; return(1); } q = (int)temp; /* q = excess of n over next smaller square */ ll = 1 + 2*(int)sqrt((double)(p+p)); l2 = ll/2; qpoint = 0; /* In the loop below, we need to check if q is a square right before */ /* the end of the loop. Is there a faster way? The current way is */ /* EXPENSIVE! (many branches and double prec sqrt) */ for (jter=0; jter < 800000; jter++) /* I see no way to speed this */ { /* main loop */ iq = (s + p)/q; pnext = iq*q  p; if (q <= ll) { if ((q & 1) == 0) enqu(q/2,&jter); else if (q <= l2) enqu(q,&jter); if (jter < 0) { return 0; } } t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; /* check for square; even iter */ if (jter & 1) continue; /* jter is odd:omit square test */ r = (int)sqrt((double)q); /* r = floor(sqrt(q)) */ if (q != r*r) continue; if (qpoint == 0) goto gotit; for (i=0; i<qpoint1; i+=2) /* treat queue as list for simplicity*/ { if (r == qqueue[i]) goto contin; if (r == qqueue[i+1]) goto contin; } if (r == qqueue[qpoint1]) continue; goto gotit; contin:; } /* end of main loop */ gotit: ; qlast = r; p = p + r*((s  p)/r); temp = (int64)p * (int64)p; temp = n  temp; temp1 = temp / qlast; q = (int)temp1; /* q = (n  p*p)/qlast (div is exact)*/ for (iter=0; iter<40000; iter++) { /* begin second main loop */ iq = (s + p)/q; /* unroll it, of course */ pnext = iq*q  p; if (p == pnext) goto gotfac; t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; iq = (s + p)/q; pnext = iq*q  p; if (p == pnext) goto gotfac; t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; iq = (s + p)/q; pnext = iq*q  p; if (p == pnext) goto gotfac; t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; iq = (s + p)/q; pnext = iq*q  p; if (p == pnext) goto gotfac; t = qlast + iq*(p  pnext); qlast = q; q = t; p = pnext; } return(0); /* this shouldn't happen */ gotfac: ; if ((q & 1) == 0) q/=2; /* q was factor or 2*factor */ *fac1 = q; temp = n / q; *fac2 = (int)temp; return(1); } Code:
avg time (ms) / % correct avg. input size yafu msieve dario RDS 41 bits 0.03/100 0.05/100 0.04/100 0.04/100 51 bits 0.16/100 0.16/100 0.22/95.8 0.22/99.98 55 bits 0.30/100 0.31/99.99 0.43/95.85 0.44/99.99 59 bits 0.64/97.9 0.68/98.7  0.87/99.82 
20100411, 17:12  #9 
Tribal Bullet
Oct 2004
3·1,163 Posts 
Actually the pseudocode in Gower's thesis did not look familiar, I probably used McMath's papers as pointed to in the SQUFOF wiki page.

20100412, 02:14  #10 
Aug 2002
Buenos Aires, Argentina
31×43 Posts 
By unrolling both loops so there are two copies for each, the code is now 15% faster, according to my measurements.

20100412, 15:22  #11  
Nov 2003
1110001000000_{2} Posts 
Quote:
(1) NFS spends so little time inside this routine, that optimizing it will not have much effect. (2) MPQS is a lot faster for numbers of this size. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
An idea for implementation of FFT multiplication.  Maximus  Software  9  20120130 22:41 
ECM/FPGA Implementation  rdotson  Hardware  12  20060326 22:58 
need C implementation of divb(r,k)  Leith  Miscellaneous Math  4  20050118 23:14 
RSA implementation  flava  Programming  12  20041026 03:51 
SQUFOF  Chris Card  Factoring  9  20040924 14:19 