20170131, 13:17  #45 
"Tilman Neumann"
Jan 2016
Germany
2×227 Posts 
Version 02
Hi all,
I am happy to be able to inform you that version 02 of the PSIQS package has been released. It is about factor 3 faster than version 01 for larger numbers (300 bit and more). The memory consumption has been reduced dramatically, too: While before 12 GB were not enough to factor a 340 bit number, this would require "only" about 3.4 GB now. 12 GB should be about what is needed to factor a 380 bit number. (but of course 'ld recommend to use Yafu or msieve for that). As usual you'll find the release, together with more information, on http://www.tilmanneumann.de/ The change log contains some informations about the details that led to the improvements. Big thanks to Dario Alpern for his permission to use his BlockLanczos solver and to Graeme Willoughby for several helpful comments on Sqrt, n.th root and PurePowerTest algorithms. 
20170205, 17:07  #46 
"Tilman Neumann"
Jan 2016
Germany
2·227 Posts 
New minor release 2.1
Shortly after the 2.0 release I found another (P)SIQS performance improvement:
The ainvp computation is much faster now, exploiting that (1/a) mod p = (1/(a%p) mod p Other improvements: * The test for pure powers has been sped up by another factor around 10 > in total the speedup since release 01 was about factor 200 ! More thanks to Graeme Willoughby here. * Threadhandling in PSIQS has been improved for small N. The parallel version is stable for N>56 bit now. 
20170206, 11:58  #47 
"Tilman Neumann"
Jan 2016
Germany
2·227 Posts 
Performance comparison
Hi all,
for those who like to see some numbers, here is a new comparison between PSIQS 2.1, Dario Alperns Siqs, and Yafu 1.34.5, 64bit Windows version. All programs were run with 6 threads. The test numbers were random generated such that they have two factors, the smaller factor having a uniformly distributed bitlength between ld(N)/3 and ld(N)/2. The following test numbers turned up: Code:
250 bit (c75): 966983290915691193309978723256242679920691599725908954700676674631843021151, factor 2166660942804222727904664493239497749 (121 bits) 260 bit (c79): 1405841556925214879486442303149405905001665390832024207523131521166180858938813, factor 67442061758209470276693046261 (96 bits) 270 bit (c82): 1862250708148641942690960669479655453558455211114669315970990094883431140413691063, factor 10131012562771734141422710036441 (103 bits) 280 bit (c85): 1124255513875297636309545520595967278216022444499041658628707408527781223819691431297, factor 102215939439709564867375590146800830467 (127 bits) 290 bit (c88): 1714638254587752572163813097700639375461962331568828069462681800410360936140704797782927, factor 118213006591252560634534512340187 (106 bits) 300 bit (c91): 1420645858748273907814607735289301772937523982503611315908309844444825252076415096683379639, factor 37001601928978936147228977127981 (105 bits) 310 bit (c94): 1504728267636063453737937497689480949465080318766121587410405654071410646640268764306438447777, factor 78826750148916873060910019453093 (106 bits) 320 bit (c97): 1520485323412100005395027120737880529234221500427246782721628759918241045629822884369979557002797, factor 26708589799276429676372119945486621441898621 (144 bits) 330 bit (c100): 1480043784220046663418677695827289686591546373143422996772942819804186784116179599537527918653095289, factor 62125133246163719560488825370539230413837849 (146 bits) Code:
Bits Alpern (factor) PSIQS2.1 (factor) Yafu 1.34.5 (factor) 250 126.8 3.57 59.0 1.66 35.5 1.0 260 187.1 3.74 111.6 2.23 49.9 1.0 270 402.4 4.31 227.3 2.43 93.3 1.0 280 735.6 5.16 424.1 2.97 142.5 1.0 290 2048.1 6.21 986.9 2.99 329.6 1.0 300 3763.5 6.34 1882.9 3.17 593.1 1.0 310 7525.7 6.46 3571.4 3.06 1163.9 1.0 320 12943.7 6.75 6028.7 3.14 1916.4 1.0 330   12460.7 3.17 3918.9 1.0 The 330 bit run of Dario Alperns Siqs was canceled because I didn't want to wait another 67 hours. Though I am already quite happy about the performance of PSIQS 2.1, it is interesting to see that Yafu scales better than both Java programs. I am not quite sure why this is so. My guess would be that the more data is produced, the more the Java programs suffer from cache misses. (in Java, segmented sieves do not work well because the JVM running as the toplevel task uses most of the L1 and L2 cache itself) 
20170206, 14:58  #48  
"Ben"
Feb 2007
2^{2}×3×293 Posts 
Quote:
Do you use a bucket sieve for very large primes or is that also not an effective technique in Java? 

20170206, 15:28  #49 
"Tilman Neumann"
Jan 2016
Germany
111000110_{2} Posts 
I didn't try a bucket sieve, because I believe that it is inefficient in Java. The reason is that the bucket sieve is also designed to optimize cache access, in combination with a segmented sieve for smaller primes.
Instead, having a monolithic sieve, I do the following: I just allocate the sieve array with size pMax+1, and for large primes, one value is always written. This means that we often write "into the blue" (into locations x > sieve size), but then no bound checks are required at all. That was one key ingredient to speed up the sieve. 
20170924, 13:20  #50 
"Tilman Neumann"
Jan 2016
Germany
111000110_{2} Posts 
New PSIQS release
Today I released PSIQS 3.0, which you find here:
http://www.tilmanneumann.de/psiqs.html In terms of asymptotic SIQS speed there was no big improvement, just 5% or so. But I added a few new algorithms, improved another few, and tried to raise the quality of many parts of the code. Most notable improvements were: * Trial division (in fact I mean resieving) performance improved almost by factor 2: Since we have many primes greater than the sieve array, it is that much faster to compute int xModP = xAbs<p ? x : x % p; instead of int xModP = x % p; where xAbs is the absolute value of the resieved smallest xsolution x. * Implementation of BPSW probable prime test: always faster than Java's builtin probable prime test, quite a lot for small arguments (<64 bit), only slightly better asymptotically. * Faster and less memoryconsuming sieve of Eratosthenes, replaced old sequential prime generator. * Better adjustment of the quadratic sieve for small N; the transition between SquFoF and SIQS dropped from 67/68 bits to 60/61 bits. * Better adjustment of KnuthSchroeppel algorithm for QS for small N, implementation of KnuthSchroeppel multipliers for CFrac * SqrtExact: faster for N<=106 bits. 
20180304, 18:20  #51 
"Tilman Neumann"
Jan 2016
Germany
454_{10} Posts 
PSIQS 4.0 release
Hi all,
today I released PSIQS 4.0, which can be found here: http://www.tilmanneumann.de/psiqs.html It is now licensed under GPL3, and my dearest thanks go once again to Dario alpern to permit me to republish his BlockLanczos solver under that license. There was considerable progress. I cleaned up many parts of the code, and obtained a notable speed improvement as well. Here is an update of the table I showed a few posts before... Test numbers: Code:
250 bit (c75): 966983290915691193309978723256242679920691599725908954700676674631843021151, factor 2166660942804222727904664493239497749 (121 bits) 260 bit (c79): 1405841556925214879486442303149405905001665390832024207523131521166180858938813, factor 67442061758209470276693046261 (96 bits) 270 bit (c82): 1862250708148641942690960669479655453558455211114669315970990094883431140413691063, factor 10131012562771734141422710036441 (103 bits) 280 bit (c85): 1124255513875297636309545520595967278216022444499041658628707408527781223819691431297, factor 102215939439709564867375590146800830467 (127 bits) 290 bit (c88): 1714638254587752572163813097700639375461962331568828069462681800410360936140704797782927, factor 118213006591252560634534512340187 (106 bits) 300 bit (c91): 1420645858748273907814607735289301772937523982503611315908309844444825252076415096683379639, factor 37001601928978936147228977127981 (105 bits) 310 bit (c94): 1504728267636063453737937497689480949465080318766121587410405654071410646640268764306438447777, factor 78826750148916873060910019453093 (106 bits) 320 bit (c97): 1520485323412100005395027120737880529234221500427246782721628759918241045629822884369979557002797, factor 26708589799276429676372119945486621441898621 (144 bits) 330 bit (c100): 1480043784220046663418677695827289686591546373143422996772942819804186784116179599537527918653095289, factor 62125133246163719560488825370539230413837849 (146 bits) 339 bit (c103): 1049318608354150719312497493537522807450843398013344125038634265072726507929273547596187217203526415417, factor 2254125207506194947465713574724871004145473711091193254616100967 (211 bits) Results: Code:
Bits Alpern (factor) PSIQS2.1 (factor) PSIQS4.0 (factor) Yafu 1.34.5 (factor) 250 126.8 3.57 59.0 1.66 35.9 1.01 35.5 1.0 260 187.1 3.74 111.6 2.23 68.5 1.37 49.9 1.0 270 402.4 4.31 227.3 2.43 141.8 1.51 93.3 1.0 280 735.6 5.16 424.1 2.97 256.4 1.80 142.5 1.0 290 2048.1 6.21 986.9 2.99 590.9 1.79 329.6 1.0 300 3763.5 6.34 1882.9 3.17 1199.1 2.02 593.1 1.0 310 7525.7 6.46 3571.4 3.06 2164.1 1.86 1163.9 1.0 320 12943.7 6.75 6028.7 3.14 3968.4 2.07 1916.4 1.0 330   12460.7 3.17 8669.0 2.21 3918.9 1.0 339     16669.1 2.48 6714.5 1.0 1. Relax the condition for "smooth candidates" passed to the trial division phase for bigger N. This resembles Silverman's comment on his Tparameter. I have a slightly different parametrization though, based on N instead of pMax (biggest prime in the prime base). 2. Use of polynomial Q2(x) = (2*a*x+b)^2  kN for kN == 1 (mod 8) and corresponding adjustment of the KnuthSchroeppel algorithm. 3. Use native memory for the sieve array using sun.misc.Unsafe. This allows array type conversion and Contini's "0x80808080" collect trick in Java. The full change log can be found here: http://www.tilmanneumann.de/docs/ChangeLog.txt Last fiddled with by Till on 20180304 at 18:36 Reason: fixed enumeration 
20180304, 20:29  #52 
"Tilman Neumann"
Jan 2016
Germany
1C6_{16} Posts 
Performance comparison
All methods were tested with 6 threads.
Sorry I forgot to repeat that point. 
20180305, 16:38  #53 
Aug 2006
3×1,993 Posts 
Wow, this is fantastic! Where do you go from here?

20180305, 16:46  #54 
"Ben"
Feb 2007
3516_{10} Posts 
Wow, impressive speedup. How much of that came from the Q2(x) polynomials? I haven't had the chance to experiment with that idea since it first came up, but it is on my todo list.
Nice work! 
20180305, 17:25  #55 
"Tilman Neumann"
Jan 2016
Germany
2×227 Posts 
Thanks! I'ld like to attack RDS' lattice QS idea, http://www.mersenneforum.org/showthr...628#post481628. Just posted a question there, maybe you can help?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
being able to use a latex package  wildrabbitt  Software  7  20180112 19:46 
Debian package of mprime  Matt  Linux  1  20070222 22:36 
Understandable QS java Implementation  ThiloHarich  Factoring  41  20051128 21:18 
Improvements for the CWI NFS software package  Istari  Factoring  30  20050712 20:20 
Free C++ package and looking for a bit of help.  Uncwilly  Programming  9  20050304 13:37 