mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2017-01-31, 13:17   #45
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×227 Posts
Default Version 02

Hi all,
I am happy to be able to inform you that version 02 of the PSIQS package has been released.

It is about factor 3 faster than version 01 for larger numbers (300 bit and more). The memory consumption has been reduced dramatically, too: While before 12 GB were not enough to factor a 340 bit number, this would require "only" about 3.4 GB now. 12 GB should be about what is needed to factor a 380 bit number. (but of course 'ld recommend to use Yafu or msieve for that).

As usual you'll find the release, together with more information, on http://www.tilman-neumann.de/

The change log contains some informations about the details that led to the improvements.

Big thanks to Dario Alpern for his permission to use his Block-Lanczos solver and to Graeme Willoughby for several helpful comments on Sqrt, n.th root and PurePowerTest algorithms.
Till is offline   Reply With Quote
Old 2017-02-05, 17:07   #46
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2·227 Posts
Default New minor release 2.1

Shortly after the 2.0 release I found another (P)SIQS performance improvement:
The ainvp computation is much faster now, exploiting that (1/a) mod p = (1/(a%p) mod p

Other improvements:
* The test for pure powers has been sped up by another factor around 10 --> in total the speedup since release 01 was about factor 200 ! More thanks to Graeme Willoughby here.

* Thread-handling in PSIQS has been improved for small N. The parallel version is stable for N>56 bit now.
Till is offline   Reply With Quote
Old 2017-02-06, 11:58   #47
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2·227 Posts
Default Performance comparison

Hi all,
for those who like to see some numbers, here is a new comparison between PSIQS 2.1, Dario Alperns Siqs, and Yafu 1.34.5, 64bit Windows version.
All programs were run with 6 threads.

The test numbers were random generated such that they have two factors, the smaller factor having a uniformly distributed bit-length between ld(N)/3 and ld(N)/2.
The following test numbers turned up:
Code:
250 bit (c75):  966983290915691193309978723256242679920691599725908954700676674631843021151, factor 2166660942804222727904664493239497749 (121 bits)
260 bit (c79):  1405841556925214879486442303149405905001665390832024207523131521166180858938813, factor 67442061758209470276693046261 (96 bits)
270 bit (c82):  1862250708148641942690960669479655453558455211114669315970990094883431140413691063, factor 10131012562771734141422710036441 (103 bits)
280 bit (c85):  1124255513875297636309545520595967278216022444499041658628707408527781223819691431297, factor 102215939439709564867375590146800830467 (127 bits)
290 bit (c88):  1714638254587752572163813097700639375461962331568828069462681800410360936140704797782927, factor 118213006591252560634534512340187 (106 bits)
300 bit (c91):  1420645858748273907814607735289301772937523982503611315908309844444825252076415096683379639, factor 37001601928978936147228977127981 (105 bits)
310 bit (c94):  1504728267636063453737937497689480949465080318766121587410405654071410646640268764306438447777, factor 78826750148916873060910019453093 (106 bits)
320 bit (c97):  1520485323412100005395027120737880529234221500427246782721628759918241045629822884369979557002797, factor 26708589799276429676372119945486621441898621 (144 bits)
330 bit (c100): 1480043784220046663418677695827289686591546373143422996772942819804186784116179599537527918653095289, factor 62125133246163719560488825370539230413837849 (146 bits)
Here are the results I obtained on my Sandybridge Notebook (it's not the fastest one):
Code:
Bits    Alpern        (factor)    PSIQS2.1    (factor)    Yafu 1.34.5    (factor)

250      126.8        3.57           59.0        1.66          35.5        1.0
260      187.1        3.74          111.6        2.23          49.9        1.0
270      402.4        4.31          227.3        2.43          93.3        1.0
280      735.6        5.16          424.1        2.97         142.5        1.0
290     2048.1        6.21          986.9        2.99         329.6        1.0
300     3763.5        6.34         1882.9        3.17         593.1        1.0
310     7525.7        6.46         3571.4        3.06        1163.9        1.0
320    12943.7        6.75         6028.7        3.14        1916.4        1.0
330       -             -         12460.7        3.17        3918.9        1.0
All measures are in seconds, and the factor columns are relative to Yafu.
The 330 bit run of Dario Alperns Siqs was canceled because I didn't want to wait another 6-7 hours.

Though I am already quite happy about the performance of PSIQS 2.1, it is interesting to see that Yafu scales better than both Java programs.
I am not quite sure why this is so. My guess would be that the more data is produced, the more the Java programs suffer from cache misses.
(in Java, segmented sieves do not work well because the JVM running as the toplevel task uses most of the L1 and L2 cache itself)
Till is offline   Reply With Quote
Old 2017-02-06, 14:58   #48
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×3×293 Posts
Default

Quote:
Originally Posted by Till View Post
Though I am already quite happy about the performance of PSIQS 2.1, it is interesting to see that Yafu scales better than both Java programs.
I am not quite sure why this is so. My guess would be that the more data is produced, the more the Java programs suffer from cache misses.
(in Java, segmented sieves do not work well because the JVM running as the toplevel task uses most of the L1 and L2 cache itself)
Well, scales better in the leading-constant-term-of-big-O-analysis sense, maybe. Thousands of lines of assembler code have been written to optimize the data flow for large numbers in yafu, so that could indeed be the cause if Java hampers that very thing.

Do you use a bucket sieve for very large primes or is that also not an effective technique in Java?
bsquared is offline   Reply With Quote
Old 2017-02-06, 15:28   #49
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

1110001102 Posts
Default

I didn't try a bucket sieve, because I believe that it is inefficient in Java. The reason is that the bucket sieve is also designed to optimize cache access, in combination with a segmented sieve for smaller primes.

Instead, having a monolithic sieve, I do the following: I just allocate the sieve array with size pMax+1, and for large primes, one value is always written. This means that we often write "into the blue" (into locations x > sieve size), but then no bound checks are required at all. That was one key ingredient to speed up the sieve.
Till is offline   Reply With Quote
Old 2017-09-24, 13:20   #50
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

1110001102 Posts
Default New PSIQS release

Today I released PSIQS 3.0, which you find here:
http://www.tilman-neumann.de/psiqs.html

In terms of asymptotic SIQS speed there was no big improvement, just 5% or so. But I added a few new algorithms, improved another few, and tried to raise the quality of many parts of the code.

Most notable improvements were:

* Trial division (in fact I mean re-sieving) performance improved almost by factor 2:
Since we have many primes greater than the sieve array, it is that much faster to compute
int xModP = xAbs<p ? x : x % p;
instead of
int xModP = x % p;
where xAbs is the absolute value of the resieved smallest x-solution x.

* Implementation of BPSW probable prime test: always faster than Java's built-in probable prime test, quite a lot for small arguments (<64 bit), only slightly better asymptotically.

* Faster and less memory-consuming sieve of Eratosthenes, replaced old sequential prime generator.

* Better adjustment of the quadratic sieve for small N; the transition between SquFoF and SIQS dropped from 67/68 bits to 60/61 bits.

* Better adjustment of Knuth-Schroeppel algorithm for QS for small N, implementation of Knuth-Schroeppel multipliers for CFrac

* SqrtExact: faster for N<=106 bits.
Till is offline   Reply With Quote
Old 2018-03-04, 18:20   #51
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

45410 Posts
Default PSIQS 4.0 release

Hi all,
today I released PSIQS 4.0, which can be found here: http://www.tilman-neumann.de/psiqs.html

It is now licensed under GPL3, and my dearest thanks go once again to Dario alpern to permit me to republish his Block-Lanczos solver under that license.

There was considerable progress. I cleaned up many parts of the code, and obtained a notable speed improvement as well. Here is an update of the table I showed a few posts before...

Test numbers:
Code:
250 bit (c75):  966983290915691193309978723256242679920691599725908954700676674631843021151, factor 2166660942804222727904664493239497749 (121 bits)
260 bit (c79):  1405841556925214879486442303149405905001665390832024207523131521166180858938813, factor 67442061758209470276693046261 (96 bits)
270 bit (c82):  1862250708148641942690960669479655453558455211114669315970990094883431140413691063, factor 10131012562771734141422710036441 (103 bits)
280 bit (c85):  1124255513875297636309545520595967278216022444499041658628707408527781223819691431297, factor 102215939439709564867375590146800830467 (127 bits)
290 bit (c88):  1714638254587752572163813097700639375461962331568828069462681800410360936140704797782927, factor 118213006591252560634534512340187 (106 bits)
300 bit (c91):  1420645858748273907814607735289301772937523982503611315908309844444825252076415096683379639, factor 37001601928978936147228977127981 (105 bits)
310 bit (c94):  1504728267636063453737937497689480949465080318766121587410405654071410646640268764306438447777, factor 78826750148916873060910019453093 (106 bits)
320 bit (c97):  1520485323412100005395027120737880529234221500427246782721628759918241045629822884369979557002797, factor 26708589799276429676372119945486621441898621 (144 bits)
330 bit (c100): 1480043784220046663418677695827289686591546373143422996772942819804186784116179599537527918653095289, factor 62125133246163719560488825370539230413837849 (146 bits)
339 bit (c103): 1049318608354150719312497493537522807450843398013344125038634265072726507929273547596187217203526415417, factor 2254125207506194947465713574724871004145473711091193254616100967 (211 bits)
(The last number is new and was supposed to be 340 bit. But sometimes my test number generator falls one bit short)

Results:
Code:
Bits    Alpern  (factor)    PSIQS2.1  (factor)    PSIQS4.0  (factor)   Yafu 1.34.5 (factor)

250      126.8    3.57          59.0    1.66          35.9    1.01          35.5     1.0
260      187.1    3.74         111.6    2.23          68.5    1.37          49.9     1.0
270      402.4    4.31         227.3    2.43         141.8    1.51          93.3     1.0
280      735.6    5.16         424.1    2.97         256.4    1.80         142.5     1.0
290     2048.1    6.21         986.9    2.99         590.9    1.79         329.6     1.0
300     3763.5    6.34        1882.9    3.17        1199.1    2.02         593.1     1.0
310     7525.7    6.46        3571.4    3.06        2164.1    1.86        1163.9     1.0
320    12943.7    6.75        6028.7    3.14        3968.4    2.07        1916.4     1.0
330       -         -        12460.7    3.17        8669.0    2.21        3918.9     1.0
339       -         -           -         -        16669.1    2.48        6714.5     1.0
The performance improvements mostly stem from
1. Relax the condition for "smooth candidates" passed to the trial division phase for bigger N. This resembles Silverman's comment on his T-parameter. I have a slightly different parametrization though, based on N instead of pMax (biggest prime in the prime base).
2. Use of polynomial Q2(x) = (2*a*x+b)^2 - kN for kN == 1 (mod 8) and corresponding adjustment of the Knuth-Schroeppel algorithm.
3. Use native memory for the sieve array using sun.misc.Unsafe. This allows array type conversion and Contini's "0x80808080" collect trick in Java.

The full change log can be found here: http://www.tilman-neumann.de/docs/ChangeLog.txt

Last fiddled with by Till on 2018-03-04 at 18:36 Reason: fixed enumeration
Till is offline   Reply With Quote
Old 2018-03-04, 20:29   #52
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

1C616 Posts
Default Performance comparison

All methods were tested with 6 threads.
Sorry I forgot to repeat that point.
Till is offline   Reply With Quote
Old 2018-03-05, 16:38   #53
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3×1,993 Posts
Default

Wow, this is fantastic! Where do you go from here?
CRGreathouse is offline   Reply With Quote
Old 2018-03-05, 16:46   #54
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

351610 Posts
Default

Wow, impressive speedup. How much of that came from the Q2(x) polynomials? I haven't had the chance to experiment with that idea since it first came up, but it is on my todo list.

Nice work!
bsquared is offline   Reply With Quote
Old 2018-03-05, 17:25   #55
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2×227 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Wow, this is fantastic! Where do you go from here?
Thanks! I'ld like to attack RDS' lattice QS idea, http://www.mersenneforum.org/showthr...628#post481628. Just posted a question there, maybe you can help?
Till is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
being able to use a latex package wildrabbitt Software 7 2018-01-12 19:46
Debian package of mprime Matt Linux 1 2007-02-22 22:36
Understandable QS java Implementation ThiloHarich Factoring 41 2005-11-28 21:18
Improvements for the CWI NFS software package Istari Factoring 30 2005-07-12 20:20
Free C++ package and looking for a bit of help. Uncwilly Programming 9 2005-03-04 13:37

All times are UTC. The time now is 19:31.


Tue Jul 27 19:31:24 UTC 2021 up 4 days, 14 hrs, 0 users, load averages: 1.37, 1.64, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.