![]() |
QS/NFS crossover points
1 Attachment(s)
I finally got cado-nfs-1.0 built for x86_64 linux, and decided to run some experiments to determine the crossover points for switching between QS and NFS using different software packages. Posting the results here in case anyone else is interested.
QS packages tested: * msieve-1.48 * yafu-1.23 NFS packages tested: * factMsieve.pl (msieve-1.48 poly/postproc, ggnfs sieving) * yafu-nfs (msieve-1.48 poly/postproc, ggnfs sieving) * cado-nfs-1.0 build/test environment: [CODE]% uname -r 2.6.18-128.4.1.el5 % gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) % cat /proc/cpuinfo <snip> vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5460 @ 3.16GHz stepping : 6 cpu MHz : 3166.000 <snip> [/CODE] Results: |
Nice work Ben, this is great data. Two things stand out to me:
1) YAFU's QS is really, really fast at the low end -- more than 2x faster than any other. Wow. 2) YAFU-NFS is slightly but consistently slower than factMsieve. I haven't looked at your new NFS code yet but I would have expected it to be doing basically the same thing as factMsieve -- any idea what causes that difference? |
[QUOTE=bchaffin;249205]Nice work Ben, this is great data. Two things stand out to me:
1) YAFU's QS is really, really fast at the low end -- more than 2x faster than any other. Wow. 2) YAFU-NFS is slightly but consistently slower than factMsieve. I haven't looked at your new NFS code yet but I would have expected it to be doing basically the same thing as factMsieve -- any idea what causes that difference?[/QUOTE] Thanks! The major difference between factMsieve and yafu-nfs is that factMsieve implements a minimum relations threshold whereas yafu-nfs doesn't. factMsieve waits until approximately enough relations have been collected before first calling msieve for postprocessing, so filtering is performed quite a few less times compared to yafu-nfs. That's one source of the difference that I can easily fix. There might also be differences in the job parameters used. I haven't compared that yet. |
1 Attachment(s)
Perhaps also of interest is the polynomial score achieved with the two different NFS packages. Unfortunately I didn't keep the time needed in polyselect for the cado jobs. Cado polyselect definitely didn't take as long.
|
crossover data
1 Attachment(s)
This topic came up recently and I've been meaning to update this comparison from several years ago anyway.
QS packages tested: * msieve-1.53 * yafu-1.34.5 NFS packages tested: * factmsieve.py (msieve-1.53 poly/postproc, ggnfs sieving) * yafu-nfs (msieve-1.53 poly/postproc, ggnfs sieving) * cado-nfs-2.1.1 build/test environment: [CODE] % uname -r 2.6.32-358.6.2.el6.x86_64 % gcc -v gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) % cat /proc/cpuinfo vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz stepping : 7 cpu MHz : 3099.837 cache size : 20480 KB [/CODE] When I get time I will try out the git version of Cado-NFS. Paul Z. has indicated it might have improved parameter selection for these smaller inputs. (Note: don't extrapolate performance here to performance at large digit sizes). Also, I need to update yafu to use the superblocksize parameter of msieve for LA. It is lots faster for this cpu (factmsieve uses it, yafu currently does not). Looks like the QS/NFS crossover ranges from 86 digits to 100+ digits, depending on your choice of tools. Using yafu for both it is about 93 digits on this CPU. |
Random observations:
* CADO-NFS Murphy-E and msieve Murphy-E are not comparable (IIRC). Have you normalized the score by computing the Murphy-E using the same tool for all polys? * Time spent in poly selection is just way too much in this range. Even the CADO-NFS ones, which is the more frugal of the lot. Honestly, just search only the leading coefficients 6-120, and you should be good to go. * There is a _big_ difference in LA time between yafu-nfs and factmsieve. Since both are using msieve, that indicates that yafu's job parameters are on the high side, generating much more relations? * Did you use the liux64 sievers (or, if on Windows, did you use the ones which are as fast as the linux ones?) * If everything is carefully tuned (including poly select), the msieve+GGNFS route probably will have crossover at or below 90 digits. |
[QUOTE=axn;402185] Random observations:
* CADO-NFS Murphy-E and msieve Murphy-E are not comparable (IIRC). Have you normalized the score by computing the Murphy-E using the same tool for all polys?[/QUOTE] No, forgot that fact. And I can't seem to find a log file of any kind so the poly's CADO used are lost as far as I can see. If I test the git version I will try to capture them and re-compute their score with msieve. [QUOTE=axn;402185] * Time spent in poly selection is just way too much in this range. Even the CADO-NFS ones, which is the more frugal of the lot. Honestly, just search only the leading coefficients 6-120, and you should be good to go. [/QUOTE] Just capturing the current behavior of the tools. For the yafu and factmsieve tools, the best leading coefficients were found at: [CODE] digit size best leading coefficient 85 60 90 1980 95 180 100 1680 [/CODE] I don't know how much these "best" ones improved on the best one found in your proposed range. [QUOTE=axn;402185] * There is a _big_ difference in LA time between yafu-nfs and factmsieve. Since both are using msieve, that indicates that yafu's job parameters are on the high side, generating much more relations? [/QUOTE] No, the parameters were the same. The difference is due to the use of superblocksizes. factmsieve uses them, yafu doesn't (it requires a command line option to specify the superblocksize). Apparently the 20 MB L3 cache in this CPU helps a bunch. [QUOTE=axn;402185] * Did you use the liux64 sievers (or, if on Windows, did you use the ones which are as fast as the linux ones?) [/QUOTE] Yes, 64 bit linux sievers. [QUOTE=axn;402185] * If everything is carefully tuned (including poly select), the msieve+GGNFS route probably will have crossover at or below 90 digits.[/QUOTE] Maybe... but seems unlikely. Here is the raw data: [CODE] total time digit size tool 85 90 95 100 yafu-qs 386 1473 4082 8809 msieve-qs 873 2454 7184 17236 yafu-nfs 1054 1973 4215 6787 cado-nfs 1271 2283 5659 11066.2 factmsieve.py 1080 2160 3600 6929 [/CODE] Subtracting a few hundred seconds from poly time (likely increasing sieving time some amount) would not bring NFS times down that far. But it is kinda splitting hairs - use your personal preference anywhere between 90-95 digits and you will be pretty close to optimal. |
I've found a crossover berween yafu's QS and gnfs at about 90 digits when using my GPU for msieve polynomial selection and the 64 bit Linux sievers.
Chris |
1 Attachment(s)
Time for another update: Paul Z. asked me to check out CADO-2.2.0
QS packages tested: * yafu (1.34.5) NFS packages tested: * yafu-nfs (yafu-1.34.5/msieve-svn-991/ggnfs-asm64) * cado-nfs-2.2.0 Run on a Xeon Haswell system (linux); all tests single threaded. Here is the total time data: [CODE] input digits yafu-1.34.5 qs yafu-1.34.5/msieve-svn-991/ggnfs-asm64 cado-nfs-2.2.0 60 2.2 44.42 65 6.6 64.24 70 13.8 112.65 75 50.8 256.5 80 104.3 470 85 276 747.7 738 90 1077 1488 1457 95 3069 3390 3306 100 6689 5377 6431 105 22155 10743 12986 110 21773 27662 [/CODE] Generally, CADO poly select is faster while yafu/msieve/ggnfs has faster sieving/LA. CADO has good parameter selection down to at least c60 (it does feel weird running NFS on a c60). Between the NFS implementations, CADO wins up to 100 digits. The QS/NFS crossover is about 96-97 digits in both cases. |
Are you sure the 22k for SIQS C105 is the right one? It seems that 12k-15k or (max) 18k should be more correct one. Some systems have crossovers over 105 digits.
If indeed the 22k is the correct one, then why? What the explanation can be? Did you run single/few composite tests or you tested a bunch of them for each size? |
That's the correct time, yes. It fits well on the trendline, and fits compared to the few previous times (roughly factor 3 increase for each 5 digits). I just ran one number... running several c105 would provide a better estimate of performance, sure, but I'm lazy.
Single-threaded c105 by QS in 6 hours doesn't seem too bad to me... |
| All times are UTC. The time now is 20:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.