mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2009-11-04, 06:33   #331
jrk
 
jrk's Avatar
 
May 2008

100010001112 Posts
Default

Quote:
Originally Posted by Batalov View Post
In SVN revision 377, I've added some oomph to the 64-bit asm-optimized sievers in the experimental branch. (up to 20% speedup for 15e, 10% for 14e, 16e, a bit less for others, while all retrogression tests hold.)
I compared experimental siever 14e from 353 to 377.

353:
Code:
$ ~/ggnfs-353/bin/gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000
Warning:  lowering FB_bound to 19999999.
total yield: 2800, q=20002007 (0.11330 sec/rel)
377:
Code:
$ ~/ggnfs-377/bin/gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 
Warning:  lowering FB_bound to 19999999.
total yield: 2800, q=20002007 (0.13144 sec/rel)


Core 2 Duo (65nm) @ 3.4 GHz. Linked with MPIR 1.2.1. Polynomial is the one from this thread: http://www.mersenneforum.org/showthread.php?t=12583
jrk is offline   Reply With Quote
Old 2009-11-04, 06:42   #332
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100101000100002 Posts
Default

Try L1_BITS 15 in piii/siever-config.h. Probably the cache in C2D is smaller than I am used to, and that's why your test is very valuable. [If the cache is smaller than we try to fit in it, then we get a setback, rather than acceleration.]

If still slow, try L1_BITS 14 (which is the old default). Should be no change from old versions (except for various patches), but if there is, then the builds are different. Did you use Jeff's, or both builds are yours?

Thx!

P.S. The largest changes should be for 15e and 16e, the other binaries already have fairly unrolled loops. Try old and new 15e and 16e. I tried on M941. Will try on this poly as well.

Last fiddled with by Batalov on 2009-11-04 at 07:02
Batalov is offline   Reply With Quote
Old 2009-11-04, 06:52   #333
axn
 
axn's Avatar
 
Jun 2003

22×33×47 Posts
Default

Quote:
Originally Posted by Batalov View Post
Did you use Jeff's, or both builds are yours?
Jeff doesn't do builds for *nix
axn is online now   Reply With Quote
Old 2009-11-04, 07:13   #334
jrk
 
jrk's Avatar
 
May 2008

21078 Posts
Default

Quote:
Originally Posted by Batalov View Post
Try L1_BITS 15 in piii/siever-config.h. Probably the cache in C2D is smaller than I am used to, and that's why your test is very valuable. [If the cache is smaller than we try to fit in it, then we get a setback, rather than acceleration.]
Do you mean athlon64/siever-config.h ?

In that file, I changed L1_BITS from 16 to 15, and now this happens:

Code:
$ ./gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 
Warning:  lowering FB_bound to 19999999.
SCHED_PATHOLOGY q0=20000003 k=11 excess=70                      
SCHED_PATHOLOGY q0=20000023 k=1 excess=134                      
SCHED_PATHOLOGY q0=20000023 k=10 excess=0                      
SCHED_PATHOLOGY q0=20000023 k=1 excess=244                      
SCHED_PATHOLOGY q0=20000059 k=1 excess=184                      
SCHED_PATHOLOGY q0=20000059 k=1 excess=388                      
SCHED_PATHOLOGY q0=20000059 k=2 excess=394                      
SCHED_PATHOLOGY q0=20000059 k=1 excess=120                      
SCHED_PATHOLOGY q0=20000059 k=1 excess=166                      
SCHED_PATHOLOGY q0=20000063 k=2 excess=14                      
SCHED_PATHOLOGY q0=20000081 k=1 excess=478                      
SCHED_PATHOLOGY q0=20000093 k=1 excess=450                      
SCHED_PATHOLOGY q0=20000159 k=2 excess=92                      
SCHED_PATHOLOGY q0=20000159 k=2 excess=454                      
SCHED_PATHOLOGY q0=20000171 k=1 excess=92                      
SCHED_PATHOLOGY q0=20000171 k=1 excess=68                      
SCHED_PATHOLOGY q0=20000213 k=12 excess=328                      
SCHED_PATHOLOGY q0=20000221 k=1 excess=116                      
SCHED_PATHOLOGY q0=20000243 k=1 excess=80                      
SCHED_PATHOLOGY q0=20000243 k=5 excess=332                      
SCHED_PATHOLOGY q0=20000269 k=2 excess=128                      
SCHED_PATHOLOGY q0=20000287 k=1 excess=412                      
SCHED_PATHOLOGY q0=20000297 k=3 excess=142                      
SCHED_PATHOLOGY q0=20000329 k=8 excess=4                      
SCHED_PATHOLOGY q0=20000353 k=1 excess=120                      
SCHED_PATHOLOGY q0=20000353 k=1 excess=70                      
SCHED_PATHOLOGY q0=20000353 k=1 excess=106                      
SCHED_PATHOLOGY q0=20000389 k=1 excess=268                      
SCHED_PATHOLOGY q0=20000429 k=14 excess=152                      
SCHED_PATHOLOGY q0=20000443 k=3 excess=36                      
SCHED_PATHOLOGY q0=20000471 k=7 excess=32                      
SCHED_PATHOLOGY q0=20000471 k=1 excess=354                      
SCHED_PATHOLOGY q0=20000531 k=2 excess=118                      
SCHED_PATHOLOGY q0=20000531 k=3 excess=166                      
SCHED_PATHOLOGY q0=20000531 k=1 excess=200                      
SCHED_PATHOLOGY q0=20000567 k=1 excess=158                      
SCHED_PATHOLOGY q0=20000569 k=1 excess=110                      
SCHED_PATHOLOGY q0=20000573 k=2 excess=502                      
SCHED_PATHOLOGY q0=20000573 k=14 excess=124                      
SCHED_PATHOLOGY q0=20000599 k=3 excess=72                      
SCHED_PATHOLOGY q0=20000623 k=3 excess=98                      
SCHED_PATHOLOGY q0=20000689 k=1 excess=242                      
SCHED_PATHOLOGY q0=20000693 k=8 excess=62                      
SCHED_PATHOLOGY q0=20000713 k=7 excess=268                      
SCHED_PATHOLOGY q0=20000723 k=1 excess=186                      
SCHED_PATHOLOGY q0=20000723 k=2 excess=108                      
SCHED_PATHOLOGY q0=20000753 k=1 excess=404                      
SCHED_PATHOLOGY q0=20000753 k=2 excess=324                      
SCHED_PATHOLOGY q0=20000779 k=2 excess=96                      
SCHED_PATHOLOGY q0=20000791 k=15 excess=130                      
SCHED_PATHOLOGY q0=20000801 k=2 excess=576                      
SCHED_PATHOLOGY q0=20000821 k=5 excess=10                      
SCHED_PATHOLOGY q0=20000821 k=10 excess=110                      
SCHED_PATHOLOGY q0=20000837 k=6 excess=22                      
SCHED_PATHOLOGY q0=20000839 k=1 excess=224                      
SCHED_PATHOLOGY q0=20000839 k=1 excess=296                      
SCHED_PATHOLOGY q0=20000839 k=3 excess=120                      
SCHED_PATHOLOGY q0=20000843 k=1 excess=270                      
SCHED_PATHOLOGY q0=20000843 k=1 excess=88                      
SCHED_PATHOLOGY q0=20000861 k=1 excess=8                      
SCHED_PATHOLOGY q0=20000861 k=1 excess=278                      
SCHED_PATHOLOGY q0=20000867 k=1 excess=500                      
SCHED_PATHOLOGY q0=20000867 k=13 excess=82                      
SCHED_PATHOLOGY q0=20000867 k=1 excess=440                      
SCHED_PATHOLOGY q0=20000873 k=1 excess=392                      
SCHED_PATHOLOGY q0=20000909 k=1 excess=552                      
SCHED_PATHOLOGY q0=20000917 k=2 excess=354                      
SCHED_PATHOLOGY q0=20000951 k=1 excess=216                      
SCHED_PATHOLOGY q0=20000969 k=1 excess=326                      
SCHED_PATHOLOGY q0=20000971 k=15 excess=86                      
SCHED_PATHOLOGY q0=20000971 k=3 excess=120                      
SCHED_PATHOLOGY q0=20000971 k=1 excess=278                      
SCHED_PATHOLOGY q0=20000971 k=2 excess=266                      
SCHED_PATHOLOGY q0=20000971 k=1 excess=94                      
SCHED_PATHOLOGY q0=20001001 k=14 excess=222                      
SCHED_PATHOLOGY q0=20001001 k=1 excess=116                      
SCHED_PATHOLOGY q0=20001019 k=9 excess=36                      
SCHED_PATHOLOGY q0=20001067 k=2 excess=396                      
SCHED_PATHOLOGY q0=20001073 k=1 excess=626                      
SCHED_PATHOLOGY q0=20001073 k=10 excess=66                      
SCHED_PATHOLOGY q0=20001083 k=3 excess=156                      
SCHED_PATHOLOGY q0=20001083 k=1 excess=534                      
SCHED_PATHOLOGY q0=20001083 k=2 excess=168                      
SCHED_PATHOLOGY q0=20001151 k=6 excess=322                      
SCHED_PATHOLOGY q0=20001161 k=1 excess=38                      
SCHED_PATHOLOGY q0=20001181 k=12 excess=88                      
SCHED_PATHOLOGY q0=20001181 k=1 excess=126                      
SCHED_PATHOLOGY q0=20001203 k=1 excess=192                      
SCHED_PATHOLOGY q0=20001227 k=1 excess=170                      
SCHED_PATHOLOGY q0=20001227 k=2 excess=38                      
SCHED_PATHOLOGY q0=20001239 k=8 excess=58                      
SCHED_PATHOLOGY q0=20001239 k=4 excess=136                      
SCHED_PATHOLOGY q0=20001259 k=4 excess=530                      
SCHED_PATHOLOGY q0=20001259 k=1 excess=314                      
SCHED_PATHOLOGY q0=20001259 k=1 excess=102                      
SCHED_PATHOLOGY q0=20001263 k=1 excess=208                      
SCHED_PATHOLOGY q0=20001269 k=1 excess=46                      
SCHED_PATHOLOGY q0=20001341 k=1 excess=84                      
SCHED_PATHOLOGY q0=20001341 k=1 excess=190                      
SCHED_PATHOLOGY q0=20001439 k=1 excess=124                      
SCHED_PATHOLOGY q0=20001439 k=1 excess=308                      
SCHED_PATHOLOGY q0=20001491 k=4 excess=62                      
SCHED_PATHOLOGY q0=20001551 k=3 excess=100                      
SCHED_PATHOLOGY q0=20001551 k=2 excess=590                      
SCHED_PATHOLOGY q0=20001551 k=1 excess=272                      
SCHED_PATHOLOGY q0=20001557 k=3 excess=162                      
SCHED_PATHOLOGY q0=20001613 k=4 excess=168                      
SCHED_PATHOLOGY q0=20001613 k=1 excess=256                      
SCHED_PATHOLOGY q0=20001659 k=4 excess=72                      
SCHED_PATHOLOGY q0=20001679 k=3 excess=56                      
SCHED_PATHOLOGY q0=20001679 k=1 excess=214                      
SCHED_PATHOLOGY q0=20001763 k=12 excess=116                      
SCHED_PATHOLOGY q0=20001769 k=2 excess=102                      
SCHED_PATHOLOGY q0=20001799 k=1 excess=236                      
SCHED_PATHOLOGY q0=20001811 k=3 excess=48                      
SCHED_PATHOLOGY q0=20001833 k=7 excess=126                      
SCHED_PATHOLOGY q0=20001833 k=2 excess=62                      
SCHED_PATHOLOGY q0=20001833 k=3 excess=496                      
SCHED_PATHOLOGY q0=20001847 k=1 excess=292                      
SCHED_PATHOLOGY q0=20001853 k=1 excess=372                      
SCHED_PATHOLOGY q0=20001899 k=1 excess=190                      
SCHED_PATHOLOGY q0=20001959 k=2 excess=66                      
SCHED_PATHOLOGY q0=20001959 k=2 excess=416                      
SCHED_PATHOLOGY q0=20001977 k=1 excess=136                      
SCHED_PATHOLOGY q0=20001977 k=1 excess=404                      
total yield: 0, q=20002007 (inf sec/rel)
By the way, the shared cache size on this C2D is 4MB.

Quote:
Originally Posted by Batalov View Post
Did you use Jeff's, or both builds are yours?
They were both mine.
jrk is offline   Reply With Quote
Old 2009-11-04, 07:21   #335
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

251016 Posts
Default

In this thread I only wanted to discuss Windows builds, because I have no access to them - this is Jeff's and Brian's domain.

The asm64-bit builds are tricky -- if you change L1_BITS, don't forget to change l1_bits in ls-defs.asm and of course clean up all .o and .a, and build all as listed in INSTALL file. Otherwise, you will get a broken build, surely.
Batalov is offline   Reply With Quote
Old 2009-11-04, 07:27   #336
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by Batalov View Post
In this thread I only wanted to discuss Windows builds, because I have no access to them - this is Jeff's and Brian's domain.
Where shall we discuss this then?

Quote:
Originally Posted by Batalov View Post
The asm64-bit builds are tricky -- if you change L1_BITS, don't forget to change l1_bits in ls-defs.asm and of course clean up all .o and .a, and build all as listed in INSTALL file. Otherwise, you will get a broken build, surely.
Yep, I was starting from a clean directory each time.

I will change l1_bits as you suggested next. Right now I'm running siever 15e without any changes, will report the numbers for it in a bit.
jrk is offline   Reply With Quote
Old 2009-11-04, 07:39   #337
jrk
 
jrk's Avatar
 
May 2008

100010001112 Posts
Default

Quote:
Originally Posted by jrk View Post
I'm running siever 15e without any changes, will report the numbers for it in a bit.
353:
Code:
$ ~/ggnfs-353/bin/gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000
Warning:  lowering FB_bound to 19999999.
total yield: 3479, q=20001001 (0.14711 sec/rel)
377:
Code:
$ ~/ggnfs-377/bin/gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000 
Warning:  lowering FB_bound to 19999999.
total yield: 3479, q=20001001 (0.18397 sec/rel)
jrk is offline   Reply With Quote
Old 2009-11-04, 07:54   #338
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

251016 Posts
Default

Apparently, for your CPU, L1_BITS 15 is better!
This is important for Greg and NFS@HOME binaries.

On Phenom 940, timings for this poly on several regions (20M, 45M, 200M) are better by a few percent with both new 14e and 15e over old ones.
Timings for M941 are better by 10%+ (M941 was tested with 15e, 16e and on both sides). The output files are 100% consistent (to truly compare them, it is best to sed 's,:.*,,' i.e. cut off all factors, leave only a,b).
_________

P.S. With a bit of an overwrite, a 'thick' binary can be built which will have all optimized variants inside, and include a benchmark that would in turn prepare a config file, or even train itself for a specific project. The current kitchen is to try everything for one's own CPU and save the best binary.
Same for ECM, right? I still keep two ecm binaries around (-enable/-disable-redc). Should be one in an ideal world.

Last fiddled with by Batalov on 2009-11-04 at 08:08
Batalov is offline   Reply With Quote
Old 2009-11-04, 08:28   #339
jrk
 
jrk's Avatar
 
May 2008

3×5×73 Posts
Default

Quote:
Originally Posted by jrk View Post
I will change l1_bits as you suggested next.
Rev 377 & Changing L1_BITS to 15, testing both 14e and 15e again:

Code:
$ ./gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 
Warning:  lowering FB_bound to 19999999.
total yield: 2800, q=20002007 (0.11304 sec/rel) 
$ ./gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000 
Warning:  lowering FB_bound to 19999999.
total yield: 3479, q=20001001 (0.14816 sec/rel)
Now virtually the same as 353 on this c157.
jrk is offline   Reply With Quote
Old 2009-11-04, 08:31   #340
jrk
 
jrk's Avatar
 
May 2008

100010001112 Posts
Default

Again, that was with the athlon64 asm code.
jrk is offline   Reply With Quote
Old 2009-11-04, 09:35   #341
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

24·593 Posts
Default

Ok, I think I got it now. ("I learned something today", like the say in South park.)

In terms of L1 data cache size, all Core2's (duos, quads) and even Nehalem have 32Kb per core (=215). Phenoms, Opterons have 64Kb per core (=216).
So, for Intel chips, keep L1_BITS at 15, but for AMD chips, 16 gives a bit of an edge. L2 cache is slower (a dozen cycles penalty) and that showed in your tests; its size doesn't matter.

Thanks, Jayson!

P.S. i7 has a relatively fast L2 cache; remains to be interesting to test.
Batalov is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Installation of GGNFS LegionMammal978 Msieve 17 2017-01-20 19:49
Running other programs while running Prime95. Neimanator PrimeNet 14 2013-08-10 20:15
Error running GGNFS+msieve+factmsieve.py D. B. Staple Factoring 6 2011-06-12 22:23
GGNFS or something better? Zeta-Flux Factoring 1 2007-08-07 22:40
ggnfs ATH Factoring 3 2006-08-12 22:50

All times are UTC. The time now is 08:15.


Tue Jul 27 08:15:50 UTC 2021 up 4 days, 2:44, 0 users, load averages: 1.93, 1.90, 1.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.