![]() |
|
|
#331 | |
|
May 2008
100010001112 Posts |
Quote:
353: Code:
$ ~/ggnfs-353/bin/gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 Warning: lowering FB_bound to 19999999. total yield: 2800, q=20002007 (0.11330 sec/rel) Code:
$ ~/ggnfs-377/bin/gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 Warning: lowering FB_bound to 19999999. total yield: 2800, q=20002007 (0.13144 sec/rel) ![]() Core 2 Duo (65nm) @ 3.4 GHz. Linked with MPIR 1.2.1. Polynomial is the one from this thread: http://www.mersenneforum.org/showthread.php?t=12583 |
|
|
|
|
|
|
#332 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
100101000100002 Posts |
Try L1_BITS 15 in piii/siever-config.h. Probably the cache in C2D is smaller than I am used to, and that's why your test is very valuable. [If the cache is smaller than we try to fit in it, then we get a setback, rather than acceleration.]
If still slow, try L1_BITS 14 (which is the old default). Should be no change from old versions (except for various patches), but if there is, then the builds are different. Did you use Jeff's, or both builds are yours? Thx! P.S. The largest changes should be for 15e and 16e, the other binaries already have fairly unrolled loops. Try old and new 15e and 16e. I tried on M941. Will try on this poly as well. Last fiddled with by Batalov on 2009-11-04 at 07:02 |
|
|
|
|
|
#333 |
|
Jun 2003
22×33×47 Posts |
|
|
|
|
|
|
#334 | |
|
May 2008
21078 Posts |
Quote:
In that file, I changed L1_BITS from 16 to 15, and now this happens: Code:
$ ./gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 Warning: lowering FB_bound to 19999999. SCHED_PATHOLOGY q0=20000003 k=11 excess=70 SCHED_PATHOLOGY q0=20000023 k=1 excess=134 SCHED_PATHOLOGY q0=20000023 k=10 excess=0 SCHED_PATHOLOGY q0=20000023 k=1 excess=244 SCHED_PATHOLOGY q0=20000059 k=1 excess=184 SCHED_PATHOLOGY q0=20000059 k=1 excess=388 SCHED_PATHOLOGY q0=20000059 k=2 excess=394 SCHED_PATHOLOGY q0=20000059 k=1 excess=120 SCHED_PATHOLOGY q0=20000059 k=1 excess=166 SCHED_PATHOLOGY q0=20000063 k=2 excess=14 SCHED_PATHOLOGY q0=20000081 k=1 excess=478 SCHED_PATHOLOGY q0=20000093 k=1 excess=450 SCHED_PATHOLOGY q0=20000159 k=2 excess=92 SCHED_PATHOLOGY q0=20000159 k=2 excess=454 SCHED_PATHOLOGY q0=20000171 k=1 excess=92 SCHED_PATHOLOGY q0=20000171 k=1 excess=68 SCHED_PATHOLOGY q0=20000213 k=12 excess=328 SCHED_PATHOLOGY q0=20000221 k=1 excess=116 SCHED_PATHOLOGY q0=20000243 k=1 excess=80 SCHED_PATHOLOGY q0=20000243 k=5 excess=332 SCHED_PATHOLOGY q0=20000269 k=2 excess=128 SCHED_PATHOLOGY q0=20000287 k=1 excess=412 SCHED_PATHOLOGY q0=20000297 k=3 excess=142 SCHED_PATHOLOGY q0=20000329 k=8 excess=4 SCHED_PATHOLOGY q0=20000353 k=1 excess=120 SCHED_PATHOLOGY q0=20000353 k=1 excess=70 SCHED_PATHOLOGY q0=20000353 k=1 excess=106 SCHED_PATHOLOGY q0=20000389 k=1 excess=268 SCHED_PATHOLOGY q0=20000429 k=14 excess=152 SCHED_PATHOLOGY q0=20000443 k=3 excess=36 SCHED_PATHOLOGY q0=20000471 k=7 excess=32 SCHED_PATHOLOGY q0=20000471 k=1 excess=354 SCHED_PATHOLOGY q0=20000531 k=2 excess=118 SCHED_PATHOLOGY q0=20000531 k=3 excess=166 SCHED_PATHOLOGY q0=20000531 k=1 excess=200 SCHED_PATHOLOGY q0=20000567 k=1 excess=158 SCHED_PATHOLOGY q0=20000569 k=1 excess=110 SCHED_PATHOLOGY q0=20000573 k=2 excess=502 SCHED_PATHOLOGY q0=20000573 k=14 excess=124 SCHED_PATHOLOGY q0=20000599 k=3 excess=72 SCHED_PATHOLOGY q0=20000623 k=3 excess=98 SCHED_PATHOLOGY q0=20000689 k=1 excess=242 SCHED_PATHOLOGY q0=20000693 k=8 excess=62 SCHED_PATHOLOGY q0=20000713 k=7 excess=268 SCHED_PATHOLOGY q0=20000723 k=1 excess=186 SCHED_PATHOLOGY q0=20000723 k=2 excess=108 SCHED_PATHOLOGY q0=20000753 k=1 excess=404 SCHED_PATHOLOGY q0=20000753 k=2 excess=324 SCHED_PATHOLOGY q0=20000779 k=2 excess=96 SCHED_PATHOLOGY q0=20000791 k=15 excess=130 SCHED_PATHOLOGY q0=20000801 k=2 excess=576 SCHED_PATHOLOGY q0=20000821 k=5 excess=10 SCHED_PATHOLOGY q0=20000821 k=10 excess=110 SCHED_PATHOLOGY q0=20000837 k=6 excess=22 SCHED_PATHOLOGY q0=20000839 k=1 excess=224 SCHED_PATHOLOGY q0=20000839 k=1 excess=296 SCHED_PATHOLOGY q0=20000839 k=3 excess=120 SCHED_PATHOLOGY q0=20000843 k=1 excess=270 SCHED_PATHOLOGY q0=20000843 k=1 excess=88 SCHED_PATHOLOGY q0=20000861 k=1 excess=8 SCHED_PATHOLOGY q0=20000861 k=1 excess=278 SCHED_PATHOLOGY q0=20000867 k=1 excess=500 SCHED_PATHOLOGY q0=20000867 k=13 excess=82 SCHED_PATHOLOGY q0=20000867 k=1 excess=440 SCHED_PATHOLOGY q0=20000873 k=1 excess=392 SCHED_PATHOLOGY q0=20000909 k=1 excess=552 SCHED_PATHOLOGY q0=20000917 k=2 excess=354 SCHED_PATHOLOGY q0=20000951 k=1 excess=216 SCHED_PATHOLOGY q0=20000969 k=1 excess=326 SCHED_PATHOLOGY q0=20000971 k=15 excess=86 SCHED_PATHOLOGY q0=20000971 k=3 excess=120 SCHED_PATHOLOGY q0=20000971 k=1 excess=278 SCHED_PATHOLOGY q0=20000971 k=2 excess=266 SCHED_PATHOLOGY q0=20000971 k=1 excess=94 SCHED_PATHOLOGY q0=20001001 k=14 excess=222 SCHED_PATHOLOGY q0=20001001 k=1 excess=116 SCHED_PATHOLOGY q0=20001019 k=9 excess=36 SCHED_PATHOLOGY q0=20001067 k=2 excess=396 SCHED_PATHOLOGY q0=20001073 k=1 excess=626 SCHED_PATHOLOGY q0=20001073 k=10 excess=66 SCHED_PATHOLOGY q0=20001083 k=3 excess=156 SCHED_PATHOLOGY q0=20001083 k=1 excess=534 SCHED_PATHOLOGY q0=20001083 k=2 excess=168 SCHED_PATHOLOGY q0=20001151 k=6 excess=322 SCHED_PATHOLOGY q0=20001161 k=1 excess=38 SCHED_PATHOLOGY q0=20001181 k=12 excess=88 SCHED_PATHOLOGY q0=20001181 k=1 excess=126 SCHED_PATHOLOGY q0=20001203 k=1 excess=192 SCHED_PATHOLOGY q0=20001227 k=1 excess=170 SCHED_PATHOLOGY q0=20001227 k=2 excess=38 SCHED_PATHOLOGY q0=20001239 k=8 excess=58 SCHED_PATHOLOGY q0=20001239 k=4 excess=136 SCHED_PATHOLOGY q0=20001259 k=4 excess=530 SCHED_PATHOLOGY q0=20001259 k=1 excess=314 SCHED_PATHOLOGY q0=20001259 k=1 excess=102 SCHED_PATHOLOGY q0=20001263 k=1 excess=208 SCHED_PATHOLOGY q0=20001269 k=1 excess=46 SCHED_PATHOLOGY q0=20001341 k=1 excess=84 SCHED_PATHOLOGY q0=20001341 k=1 excess=190 SCHED_PATHOLOGY q0=20001439 k=1 excess=124 SCHED_PATHOLOGY q0=20001439 k=1 excess=308 SCHED_PATHOLOGY q0=20001491 k=4 excess=62 SCHED_PATHOLOGY q0=20001551 k=3 excess=100 SCHED_PATHOLOGY q0=20001551 k=2 excess=590 SCHED_PATHOLOGY q0=20001551 k=1 excess=272 SCHED_PATHOLOGY q0=20001557 k=3 excess=162 SCHED_PATHOLOGY q0=20001613 k=4 excess=168 SCHED_PATHOLOGY q0=20001613 k=1 excess=256 SCHED_PATHOLOGY q0=20001659 k=4 excess=72 SCHED_PATHOLOGY q0=20001679 k=3 excess=56 SCHED_PATHOLOGY q0=20001679 k=1 excess=214 SCHED_PATHOLOGY q0=20001763 k=12 excess=116 SCHED_PATHOLOGY q0=20001769 k=2 excess=102 SCHED_PATHOLOGY q0=20001799 k=1 excess=236 SCHED_PATHOLOGY q0=20001811 k=3 excess=48 SCHED_PATHOLOGY q0=20001833 k=7 excess=126 SCHED_PATHOLOGY q0=20001833 k=2 excess=62 SCHED_PATHOLOGY q0=20001833 k=3 excess=496 SCHED_PATHOLOGY q0=20001847 k=1 excess=292 SCHED_PATHOLOGY q0=20001853 k=1 excess=372 SCHED_PATHOLOGY q0=20001899 k=1 excess=190 SCHED_PATHOLOGY q0=20001959 k=2 excess=66 SCHED_PATHOLOGY q0=20001959 k=2 excess=416 SCHED_PATHOLOGY q0=20001977 k=1 excess=136 SCHED_PATHOLOGY q0=20001977 k=1 excess=404 total yield: 0, q=20002007 (inf sec/rel) They were both mine. |
|
|
|
|
|
|
#335 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
251016 Posts |
In this thread I only wanted to discuss Windows builds, because I have no access to them - this is Jeff's and Brian's domain.
The asm64-bit builds are tricky -- if you change L1_BITS, don't forget to change l1_bits in ls-defs.asm and of course clean up all .o and .a, and build all as listed in INSTALL file. Otherwise, you will get a broken build, surely. |
|
|
|
|
|
#336 | ||
|
May 2008
3·5·73 Posts |
Quote:
Quote:
I will change l1_bits as you suggested next. Right now I'm running siever 15e without any changes, will report the numbers for it in a bit. |
||
|
|
|
|
|
#337 | |
|
May 2008
100010001112 Posts |
Quote:
Code:
$ ~/ggnfs-353/bin/gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000 Warning: lowering FB_bound to 19999999. total yield: 3479, q=20001001 (0.14711 sec/rel) Code:
$ ~/ggnfs-377/bin/gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000 Warning: lowering FB_bound to 19999999. total yield: 3479, q=20001001 (0.18397 sec/rel) |
|
|
|
|
|
|
#338 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
251016 Posts |
Apparently, for your CPU, L1_BITS 15 is better!
This is important for Greg and NFS@HOME binaries. On Phenom 940, timings for this poly on several regions (20M, 45M, 200M) are better by a few percent with both new 14e and 15e over old ones. Timings for M941 are better by 10%+ (M941 was tested with 15e, 16e and on both sides). The output files are 100% consistent (to truly compare them, it is best to sed 's,:.*,,' i.e. cut off all factors, leave only a,b). _________ P.S. With a bit of an overwrite, a 'thick' binary can be built which will have all optimized variants inside, and include a benchmark that would in turn prepare a config file, or even train itself for a specific project. The current kitchen is to try everything for one's own CPU and save the best binary. Same for ECM, right? I still keep two ecm binaries around (-enable/-disable-redc). Should be one in an ideal world. Last fiddled with by Batalov on 2009-11-04 at 08:08 |
|
|
|
|
|
#339 |
|
May 2008
3×5×73 Posts |
Rev 377 & Changing L1_BITS to 15, testing both 14e and 15e again:
Code:
$ ./gnfs-lasieve4I14e -a 4788.2448.poly -f 20000000 -c 2000 Warning: lowering FB_bound to 19999999. total yield: 2800, q=20002007 (0.11304 sec/rel) $ ./gnfs-lasieve4I15e -a 4788.2448.poly -f 20000000 -c 1000 Warning: lowering FB_bound to 19999999. total yield: 3479, q=20001001 (0.14816 sec/rel) |
|
|
|
|
|
#340 |
|
May 2008
100010001112 Posts |
Again, that was with the athlon64 asm code.
|
|
|
|
|
|
#341 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
24·593 Posts |
Ok, I think I got it now. ("I learned something today", like the say in South park.)
In terms of L1 data cache size, all Core2's (duos, quads) and even Nehalem have 32Kb per core (=215). Phenoms, Opterons have 64Kb per core (=216). So, for Intel chips, keep L1_BITS at 15, but for AMD chips, 16 gives a bit of an edge. L2 cache is slower (a dozen cycles penalty) and that showed in your tests; its size doesn't matter. Thanks, Jayson! P.S. i7 has a relatively fast L2 cache; remains to be interesting to test. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Installation of GGNFS | LegionMammal978 | Msieve | 17 | 2017-01-20 19:49 |
| Running other programs while running Prime95. | Neimanator | PrimeNet | 14 | 2013-08-10 20:15 |
| Error running GGNFS+msieve+factmsieve.py | D. B. Staple | Factoring | 6 | 2011-06-12 22:23 |
| GGNFS or something better? | Zeta-Flux | Factoring | 1 | 2007-08-07 22:40 |
| ggnfs | ATH | Factoring | 3 | 2006-08-12 22:50 |