![]() |
Thanks Jeff (and others who have helped in testing) - its nice to see that it is also faster as well.
Brian |
Nice indeed. On commodity hard drives (not RAIDed or NAS), it was expected to be the case (smaller files, faster read time + reasonably small CPU overhead vs bzip2 or even more aggressive compression libraries). On RAID, it could be par for the course. But the .dat file been almost twice smaller is the invariable benefit. All other files (.mat, .cyc) won't compress well - they are nearly random data.
I'll check linux now, but wouldn't expect any surprises. (Except conditionals been crossed over.) |
Got this:
[code]Tue Nov 15 10:21:51 2011 commencing square root phase Tue Nov 15 10:21:51 2011 reading relations for dependency 1 Tue Nov 15 10:21:54 2011 read 5529557 cycles Tue Nov 15 10:22:01 2011 cycles contain 15362500 unique relations Tue Nov 15 10:23:18 2011 error: relation 53261938 [B]corrupt[/B][/code]Now what should I do? |
[QUOTE=em99010pepe;278460]Got this:
[code]Tue Nov 15 10:21:51 2011 commencing square root phase Tue Nov 15 10:21:51 2011 reading relations for dependency 1 Tue Nov 15 10:21:54 2011 read 5529557 cycles Tue Nov 15 10:22:01 2011 cycles contain 15362500 unique relations Tue Nov 15 10:23:18 2011 error: relation 53261938 [B]corrupt[/B][/code]Now what should I do?[/QUOTE] You can try running -nc3 on specific dependencies. See if that helps. |
I did but got the same error. I changed msieve version from 1.50 to 1.49. It's running.
|
I just completed a merge with extensive changes to GPU poly selection courtesy of jrk. This should make the GPU code uniformly faster than the CPU code, and the modifications let Nvidia's driver figure out how to give work to the GPU and still keep your graphics responsive. Give it a try if you can; this will probably be the last set of changes before releasing v1.50
|
I don't know if this important but....
Concerning "error: corrupt state, please restart from checkpoint" error I found that it happens often while I'm pushing hard on my fiber-optic home internet connection when msieve checkpoint is written. I'm using Vista 32 bits on a 5400 rpm hard disk (laptop). |
That error is reported when a number-theoretic checksum on the working state of the linear algebra fails; it's performed before a checkpoint write and every so often as the LA runs. Technically it's independent of the checksum writing process.
|
Here's an example of post-processing of 431_79_minus1 from RSALS that LA won't start. For the last 4 hours that I am trying to run it and dos box always shutdowns after line "commencing Lanczos iteration". So what's my issue here?
[code]Fri Nov 25 20:23:53 2011 Msieve v. 1.49 Fri Nov 25 20:23:53 2011 random seeds: 4cb43a9c f1f344d7 Fri Nov 25 20:23:53 2011 factoring 10040872850412202767988311200343904010253296383666204179607169584356237497461378013263541420378568142212694536585132764491915183713758243179684748057870915974533442611405962355156581893 (185 digits) Fri Nov 25 20:24:00 2011 searching for 15-digit factors Fri Nov 25 20:24:04 2011 commencing number field sieve (185-digit input) Fri Nov 25 20:24:04 2011 R0: -17709427104057277060049775698679791 Fri Nov 25 20:24:04 2011 R1: 1 Fri Nov 25 20:24:04 2011 A0: -1 Fri Nov 25 20:24:04 2011 A1: 0 Fri Nov 25 20:24:04 2011 A2: 0 Fri Nov 25 20:24:04 2011 A3: 0 Fri Nov 25 20:24:04 2011 A4: 0 Fri Nov 25 20:24:04 2011 A5: 0 Fri Nov 25 20:24:04 2011 A6: 431 Fri Nov 25 20:24:04 2011 skew 0.36, size 1.913e-010, alpha 1.673, combined = 5.273e-012 rroots = 2 Fri Nov 25 20:24:04 2011 Fri Nov 25 20:24:04 2011 commencing linear algebra Fri Nov 25 20:24:08 2011 read 3340596 cycles Fri Nov 25 20:24:28 2011 cycles contain 10103703 unique relations Fri Nov 25 20:30:55 2011 read 10103703 relations Fri Nov 25 20:31:51 2011 using 20 quadratic characters above 268435338 Fri Nov 25 20:35:03 2011 building initial matrix Fri Nov 25 20:42:45 2011 memory use: 1161.1 MB Fri Nov 25 20:43:27 2011 read 3340596 cycles Fri Nov 25 20:43:31 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 20:43:31 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 20:44:50 2011 filtering completed in 1 passes Fri Nov 25 20:44:54 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 20:44:54 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 20:46:21 2011 matrix starts at (0, 0) Fri Nov 25 20:46:25 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 20:46:25 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 20:46:25 2011 saving the first 48 matrix rows for later Fri Nov 25 20:46:30 2011 matrix includes 64 packed rows Fri Nov 25 20:46:33 2011 matrix is 3340372 x 3340596 (904.2 MB) with weight 235052547 (70.36/col) Fri Nov 25 20:46:33 2011 sparse part has weight 216990521 (64.96/col) Fri Nov 25 20:46:33 2011 using block size 65536 for processor cache size 2048 kB Fri Nov 25 20:47:34 2011 commencing Lanczos iteration (2 threads) Fri Nov 25 21:55:18 2011 Fri Nov 25 21:55:18 2011 Fri Nov 25 21:55:18 2011 Msieve v. 1.49 Fri Nov 25 21:55:18 2011 random seeds: 6d8b4be4 27bd7c88 Fri Nov 25 21:55:18 2011 factoring 10040872850412202767988311200343904010253296383666204179607169584356237497461378013263541420378568142212694536585132764491915183713758243179684748057870915974533442611405962355156581893 (185 digits) Fri Nov 25 21:55:26 2011 searching for 15-digit factors Fri Nov 25 21:55:31 2011 commencing number field sieve (185-digit input) Fri Nov 25 21:55:32 2011 R0: -17709427104057277060049775698679791 Fri Nov 25 21:55:32 2011 R1: 1 Fri Nov 25 21:55:32 2011 A0: -1 Fri Nov 25 21:55:32 2011 A1: 0 Fri Nov 25 21:55:32 2011 A2: 0 Fri Nov 25 21:55:32 2011 A3: 0 Fri Nov 25 21:55:32 2011 A4: 0 Fri Nov 25 21:55:32 2011 A5: 0 Fri Nov 25 21:55:32 2011 A6: 431 Fri Nov 25 21:55:32 2011 skew 0.36, size 1.913e-010, alpha 1.673, combined = 5.273e-012 rroots = 2 Fri Nov 25 21:55:32 2011 Fri Nov 25 21:55:32 2011 commencing linear algebra Fri Nov 25 21:55:40 2011 read 3340596 cycles Fri Nov 25 21:56:08 2011 cycles contain 10103703 unique relations Fri Nov 25 22:07:15 2011 read 10103703 relations Fri Nov 25 22:09:22 2011 using 20 quadratic characters above 268435338 Fri Nov 25 22:13:28 2011 building initial matrix Fri Nov 25 22:22:38 2011 memory use: 1161.1 MB Fri Nov 25 22:23:32 2011 read 3340596 cycles Fri Nov 25 22:23:35 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 22:23:36 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 22:25:00 2011 filtering completed in 1 passes Fri Nov 25 22:25:03 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 22:25:03 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 22:28:48 2011 matrix starts at (0, 0) Fri Nov 25 22:28:52 2011 matrix is 3340420 x 3340596 (952.6 MB) with weight 293679469 (87.91/col) Fri Nov 25 22:28:52 2011 sparse part has weight 226327671 (67.75/col) Fri Nov 25 22:28:52 2011 saving the first 48 matrix rows for later Fri Nov 25 22:28:58 2011 matrix includes 64 packed rows Fri Nov 25 22:29:00 2011 matrix is 3340372 x 3340596 (904.2 MB) with weight 235052547 (70.36/col) Fri Nov 25 22:29:00 2011 sparse part has weight 216990521 (64.96/col) Fri Nov 25 22:29:00 2011 using block size 65536 for processor cache size 2048 kB Fri Nov 25 22:30:06 2011 commencing Lanczos iteration (2 threads) [/code] |
how much RAM does your PC have? Maybe LA crashes due to an out of memory issue?
|
[QUOTE=Andi47;279940]how much RAM does your PC have? Maybe LA crashes due to an out of memory issue?[/QUOTE]
2GB |
[QUOTE=pinhodecarlos;279879]Here's an example of post-processing of 431_79_minus1 from RSALS that LA won't start. For the last 4 hours that I am trying to run it and dos box always shutdowns after line "commencing Lanczos iteration". So what's my issue here?[/QUOTE]How are you starting this? If you use a shortcut to start it, try starting from an already open dos box. If there's an error message printed, the dos box started from the shortcut will exit immeditely after the message; an already open box will drop back to the command prompt after printing the error message.[QUOTE=Andi47;279940]how much RAM does your PC have? Maybe LA crashes due to an out of memory issue?[/QUOTE]
[QUOTE=pinhodecarlos;279951]2GB[/QUOTE]I don't know if msieve has to do a reallocation, but there have been reports (from Andi47, I believe) where there has been a problem trying a realloc when there is plenty of RAM available (problem is not too little RAM, it's too little [I]contiguous[/I] RAM....) |
Frank,
I'll reply later in the evening, I have to go now. Carlos |
I start msieve as a batch file. Right now I started another RSALS task so I can only test what you say later at night.
I'm suspecting of another issue that it is related to the gz relations source but I am still waiting for Lionel's reply. Carlos |
You definitely cannot mix zipped and unzipped relation files into the same msieve.dat, but that should not have been an issue if you also performed the filtering. Likewise, there is some memory allocation at the point where you crashed but if the allocation failed then there would be a message to that effect.
|
I am kinda curious. In the long threaded runs, the very first child process is a bit lazier than others:
[CODE]Mem: 16077M total, 15568M used, 509M free, 10M buffers Swap: 8187M total, 2218M used, 5969M free, 1909M cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P nFLT nDRT COMMAND 3975 serge 20 0 15.4g 13g 480 R 96 86.1 5742:29 5 8743 0 msieveS 3987 serge 20 0 15.4g 13g 480 R [COLOR=darkred]91[/COLOR] 86.1 [COLOR=darkred]5226:11[/COLOR] 3 131 0 msieveS 3988 serge 20 0 15.4g 13g 480 R 94 86.1 5611:49 1 232 0 msieveS 3989 serge 20 0 15.4g 13g 480 R 95 86.1 5706:39 2 124 0 msieveS 3992 serge 20 0 15.4g 13g 480 R 92 86.1 5680:39 4 36 0 msieveS 3997 serge 20 0 15.4g 13g 480 R 92 86.1 5671:16 0 7 0 msieveS [/CODE] It probably has less job than others; maybe it can be given a 10% larger slice and the efficiency will increase (even if by a few percent)? J- Let me know if this makes sense, I can implement and report the results. |
P.S. I think even though the pid of the child is "1st" it actually must be the one that gets the last chunk. I've patched a bit that creates slight assymetry and committed (SVN 693).
|
Does that patch actually help? Even in your case with lots of threads, each thread gets millions of columns, shifting one back or one forward shouldn't noticeably affect the time.
The thread with the lowest pid is actually the master, and its lower CPU utilization may just reflect it sleeping a little bit to wait for each thread in turn to finish, so that another partial vector can be XOR-ed into the final answer. |
I meant the 1st child (its PID is the second least). The master spend most time of all threads.
The patch seems to help on small cases. I want to restart the big matrix (it's the 16M[SUP]2[/SUP] for 7,326+) when I get home - want to do it from the master shell (not from ssh). I'll report in a day when the threads will accumulate enough runtime and whether the ETA would change. [COLOR=green]P.S. The 694th patch is just beautification. Look at the 693 vs 639 diff. to see the real change: it gets rid of a 1000 fudge factor which on many threads runs up to shorthand the last worker.[/COLOR] |
Ok, I didn't notice the commit before. Another possibility for why it might help is because the density on the matrix varies between the ends; the code sorts the columns in order of increasing weight, then alternates the lightest and heaviest columns. The end result is that threads assigned columns of the matrix close to the left edge have fewer columns, and threads assigned to the right-hand columns get more of them. It's possible that the bias in the original code, to avoid a thread with almost no columns, caused threads assigned to gobble up too many columns.
It might be a better idea to do a [num_threads]-way scatter of the columns, like the MPI code does with the rows; but that means you have to have the number of threads in mind when you build the matrix, which means an efficiency loss when the number of threads changes. Perhaps a [max_threads]-way scatter would be best. |
Now that the chunks are [I]fairly[/I] divided between threads (in terms of sum(weights)), [...drum roll...] the spread of running times actually increased. :-(
I am thinking about trying a spread with "[URL="http://en.wikipedia.org/wiki/Gamma_correction"]gamma correction[/URL]" which will attempt to mimic and counteract the function of running-time vs the given slot (the alternation makes it almost linear, but not entirely linear; maybe a gamma of ~0.9 will take care of that; that's a cheap trick compared to your suggestions). The original code with an overshot of 1000 takes off higher than the diagonal line, - similar to a gamma slightly less than 1. |
No, wait, I haven't actually tested anything as it turns out. :rolleyes:
(SVN skipped the patch on the matmul0.c file, as it had other edits; so I was actually "testing" yet another build of unmodified code.) |
[QUOTE=Batalov;282261]No, wait, I haven't actually tested anything as it turns out. :rolleyes:
(SVN skipped the patch on the matmul0.c file, as it had other edits; so I was actually "testing" yet another build of unmodified code.)[/QUOTE] So doing nothing increased your running time? I guess you might want to look at a different way to benchmark this if re-running the same code has that much variation. Jeff. |
It always has signinficant variation: this is LA. I am not re-running exactly same iterations (and even if I did, I would have only overtrained the 'model' into an effect that may be only characteristic to a specific stage of a specific LA run); instead, I am tweaking and then continuing LA as it goes. There's still 16 days to go.
Anyway, I've run half a day with ad hoc gamma-correction (of 0.97) and ran half a day with simply giving the first child 8% larger slice, and the latter seems to be evening per-thread run time better than all other attempts (the spread is less than it was). The ultimate goal would be to have them all run the same time (on average), then the time when five threads are waiting for the sixth to finish would be minimized. Same with one thread systematically working less than others. |
grblbl ..
[code] gcc -D_FILE_OFFSET_BITS=64 -O3 -fomit-frame-pointer -march=k8 -DNDEBUG -D_LARGEF ILE64_SOURCE -Wall -W -DMSIEVE_SVN_VERSION="\"unknown\"" -I. -Iinclude -Ignfs - Ignfs/poly -Ignfs/poly/stage1 -DNO_ZLIB -c -o common/polyroot.o common/polyroot. c In file included from include/polyroot.h:19:0, from common/polyroot.c:15: include/dd.h: In function 'dd_set_precision_ieee': include/dd.h:46:2: attention : implicit declaration of function '_control87' [-W implicit-function-declaration] include/dd.h:47:13: erreur: '_PC_53' undeclared (first use in this function) include/dd.h:47:13: note: each undeclared identifier is reported only once for e ach function it appears in include/dd.h:47:21: erreur: '_MCW_PC' undeclared (first use in this function) include/dd.h: In function 'dd_precision_is_ieee': include/dd.h:73:19: erreur: '_MCW_PC' undeclared (first use in this function) include/dd.h:73:31: erreur: '_PC_53' undeclared (first use in this function) make: *** [common/polyroot.o] Error 1 [/code] |
Hmm, what compiler and OS was this on?
|
What environment/compiler are you using? It looks like you have WIN32 or _WIN64 defined, but your float.h must lack the bits that msieve expects it to have in that case.
|
Jason, is it expected that Lanczos iterations towards the end are slightly slower than in the beginning? I wonder if there is a simple qualitative explanation.
It is hard to observe on small runs, and even on larger runs this is visible not with ETA but in the growing intervals between .chk file writes; in my case, these intervals grew from 67 minutes (early in the run) to 70 and slowly to 75 minutes (day 18 out of 20). The memory is not being swapped - I checked. |
The Lanczos runtime should not vary over time. Although if the iteration genuinely slowed down by 10% over the course of the run then the ETA would be affected. The ETA is recalculated on every iteration using the unix seconds counter. Could yours be drifting? I've also notice in earlier versions that the ETA at the beginning of a run is often about 5% high, compared to the actual elapsed time measured at the end of the run.
|
Thank you; makes sense; I thought so, but then I thought that I've overlooked something: the matrix is a constant object, the split for threads is done once and is constant, the state vector is essentially random, so the rate of churning should be like you said - constant over time.
The ETA surely drifts at the very beginning, but that's different. This is already after a several day of uninterrupted run, and then ETA is usually very stable. So, I've stopped restarted and the .chk-writing interval is back to 70 minutes. Now what I think is that this could be something boring like throttling - I've noticed that the Scythe CPU fan is slightly rattling, so I will have to replace it. The temp is fine right now but maybe it comes and goes. I'll try to finish the matrix as is and then run some tests with hardware. We are still on schedule to finish the c258 this year. |
Out of curiosity (I'm not proposing this as a task for msieve 1.50 :smile:): would it be difficult to modify msieve so as to optionally speed up re-filtering at a different target density ?
(at the space penalty of keeping intermediate files, I'm sure, but I'm OK with that) For instance, in the run reproduced below (a 30-bit LPs task on the easy side, I could have made RSALS produce only ~110M relations...), I might have wanted to raise the target density further. However, I didn't, because re-filtering everything would have subtracted an hour to the potential speed increase yielded by a smaller, denser matrix. If re-filtering had taken half of that time (the latest 23 minutes or so, plus maybe some time to read additional data ?), I probably would have re-filtered :smile: [code]Tue Dec 13 20:38:39 2011 Msieve v. 1.50 (SVN 688M) Tue Dec 13 20:38:39 2011 random seeds: dc3e000b aea72940 Tue Dec 13 20:38:39 2011 factoring 9000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 (226 digits) Tue Dec 13 20:38:41 2011 searching for 15-digit factors Tue Dec 13 20:38:41 2011 commencing number field sieve (226-digit input) Tue Dec 13 20:38:41 2011 R0: -10000000000000000000000000000000000000 Tue Dec 13 20:38:41 2011 R1: 1 Tue Dec 13 20:38:41 2011 A0: 1 Tue Dec 13 20:38:41 2011 A1: 0 Tue Dec 13 20:38:41 2011 A2: 0 Tue Dec 13 20:38:41 2011 A3: 0 Tue Dec 13 20:38:41 2011 A4: 0 Tue Dec 13 20:38:41 2011 A5: 0 Tue Dec 13 20:38:41 2011 A6: 9000 Tue Dec 13 20:38:41 2011 skew 0.22, size 2.067e-11, alpha 0.996, combined = 1.105e-12 rroots = 0 Tue Dec 13 20:38:41 2011 Tue Dec 13 20:38:41 2011 commencing relation filtering with target density 85.00 Tue Dec 13 20:38:41 2011 estimated available RAM is 3072.0 MB Tue Dec 13 20:38:41 2011 commencing duplicate removal, pass 1 <snip errors reading relations> Tue Dec 13 20:50:48 2011 skipped 131 relations with b > 2^32 Tue Dec 13 20:50:48 2011 found 22220107 hash collisions in 118474319 relations Tue Dec 13 20:51:03 2011 added 1 free relations Tue Dec 13 20:51:03 2011 commencing duplicate removal, pass 2 Tue Dec 13 20:54:19 2011 found 21254598 duplicates and 97219722 unique relations Tue Dec 13 20:54:19 2011 memory use: 660.8 MB Tue Dec 13 20:54:20 2011 reading ideals above 720000 Tue Dec 13 20:54:20 2011 commencing singleton removal, initial pass Tue Dec 13 21:10:58 2011 memory use: 2756.0 MB Tue Dec 13 21:10:59 2011 removing singletons from LP file Tue Dec 13 21:10:59 2011 start with 97219722 relations and 90570125 ideals Tue Dec 13 21:12:56 2011 pass 1: found 26769593 singletons Tue Dec 13 21:13:51 2011 pass 2: found 6437137 singletons Tue Dec 13 21:14:43 2011 pass 3: found 1637660 singletons Tue Dec 13 21:15:34 2011 pass 4: found 415448 singletons Tue Dec 13 21:17:20 2011 pruned dataset has 61959884 relations and 51582595 large ideals Tue Dec 13 21:17:21 2011 reading all ideals from disk Tue Dec 13 21:18:01 2011 memory use: 2336.6 MB Tue Dec 13 21:18:14 2011 keeping 51375329 ideals with weight <= 200, target excess is 323344 Tue Dec 13 21:18:26 2011 commencing in-memory singleton removal Tue Dec 13 21:18:37 2011 begin with 61959884 relations and 51375329 unique ideals Tue Dec 13 21:20:47 2011 reduce to 61818281 relations and 51233646 ideals in 11 passes Tue Dec 13 21:20:47 2011 max relations containing the same ideal: 200 Tue Dec 13 21:21:37 2011 removing 9702740 relations and 7702741 ideals in 2000000 cliques Tue Dec 13 21:21:40 2011 commencing in-memory singleton removal Tue Dec 13 21:21:50 2011 begin with 52115541 relations and 51233646 unique ideals Tue Dec 13 21:23:17 2011 reduce to 51192976 relations and 42574151 ideals in 9 passes Tue Dec 13 21:23:17 2011 max relations containing the same ideal: 186 Tue Dec 13 21:23:57 2011 removing 7426975 relations and 5426975 ideals in 2000000 cliques Tue Dec 13 21:24:01 2011 commencing in-memory singleton removal Tue Dec 13 21:24:08 2011 begin with 43766001 relations and 42574151 unique ideals Tue Dec 13 21:25:10 2011 reduce to 43052619 relations and 36407109 ideals in 8 passes Tue Dec 13 21:25:10 2011 max relations containing the same ideal: 167 Tue Dec 13 21:25:45 2011 removing 6751847 relations and 4751847 ideals in 2000000 cliques Tue Dec 13 21:25:48 2011 commencing in-memory singleton removal Tue Dec 13 21:25:54 2011 begin with 36300772 relations and 36407109 unique ideals Tue Dec 13 21:26:44 2011 reduce to 35602587 relations and 30927673 ideals in 8 passes Tue Dec 13 21:26:44 2011 max relations containing the same ideal: 144 Tue Dec 13 21:27:13 2011 removing 6427785 relations and 4427785 ideals in 2000000 cliques Tue Dec 13 21:27:16 2011 commencing in-memory singleton removal Tue Dec 13 21:27:21 2011 begin with 29174802 relations and 30927673 unique ideals Tue Dec 13 21:28:00 2011 reduce to 28429780 relations and 25717778 ideals in 8 passes Tue Dec 13 21:28:00 2011 max relations containing the same ideal: 123 Tue Dec 13 21:28:23 2011 removing 6113219 relations and 4113219 ideals in 2000000 cliques Tue Dec 13 21:28:26 2011 commencing in-memory singleton removal Tue Dec 13 21:28:29 2011 begin with 22316561 relations and 25717778 unique ideals Tue Dec 13 21:29:06 2011 reduce to 21407079 relations and 20637037 ideals in 10 passes Tue Dec 13 21:29:06 2011 max relations containing the same ideal: 102 Tue Dec 13 21:29:23 2011 removing 1759028 relations and 1364066 ideals in 394962 cliques Tue Dec 13 21:29:25 2011 commencing in-memory singleton removal Tue Dec 13 21:29:28 2011 begin with 19648051 relations and 20637037 unique ideals Tue Dec 13 21:29:51 2011 reduce to 19558422 relations and 19181783 ideals in 7 passes Tue Dec 13 21:29:51 2011 max relations containing the same ideal: 96 Tue Dec 13 21:30:12 2011 relations with 0 large ideals: 11658 Tue Dec 13 21:30:12 2011 relations with 1 large ideals: 2349 Tue Dec 13 21:30:12 2011 relations with 2 large ideals: 34734 Tue Dec 13 21:30:12 2011 relations with 3 large ideals: 266600 Tue Dec 13 21:30:12 2011 relations with 4 large ideals: 1134118 Tue Dec 13 21:30:12 2011 relations with 5 large ideals: 2922338 Tue Dec 13 21:30:12 2011 relations with 6 large ideals: 4772368 Tue Dec 13 21:30:12 2011 relations with 7+ large ideals: 10414257 Tue Dec 13 21:30:12 2011 commencing 2-way merge Tue Dec 13 21:30:33 2011 reduce to 13528344 relation sets and 13151705 unique ideals Tue Dec 13 21:30:33 2011 commencing full merge Tue Dec 13 21:37:47 2011 memory use: 1708.7 MB Tue Dec 13 21:37:49 2011 found 6716964 cycles, need 6669905 Tue Dec 13 21:37:52 2011 weight of 6669905 cycles is about 567177283 (85.04/cycle) Tue Dec 13 21:37:52 2011 distribution of cycle lengths: Tue Dec 13 21:37:52 2011 1 relations: 504298 Tue Dec 13 21:37:52 2011 2 relations: 636713 Tue Dec 13 21:37:52 2011 3 relations: 698069 Tue Dec 13 21:37:52 2011 4 relations: 683235 Tue Dec 13 21:37:52 2011 5 relations: 658665 Tue Dec 13 21:37:52 2011 6 relations: 602260 Tue Dec 13 21:37:52 2011 7 relations: 537367 Tue Dec 13 21:37:52 2011 8 relations: 465253 Tue Dec 13 21:37:52 2011 9 relations: 396310 Tue Dec 13 21:37:52 2011 10+ relations: 1487735 Tue Dec 13 21:37:52 2011 heaviest cycle: 25 relations Tue Dec 13 21:37:55 2011 commencing cycle optimization Tue Dec 13 21:38:13 2011 start with 43760349 relations Tue Dec 13 21:40:17 2011 pruned 1604684 relations Tue Dec 13 21:40:18 2011 memory use: 1267.6 MB Tue Dec 13 21:40:18 2011 distribution of cycle lengths: Tue Dec 13 21:40:18 2011 1 relations: 504298 Tue Dec 13 21:40:18 2011 2 relations: 651686 Tue Dec 13 21:40:18 2011 3 relations: 726090 Tue Dec 13 21:40:18 2011 4 relations: 708704 Tue Dec 13 21:40:18 2011 5 relations: 686733 Tue Dec 13 21:40:18 2011 6 relations: 623267 Tue Dec 13 21:40:18 2011 7 relations: 554164 Tue Dec 13 21:40:18 2011 8 relations: 473805 Tue Dec 13 21:40:18 2011 9 relations: 398676 Tue Dec 13 21:40:18 2011 10+ relations: 1342482 Tue Dec 13 21:40:18 2011 heaviest cycle: 25 relations Tue Dec 13 21:40:39 2011 RelProcTime: 3718[/code] Two notes on that run: * only 1 free relation was found, because the free relations had already been found by a previous run, which put the 4 GB machine running 64-bit OS and msieve into thrashing. So I killed msieve, and restarted using -mb 3072, which coerced it into parforming singleton removal on the disk. * the "commencing relation filtering with target density 85.00" trace is the result of the patch I posted in comment #57 of this thread. I'm using it on most of my copies of msieve SVN, hence the "688M" revision number. |
Jasonp, did you see Lionel question? I'm also interested in your answer.
Meanwhile, will next version of msieve support AVX if applicable? |
I saw Lionel's question but have been very busy lately. To restart the merge phase, you would have to remember the current list of relations for each column of the matrix as well as the list of ideals that column is tracking. It's straightforward to do so, but maybe it would be better to improve the merge phase itself so that you don't have to guess at the best target density in the first place. I'll think about it.
Carlos, I don't see how AVX would be able to improve NFS postprocessing. Nothing in the linear algebra can make effective use of longer vectors. I've tried several times and making the LA use longer vectors just slows it down because you run out of cache too quickly. On the other hand, akruppa's work on Block Lanczos definitely benefits from extremely long vectors, so it's entirely possible I'm missing something. |
[QUOTE=jasonp;286012]I saw Lionel's question but have been very busy lately. To restart the merge phase, you would have to remember the current list of relations for each column of the matrix as well as the list of ideals that column is tracking. It's straightforward to do so, but maybe it would be better to improve the merge phase itself so that you don't have to guess at the best target density in the first place. I'll think about it.
Carlos, I don't see how AVX would be able to improve NFS postprocessing. Nothing in the linear algebra can make effective use of longer vectors. I've tried several times and making the LA use longer vectors just slows it down because you run out of cache too quickly. On the other hand, akruppa's work on Block Lanczos definitely benefits from extremely long vectors, so it's entirely possible I'm missing something.[/QUOTE] What about QS? Would AVX help that? |
[QUOTE=jasonp;286012]
Carlos, I don't see how AVX would be able to improve NFS postprocessing. Nothing in the linear algebra can make effective use of longer vectors. I've tried several times and making the LA use longer vectors just slows it down because you run out of cache too quickly. On the other hand, akruppa's work on Block Lanczos definitely benefits from extremely long vectors, so it's entirely possible I'm missing something.[/QUOTE] But did you try your code on a Sandy Bridge processor? Maybe now you don't run out of cache. Please take a look at [URL]http://software.intel.com/en-us/avx/[/URL][URL="http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors/"] [/URL] |
My main development machine dates from 2007, so no :)
The linear algebra uses a block size of 64, and the modifications I've tried (at least twice now, including once with MPI) increase that to 128, 192 or 256 depending on compile options. I also changed the use of MMX to SSE2, since that's a natural fit at the 128 or 256 block size. Scaling up the block size means reducing the number of matrix multiplies by a similar factor, with each multiply dealing with a scaled-up amount of data. The big question is how much longer the multiply takes. On all the machines that both I and Greg have tried the modified code on, the best we could ever get was an approximately equal runtime between size 64 and size 128. Greg also tried the larger block size code on some Teragrid nodes last year, with the same result. I still have the old code lying around, and it should be a drop-in replacement for the Lanczos code in use now (the changes are really messy but limited to the LA source only), so I can dig it out if anyone wants. New performance measurements will be a bit tricky, since the mainline has gotten better since those changes were written. Edit: Henry, bsquared is now the authority on high-performance QS. I don't think AVX would make QS appreciably faster |
[QUOTE=jasonp;286054]I don't think AVX would make QS appreciably faster[/QUOTE]
I don't readily see it either. The most obvious place it *could* help is in the large prime bucket sieve, which is 90% assembly code at this point. In there, I compute the updates to 4 roots at a time (snippet below): [CODE][FONT=Consolas][SIZE=2][COLOR=#0000ff] [SIZE=2][FONT=Consolas][COLOR=#0000ff][SIZE=2][FONT=Consolas][COLOR=#0000ff]#define[/COLOR][/FONT][/SIZE][/COLOR][/FONT][/SIZE][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] COMPUTE_4_PROOTS(j) \[/SIZE][/FONT] [SIZE=2][FONT=Consolas]ASM_G ( \[/FONT][/SIZE] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rax), %%xmm3 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm3 = next 4 values of rootupdates */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rcx), %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm1 = next 4 values of root1 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "psubd %%xmm3, %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* root1 -= ptr */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rdx), %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm2 = next 4 values of root2 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "psubd %%xmm3, %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* root2 -= ptr */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pxor %%xmm4, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* zero xmm4 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pxor %%xmm5, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* zero xmm5 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rbx), %%xmm0 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm0 = next 4 primes */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pcmpgtd %%xmm1, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* signed comparison: 0 > root1? [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]\[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pcmpgtd %%xmm2, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* signed comparison: 0 > root2? [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]\[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pand %%xmm0, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* copy prime to overflow locations */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pand %%xmm0, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* copy prime to overflow locations */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "paddd %%xmm4, %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* selectively add back prime (modular subtract) */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa %%xmm1, (%%rcx) \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* save new root1 values */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "paddd %%xmm5, %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* selectively add back prime (modular subtract) */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa %%xmm2, (%%rdx) \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* save new root2 values */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT] [/SIZE][/FONT][/CODE] Where AVX *could* provide direct benefit is in doing 8 roots at a time instead of 4, using the same instructions. But for whatever reason, AVX doesn't extend the padd/pxor/pand instructions to 256 bits. They get 3 operand variants, but not 256 bit variants. So it's not immediately clear what benefit they will be. Even if there were the correct instructions, there simply isn't any way to speed up the writes to random memory locations more than I already have (and others before me, by inventing things like the bucket sieve), at least as far I can see. So at best AVX could only provide a small percentage increase in speed, rather than the 2x increases that some other code (with different requirements) may be able realize. |
I need to perform polynomial selection for the remaining C151 cofactor of 439^89-1 (the full-sized number is 236 digits)... so I fired up msieve trunk r706 on a GT 540M.
I ran -np1 and -np2 in two different processes. I killed msieve -np1 after ~23h of wall clock time for -np1, it produced ~48e6 bytes of hits for the 12-5120 range of a5 (!). So far, msieve -np2 has taken ~25 hours of CPU time, on one hyperthread of Core i7-2670QM @ 2.2 GHz, and boiled down hits for 12-2000, producing several polynomials with e ~4.9e-12, and an outlier at e ~5.25e-12. Perhaps more to come This means that for a C151: * the GPU polynomial selection stage 1 is so fast, even on a GPU significantly less powerful than the top of the line, that the CPU stage 2 cannot digest the stage 1 output in real time (on a single thread) :smile: * the best e values are somewhat higher than the estimation printed by msieve at the beginning, which was, IIRC, ~4.5e-12. For fun, I think I should try running 24h of GPU-based polynomial selection on one of the TI-Z80/TI-68k 512-bit RSA signing keys, and compare the output with the polynomial we used for sieving on RSALS... |
I can't find Batalov's comment about this but I know I red it somewhere. It's about how to resume msieve jobs with less disk transfers. For RSALS post-processing I usually have two batch files, one that starts the job and other that resumes the job in case something happens and the machine is rebooted.
First batch is [code] start /low /min msieve.exe -i 383_89_minus1.ini -s 383_89_minus1.dat -nf 383_89_minus1.fb -v -nc -t 4 [/code]Second batch (resume work) is [code] start /low /min msieve.exe -i 383_89_minus1.ini -s 383_89_minus1.dat -nf 383_89_minus1.fb -nc3 -ncr -v -t 4 [/code] So are the flags correct for the second batch? I keep always all files in the folder except the gzip one. Thank you in advance, Carlos |
[QUOTE=Batalov;277194]Clarification: I did link to 1.2.3 and observed Lionel's effect, and I also did link to 1.2.5 as well as used NO_ZLIB=1. Ran a not-so-small filtering job 5 times for each scenario. (If you are using a NAS drive, do that too, to remove variability.) Ran valgrind (callgrind) for a while only to see that gzread was called an inordinate amount of times (see the quoted [URL="http://stackoverflow.com/questions/2832485/zlib-gzgets-extremely-slow"]answer[/URL], it is indeed one gzread call per byte! He's right!)
Result: 1.2.3 is bad (about as bad as Lionel described - almost 2 times slower), but 1.2.5 has the same speed as NO_ZLIB=1 for plain files and negligible overhead on gzipped .dat file (which on systems may actually be a negative overhead, not to mention smaller footprint). Use zlib 1.2.5.[/QUOTE] [URL="http://zlib.net/"]zlib[/URL] 1.2.6 is out. Version 1.2.6 has many changes over 1.2.5, including [URL="http://zlib.net/"]improvements[/URL]. |
Yup. And this time, unlike what occurred for zlib 1.2.5, zlib 1.2.6 immediately made its way to Debian unstable, from where it was pushed into testing :smile:
|
Hello,
I've been factoring small SNFS composites from factordb with my old P4 (all it can usefully cope with) and found a few had had very little ECM done against them. This can cause msieve problems as in the log below: Fri Mar 23 18:05:57 2012 Msieve v. 1.49 (SVN 18sep2011) Fri Mar 23 18:05:57 2012 random seeds: ef4ea53f 50c26a13 Fri Mar 23 18:05:57 2012 factoring 392897621444950534760946692014491292757040653381539546673010579841680740738641371664069 (87 digits) Fri Mar 23 18:05:58 2012 searching for 15-digit factors Fri Mar 23 18:05:59 2012 commencing number field sieve (81-digit input) Fri Mar 23 18:05:59 2012 warning: NFS input not found in factor base file Fri Mar 23 18:05:59 2012 p7 factor: 1453411 Fri Mar 23 18:05:59 2012 c81 factor: 270327953651754758124815824301929249714664780562098089716543069951776022569418679 Fri Mar 23 18:05:59 2012 elapsed time 00:00:02 Apart from "don't do that then" is there a way to make msieve work in this case? Chris |
Not as written; small factors are always pulled out of any input, and I don't know what would happen if you forced NFS to happen anyway. Fortunately even using QS you should be done with a c81 in ~10 minutes (even faster with YAFU's QS).
|
The thing that's bitten me a few times was that factMsieve.pl didn't notice the messages, so kept sieving relations and calling msieve expecting it to create the cols file. Which wasted several hours overnight since I wasn't watching the screen.
I'll probably update the script calling factMsieve.pl to run ecm to 20 digits first. That should stop it. I've also had a few cases where it didn't always find the factor by ECM. I've had to re-run the square root stage for a few numbers until ECM failed and the square root worked. I've also had cases where after removing all the algebraic and small factors from the composite provided by factordb I was left with a prime number. But that's not msieve's fault. Chris |
linear algebra loop.
I've just come home and found the following screen output:
[code] =>nice -n 19 "/home/chris/ggnfs/bin/msieve" -s d1532_34_2.dat -l ggnfs.log -i d1532_34_2.ini -v -nf d1532_34_2.fb -t 2 -nc2 Msieve v. 1.49 (SVN 18sep2011) Tue May 1 12:48:30 2012 random seeds: a5a730ef e813bf12 factoring 75535181078074845018437161937882151367685849773415089372524470237498845846659369667108907186263191136699 (104 digits) searching for 15-digit factors commencing number field sieve (104-digit input) R0: -30343810840966799284043776 R1: 1 A0: 1 A1: 0 A2: 766 A3: 0 A4: 1173512 skew 0.03, size 5.861e-12, alpha 1.196, combined = 8.786e-08 rroots = 0 commencing linear algebra read 61231 cycles cycles contain 186631 unique relations read 186631 relations using 20 quadratic characters above 33539634 building initial matrix memory use: 21.7 MB read 61231 cycles matrix is 61053 x 61231 (18.6 MB) with weight 5560889 (90.82/col) sparse part has weight 4193873 (68.49/col) filtering completed in 2 passes matrix is 61050 x 61228 (18.6 MB) with weight 5560797 (90.82/col) sparse part has weight 4193853 (68.50/col) matrix starts at (0, 0) matrix is 61050 x 61228 (18.6 MB) with weight 5560797 (90.82/col) sparse part has weight 4193853 (68.50/col) saving the first 48 matrix rows for later matrix includes 64 packed rows matrix is 61002 x 61228 (17.3 MB) with weight 4389705 (71.69/col) sparse part has weight 3921569 (64.05/col) using block size 24400 for processor cache size 1024 kB commencing Lanczos iteration memory use: 12.7 MB linear algebra at 78.7%, ETA 0h 0m61228 dimensions (78.7%, ETA 0h 0m) linear algebra completed 21442763 of 61228 dimensions (35021.2%, ETA 925h 2m) [/code] Notice completed dimensions is much larger than total dimensions. I've killed it since it was going nowhere, but saved all the files it created. Any idea what was wrong? Chris PS. The SVN 18sep2011 is because I don't have SVN set up on this system, I tweaked the makefile to make it say 18sep2011 because I downloaded msieve then. |
Looks like the LA is not converging.
It's a small enough dataset that I can download it if you post it somewhere, but offhand I'm wondering how stable the machine is. A small matrix like this does not have error checking enabled, so if it's a transient glitch you can try rerunning. |
I tried a rerun and it worked. So it looks like a transient glitch in my system.
It's the first such on this system that I know of. But there's always a first time for everything. Chris |
My last post was tempting fate. It locked up hard so I had to power cycle it to recover. So don't spend too much time looking for a bug in msieve.
Chris |
This is the spring effect. Summer is coming. Ambient T's will be only climbing. Lower all frequencies preventively, and it's a good time to blow the system (and for example every five [~monthly] blowing cycles I go for a deep cleaning: taking apart the Tuniq Tower, cleaning all the cages in the Antec 900... Now that the GPU is added, will have a look how deep I'd want to go with its cooler. *Notes to myself.)
|
[QUOTE=Batalov;298193]This is the spring effect.[/QUOTE]
You hit a spring with a hammer, the hammer jumps on your head. |
| All times are UTC. The time now is 04:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.