![]() |
|
|
#45 | |
|
Jun 2003
Ottawa, Canada
117310 Posts |
Quote:
I rebuilt msieve using a local build of zlib 1.2.5 copying zlib.h and zconfig.h into msieve-1.46/include and linking directly with libz.a. Re-running it still stops very quickly. Do I need to try this with the latest trunk or should it also work with v1.46? Jeff. |
|
|
|
|
|
|
#46 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36·13 Posts |
Very peculiar. Is it the same position in the file?
Code:
Thu Aug 12 12:14:43 2010 found 5716 hash collisions in 514614 relations Could you check if there's something going on around the 514615-th line of the .dat file? Code:
tail +514614 *.dat |head |od -c EDIT: actually, this may be the 13Gb size mod(4Gb). Coult you test ls -l *.dat ; then dc <its_size> 2 32 ^ % p then head -514615 *.dat | wc ==> do these numbers match? If so, then libz itself is still miscompiled. It is possible that libz's own ./configure guesses your settings incorrectly. Could you try this: in zlib-1.2.5 folder make clean; ./configure; (possibly ./configure --64 ); check what's going on with flags -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64 (are they there?); make check... hmmm. dunno what else. Use NO_ZLIB, I guess. Last fiddled with by Batalov on 2010-08-13 at 21:13 |
|
|
|
|
|
#47 |
|
Sep 2004
2·5·283 Posts |
|
|
|
|
|
|
#48 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
224058 Posts |
Even though I've done this before, I've just now concluded a test with my local archival set for 5,391- (which was a gnfs-161 with 10.7Gb relation file, stored as a 5.3Gb gzipped file): first as is (with the gzipped test.dat.gz), then with ungzipped plain file. Used the latest SVN 374 (sans gzdirect, i.e. with a hand-written alternative). All passed here in both tests: 100122031 unique relations were read from disk and started to filter.
Just a hunch, kind of a last straw (out of ideas): Is that CentOS 32-bit? I cannot debug a similar situation (don't have such system), but it should still be debuggable and it should work - it's not a problem, just may need care in configuring. |
|
|
|
|
|
#49 | ||
|
Jun 2003
Ottawa, Canada
22258 Posts |
Quote:
tail +514614 599_83_minus1.dat |head |od -c tail: cannot open `+514614' for reading: No such file or directory If you want to see the line at 51614 and a little beyond I tried: head -n 514617 599_83_minus1.dat | tail -n 5 Code:
-26867375,13302087:ab58255,154AB,2ED6D5,5083D5,133BFB3,67,593,815,CC7,2:ef192d5,2191d6f,35BF,4D039,ABFB9,D,E9,409,2,109E473 6350755,11604967:2069,46BDF,631ED,301571,53EDF3,1CF9,3,3,3,3,3,3,3,17,125,1BB,A97,2,2:1559b675,4781ED,841D11,9DCD27,B94C4D,17,2,109E473 -47745497,17042437:12a773dd,37a2213,4B829,4F51AB,1C98391,9D9,3,3,7,7,3B,71,2:15af169,330dfdf5,2ABB,55C37,3D6EF7,B,B,11,83,2,109E473 -30873894,1144465:29C7B,7E523,11CE4F,263273,67F8BF,9C11BD,2E5A181:16b0c2b,15a76897,74AB,12143,38ED99,B,B,D,1BB,109E473 315644,773821:ee40ef3,200E1,EBC91,1475A5,19FEEF5,AE5,851,C7,1B1:B98D,1DF81,30EBF,D5689,D,3B,6B,BF,1CD,109E473 head -n 514615 599_83_minus1.dat | tail -n 1 | od -c Code:
0000000 - 4 7 7 4 5 4 9 7 , 1 7 0 4 2 4 0000020 3 7 : 1 2 a 7 7 3 d d , 3 7 a 2 0000040 2 1 3 , 4 B 8 2 9 , 4 F 5 1 A B 0000060 , 1 C 9 8 3 9 1 , 9 D 9 , 3 , 3 0000100 , 7 , 7 , 3 B , 7 1 , 2 : 1 5 a 0000120 f 1 6 9 , 3 3 0 d f d f 5 , 2 A 0000140 B B , 5 5 C 3 7 , 3 D 6 E F 7 , 0000160 B , B , 1 1 , 8 3 , 2 , 1 0 9 E 0000200 4 7 3 \n 0000204 Quote:
13968122545 % 2^32 = 1,083,220,657 head -514615 599_83_minus1.dat | wc 514615 514615 62416924 |
||
|
|
|
|
|
#50 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36×13 Posts |
Thanks. Nothing matched the hunches though. It's not a mod 2^32 casting and it doesn't look like a defective line. So, it reads 62Mb of data and miraculously stops reading.
I have another (rhetorical) question: why did one version complain about these relations (just two before stopping reading): Code:
Thu Aug 12 12:14:33 2010 error -15 reading relation 14690 Thu Aug 12 12:14:39 2010 error -9 reading relation 290519 Thu Aug 12 12:14:43 2010 found 5716 hash collisions in 514614 relations Could you please email the ggnfs.log s (for all runs, were there at least three) to me? I'll try to look for a pattern. Thx. --Serge |
|
|
|
|
|
#51 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
If the problem only needs <1M relations to manifest, that would only be a 50-100MB download...
|
|
|
|
|
|
#52 | |
|
May 2008
3·5·73 Posts |
Quote:
|
|
|
|
|
|
|
#53 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
100101000001012 Posts |
That's a possibility. But it would have created a ton of errors if it were not from this number. May be a partial file from some collaborator.
Jeff? ___ I'll put a warning if both files exist (or should it be a complete stop for the operator's decision? the program cannot decide which of the files is better; larger is not necessarily better). Last fiddled with by Batalov on 2010-08-14 at 03:02 |
|
|
|
|
|
#54 | |
|
Jun 2003
Ottawa, Canada
3·17·23 Posts |
Quote:
Good idea to put the warning in. Thanks for everyone's suggestions. Sorry for all the trouble Jeff. |
|
|
|
|
|
|
#55 |
|
Jun 2003
Ottawa, Canada
117310 Posts |
Now that I built the matrix, I'm running into a problem with the MPI version (which may be the same as em99010pepe is seeing where it just stops after a while).
Using 36 cpus with -nc2 6,6 I get. Code:
linear algebra at 0.0%, ETA 36h41m checkpointing every 196637 dimensions linear algebra completed 586841 of 7417185 dimensions (7.9%, ETA 34h37m) error: corrupt state, please restart from checkpoint received signal 15; shutting down *UPDATE*: Crap, when I try to restart it gives me the corrupt message immediately: Code:
restarting at iteration 9329 (dim = 589938) error: corrupt state, please restart from checkpoint Last fiddled with by Jeff Gilchrist on 2010-08-14 at 10:59 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Msieve 1.53 feedback | xilman | Msieve | 149 | 2018-11-12 06:37 |
| Msieve 1.50 feedback | firejuggler | Msieve | 99 | 2013-02-17 11:53 |
| Msieve 1.43 feedback | Jeff Gilchrist | Msieve | 47 | 2009-11-24 15:53 |
| Msieve 1.42 feedback | Andi47 | Msieve | 167 | 2009-10-18 19:37 |
| Msieve 1.41 Feedback | Batalov | Msieve | 130 | 2009-06-09 16:01 |