mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-08-13, 19:31   #45
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

117310 Posts
Default

Quote:
Originally Posted by Batalov View Post
Yeah, but what's the platform and the zlib version?

Also, there may be a mismatch between zlib.h and libz.so ...? Could you attempt to build zlib , copy zlib.h to msieve/trunk/include and libz.a to msieve/trunk/ and link as ./libz.a ? That's what I would do (and did, for test purposes).
It is a CentOS 5.1 system with zlib 1.2.3.

I rebuilt msieve using a local build of zlib 1.2.5 copying zlib.h and zconfig.h into msieve-1.46/include and linking directly with libz.a.

Re-running it still stops very quickly. Do I need to try this with the latest trunk or should it also work with v1.46?

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2010-08-13, 20:52   #46
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

Very peculiar. Is it the same position in the file?
Code:
Thu Aug 12 12:14:43 2010  found 5716 hash collisions in 514614 relations
This is way under 4Gb or even 2Gb.
Could you check if there's something going on around the 514615-th line of the .dat file?
Code:
tail +514614 *.dat |head |od -c
__________

EDIT: actually, this may be the 13Gb size mod(4Gb). Coult you test ls -l *.dat ; then
dc
<its_size> 2 32 ^ % p

then head -514615 *.dat | wc ==> do these numbers match?


If so, then libz itself is still miscompiled. It is possible that libz's own ./configure guesses your settings incorrectly. Could you try this: in zlib-1.2.5 folder make clean; ./configure; (possibly ./configure --64 ); check what's going on with flags -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64 (are they there?); make check... hmmm. dunno what else. Use NO_ZLIB, I guess.

Last fiddled with by Batalov on 2010-08-13 at 21:13
Batalov is offline   Reply With Quote
Old 2010-08-13, 21:26   #47
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2·5·283 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
Meanwhile I started a new LA with 1.46 windows 64-bit version, let's see what happens. So far no problems.
Finish that task, no problem this time.
em99010pepe is offline   Reply With Quote
Old 2010-08-13, 23:57   #48
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

224058 Posts
Default

Even though I've done this before, I've just now concluded a test with my local archival set for 5,391- (which was a gnfs-161 with 10.7Gb relation file, stored as a 5.3Gb gzipped file): first as is (with the gzipped test.dat.gz), then with ungzipped plain file. Used the latest SVN 374 (sans gzdirect, i.e. with a hand-written alternative). All passed here in both tests: 100122031 unique relations were read from disk and started to filter.

Just a hunch, kind of a last straw (out of ideas): Is that CentOS 32-bit? I cannot debug a similar situation (don't have such system), but it should still be debuggable and it should work - it's not a problem, just may need care in configuring.
Batalov is offline   Reply With Quote
Old 2010-08-14, 00:45   #49
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

22258 Posts
Default

Quote:
Originally Posted by Batalov View Post
Very peculiar. Is it the same position in the file?
Code:
Thu Aug 12 12:14:43 2010  found 5716 hash collisions in 514614 relations
This is way under 4Gb or even 2Gb.
Could you check if there's something going on around the 514615-th line of the .dat file?

tail +514614 *.dat |head |od -c
I'm not sure if you have a special version of tail but I get an error:

tail +514614 599_83_minus1.dat |head |od -c
tail: cannot open `+514614' for reading: No such file or directory

If you want to see the line at 51614 and a little beyond I tried:
head -n 514617 599_83_minus1.dat | tail -n 5

Code:
-26867375,13302087:ab58255,154AB,2ED6D5,5083D5,133BFB3,67,593,815,CC7,2:ef192d5,2191d6f,35BF,4D039,ABFB9,D,E9,409,2,109E473
6350755,11604967:2069,46BDF,631ED,301571,53EDF3,1CF9,3,3,3,3,3,3,3,17,125,1BB,A97,2,2:1559b675,4781ED,841D11,9DCD27,B94C4D,17,2,109E473
-47745497,17042437:12a773dd,37a2213,4B829,4F51AB,1C98391,9D9,3,3,7,7,3B,71,2:15af169,330dfdf5,2ABB,55C37,3D6EF7,B,B,11,83,2,109E473
-30873894,1144465:29C7B,7E523,11CE4F,263273,67F8BF,9C11BD,2E5A181:16b0c2b,15a76897,74AB,12143,38ED99,B,B,D,1BB,109E473
315644,773821:ee40ef3,200E1,EBC91,1475A5,19FEEF5,AE5,851,C7,1B1:B98D,1DF81,30EBF,D5689,D,3B,6B,BF,1CD,109E473
The line itself in question:
head -n 514615 599_83_minus1.dat | tail -n 1 | od -c

Code:
0000000   -   4   7   7   4   5   4   9   7   ,   1   7   0   4   2   4
0000020   3   7   :   1   2   a   7   7   3   d   d   ,   3   7   a   2
0000040   2   1   3   ,   4   B   8   2   9   ,   4   F   5   1   A   B
0000060   ,   1   C   9   8   3   9   1   ,   9   D   9   ,   3   ,   3
0000100   ,   7   ,   7   ,   3   B   ,   7   1   ,   2   :   1   5   a
0000120   f   1   6   9   ,   3   3   0   d   f   d   f   5   ,   2   A
0000140   B   B   ,   5   5   C   3   7   ,   3   D   6   E   F   7   ,
0000160   B   ,   B   ,   1   1   ,   8   3   ,   2   ,   1   0   9   E
0000200   4   7   3  \n
0000204
Quote:
Originally Posted by Batalov View Post
EDIT: actually, this may be the 13Gb size mod(4Gb). Coult you test ls -l *.dat ; then
dc
<its_size> 2 32 ^ % p

then head -514615 *.dat | wc ==> do these numbers match?
13968122545 Aug 13 17:47 599_83_minus1.dat

13968122545 % 2^32 = 1,083,220,657

head -514615 599_83_minus1.dat | wc
514615 514615 62416924
Jeff Gilchrist is offline   Reply With Quote
Old 2010-08-14, 01:14   #50
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

Thanks. Nothing matched the hunches though. It's not a mod 2^32 casting and it doesn't look like a defective line. So, it reads 62Mb of data and miraculously stops reading.

I have another (rhetorical) question: why did one version complain about these relations (just two before stopping reading):
Code:
Thu Aug 12 12:14:33 2010  error -15 reading relation 14690
Thu Aug 12 12:14:39 2010  error -9 reading relation 290519
Thu Aug 12 12:14:43 2010  found 5716 hash collisions in 514614 relations
while the other didn't! The errors are either in the file or they aren't. Or are they in different places all the time?

Could you please email the ggnfs.log s (for all runs, were there at least three) to me? I'll try to look for a pattern. Thx. --Serge
Batalov is offline   Reply With Quote
Old 2010-08-14, 02:16   #51
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

If the problem only needs <1M relations to manifest, that would only be a 50-100MB download...
jasonp is offline   Reply With Quote
Old 2010-08-14, 02:25   #52
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by Batalov View Post
I have another (rhetorical) question: why did one version complain about these relations (just two before stopping reading):
Code:
Thu Aug 12 12:14:33 2010  error -15 reading relation 14690
Thu Aug 12 12:14:39 2010  error -9 reading relation 290519
Thu Aug 12 12:14:43 2010  found 5716 hash collisions in 514614 relations
while the other didn't! The errors are either in the file or they aren't. Or are they in different places all the time?
Serge, doesn't the program check for a msieve.dat.gz file, and reads from it if it exists (instead of the normal msieve.dat)? Could Jeff happen to have an old .gz dat file laying around with 514614 relations in it?
jrk is offline   Reply With Quote
Old 2010-08-14, 02:56   #53
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100101000001012 Posts
Default

That's a possibility. But it would have created a ton of errors if it were not from this number. May be a partial file from some collaborator.

Jeff?
___

I'll put a warning if both files exist (or should it be a complete stop for the operator's decision? the program cannot decide which of the files is better; larger is not necessarily better).

Last fiddled with by Batalov on 2010-08-14 at 03:02
Batalov is offline   Reply With Quote
Old 2010-08-14, 10:16   #54
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by Batalov View Post
I'll put a warning if both files exist (or should it be a complete stop for the operator's decision? the program cannot decide which of the files is better; larger is not necessarily better).
Good idea jrk! That was the problem. I had my big 13GB msieve.dat file but there was also an msieve.dat.gz file which was just a partial contribution so msieve was picking up just msieve.dat.gz and ignoring msieve.dat.

Good idea to put the warning in. Thanks for everyone's suggestions. Sorry for all the trouble

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2010-08-14, 10:22   #55
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

117310 Posts
Default

Now that I built the matrix, I'm running into a problem with the MPI version (which may be the same as em99010pepe is seeing where it just stops after a while).

Using 36 cpus with -nc2 6,6 I get.

Code:
linear algebra at 0.0%, ETA 36h41m
checkpointing every 196637 dimensions
linear algebra completed 586841 of 7417185 dimensions (7.9%, ETA 34h37m)    
error: corrupt state, please restart from checkpoint
received signal 15; shutting down
There is nothing I can see in the msieve.log.mpi* log files of corruption just the above message in stdout.

*UPDATE*: Crap, when I try to restart it gives me the corrupt message immediately:
Code:
restarting at iteration 9329 (dim = 589938)

error: corrupt state, please restart from checkpoint

Last fiddled with by Jeff Gilchrist on 2010-08-14 at 10:59
Jeff Gilchrist is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve 1.50 feedback firejuggler Msieve 99 2013-02-17 11:53
Msieve 1.43 feedback Jeff Gilchrist Msieve 47 2009-11-24 15:53
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 00:48.


Sat Jul 17 00:48:38 UTC 2021 up 49 days, 22:35, 1 user, load averages: 1.32, 1.47, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.