mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2009-11-17, 03:20   #23
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

947710 Posts
Default

The last crash is not very interesting.
All roots failes with these notes:
Code:
Mon Nov 16 07:01:53 2009 reading relations for dependency 1
Mon Nov 16 07:01:55 2009 read 2745464 cycles
Mon Nov 16 07:02:00 2009 cycles contain 8607833 unique relations
Mon Nov 16 07:03:45 2009 read 8607833 relations
Mon Nov 16 07:03:45 2009 number of relations is not even
Mon Nov 16 07:03:46 2009 reading relations for dependency 2
Mon Nov 16 07:03:48 2009 read 2747401 cycles
Mon Nov 16 07:03:53 2009 cycles contain 8610828 unique relations
Mon Nov 16 07:05:36 2009 read 8610828 relations
Mon Nov 16 07:05:56 2009 algebraic side is not a square!
Mon Nov 16 07:05:56 2009 reading relations for dependency 3
1. What if you tried and older binary?

2. Also check the .dat file for strange lines (but do not delete them, or the .dep will become incompatible).
awk -F: 'NF!=3 && !(NF==2 && $2=="")' your...dat
grep -v "^[-0-9A-Fa-f:,]" your...dat

(Of course the first line should pop out of these tests. It is "N ..........".)

3. Did you sieve with the latest sievers?
I wonder if this is because I "fixed" the "mpqs failed" bug. Symptoms: there will be relations that have the same primes in A or R sections.
Try on your...dat file:
Code:
 
#!/usr/bin/perl -w
 
while(<>) {
  print $_ if /:([0-9a-f]+),\1/;
}
PM if you will find any lines like these.

Last fiddled with by Batalov on 2009-11-17 at 03:26 Reason: while I was writing... doubled Greg's
Batalov is offline   Reply With Quote
Old 2009-11-17, 12:13   #24
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

I will try an older binary and see what happens.

Your grep line produced nothing. Your awk line produced this:

Code:
6271457,6292155:8bd91d1,1a50db01,2F45,9F7D,28AB6
-1284119,1398665:a4e6b6d,a3d37e9,2363,C9AF,1489C3,1933,5B3,5,7,7,47:1-50885,41286:1862ac23,7d0ee79,20FB,43403,14578D,1475395,11,2:d13ff5,ad18453,AD,5,5,5,43EA0D
28125995,8631287:2a9ec29,863b825,8551,CB29,FFF1,3FD1F,3,11,1F,101,35F:DE83,236EF3,314731222,798705:372ec05,73b52b5,FBD6B,1B83A1,3B63FB,1613,295,5,25:6e2d4a7,5CBF,72EF,D567,883,11,2B,2,2,2,2,5E4AB3
2655041,3207944:2288af7,44cee77,249B,28CC9,2D0BBF,1807,94D,3,3,7,47,
2144397,1
47612059,8753686:7aec3b7,8acbe3f,20309,342865,3E55FB,7CC073,3,43,2:395bc0d1410755,59527:d5510e3,1046cbdf,12FF5,130D27,273ECB,A75,BF,1AF:196b083,19AFE9,1C99,8BF,5,5,5,7,84885D
-6528060,3363431:20cc017,10430c03,11DA1,32
-13580215,5018876:1c749a5b,165b8415,268
-20853679,5295320:b62d6fb,1fdde25b,5D79,34781,4E8819,3,5,13,1F,1B1,259,2,
17273208,17023583:10dd9799,531c2f3,15745,19775,1B2DB,1C57DBD,1EBB
-14630629,2439270:8c957b1,ce8216d,ED65,1E0911,3BFFAD,9A204D,5,17,2:a609a156631,180874:826dfe9,d3c2973,2C93,118C7,C8207,1659F1,821,3,2:1c3c1b07,94872ad,6791,3,3,3,6FE47F
41901279,2127844:1896d3e7,c08cf51,61403,35C1AF,12149E1,59,8B,38F,
Your perl script produced about 36K of output. Do you want me to PM you that file?

I didn't do the sieving it was RSALS so I'm not sure what version of the siever it was.

Jeff.

Quote:
Originally Posted by Batalov View Post
1. What if you tried and older binary?

2. Also check the .dat file for strange lines (but do not delete them, or the .dep will become incompatible).
awk -F: 'NF!=3 && !(NF==2 && $2=="")' your...dat
grep -v "^[-0-9A-Fa-f:,]" your...dat

(Of course the first line should pop out of these tests. It is "N ..........".)

3. Did you sieve with the latest sievers?
I wonder if this is because I "fixed" the "mpqs failed" bug. Symptoms: there will be relations that have the same primes in A or R sections.
Try on your...dat file:
Code:
 
#!/usr/bin/perl -w
 
while(<>) {
  print $_ if /:([0-9a-f]+),\1/;
}
PM if you will find any lines like these.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-17, 13:27   #25
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

How would one get relations glued together like that? Appending a windows output file to a unix output file with different EOL characters? Appending to siever output that was interrupted so no final CR in the file? Lots of factorizations have a few lines like this; it doesn't explain the line with the dash though. That machine was probably super overclocked :)

Last fiddled with by jasonp on 2009-11-17 at 13:28
jasonp is offline   Reply With Quote
Old 2009-11-17, 13:57   #26
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Quote:
Originally Posted by jasonp View Post
How would one get relations glued together like that? Appending a windows output file to a unix output file with different EOL characters? Appending to siever output that was interrupted so no final CR in the file? Lots of factorizations have a few lines like this; it doesn't explain the line with the dash though. That machine was probably super overclocked :)
The sieving was done on RSALS via BOINC so there were probably unix and Windows binaries producing output files and they may have just been concatenated together without any EOL processing. I'm just guessing since I only received the final .dat file to process.

An update: all the windows binaries including the one that Jason produced on sourceforge (done with cygwin?) crash, but the Linux 64bit binary that I built with the 1.43 code did not crash and just completed showing prp58 and prp100 factors. So the Linux code seems to handle this situation fine.

I ran the 64bit Windows binary built with debug code, this is what I get.

Variables:
Code:
+		obj	0x0000000000649a50 {input=0x000000000012fbf0 "4119920635610832117027022146854969787051502053758103615653887536974906120680391680621246965735169915172641486888857683983247749384769531624818097656906464449" factors=0x0000000000000000 flags=1027 ...}	msieve_obj *
+		fb	0x000000000012d100 {rfb={...} afb={...} }	factor_base_t *
+		num_cycles_out	0x0000000000000000	unsigned int *
+		cycle_list_out	0x0000000000000000	la_col_t * *
+		num_relations_out	0x000000000012e234	unsigned int *
+		rlist_out	0x000000000012e278	relation_t * *
		compress	0	unsigned int
		dependency	36	unsigned int
+		cycle_list	0x0000000000000000 {data=??? weight=??? cycle={...} }	la_col_t *
+		rlist	0xcccccccccccccccc {a=??? b=??? rel_index=??? ...}	relation_t *
		num_cycles	0	unsigned int
		num_relations	3435973836	unsigned int
Call Stack:

Code:
>	msieve.exe!nfs_read_cycles(msieve_obj * obj=0x0000000000649a50, factor_base_t * fb=0x000000000012d100, unsigned int * num_cycles_out=0x0000000000000000, la_col_t * * cycle_list_out=0x0000000000000000, unsigned int * num_relations_out=0x000000000012e234, relation_t * * rlist_out=0x000000000012e278, unsigned int compress=0, unsigned int dependency=36)  Line 622 + 0x8 bytes	C
 	msieve.exe!nfs_find_factors(msieve_obj * obj=0x0000000000649a50, mp_t * n=0x000000000012fa00, factor_list_t * factor_list=0x000000000012f1b0)  Line 356	C
 	msieve.exe!factor_gnfs(msieve_obj * obj=0x0000000000649a50, mp_t * n=0x000000000012fa00, factor_list_t * factor_list=0x000000000012f1b0)  Line 160 + 0x1d bytes	C
 	msieve.exe!msieve_run_core(msieve_obj * obj=0x0000000000649a50, mp_t * n=0x000000000012fa00, factor_list_t * factor_list=0x000000000012f1b0)  Line 149 + 0x1d bytes	C
 	msieve.exe!msieve_run(msieve_obj * obj=0x0000000000649a50)  Line 241 + 0x1d bytes	C
 	msieve.exe!factor_integer(char * buf=0x000000000012fbf0, unsigned int flags=1027, char * savefile_name=0x0000000000645d61, char * logfile_name=0x0000000000645d4e, char * nfs_fbfile_name=0x0000000000645d29, unsigned int * seed1=0x000000000012fd94, unsigned int * seed2=0x000000000012fdb4, unsigned int max_relations=0, unsigned __int64 nfs_lower=0, unsigned __int64 nfs_upper=0, cpu_type cpu=cpu_core, unsigned int cache_size1=32768, unsigned int cache_size2=6291456, unsigned int num_threads=1, unsigned int mem_mb=0)  Line 183	C
 	msieve.exe!main(int argc=13, char * * argv=0x0000000000645c80)  Line 515	C
 	msieve.exe!__tmainCRTStartup()  Line 266 + 0x19 bytes	C
 	msieve.exe!mainCRTStartup()  Line 182	C
 	kernel32.dll!000000007794f56d() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!0000000077a83281()
If I take the .dat file and do a dos2unix or unix2dos will that screw up the dependencies and have to redo the matrix and linear algebra or would that be a good test to see if that allows the Windows binary to work?

I'm not sure if this makes a difference but I think with this number I did -nc1 on Windows, then -nc2 on unix, and then -nc3 on Windows (crash) but when I tried -nc3 on Unix it worked. Would that produce any incompatible files in the process?

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-11-17 at 14:01
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-17, 14:09   #27
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

The sourceforge binary was built using minGW with gcc 4.x

As long as the endianness between machines is the same, files can be generated on one OS and moved to another without issue. The only ascii file is the relation file, and the relation-reading code explicitly supports DOS and unix EOL characters simultaneously.

Does the linux binary also have the 'odd parity' errors on dependencies? If not, that's somewhat worrying and indicates as 32-bit issue that's more serious than not cleaning up on a null pointer.

See if changing the if() block at gnfs/relation.c:620 to
Code:
    if (num_cycles == 0) {
        free(cycle_list);
        if (num_cycles_out != NULL)
            *num_cycles_out = 0;

        if (cycle_list_out != NULL)
            *cycle_list_out = NULL;

        if (num_relations_out != NULL)
            *num_relations_out = 0;

        if (rlist_out != NULL)
            *rlist_out = NULL;
        return;
    }
prevents the crashes; code that needs relation lists and cycle lists often needs only one of those, so those arguments are allowed to be optional.

Last fiddled with by jasonp on 2009-11-17 at 14:18
jasonp is offline   Reply With Quote
Old 2009-11-17, 14:34   #28
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by jasonp View Post
Does the linux binary also have the 'odd parity' errors on dependencies? If not, that's somewhat worrying and indicates as 32-bit issue that's more serious than not cleaning up on a null pointer.
I will try your updated code. This is what is in the output file on the Linux binary, I'm assuming those would be put in the log as well not just on stdout?

Code:
Tue Nov 17 07:15:46 2009  commencing square root phase
Tue Nov 17 07:15:46 2009  reading relations for dependency 1
Tue Nov 17 07:15:47 2009  read 2745464 cycles
Tue Nov 17 07:15:57 2009  cycles contain 8607572 unique relations
Tue Nov 17 07:17:17 2009  read 8607572 relations
Tue Nov 17 07:18:24 2009  multiplying 8607572 relations
Tue Nov 17 07:27:09 2009  multiply complete, coefficients have about 209.62 million bits
Tue Nov 17 07:27:11 2009  initial square root is modulo 33327901
Tue Nov 17 07:44:47 2009  reading relations for dependency 2
Tue Nov 17 07:44:48 2009  read 2747401 cycles
Tue Nov 17 07:44:58 2009  cycles contain 8610710 unique relations
Tue Nov 17 07:46:18 2009  read 8610710 relations
Tue Nov 17 07:47:25 2009  multiplying 8610710 relations
Tue Nov 17 07:56:10 2009  multiply complete, coefficients have about 209.70 million bits
Tue Nov 17 07:56:12 2009  initial square root is modulo 33526301
Tue Nov 17 08:13:49 2009  sqrtTime: 3483
Tue Nov 17 08:13:49 2009  prp58 factor: X...
Tue Nov 17 08:13:49 2009  prp100 factor: Y...
Tue Nov 17 08:13:49 2009  elapsed time 00:58:05
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-17, 14:49   #29
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110101012 Posts
Default

Well, the code actually proceeded to the square root so odd parity didn't happen. Grrr.

Last fiddled with by jasonp on 2009-11-17 at 14:50
jasonp is offline   Reply With Quote
Old 2009-11-17, 17:00   #30
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

49516 Posts
Default

Quote:
Originally Posted by jasonp View Post
Well, the code actually proceeded to the square root so odd parity didn't happen. Grrr.
Your code change prevents msieve from crashing, so that would be a good thing to submit to SVN. I now get this output:

Code:
Tue Nov 17 11:49:33 2009  reading relations for dependency 35
Tue Nov 17 11:49:35 2009  read 2745153 cycles
Tue Nov 17 11:49:46 2009  cycles contain 8604690 unique relations
Tue Nov 17 11:52:37 2009  read 8604690 relations
Tue Nov 17 11:53:15 2009  algebraic side is not a square!
Tue Nov 17 11:53:18 2009  reading relations for dependency 36
Tue Nov 17 11:53:20 2009  read 0 cycles
Tue Nov 17 11:53:20 2009  reading relations for dependency 37
Tue Nov 17 11:53:22 2009  read 0 cycles
Tue Nov 17 11:53:22 2009  reading relations for dependency 38
Tue Nov 17 11:53:24 2009  read 0 cycles
Tue Nov 17 11:53:24 2009  reading relations for dependency 39
Tue Nov 17 11:53:27 2009  read 0 cycles
Tue Nov 17 11:53:27 2009  reading relations for dependency 40
Tue Nov 17 11:53:29 2009  read 0 cycles
Tue Nov 17 11:53:29 2009  reading relations for dependency 41
Tue Nov 17 11:53:31 2009  read 0 cycles
Tue Nov 17 11:53:31 2009  reading relations for dependency 42
Tue Nov 17 11:53:33 2009  read 0 cycles
Tue Nov 17 11:53:33 2009  reading relations for dependency 43
Tue Nov 17 11:53:36 2009  read 0 cycles
Tue Nov 17 11:53:36 2009  reading relations for dependency 44
Tue Nov 17 11:53:38 2009  read 0 cycles
Tue Nov 17 11:53:38 2009  reading relations for dependency 45
Tue Nov 17 11:53:40 2009  read 0 cycles
Tue Nov 17 11:53:40 2009  reading relations for dependency 46
Tue Nov 17 11:53:42 2009  read 0 cycles
Tue Nov 17 11:53:43 2009  reading relations for dependency 47
Tue Nov 17 11:53:45 2009  read 0 cycles
Tue Nov 17 11:53:45 2009  reading relations for dependency 48
Tue Nov 17 11:53:47 2009  read 0 cycles
Tue Nov 17 11:53:47 2009  reading relations for dependency 49
Tue Nov 17 11:53:50 2009  read 0 cycles
Tue Nov 17 11:53:50 2009  reading relations for dependency 50
Tue Nov 17 11:53:52 2009  read 0 cycles
Tue Nov 17 11:53:52 2009  reading relations for dependency 51
Tue Nov 17 11:53:54 2009  read 0 cycles
Tue Nov 17 11:53:54 2009  reading relations for dependency 52
Tue Nov 17 11:53:57 2009  read 0 cycles
Tue Nov 17 11:53:57 2009  reading relations for dependency 53
Tue Nov 17 11:53:59 2009  read 0 cycles
Tue Nov 17 11:53:59 2009  reading relations for dependency 54
Tue Nov 17 11:54:01 2009  read 0 cycles
Tue Nov 17 11:54:01 2009  reading relations for dependency 55
Tue Nov 17 11:54:03 2009  read 0 cycles
Tue Nov 17 11:54:04 2009  reading relations for dependency 56
Tue Nov 17 11:54:06 2009  read 0 cycles
Tue Nov 17 11:54:06 2009  reading relations for dependency 57
Tue Nov 17 11:54:08 2009  read 0 cycles
Tue Nov 17 11:54:08 2009  reading relations for dependency 58
Tue Nov 17 11:54:10 2009  read 0 cycles
Tue Nov 17 11:54:10 2009  reading relations for dependency 59
Tue Nov 17 11:54:13 2009  read 0 cycles
Tue Nov 17 11:54:13 2009  reading relations for dependency 60
Tue Nov 17 11:54:15 2009  read 0 cycles
Tue Nov 17 11:54:15 2009  reading relations for dependency 61
Tue Nov 17 11:54:17 2009  read 0 cycles
Tue Nov 17 11:54:17 2009  reading relations for dependency 62
Tue Nov 17 11:54:19 2009  read 0 cycles
Tue Nov 17 11:54:19 2009  reading relations for dependency 63
Tue Nov 17 11:54:22 2009  read 0 cycles
Tue Nov 17 11:54:22 2009  reading relations for dependency 64
Tue Nov 17 11:54:24 2009  read 0 cycles
Tue Nov 17 11:54:24 2009  sqrtTime: 7636
Tue Nov 17 11:54:24 2009  elapsed time 02:07:26
For some reason it goes up to 64 dependencies. I'm going to double check that all my files on the Windows and Linux systems are identical in case something got messed up somewhere.

Last fiddled with by Jeff Gilchrist on 2009-11-17 at 17:01
Jeff Gilchrist is offline   Reply With Quote
Old 2009-11-17, 17:33   #31
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
Your awk line produced this:

Code:
6271457,6292155:8bd91d1,1a50db01,2F45,9F7D,28AB6
-1284119,1398665:a4e6b6d,a3d37e9,2363,C9AF,1489C3,1933,5B3,5,7,7,47:1-50885,41286:1862ac23,7d0ee79,20FB,43403,14578D,1475395,11,2:d13ff5,ad18453,AD,5,5,5,43EA0D
28125995,8631287:2a9ec29,863b825,8551,CB29,FFF1,3FD1F,3,11,1F,101,35F:DE83,236EF3,314731222,798705:372ec05,73b52b5,FBD6B,1B83A1,3B63FB,1613,295,5,25:6e2d4a7,5CBF,72EF,D567,883,11,2B,2,2,2,2,5E4AB3
2655041,3207944:2288af7,44cee77,249B,28CC9,2D0BBF,1807,94D,3,3,7,47,
2144397,1
47612059,8753686:7aec3b7,8acbe3f,20309,342865,3E55FB,7CC073,3,43,2:395bc0d1410755,59527:d5510e3,1046cbdf,12FF5,130D27,273ECB,A75,BF,1AF:196b083,19AFE9,1C99,8BF,5,5,5,7,84885D
-6528060,3363431:20cc017,10430c03,11DA1,32
-13580215,5018876:1c749a5b,165b8415,268
If relations don't have 3 ":"-separated fields (except free rels which have 2), then you had some uncompression or transmission errors, then the last part is unnecessary. Usually, msieve cleans up these pretty good though (you must have seen a lot of -5, -1 errors. If there's too many of these, something weird may have passed the filters and crept in the matrix. Then, I take off my question for now.
Batalov is offline   Reply With Quote
Old 2009-11-17, 19:31   #32
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·34·13 Posts
Default

Quote:
Originally Posted by jasonp View Post
How would one get relations glued together like that? Appending a windows output file to a unix output file with different EOL characters? Appending to siever output that was interrupted so no final CR in the file? Lots of factorizations have a few lines like this; it doesn't explain the line with the dash though. That machine was probably super overclocked :)
The dash is the beginning of a new relation with negative a. I don't know why this happens, but I've seen it very frequently using only Windows sievers. I suspect that the siever crashes without having flushed the output buffers, and the last output stopped mid-line. msieve has always discarded these without problems, but the duplicate checker I wrote and Serge modified specifically looks for and discards these lines. Does RSALS still use a quorum of 2? If so, it would seem that these would be very unlikely in their output.
frmky is offline   Reply With Quote
Old 2009-11-17, 20:25   #33
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

22258 Posts
Default

Quote:
Originally Posted by Batalov View Post
If relations don't have 3 ":"-separated fields (except free rels which have 2), then you had some uncompression or transmission errors, then the last part is unnecessary. Usually, msieve cleans up these pretty good though (you must have seen a lot of -5, -1 errors. If there's too many of these, something weird may have passed the filters and crept in the matrix. Then, I take off my question for now.
Bingo! So I checked out each of the .dat, .dat.cyc, dat.mat, dat.dep files and sure enough the .cyc and .mat files had different md5 hashes. As Batalov suggested it was probably a transmission or decompression error somewhere along the line. Once I re-sent the files and all the hashes matched, it worked fine in Windows without any of the previous errors about being even or algebraic side is not a square!

Sorry about that, but at least we tracked down a condition that msieve crashes on with corrupted data files that is now fixed.

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve 1.50 feedback firejuggler Msieve 99 2013-02-17 11:53
Msieve v1.48 feedback Jeff Gilchrist Msieve 48 2011-06-10 18:18
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 01:17.


Sat Jul 17 01:17:50 UTC 2021 up 49 days, 23:05, 1 user, load averages: 1.23, 1.16, 1.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.