mersenneforum.org Error running GGNFS+msieve+factmsieve.py
 Register FAQ Search Today's Posts Mark Forums Read

 2011-06-10, 11:11 #1 D. B. Staple     Nov 2007 Halifax, Nova Scotia 23×7 Posts Error running GGNFS+msieve+factmsieve.py Hello, I have been getting crashes during Lattice sieving while running factmsieve.py in linux with 4,8, or 16 cores on numbers in the ~110 digit range. A tyical crash is preceeded by a drop in the "Total yield:" line in stderr, and the following python error: ValueError: invalid literal for int() with base 10: '' On stdout I only get the message "siever terminated" In the .log I don't get any indication at all that an error was encountered. In some cases simply restarting the script allows the calculation to continue to completion. Sometimes the script has to be restarted several times, and sometimes it seems that restarting the script does not help. The crashes are reproducible in the sense that rerunning the same job with the same polynomial seems to crash at a similar spot. However, the crashes are not "exactly the same" in the sense that not all of the output appears identical. I haven't checked the differences carefully yet. One more note: the numbers in question could be factored using SNFS. I am nevertheless using GNFS, even though SNFS would be faster. Nevertheless, I want to sort out this problem I'm having with GNFS. I've attached .log, .n, .poly, stderr, and stdout files for a typical crash. Has anyone encountered this error before? Any indication what is going on here? Best, Doug
2011-06-10, 11:23   #2
D. B. Staple

Nov 2007
Halifax, Nova Scotia

23·7 Posts
(Files)

(Here are the aformentioned files)
Attached Files
 to_mersenneforum.zip (27.5 KB, 280 views)

 2011-06-12, 09:57 #3 D. B. Staple     Nov 2007 Halifax, Nova Scotia 23×7 Posts Bug fix Hello everyone, It looks like I found a bug in factmsieve.py. I don't know why this bug is only showing itself on my system; presumably there's something strange going on on my system causing the python script to be abused, causing the bug to show itself. Anyway, here's the fix: I suggest to replace this code: Code: def read_spq(fact_p): for j in range(SV_THREADS): ql, qp, qh = fact_p['q_dq'][j] try: with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f: tmp = remove_ws(chomp(in_f.readline())) if tmp: t = int(chomp(tmp)) if t > qp: fact_p['q_dq'][j] = (ql, t, qh) except IOError: pass With this modified version: Code: def read_spq(fact_p): for j in range(SV_THREADS): ql, qp, qh = fact_p['q_dq'][j] try: with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f: tmp = in_f.readline() try: t = int(tmp) except ValueError: pass else: if t > qp: fact_p['q_dq'][j] = (ql, t, qh) except IOError: pass To be honest I don't completely understand the crash of the original code. Ultimately, chomp(tmp) sometimes returns an empty string, causing int() to throw a ValueException. This doesn't make sense to me, because if tmp is empty then it should not pass the line containing 'if tmp:'. Perhaps tmp is sometimes composed entirely of line breaks, such that tmp is nonempty but chomp(tmp) is an empty string. However, this doesn't make sense, because remove_ws should ensure that tmp contains no linebreaks. So ultimately I don't yet understand how the original crash occurs; perhaps there is a bug in chomp or remove_ws that I haven't found. My impression is that the problem gets worse with a larger number of threads, so perhaps it's related to locking .last_spq or something of that nature. In any case, it seems to me that these few lines are not robustly written and can be improved. Firstly, why do we call chomp() twice when remove_ws() is called? remove_ws will also remove line breaks and carriage returns. Secondly, python has built-in functions for removing whitespace, they're called strip, lstrip, rstrip, etc., so one should probably not write his own. Try this at a Python terminal: Code: temp='\n\n\r\r\t\t 14 \n\n\r' temp.strip() Thirdly, int() is already coded to remove whitespace, see http://docs.python.org/library/functions.html#int , so actually whitespace removal before calling int() is redundant. Finally, why not simply add try/except to int()? That way the string is only parsed if it can give a sensible result. In the end I suggest modifying the code as described above, which solves the issue and allows me to run crash-free. However, I'm not an expert python programmer, so maybe it's not perfect. If anyone is interested then perhaps we can try to nail down the error further.
 2011-06-12, 11:54 #4 Brian Gladman     May 2008 Worcester, United Kingdom 10338 Posts The reason why I don't use Python string funtions is that much of this code is a relatively unintelligent tranlation from the original Perl code and this made it convenient to just provide Python equivalents of Perl functions. Since Perl and Python have radically different philosophies, the translation has resulted in pretty poor Python in many places. This also accounts for some of the redundancy. So there are, I am afraid, a lot of places where the code could be greatly improved. In the area you are looking at, I might be inclined to use: Code: def read_spq(fact_p): for j in range(SV_THREADS): ql, qp, qh = fact_p['q_dq'][j] try: with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f: try: t = int(remove_ws(in_f.readline())) except ValueError: pass if t > qp: fact_p['q_dq'][j] = (ql, t, qh) except IOError: pass
 2011-06-12, 21:18 #5 D. B. Staple     Nov 2007 Halifax, Nova Scotia 23×7 Posts Brian, Thanks for taking a look at it. I'd be happy if my comment led to a change in a future version of your script. That way I could say with a straight face that I contributed something. Best, Doug
 2011-06-12, 21:32 #6 D. B. Staple     Nov 2007 Halifax, Nova Scotia 23·7 Posts P.S. I think I solved the mystery -- the result of the readline is a bunch of null characters. These pass through both the if statement and the whitespace removal, and cause int() to throw a ValueException. I used this test code: Code: def read_spq(fact_p): for j in range(SV_THREADS): ql, qp, qh = fact_p['q_dq'][j] try: with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f: raw_line = in_f.readline() tmp = remove_ws(chomp(raw_line)) if tmp: try: t = int(chomp(tmp)) except ValueError: print('raw_line: ', raw_line) print('tmp: ', tmp) print('chomp(tmp): ', chomp(tmp)) raise else: if t > qp: fact_p['q_dq'][j] = (ql, t, qh) except IOError: pass Which produced this output immediately prior to the crash: Code: raw_line: ^@^@^@^@^@^@^@^@ tmp: ^@^@^@^@^@^@^@^@ chomp(tmp): ^@^@^@^@^@^@^@^@ siever terminated
 2011-06-12, 22:23 #7 Brian Gladman     May 2008 Worcester, United Kingdom 72×11 Posts Thanks for the debugging - this is behaviour that is worth knowing about as it may explain other crashes that we see infrequently. In looking at this I have found some other errors so I will definitely be able to improve the code a bit as a result of your efforts. I will issue an update within the next few days.

 Similar Threads Thread Thread Starter Forum Replies Last Post aein Msieve 9 2019-02-25 14:09 FelicityGranger Msieve 2 2016-12-04 10:44 Romuald Msieve 0 2016-08-05 08:06 zukhruf Msieve 2 2015-11-25 12:58 Romuald Msieve 24 2015-11-09 20:16

All times are UTC. The time now is 21:08.

Fri Aug 12 21:08:08 UTC 2022 up 36 days, 15:55, 2 users, load averages: 1.47, 1.52, 1.34