mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   Error running GGNFS+msieve+factmsieve.py (https://www.mersenneforum.org/showthread.php?t=15658)

D. B. Staple 2011-06-10 11:11

Error running GGNFS+msieve+factmsieve.py
 
Hello,

I have been getting crashes during Lattice sieving while running factmsieve.py in linux with 4,8, or 16 cores on numbers in the ~110 digit range.

A tyical crash is preceeded by a drop in the "Total yield:" line in stderr, and the following python error:
ValueError: invalid literal for int() with base 10: ''
On stdout I only get the message "siever terminated"
In the .log I don't get any indication at all that an error was encountered.

In some cases simply restarting the script allows the calculation to continue to completion. Sometimes the script has to be restarted several times, and sometimes it seems that restarting the script does not help.

The crashes are reproducible in the sense that rerunning the same job with the same polynomial seems to crash at a similar spot. However, the crashes are not "exactly the same" in the sense that not all of the output appears identical. I haven't checked the differences carefully yet.

One more note: the numbers in question could be factored using SNFS. I am nevertheless using GNFS, even though SNFS would be faster. Nevertheless, I want to sort out this problem I'm having with GNFS.

I've attached .log, .n, .poly, stderr, and stdout files for a typical crash. Has anyone encountered this error before? Any indication what is going on here?

Best,

Doug

D. B. Staple 2011-06-10 11:23

(Files)
 
1 Attachment(s)
(Here are the aformentioned files)

D. B. Staple 2011-06-12 09:57

Bug fix
 
Hello everyone,

It looks like I found a bug in factmsieve.py. I don't know why this bug is only showing itself on my system; presumably there's something strange going on on my system causing the python script to be abused, causing the bug to show itself. Anyway, here's the fix:

I suggest to replace this code:
[CODE]def read_spq(fact_p):
for j in range(SV_THREADS):
ql, qp, qh = fact_p['q_dq'][j]
try:
with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f:
tmp = remove_ws(chomp(in_f.readline()))
if tmp:
t = int(chomp(tmp))
if t > qp:
fact_p['q_dq'][j] = (ql, t, qh)
except IOError:
pass[/CODE]With this modified version:
[CODE]def read_spq(fact_p):
for j in range(SV_THREADS):
ql, qp, qh = fact_p['q_dq'][j]
try:
with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f:
tmp = in_f.readline()
try:
t = int(tmp)
except ValueError:
pass
else:
if t > qp:
fact_p['q_dq'][j] = (ql, t, qh)
except IOError:
pass[/CODE]To be honest I don't completely understand the crash of the original code. Ultimately, chomp(tmp) sometimes returns an empty string, causing int() to throw a ValueException. This doesn't make sense to me, because if tmp is empty then it should not pass the line containing 'if tmp:'. Perhaps tmp is sometimes composed entirely of line breaks, such that tmp is nonempty but chomp(tmp) is an empty string. However, this doesn't make sense, because remove_ws should ensure that tmp contains no linebreaks. So ultimately I don't yet understand how the original crash occurs; perhaps there is a bug in chomp or remove_ws that I haven't found. My impression is that the problem gets worse with a larger number of threads, so perhaps it's related to locking .last_spq or something of that nature.

In any case, it seems to me that these few lines are not robustly written and can be improved.
Firstly, why do we call chomp() twice when remove_ws() is called? remove_ws will also remove line breaks and carriage returns.
Secondly, python has built-in functions for removing whitespace, they're called strip, lstrip, rstrip, etc., so one should probably not write his own. Try this at a Python terminal:
[CODE]temp='\n\n\r\r\t\t 14 \n\n\r'
temp.strip()[/CODE]Thirdly, int() is already coded to remove whitespace, see [URL]http://docs.python.org/library/functions.html#int[/URL] , so actually whitespace removal before calling int() is redundant.
Finally, why not simply add try/except to int()? That way the string is only parsed if it can give a sensible result.

In the end I suggest modifying the code as described above, which solves the issue and allows me to run crash-free. However, I'm not an expert python programmer, so maybe it's not perfect. If anyone is interested then perhaps we can try to nail down the error further.

Brian Gladman 2011-06-12 11:54

The reason why I don't use Python string funtions is that much of this code is a relatively unintelligent tranlation from the original Perl code and this made it convenient to just provide Python equivalents of Perl functions.

Since Perl and Python have radically different philosophies, the translation has resulted in pretty poor Python in many places. This also accounts for some of the redundancy. So there are, I am afraid, a lot of places where the code could be greatly improved.

In the area you are looking at, I might be inclined to use:

[CODE]def read_spq(fact_p):
for j in range(SV_THREADS):
ql, qp, qh = fact_p['q_dq'][j]
try:
with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f:
try:
t = int(remove_ws(in_f.readline()))
except ValueError:
pass
if t > qp:
fact_p['q_dq'][j] = (ql, t, qh)
except IOError:
pass
[/CODE]

D. B. Staple 2011-06-12 21:18

Brian,

Thanks for taking a look at it. I'd be happy if my comment led to a change in a future version of your script. That way I could say with a straight face that I contributed something.

Best,

Doug

D. B. Staple 2011-06-12 21:32

P.S.
I think I solved the mystery -- the result of the readline is a bunch of null characters. These pass through both the if statement and the whitespace removal, and cause int() to throw a ValueException.

I used this test code:
[CODE]def read_spq(fact_p):
for j in range(SV_THREADS):
ql, qp, qh = fact_p['q_dq'][j]
try:
with open('.last_spq' + str(100 * PNUM + j), 'r') as in_f:
raw_line = in_f.readline()
tmp = remove_ws(chomp(raw_line))
if tmp:
try:
t = int(chomp(tmp))
except ValueError:
print('raw_line: ', raw_line)
print('tmp: ', tmp)
print('chomp(tmp): ', chomp(tmp))
raise
else:
if t > qp:
fact_p['q_dq'][j] = (ql, t, qh)
except IOError:
pass[/CODE]Which produced this output immediately prior to the crash:
[CODE]
raw_line: ^@^@^@^@^@^@^@^@
tmp: ^@^@^@^@^@^@^@^@
chomp(tmp): ^@^@^@^@^@^@^@^@
siever terminated
[/CODE]

Brian Gladman 2011-06-12 22:23

Thanks for the debugging - this is behaviour that is worth knowing about as it may explain other crashes that we see infrequently. In looking at this I have found some other errors so I will definitely be able to improve the code a bit as a result of your efforts. I will issue an update within the next few days.


All times are UTC. The time now is 05:51.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.