mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-12-27, 06:55   #1
tal
 
Nov 2010

32 Posts
Unhappy msieve segfaulting on the very last mile

I searched for a poly, sieved, and did steps -nc1 and -nc2 and now I'm invoking it:

./msieve -v -s pairs.dat -i ./nasa.n -nf ./msievejob_nasa_125_201012192006.fb -t 4 -nc3 -v

and it segfaults. I tried three times, then captured the fourth in gdb:

Code:
Program received signal SIGBUS, Bus error.
[Switching to Thread 0xb73d06c0 (LWP 16486)]
0xb76dac7c in __gmpn_mul () from /usr/lib/libgmp.so.3
(gdb) bt
#0  0xb76dac7c in __gmpn_mul () from /usr/lib/libgmp.so.3
#1  0xb76e9c48 in __gmpn_toom22_mul () from /usr/lib/libgmp.so.3
#2  0xb76daca7 in __gmpn_mul () from /usr/lib/libgmp.so.3
#3  0xb76e9c48 in __gmpn_toom22_mul () from /usr/lib/libgmp.so.3
#4  0xb76daca7 in __gmpn_mul () from /usr/lib/libgmp.so.3
#5  0xb76e9c48 in __gmpn_toom22_mul () from /usr/lib/libgmp.so.3
#6  0xb76daca7 in __gmpn_mul () from /usr/lib/libgmp.so.3
#7  0xb76e9c48 in __gmpn_toom22_mul () from /usr/lib/libgmp.so.3
#8  0xb76daca7 in __gmpn_mul () from /usr/lib/libgmp.so.3
#9  0xb76e9c48 in __gmpn_toom22_mul () from /usr/lib/libgmp.so.3
#10 0xb76daca7 in __gmpn_mul () from /usr/lib/libgmp.so.3
#11 0xb76cf7d9 in __gmpz_mul () from /usr/lib/libgmp.so.3
#12 0xb77611c2 in gmp_poly_mul (p1=<value optimized out>, p2=<value optimized out>, mod=0xbf8fa01c, free_p2=1) at gnfs/sqrt/sqrt_a.c:170
#13 0xb7761873 in multiply_relations (prodinfo=<value optimized out>, index1=<value optimized out>, index2=2616, prod=0xbf8f8488) at gnfs/sqrt/sqrt_a.c:368
#14 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=5232, prod=0xbf8f85d8) at gnfs/sqrt/sqrt_a.c:363
#15 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=10464, prod=0xbf8f8728) at gnfs/sqrt/sqrt_a.c:363
#16 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=20929, prod=0xbf8f8878) at gnfs/sqrt/sqrt_a.c:363
#17 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=41858, prod=0xbf8f89c8) at gnfs/sqrt/sqrt_a.c:363
#18 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=83717, prod=0xbf8f8b18) at gnfs/sqrt/sqrt_a.c:363
#19 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=167434, prod=0xbf8f8c68) at gnfs/sqrt/sqrt_a.c:363
#20 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=334869, prod=0xbf8f8db8) at gnfs/sqrt/sqrt_a.c:363
#21 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=669739, prod=0xbf8f8f08) at gnfs/sqrt/sqrt_a.c:363
#22 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=1339479, prod=0xbf8f9058) at gnfs/sqrt/sqrt_a.c:363
#23 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=2678958, prod=0xbf8f91a8) at gnfs/sqrt/sqrt_a.c:363
#24 0xb776183c in multiply_relations (prodinfo=0xbf8f9384, index1=594, index2=5357917, prod=0xbf8f9f54) at gnfs/sqrt/sqrt_a.c:363
#25 0xb7761eb0 in alg_square_root (obj=0xb92e2690, mp_alg_poly=0xbf8fb020, n=0xbf8fcac8, c=0xbf8fb808, m1=0xbf8fa2c4, m0=0xbf8fa244, rlist=0xa76ad008,
    num_relations=5357918, check_q=2147483713, sqrt_a=0xbf8fb884) at gnfs/sqrt/sqrt_a.c:692
#26 0xb7760aa3 in nfs_find_factors (obj=0xb92e2690, n=0xbf8fcac8, factor_list=0xbf8fc5cc) at gnfs/sqrt/sqrt.c:407
#27 0xb774ce1d in factor_gnfs (obj=0xb92e2690, n=0xbf8fcac8, factor_list=0xbf8fc5cc) at gnfs/gnfs.c:168
#28 0xb7733cd3 in msieve_run (obj=0xb92e2690) at common/driver.c:161
#29 0xb77313b7 in factor_integer (
    buf=0xbf8fce1c "8854464257519654019872571841021770907798370137229851767374182638539633345536731466751878779412095181258499956010238408230984422841802533298593262502566823", flags=<value optimized out>, savefile_name=0xbf8fe0be "pairs.dat", logfile_name=0x0,
    nfs_fbfile_name=0xbf8fe0d8 "./msievejob_nasa_125_201012192006.fb", seed1=0xbf8fce18, seed2=0xbf8fce14, max_relations=0, nfs_lower=0, nfs_upper=0,
    cpu=cpu_opteron, cache_size1=65536, cache_size2=2097152, num_threads=4, mem_mb=0, which_gpu=0) at demo.c:223
#30 0xb773240d in main (argc=12, argv=0xbf8fd074) at demo.c:685

(gdb) frame 12
#12 0xb77611c2 in gmp_poly_mul (p1=<value optimized out>, p2=<value optimized out>, mod=0xbf8fa01c, free_p2=1) at gnfs/sqrt/sqrt_a.c:170
170                     mpz_mul(tmp[i], p1->coeff[i], p2->coeff[d2]);
(gdb) print p1
$1 = <value optimized out>
(gdb) print p2
$2 = <value optimized out>
(gdb) print mod
$3 = (gmp_poly_t *) 0xbf8fa01c
(gdb) print i
$4 = 1
(gdb) print d1
$5 = 4
(gdb) print p1->coeff
Cannot access memory at address 0x4
(gdb) print p2->coeff
Cannot access memory at address 0x4
(gdb) print tmp
$6 = {{{_mp_alloc = 4242, _mp_size = 0, _mp_d = 0x964bea50}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x964ba928}}, {{_mp_alloc = 1, _mp_size = 0,
      _mp_d = 0x964b6728}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x96496e40}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x9648e190}}, {{_mp_alloc = 1,
      _mp_size = 0, _mp_d = 0x9648f228}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x9648c098}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x96494520}}, {{
      _mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c37b8}}}
(gdb) frame 13
#13 0xb7761873 in multiply_relations (prodinfo=<value optimized out>, index1=<value optimized out>, index2=2616, prod=0xbf8f8488) at gnfs/sqrt/sqrt_a.c:368
368             gmp_poly_mul(&prod1, &prod2, prodinfo->monic_poly, 1);
(gdb) print prod1
$7 = {degree = 4, coeff = {{{_mp_alloc = 2125, _mp_size = 2124, _mp_d = 0x964997a8}}, {{_mp_alloc = 2124, _mp_size = 2123, _mp_d = 0x9649d9e8}}, {{
        _mp_alloc = 2123, _mp_size = -2122, _mp_d = 0x964a5e68}}, {{_mp_alloc = 2122, _mp_size = 2121, _mp_d = 0x964a3d38}}, {{_mp_alloc = 2121,
        _mp_size = 2119, _mp_d = 0x964b6738}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c2fd0}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c2fe0}}, {{
        _mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c3048}}}}
(gdb) print prod2
$8 = {degree = 4, coeff = {{{_mp_alloc = 2124, _mp_size = 2123, _mp_d = 0x964ac1e0}}, {{_mp_alloc = 2123, _mp_size = 2122, _mp_d = 0x9648f238}}, {{
        _mp_alloc = 2122, _mp_size = -2121, _mp_d = 0x964aa0b0}}, {{_mp_alloc = 2121, _mp_size = 2119, _mp_d = 0x964b2500}}, {{_mp_alloc = 2120,
        _mp_size = 2118, _mp_d = 0x964bc928}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c33f8}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c3408}}, {{
        _mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c3418}}}}
(gdb) print prod1->coeff
$9 = {{{_mp_alloc = 2125, _mp_size = 2124, _mp_d = 0x964997a8}}, {{_mp_alloc = 2124, _mp_size = 2123, _mp_d = 0x9649d9e8}}, {{_mp_alloc = 2123,
      _mp_size = -2122, _mp_d = 0x964a5e68}}, {{_mp_alloc = 2122, _mp_size = 2121, _mp_d = 0x964a3d38}}, {{_mp_alloc = 2121, _mp_size = 2119,
      _mp_d = 0x964b6738}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c2fd0}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c2fe0}}, {{_mp_alloc = 1,
      _mp_size = 0, _mp_d = 0xbc4c3048}}}
(gdb) print prod2->coeff
$10 = {{{_mp_alloc = 2124, _mp_size = 2123, _mp_d = 0x964ac1e0}}, {{_mp_alloc = 2123, _mp_size = 2122, _mp_d = 0x9648f238}}, {{_mp_alloc = 2122,
      _mp_size = -2121, _mp_d = 0x964aa0b0}}, {{_mp_alloc = 2121, _mp_size = 2119, _mp_d = 0x964b2500}}, {{_mp_alloc = 2120, _mp_size = 2118,
      _mp_d = 0x964bc928}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c33f8}}, {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0xbc4c3408}}, {{_mp_alloc = 1,
      _mp_size = 0, _mp_d = 0xbc4c3418}}}
(gdb) print prod1->coeff[1]
$11 = {{_mp_alloc = 2124, _mp_size = 2123, _mp_d = 0x9649d9e8}}
(gdb) print prod2->coeff[1]
$12 = {{_mp_alloc = 2123, _mp_size = 2122, _mp_d = 0x9648f238}}
I'm so upset - I've begged, borrowed, and bartered enough CPU time to do the sieving in a day and a half, and I was hoping to show my family the fruits of my labor before going home - but now it seems all for naught.

It seems highly unlikely that I should be so unlucky to have found a bug on my very first run through of a large number; but I don't know what I did wrong. I have the output of the first two stages saved, and of course the poly, sieve data, .chk, .cyc, .dep, .mat, and .mat.idx files.

Can anyone give me suggestions on what to try, besides weeping softly (doesn't seem to be helping).
tal is offline   Reply With Quote
Old 2010-12-27, 08:17   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

191316 Posts
Default

Turn off the limit on stack space (I think ulimit -s is the command) and try again. Some older versions of gmp allocate unusually large objects on the stack during the big-multiply part of stage three, and most Linux distributions have rather small stack space limits.

If that doesn't work, download and compile gmp, then compile msieve linked against the freshly-compiled gmp.
fivemack is offline   Reply With Quote
Old 2010-12-27, 14:11   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354110 Posts
Default

Another thing to check is the gcc version you are using; compiling with gcc 4.1.x occasionally gave segfaults in the square root, which would go away if you recompiled with gcc 4.2.x

Also, calm down :)
jasonp is offline   Reply With Quote
Old 2010-12-28, 04:00   #4
tal
 
Nov 2010

118 Posts
Default

No dice.
  • ulimit reported unlimited to begin with
  • I was previously using gmp 4.3.2, I updated to 5.0.1 but it didn't fix the problem (more below)
  • Everything has been compiled with gcc 4.4.4

After changing the gmp version, I get Bus Errors:

Code:
multiplying 5357918 relations

Program received signal SIGBUS, Bus error.
[Switching to Thread 0xb73cb6c0 (LWP 4701)]
0xb76d729a in __gmpn_lshift () from /usr/local/lib/libgmp.so.10
(gdb) bt
#0  0xb76d729a in __gmpn_lshift () from /usr/local/lib/libgmp.so.10
#1  0xb7711abc in ?? () from /usr/local/lib/libgmp.so.10
#2  0xbf8d0f34 in ?? ()
#3  0xbf8d0f6c in ?? ()
#4  0xb76db0a5 in mpn_fft_mul_2exp_modF () from /usr/local/lib/libgmp.so.10
#5  0xbf8df7f0 in ?? ()
#6  0x00000032 in ?? ()
#7  0x00000010 in ?? ()
#8  0x991e8c2b in ?? ()
#9  0xbf8df7f0 in ?? ()
#10 0x0000000e in ?? ()
#11 0x00000032 in ?? ()
#12 0x00000010 in ?? ()
#13 0x00000000 in ?? ()
(gdb) quit
The program is running.  Exit anyway? (y or n) y



Program received signal SIGBUS, Bus error.
[Switching to Thread 0xb73456c0 (LWP 12979)]
0xb7650910 in __gmpn_add_n () from /usr/local/lib/libgmp.so.10
(gdb) bt
#0  0xb7650910 in __gmpn_add_n () from /usr/local/lib/libgmp.so.10
#1  0xbf93ab80 in ?? ()
#2  0xb768babc in ?? () from /usr/local/lib/libgmp.so.10
#3  0x0000010e in ?? ()
#4  0x0000010e in ?? ()
#5  0xb7678ab0 in __gmpn_mulmod_bnm1 () from /usr/local/lib/libgmp.so.10
#6  0x00000000 in ?? ()
The latter one is actually the debugging version of gmp (it didn't seem to help), configured as:

./configure --disable-shared --enable-assert --enable-alloca=debug --host=none CFLAGS=-g --enable-cxx CXXFLAGS=-g

I thought maybe I had a memory problem. It's got 2GB, and I ran memtester (userspace, runnable without shutting the machine down) twice over 1.5GB of RAM and it had no problems. top seemed to show mseieve using more like 20% of available memory.
tal is offline   Reply With Quote
Old 2010-12-31, 03:14   #5
tal
 
Nov 2010

10012 Posts
Default

Good news! I got it to work! Bad news! I don't know how!

I moved the files to another PC and it ran successfully 2 out of 3 times. It failed once with the following:

Code:
*** glibc detected *** ./msieve: free(): invalid next size (normal): 0xbd0bb038 ***
======= Backtrace: =========
/lib/libc.so.6(+0x6b901)[0xb7629901]
/lib/libc.so.6(+0x6d168)[0xb762b168]
/lib/libc.so.6(+0x70d7a)[0xb762ed7a]
/lib/libc.so.6(realloc+0xdd)[0xb762f34d]
/usr/lib/libgmp.so.3(__gmp_default_reallocate+0x29)[0xb774d7a9]
/usr/lib/libgmp.so.3(__gmpz_realloc2+0x4f)[0xb77614ff]
./msieve(+0x3366c)[0xb77f566c]
======= Memory map: ========
8b9c2000-8ca90000 rw-p 00000000 00:00 0
98486000-a6c86000 rw-p 00000000 00:00 0
a75c4000-a8692000 rw-p 00000000 00:00 0
ab8fa000-abefa000 rw-p 00000000 00:00 0
ad9ed000-aeabb000 rw-p 00000000 00:00 0
afb88000-b0c56000 rw-p 00000000 00:00 0
b5600000-b5621000 rw-p 00000000 00:00 0
b5621000-b5700000 ---p 00000000 00:00 0
b578e000-b57a8000 r-xp 00000000 08:23 4197034    /usr/lib/gcc/i686-pc-linux-gnu/4.4.4/libgcc_s.so.1
b57a8000-b57a9000 r--p 0001a000 08:23 4197034    /usr/lib/gcc/i686-pc-linux-gnu/4.4.4/libgcc_s.so.1
b57a9000-b57aa000 rw-p 0001b000 08:23 4197034    /usr/lib/gcc/i686-pc-linux-gnu/4.4.4/libgcc_s.so.1
b57bc000-b75be000 rw-p 00000000 00:00 0
b75be000-b76ff000 r-xp 00000000 08:23 4630462    /lib/libc-2.11.2.so
b76ff000-b7701000 r--p 00141000 08:23 4630462    /lib/libc-2.11.2.so
b7701000-b7702000 rw-p 00143000 08:23 4630462    /lib/libc-2.11.2.so
b7702000-b7705000 rw-p 00000000 00:00 0
b7705000-b771a000 r-xp 00000000 08:23 4630539    /lib/libpthread-2.11.2.so
b771a000-b771b000 r--p 00014000 08:23 4630539    /lib/libpthread-2.11.2.so
b771b000-b771c000 rw-p 00015000 08:23 4630539    /lib/libpthread-2.11.2.so
b771c000-b771e000 rw-p 00000000 00:00 0
b771e000-b7742000 r-xp 00000000 08:23 4630479    /lib/libm-2.11.2.so
b7742000-b7743000 r--p 00023000 08:23 4630479    /lib/libm-2.11.2.so
b7743000-b7744000 rw-p 00024000 08:23 4630479    /lib/libm-2.11.2.so
b7744000-b778e000 r-xp 00000000 08:23 4196167    /usr/lib/libgmp.so.3.5.2
b778e000-b778f000 r--p 00049000 08:23 4196167    /usr/lib/libgmp.so.3.5.2
b778f000-b7790000 rw-p 0004a000 08:23 4196167    /usr/lib/libgmp.so.3.5.2
b77a0000-b77a3000 rw-p 00000000 00:00 0
b77a3000-b77a4000 r-xp 00000000 00:00 0          [vdso]
b77a4000-b77c0000 r-xp 00000000 08:23 4630439    /lib/ld-2.11.2.so
b77c0000-b77c1000 r--p 0001b000 08:23 4630439    /lib/ld-2.11.2.so
b77c1000-b77c2000 rw-p 0001c000 08:23 4630439    /lib/ld-2.11.2.so
b77c2000-b7845000 r-xp 00000000 08:23 5219987    /home/tom/working/factor/lastmile/msieve
b7845000-b7846000 r--p 00082000 08:23 5219987    /home/tom/working/factor/lastmile/msieve
b7846000-b7847000 rw-p 00083000 08:23 5219987    /home/tom/working/factor/lastmile/msieve
b7a8e000-bfbb5000 rw-p 00000000 00:00 0          [heap]
bfbb5000-bfbef000 rw-p 00000000 00:00 0          [stack]
Aborted

real    58m21.829s
user    0m58.662s
sys     57m7.099s
From here, I'm going to try and figure out if it's the computer hardware failing (perhaps due to heat), or maybe something in the optimizations I was using.
tal is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
The "100 mile high" Club! Xyzzy Lounge 57 2019-11-09 23:46
Msieve on a Mac (Help) pxp Msieve 1 2013-02-28 14:56
msieve help em99010pepe Msieve 23 2009-09-27 16:13
fun with msieve masser Sierpinski/Riesel Base 5 83 2007-11-17 19:39
mprime 23.9 & 24.6 segfaulting chn Linux 1 2005-02-17 13:31

All times are UTC. The time now is 00:48.


Sat Jul 17 00:48:33 UTC 2021 up 49 days, 22:35, 1 user, load averages: 1.43, 1.49, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.