mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   GGNFS 64-bit sieve crash (https://www.mersenneforum.org/showthread.php?t=12887)

Raman 2009-12-19 09:54

GGNFS 64-bit sieve crash
 
While sieving for 7,320+
[code]n: 154758218728434566356200475118402575823277636002771024732774611297531505856368662705022790238791621423554625142974233023501037614707598549788891266955273526560342732155180961801351291833302132131894401
m: 1219760487635835700138573862562971820755615294131238401
c4: 1
c3: -1
c2: 1
c1: -1
c0: 1
skew: 1
type: snfs
rlim: 125000000
alim: 25000000
lpbr: 31
lpba: 29
mfbr: 62
mfba: 58
rlambda: 2.6
alambda: 2.6[/code]Rational side, at q = 28677949, I am getting up segmentation fault in GGNFS. Why?
Here is the job file for your reference purposes only.

By the way, I am sieving another 10 million range of special-q on the algebraic side for 2,1778L.

Will the matrix size be smaller for any number if used up with 30 bit large primes instead of 31 bit large primes, or 29 instead of 30? In order to accommodate up the matrix within 2 GB of RAM itself, only, if the 4 GB RAM system will not at all be ready very soon.

Batalov 2009-12-20 21:03

[quote=Raman;199348]gnfs-lasieve4I15e for 7,320+. 64 bit Linux binary, Core 2 Duo.[/quote]
Hmm. Still doesn't crash here. (This is like downforeveryoneorjustme.com - if a crash cannot be reproduced, it cannot be debugged. Even though I am interested to debug it.)

Following-up:
Can you reproduce it or was it just then?
What was the rlim for that particular chunk* (if you were using the perl script)?
In other words, what was the particular command line?

It would be most helpful if you could produce the exact command line and probably the pointer to the exact binary.

____
*if you don't know it, have a look at any consecutive command lines that are running now or before; using qintsize value it is possible to step back to the defective region. rlim is adjusted/lowered to the start_q once and then doesn't grow during the run of the program.

Raman 2009-12-21 05:00

[quote=Batalov;199427]
Following-up:
Can you reproduce it or was it just then?
What was the rlim for that particular chunk* (if you were using the perl script)?
In other words, what was the particular command line?

It would be most helpful if you could produce the exact command line and probably the pointer to the exact binary.
[/quote]

rlim was exactly equal the same value: 125000000, alim was 25000000. Just simply added up
q0: 28677949
qintsize: 322051
#q1: 29000000
within the job file.
Yes, it is true that the gnfs binary lowers down the value of special-q, thus setting it up to 28677948.
So, it has been lowered down to exactly equal to this value. Why is it so? Thus, does it have any effect
on cutting out some of the relations, which would have otherwise been produced up?

64 bit Linux binary whose source code was being downloaded up from the website of Mr. Jeff Gilchrist. Being compiled up by myself.

Command line is as follows:
nohup ~/64bit/gnfs-lasieve4I15e -k -o spairs2.out -v -n0 -r 7_320P.job
No perl script was being used up at all by me, just simply sieving up a different sieve range for each of the different computers, by using this command line itself only, thus.

No problem that I skipped up this special-q value from sieving, anyway, however.

HAPPY WINTER SOLSTICE DAY!
(NORTHERN HEMISPHERE - WINTER SOLSTICE)
(SOUTHERN HEMISPHERE - SUMMER SOLSTICE)
Remember that today is December 21 again, right back.

Batalov 2009-12-21 05:13

You did the right thing by skipping.
Then, there's nothing left to do for now, because it is not reproducible.

Happy Solstice Day!
(I do know quite a few people who celebrate it.)

jrk 2009-12-21 05:36

[QUOTE=Batalov;199469]Then, there's nothing left to do for now, because it is not reproducible.[/QUOTE]
FYI: I can reproduce it.

[code]
$ ~/ggnfs/bin/gnfs-lasieve4I15e -k -o spairs2.out -v -n0 -r 7_320P.job
Warning: lowering FB_bound to 28677948.
FBsize 1565317+0 (deg 4), 1780990+0 (deg 1)
Segmentation fault
[/code]

This is the experimental 64bit (athlon64) lasieve4I15e from SVN 353.

Batalov 2009-12-21 05:41

Ok, can you do it under gdb and do "bt" when it crashes?
Thanks.

I have this:
[code]$ ~/KF/gnfs_lasieve_source/gnfs-lasieve4I15e -r t.poly -f 28677949 -c 20
Warning: lowering FB_bound to 28677948.
total yield: 56, q=28677989 (0.35679 sec/rel)
[/code]

Aha. Found some old binary -- there's something; that's the unlikely old "mpqs failed" thing:
[CODE]
$ ../old_lasieve4_64/gnfs-lasieve4I15e -r t.poly -f 28677949 -c 22
Warning: lowering FB_bound to 28677948.
mpqs failed for 2926569634690573369(a,b): 10245291 49019062
total yield: 55, q=28677989 (0.35600 sec/rel)

[/CODE]
It still doesn't crash, but
I'll have a look at this one single different relation -- it has a prime square:
2926569634690573369 = 1710721963[SUP]2[/SUP]
Something is incompatible with your GMP library. Hmm. Will think.
Maybe it crashes in printf, actually.

Thanks for the case.

jrk 2009-12-21 05:44

[QUOTE=jrk;199474]This is the experimental 64bit (athlon64) lasieve4I15e from SVN 353.[/QUOTE]

Also I should probably mention, it is dynamically linked with mpir 1.2.1 instead of gmp.

jrk 2009-12-21 05:46

[QUOTE=Batalov;199476]Ok, can you do it under gdb and do "bt" when it crashes?
Thanks.[/QUOTE]

That is one of the next things I was preparing to do. First I wanted to try the latest SVN (I needed an excuse to update anyway).

If that still fails, I'll get a trace for you.

jrk 2009-12-21 06:48

[QUOTE=jrk;199479]That is one of the next things I was preparing to do. First I wanted to try the latest SVN (I needed an excuse to update anyway).

If that still fails, I'll get a trace for you.[/QUOTE]

Yes latest SVN still crashes. Here is a backtrace from gdb.

[code]#0 0x0000000000418a9a in mpqs_decompose () at mpqs.c:1334
#1 0x000000000041ac20 in mpqs_factor0 (N=0x721130, max_bits=31,
factors=0x7fff26c45b18, retry=1) at mpqs.c:1911
#2 0x000000000041ad3a in mpqs_factor (N=0x721130, max_bits=31,
factors=0x7fff26c45b18) at mpqs.c:1958
#3 0x000000000040eda0 in output_tdsurvivor (fbp_buf0=0x28d2a28,
fbp_buf0_ub=0x28d2a4c, fbp_buf1=0x28d2a4c, fbp_buf1_ub=0x28d2a58,
lf0=0x28ba4e0, lf1=0x28ba4f0) at gnfs-lasieve4e.c:3944
#4 0x000000000040eb8b in output_all_tdsurvivors () at gnfs-lasieve4e.c:3895
#5 0x000000000040acdb in main (argc=8, argv=0x7fff26c46d38)
at gnfs-lasieve4e.c:2686
[/code]

It only crashes if it is dynamically linked (default is to build static). Also it did not crash from within gdb so I had to load a core file into gdb instead.

I will post a valgrind output

jrk 2009-12-21 06:50

[code]$ valgrind ./gnfs-lasieve4I15e -k -o spairs2.out -v -n0 -r 7_320P.job
==15362== Memcheck, a memory error detector.
==15362== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==15362== Using LibVEX rev 1804, a library for dynamic binary translation.
==15362== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==15362== Using valgrind-3.3.0, a dynamic binary instrumentation framework.
==15362== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==15362== For more details, rerun with: -v
==15362==
Warning: lowering FB_bound to 28677948.
FBsize 1565317+0 (deg 4), 1780990+0 (deg 1)
==15362== Warning: set address range perms: large range 139230208 (undefined)
==15362== Conditional jump or move depends on uninitialised value(s)
==15362== at 0x40A5DB: main (gnfs-lasieve4e.c:2405)
==15362==
==15362== Invalid read of size 8
==15362== at 0x41D531: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x40AC4F: main (gnfs-lasieve4e.c:2651)
==15362== Address 0x4eec5e3 is not stack'd, malloc'd or (recently) free'd
==15362==
==15362== Invalid read of size 1
==15362== at 0x41D7EF: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x40AC4F: main (gnfs-lasieve4e.c:2651)
==15362== Address 0x4eec66c is not stack'd, malloc'd or (recently) free'd
==15362==
==15362== Conditional jump or move depends on uninitialised value(s)
==15362== at 0x41D7F3: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x40AC4F: main (gnfs-lasieve4e.c:2651)
==15362==
==15362== Conditional jump or move depends on uninitialised value(s)
==15362== at 0x41D7F7: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x40AC4F: main (gnfs-lasieve4e.c:2651)
==15362==
==15362== Invalid read of size 8
==15362== at 0x41D4FE: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x40AC4F: main (gnfs-lasieve4e.c:2651)
==15362== Address 0x4eebff9 is 65,529 bytes inside a block of size 65,536 alloc'd
==15362== at 0x4A04FC0: memalign (vg_replace_malloc.c:460)
==15362== by 0x40F3CF: xvalloc (if.w:103)
==15362== by 0x404EA8: main (gnfs-lasieve4e.c:999)
==15362==
==15362== Invalid read of size 4
==15362== at 0x41C2AA: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0xA90A423: ???
==15362== by 0x7FEFFF43F: ???
==15362== Address 0xa90a428 is 0 bytes after a block of size 7,123,960 alloc'd
==15362== at 0x4A0739E: malloc (vg_replace_malloc.c:207)
==15362== by 0x40F381: xmalloc (if.w:93)
==15362== by 0x405326: main (gnfs-lasieve4e.c:1080)
==15362==
==15362== Invalid read of size 4
==15362== at 0x41C2AA: (within /home/prime/tmp/gnfs-lasieve4I15e)
==15362== by 0x964CA3F: ???
==15362== by 0x7FEFFF43F: ???
==15362== Address 0x964ca44 is 0 bytes after a block of size 6,261,268 alloc'd
==15362== at 0x4A0739E: malloc (vg_replace_malloc.c:207)
==15362== by 0x40F381: xmalloc (if.w:93)
==15362== by 0x405326: main (gnfs-lasieve4e.c:1080)
==15362==
==15362== Use of uninitialised value of size 8
==15362== at 0x418A9A: mpqs_decompose (mpqs.c:1334)
==15362== by 0x41AC1F: mpqs_factor0 (mpqs.c:1911)
==15362== by 0x41AD39: mpqs_factor (mpqs.c:1958)
==15362== by 0x40ED9F: output_tdsurvivor (gnfs-lasieve4e.c:3944)
==15362== by 0x40EB8A: output_all_tdsurvivors (gnfs-lasieve4e.c:3895)
==15362== by 0x40ACDA: main (gnfs-lasieve4e.c:2686)
==15362==
==15362== Invalid read of size 2
==15362== at 0x418A9A: mpqs_decompose (mpqs.c:1334)
==15362== by 0x41AC1F: mpqs_factor0 (mpqs.c:1911)
==15362== by 0x41AD39: mpqs_factor (mpqs.c:1958)
==15362== by 0x40ED9F: output_tdsurvivor (gnfs-lasieve4e.c:3944)
==15362== by 0x40EB8A: output_all_tdsurvivors (gnfs-lasieve4e.c:3895)
==15362== by 0x40ACDA: main (gnfs-lasieve4e.c:2686)
==15362== Address 0x7348b8 is not stack'd, malloc'd or (recently) free'd
==15362==
==15362== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==15362== Access not within mapped region at address 0x7348B8
==15362== at 0x418A9A: mpqs_decompose (mpqs.c:1334)
==15362== by 0x41AC1F: mpqs_factor0 (mpqs.c:1911)
==15362== by 0x41AD39: mpqs_factor (mpqs.c:1958)
==15362== by 0x40ED9F: output_tdsurvivor (gnfs-lasieve4e.c:3944)
==15362== by 0x40EB8A: output_all_tdsurvivors (gnfs-lasieve4e.c:3895)
==15362== by 0x40ACDA: main (gnfs-lasieve4e.c:2686)
==15362==
==15362== ERROR SUMMARY: 421663 errors from 10 contexts (suppressed: 4 from 1)
==15362== malloc/free: in use at exit: 215,797,575 bytes in 68,357 blocks.
==15362== malloc/free: 68,809 allocs, 452 frees, 310,122,029 bytes allocated.
==15362== For counts of detected errors, rerun with: -v
==15362== searching for pointers to 68,357 not-freed blocks.
==15362== checked 94,228,568 bytes.
==15362==
==15362== LEAK SUMMARY:
==15362== definitely lost: 53,264 bytes in 12 blocks.
==15362== possibly lost: 0 bytes in 0 blocks.
==15362== still reachable: 215,744,311 bytes in 68,345 blocks.
==15362== suppressed: 0 bytes in 0 blocks.
==15362== Rerun with --leak-check=full to see details of leaked memory.
Segmentation fault
[/code]

Batalov 2009-12-21 06:54

[COLOR=green]I'll need to find a Core2, then. Static linking will help against linking to a wrong lib.[/COLOR]

Earlier wrote:
I've looked up the version 353 - that's fairly old (almost the original source), surely it did have problems. Crashed on me too (and it had zero yield in some ranges).

The new version however will need care at compilation time.
Re-read the INSTALL file --
========
on Core2 replace in athlon64/ls-defs.asm
- define(l1_bits,16)dnl
+ define(l1_bits,[B]15[/B])dnl
========
The C source part will use L1_BITS 15, but the asm part is not under control of any scripts, so simply edit that manually for Intel CPUs, all of them. If it is miscompiled, it will be easy to see though - it will complain all over the place (at run-time).

Athlons, Phenoms: will get a boost and nothing will need to be changed for them. (They will use L1_BITS 16 and define(l1_bits,16)dnl.)

Good luck. --S


All times are UTC. The time now is 21:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.