mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-01-30, 20:17   #1
rudi_m
 
rudi_m's Avatar
 
Jul 2005

B616 Posts
Default mprime 29.1.8, losing work

Hi,

I'm using 29.1.8 pre-release for TF and noticed an issue when writing the job state to disk.

[Work thread Jan 30 19:34] Trial factoring M748000307 to 2^76 is 62.28% complete. Time: 1364.301 sec.
[Work thread Jan 30 19:57] Trial factoring M748000307 to 2^76 is 70.06% complete. Time: 1363.816 sec.
[Work thread Jan 30 20:19] Trial factoring M748000307 to 2^76 is 77.85% complete. Time: 1364.220 sec.
[Work thread Jan 30 20:42] Trial factoring M748000307 to 2^76 is 85.64% complete. Time: 1364.398 sec.
[Main thread Jan 30 21:04] Stopping all worker threads.
[Work thread Jan 30 21:04] Error writing intermediate file: f748000307
[Work thread Jan 30 21:04] Worker stopped.
[Main thread Jan 30 21:04] Execution halted.
[Main thread Jan 30 21:04] Choose Test/Continue to restart.

### here I killed mprime with SIGTERM and started again

[Main thread Jan 30 21:08] Mersenne number primality test program version 29.1
[Main thread Jan 30 21:08] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 6 MB
[Main thread Jan 30 21:08] Starting worker.
[Work thread Jan 30 21:08] Worker starting
[Work thread Jan 30 21:08] Setting affinity to run worker on CPU core #1
[Work thread Jan 30 21:08] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Jan 30 21:08] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Jan 30 21:08] Resuming trial factoring of M748000307 to 2^77
[Work thread Jan 30 21:08] Trial factoring M748000307 to 2^76 is 70.50% complete.


So it lost the work of about one hour. In the working directory I have such ".write" file (never seen that before):
Code:
-rw-r--r-- 1 rudi users     80 2017-01-30 19:58 f748000307
-rw-r--r-- 1 rudi users     80 2017-01-30 17:58 f748000307.bu
-rw-r--r-- 1 rudi users      0 2017-01-30 21:04 f748000307.write

Last fiddled with by rudi_m on 2017-01-30 at 20:18
rudi_m is offline   Reply With Quote
Old 2017-01-30, 22:21   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2·47·101 Posts
Default

The .write files are always created - you just cannot see them, because usually everything next happens very fast: Then, in sequence: .bu2 gets deleted, .bu get renamed into .bu2, "p* (or f*, or m*)" file get renamed into .bu and then .write renamed into "p* (or f*, or m*)" file.

Was the disk full, or did the anti-virus act up, or something like that?
Batalov is offline   Reply With Quote
Old 2017-01-31, 14:01   #3
rudi_m
 
rudi_m's Avatar
 
Jul 2005

B616 Posts
Default

Quote:
Originally Posted by Batalov View Post
Was the disk full, or did the anti-virus act up, or something like that?
Disk is fine and I can reproduce it on all machines. The normal write (DiskWriteTime) works. It's only broken when killed. Maybe the signal handler runs in the wrong thread. AFAIR I'd seen such potential issue when I looked into the source code last time.

Moreover I've noticed another issue. In the log below you see a crash where a TF job was finishing and then continuing an LL job:

Code:
[...]
[Work thread Jan 30 21:45] Trial factoring M776000053 to 2^77 is 81.06% complete.  Time: 1471.301 sec.
[Comm thread Jan 30 22:09] Updating computer information on the server
[Comm thread Jan 30 22:09] Sending expected completion date for M776000053: Jan 31 2017
[Comm thread Jan 30 22:09] Sending expected completion date for M39320129: Feb  2 2017
[Comm thread Jan 30 22:09] Sending expected completion date for M906000041: Feb  2 2017
[Comm thread Jan 30 22:09] Sending expected completion date for M907000063: Feb  3 2017
[Comm thread Jan 30 22:09] Sending expected completion date for M908000053: Feb  3 2017
[Comm thread Jan 30 22:09] Sending expected completion date for M909000061: Feb  4 2017
[Comm thread Jan 30 22:09] Done communicating with server.
[Work thread Jan 30 22:10] Trial factoring M776000053 to 2^77 is 85.10% complete.  Time: 1476.194 sec.
[Work thread Jan 30 22:34] Trial factoring M776000053 to 2^77 is 89.14% complete.  Time: 1468.951 sec.
[Work thread Jan 30 22:59] Trial factoring M776000053 to 2^77 is 93.18% complete.  Time: 1469.562 sec.
[Work thread Jan 30 23:24] Trial factoring M776000053 to 2^77 is 97.22% complete.  Time: 1513.716 sec.
[Work thread Jan 30 23:41] M776000053 no factor from 2^76 to 2^77, Wg8: 17ED2BFC
[Comm thread Jan 30 23:41] Sending result to server: UID: rudimeier/lakshmi, M776000053 no factor from 2^76 to 2^77, Wg8: 17ED2BFC, AID: FDFC9C4E4763840E571DB046AC77D795
[Comm thread Jan 30 23:41]
[Work thread Jan 30 23:41] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Jan 30 23:41] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Jan 30 23:41] Starting primality test of M39320129 using FMA3 FFT length 2M, Pass1=512, Pass2=4K, 3 threads
[Comm thread Jan 30 23:41] PrimeNet success code with additional info:
[Comm thread Jan 30 23:41] CPU credit is 19.7219 GHz-days.
[Comm thread Jan 30 23:41] Done communicating with server.
[Work thread Jan 30 23:43] Iteration: 36117/39320129, Possible error: round off (7.220114937e+22) > 0.40625
[Work thread Jan 30 23:43] Continuing from last save file.
*** Error in `./mprime': free(): invalid pointer: 0x00007f6e5c0123f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7364f)[0x7f6e653ed64f]
/lib64/libc.so.6(+0x78eae)[0x7f6e653f2eae]
/lib64/libc.so.6(+0x79b87)[0x7f6e653f3b87]
./mprime[0x44dc49]
./mprime[0x44dd3e]
./mprime[0x436182]
./mprime[0x439155]
./mprime[0x439375]
./mprime[0x4394e8]
./mprime[0x463efa]
/lib64/libpthread.so.0(+0x80db)[0x7f6e65e440db]
/lib64/libc.so.6(clone+0x6d)[0x7f6e65460e3d]
======= Memory map: ========
00400000-0263a000 r-xp 00000000 00:26 19924973                           /home/rudi/MPrime/mprime-29.1.8/mprime
0283a000-0284c000 rwxp 0223a000 00:26 19924973                           /home/rudi/MPrime/mprime-29.1.8/mprime
0284c000-02871000 rwxp 00000000 00:00 0
02af7000-02b18000 rwxp 00000000 00:00 0                                  [heap]
7f6e4afc3000-7f6e4c000000 rwxp 00000000 00:00 0
7f6e4c000000-7f6e4c021000 rwxp 00000000 00:00 0
7f6e4c021000-7f6e50000000 ---p 00000000 00:00 0
7f6e50000000-7f6e50086000 rwxp 00000000 00:00 0
7f6e50086000-7f6e54000000 ---p 00000000 00:00 0
7f6e54000000-7f6e54021000 rwxp 00000000 00:00 0
7f6e54021000-7f6e58000000 ---p 00000000 00:00 0
7f6e58000000-7f6e58024000 rwxp 00000000 00:00 0
7f6e58024000-7f6e5c000000 ---p 00000000 00:00 0
7f6e5c000000-7f6e5c05f000 rwxp 00000000 00:00 0
7f6e5c05f000-7f6e60000000 ---p 00000000 00:00 0
7f6e606dd000-7f6e61b34000 rwxp 00000000 00:00 0
7f6e61b34000-7f6e61b48000 r-xp 00000000 fe:02 1452658                    /lib64/libresolv-2.18.so
7f6e61b48000-7f6e61d47000 ---p 00014000 fe:02 1452658                    /lib64/libresolv-2.18.so
7f6e61d47000-7f6e61d48000 r-xp 00013000 fe:02 1452658                    /lib64/libresolv-2.18.so
7f6e61d48000-7f6e61d49000 rwxp 00014000 fe:02 1452658                    /lib64/libresolv-2.18.so
7f6e61d49000-7f6e61d4b000 rwxp 00000000 00:00 0
7f6e61d4b000-7f6e61d50000 r-xp 00000000 fe:02 1441945                    /lib64/libnss_dns-2.18.so
7f6e61d50000-7f6e61f4f000 ---p 00005000 fe:02 1441945                    /lib64/libnss_dns-2.18.so
7f6e61f4f000-7f6e61f50000 r-xp 00004000 fe:02 1441945                    /lib64/libnss_dns-2.18.so
7f6e61f50000-7f6e61f51000 rwxp 00005000 fe:02 1441945                    /lib64/libnss_dns-2.18.so
7f6e61f51000-7f6e61f5c000 r-xp 00000000 fe:02 1452653                    /lib64/libnss_files-2.18.so
7f6e61f5c000-7f6e6215b000 ---p 0000b000 fe:02 1452653                    /lib64/libnss_files-2.18.so
7f6e6215b000-7f6e6215c000 r-xp 0000a000 fe:02 1452653                    /lib64/libnss_files-2.18.so
7f6e6215c000-7f6e6215d000 rwxp 0000b000 fe:02 1452653                    /lib64/libnss_files-2.18.so
7f6e6215d000-7f6e6215e000 ---p 00000000 00:00 0
7f6e6215e000-7f6e6295e000 rwxp 00000000 00:00 0
7f6e6295e000-7f6e6295f000 ---p 00000000 00:00 0
7f6e6295f000-7f6e6315f000 rwxp 00000000 00:00 0
7f6e6315f000-7f6e63160000 ---p 00000000 00:00 0
7f6e63160000-7f6e63960000 rwxp 00000000 00:00 0
7f6e63960000-7f6e63961000 ---p 00000000 00:00 0
7f6e63961000-7f6e64161000 rwxp 00000000 00:00 0
7f6e64161000-7f6e64162000 ---p 00000000 00:00 0
7f6e64162000-7f6e64962000 rwxp 00000000 00:00 0
7f6e64962000-7f6e64963000 ---p 00000000 00:00 0
7f6e64963000-7f6e65163000 rwxp 00000000 00:00 0
7f6e65163000-7f6e65179000 r-xp 00000000 fe:02 1441914                    /lib64/libgcc_s.so.1
7f6e65179000-7f6e65378000 ---p 00016000 fe:02 1441914                    /lib64/libgcc_s.so.1
7f6e65378000-7f6e65379000 r-xp 00015000 fe:02 1441914                    /lib64/libgcc_s.so.1
7f6e65379000-7f6e6537a000 rwxp 00016000 fe:02 1441914                    /lib64/libgcc_s.so.1
7f6e6537a000-7f6e6551e000 r-xp 00000000 fe:02 1441863                    /lib64/libc-2.18.so
7f6e6551e000-7f6e6571e000 ---p 001a4000 fe:02 1441863                    /lib64/libc-2.18.so
7f6e6571e000-7f6e65722000 r-xp 001a4000 fe:02 1441863                    /lib64/libc-2.18.so
7f6e65722000-7f6e65724000 rwxp 001a8000 fe:02 1441863                    /lib64/libc-2.18.so
7f6e65724000-7f6e65728000 rwxp 00000000 00:00 0
7f6e65728000-7f6e6572b000 r-xp 00000000 fe:02 1441967                    /lib64/libdl-2.18.so
7f6e6572b000-7f6e6592a000 ---p 00003000 fe:02 1441967                    /lib64/libdl-2.18.so
7f6e6592a000-7f6e6592b000 r-xp 00002000 fe:02 1441967                    /lib64/libdl-2.18.so
7f6e6592b000-7f6e6592c000 rwxp 00003000 fe:02 1441967                    /lib64/libdl-2.18.so
7f6e6592c000-7f6e65a16000 r-xp 00000000 fe:02 22008                      /usr/lib64/libstdc++.so.6.0.18
7f6e65a16000-7f6e65c15000 ---p 000ea000 fe:02 22008                      /usr/lib64/libstdc++.so.6.0.18
7f6e65c15000-7f6e65c1d000 r-xp 000e9000 fe:02 22008                      /usr/lib64/libstdc++.so.6.0.18
7f6e65c1d000-7f6e65c1f000 rwxp 000f1000 fe:02 22008                      /usr/lib64/libstdc++.so.6.0.18
7f6e65c1f000-7f6e65c34000 rwxp 00000000 00:00 0
7f6e65c34000-7f6e65c3b000 r-xp 00000000 fe:02 1452659                    /lib64/librt-2.18.so
7f6e65c3b000-7f6e65e3a000 ---p 00007000 fe:02 1452659                    /lib64/librt-2.18.so
7f6e65e3a000-7f6e65e3b000 r-xp 00006000 fe:02 1452659                    /lib64/librt-2.18.so
7f6e65e3b000-7f6e65e3c000 rwxp 00007000 fe:02 1452659                    /lib64/librt-2.18.so
7f6e65e3c000-7f6e65e54000 r-xp 00000000 fe:02 1452649                    /lib64/libpthread-2.18.so
7f6e65e54000-7f6e66054000 ---p 00018000 fe:02 1452649                    /lib64/libpthread-2.18.so
7f6e66054000-7f6e66055000 r-xp 00018000 fe:02 1452649                    /lib64/libpthread-2.18.so
7f6e66055000-7f6e66056000 rwxp 00019000 fe:02 1452649                    /lib64/libpthread-2.18.so
7f6e66056000-7f6e6605a000 rwxp 00000000 00:00 0
7f6e6605a000-7f6e6615c000 r-xp 00000000 fe:02 1441968                    /lib64/libm-2.18.so
7f6e6615c000-7f6e6635b000 ---p 00102000 fe:02 1441968                    /lib64/libm-2.18.so
7f6e6635b000-7f6e6635c000 r-xp 00101000 fe:02 1441968                    /lib64/libm-2.18.so
7f6e6635c000-7f6e6635d000 rwxp 00102000 fe:02 1441968                    /lib64/libm-2.18.so
7f6e6635d000-7f6e6637d000 r-xp 00000000 fe:02 1452665                    /lib64/ld-2.18.so
7f6e6653e000-7f6e66545000 rwxp 00000000 00:00 0
7f6e6657a000-7f6e6657c000 rwxp 00000000 00:00 0
7f6e6657c000-7f6e6657d000 r-xp 0001f000 fe:02 1452665                    /lib64/ld-2.18.so
7f6e6657d000-7f6e6657e000 rwxp 00020000 fe:02 1452665                    /lib64/ld-2.18.so
7f6e6657e000-7f6e6657f000 rwxp 00000000 00:00 0
7ffe9750d000-7ffe9752e000 rwxp 00000000 00:00 0                          [stack]
7ffe975a5000-7ffe975a8000 r--p 00000000 00:00 0                          [vvar]
7ffe975a8000-7ffe975aa000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

And gdb backtrace:

Code:
$ gdb ./mprime core-mprime.1000.lakshmi.1485816220.16652.dump
[...]
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f6e653af4c9 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f6e653af4c9 in raise () from /lib64/libc.so.6
#1  0x00007f6e653b0958 in abort () from /lib64/libc.so.6
#2  0x00007f6e653ed654 in __libc_message () from /lib64/libc.so.6
#3  0x00007f6e653f2eae in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f6e653f3b87 in _int_free () from /lib64/libc.so.6
#5  0x000000000044dc49 in multithread_term ()
#6  0x000000000044dd3e in gwdone ()
#7  0x0000000000436182 in prime ()
#8  0x0000000000439155 in primeContinue ()
#9  0x0000000000439375 in LauncherDispatch ()
#10 0x00000000004394e8 in Launcher ()
#11 0x0000000000463efa in ThreadStarter ()
#12 0x00007f6e65e440db in start_thread () from /lib64/libpthread.so.0
#13 0x00007f6e65460e3d in clone () from /lib64/libc.so.6
(gdb)
rudi_m is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
I'm losing faith in my influence... seba2122 Prime Sierpinski Project 2 2015-07-22 23:46
Losing downdriver henryzz Aliquot Sequences 28 2011-05-12 23:48
Losing Downguide henryzz Aliquot Sequences 5 2010-02-10 22:42
How Do I Make Mprime Work In Ubuntu?? hesdeadjim Software 4 2010-01-01 19:03
Should mprime work like this? Carlos Software 4 2005-08-27 22:06

All times are UTC. The time now is 20:48.


Sun Aug 1 20:48:06 UTC 2021 up 9 days, 15:17, 0 users, load averages: 1.12, 1.40, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.