mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2013-08-11, 23:14   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72×131 Posts
Default msieve SVN940 problem

Doing the linear algebra for a medium-sized SNFS job on my Haswell, with a freshly-compiled SVN940 and running -t4, I get a segfault just as the Lanczos starts

Code:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9f38700 (LWP 615)]
0x000000000043e9c9 in mul_packed_small_core ()
(gdb) bt
#0  0x000000000043e9c9 in mul_packed_small_core ()
#1  0x000000000044a5b6 in worker_thr_routine ()
#2  0x00007ffff7235f8e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007ffff6f5fe1d in clone () from /lib/x86_64-linux-gnu/libc.so.6
which looks as if mul_packed_small_core has been called with task->matrix==NULL.

Rebuilding with -g and re-running, the whole matrix->tasks[] contains only zeroes at this point

Last fiddled with by fivemack on 2013-08-11 at 23:38
fivemack is offline   Reply With Quote
Old 2013-09-11, 16:43   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

Seconded.

My test was on r944 and under MPI.
Typical dumps from workers are :
Code:
[node-4-5:24546] [ 0] /lib64/libpthread.so.0 [0x368700eca0]
[node-4-5:24546] [ 1] ./ms(mul_packed_core+0x15) [0x43d6d5]
[node-4-5:24546] [ 2] ./ms [0x43c77e]
[node-4-5:24546] [ 3] ./ms(mul_sym_NxN_Nx64+0x6c) [0x43cc7c]
[node-4-5:24546] [ 4] ./ms [0x437eb7]
[node-4-5:24546] [ 5] ./ms(block_lanczos+0x33b) [0x43acfb]
[node-4-5:24546] [ 6] ./ms(nfs_solve_linear_system+0x438) [0x42f418]
[node-4-5:24546] [ 7] ./ms(factor_gnfs+0x966) [0x4178c6]
[node-4-5:24546] [ 8] ./ms(msieve_run+0x519) [0x408b39]
[node-4-5:24546] [ 9] ./ms(factor_integer+0x140) [0x406ed0]
[node-4-5:24546] [10] ./ms(main+0x7c9) [0x4078b9]
[node-4-5:24546] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x368641d9c4]
[node-4-5:24546] [12] ./ms [0x406c39]
[node-4-5:24546] *** End of error message ***
[node-4-5:24542] [ 0] /lib64/libpthread.so.0 [0x368700eca0]
[node-4-5:24542] [ 1] ./ms(mul_packed_core+0x15) [0x43d6d5]
[node-4-5:24542] [ 2] ./ms [0x43c77e]
[node-4-5:24542] [ 3] ./ms(mul_sym_NxN_Nx64+0x6c) [0x43cc7c]
[node-4-5:24542] [ 4] ./ms [0x437eb7]
[node-4-5:24542] [ 5] ./ms(block_lanczos+0x33b) [0x43acfb]
[node-4-5:24542] [ 6] ./ms(nfs_solve_linear_system+0x438) [0x42f418]
[node-4-5:24542] [ 7] ./ms(factor_gnfs+0x966) [0x4178c6]
[node-4-5:24542] [ 8] ./ms(msieve_run+0x519) [0x408b39]
[node-4-5:24542] [ 9] ./ms(factor_integer+0x140) [0x406ed0]
[node-4-5:24542] [10] ./ms(main+0x7c9) [0x4078b9]
[node-4-5:24542] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x368641d9c4]
[node-4-5:24542] [12] ./ms [0x406c39]
[node-4-5:24542] *** End of error message ***
Note: the matrix was built repeatedly and fine (that part is fine in r940+), but the crash happens immediately when LA iterations start.

Following on Greg's sage advice, I downgraded to r923, rebuilt, and the computation then went on without problems in either of the tried combinations (-nc2 4,8; -nc2 8,4; -nc2 8,4 -t 4).

The bug is between r923 and r940+.
Batalov is offline   Reply With Quote
Old 2013-09-11, 16:59   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

67258 Posts
Default

The current best guess is that there's a buffer overflow at the beginning of the LA that only became apparent in SVN 937, when initialization that could not afford to be overwritten was moved close by. I don't have the time to deal with it.
jasonp is offline   Reply With Quote
Old 2013-09-11, 17:22   #4
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

250516 Posts
Default

Understood.

This is just to leave a hint for anyone in the same shoes that for the time being r923 is fine.
Batalov is offline   Reply With Quote
Old 2013-09-20, 15:07   #5
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

102410 Posts
Default

Quote:
Originally Posted by Batalov View Post
Understood.

This is just to leave a hint for anyone in the same shoes that for the time being r923 is fine.
Uhm, that was
Code:
svn checkout -r 932 svn://svn.code.sf.net/p/msieve/code/trunk msieve-code
in case there are any other svn newbies, desperately seeking 932.
Matrix for 6p448 C289 is
Code:
Thu Sep 19 23:02:19 2013  weight of 37122087 cycles is about 4826664707 (130.02/cycle)
up from
Code:
 
Fri Mar  8 09:25:30 2013  weight of 30701948 cycles is about 3991401291 (130.00/cycle)
for 3m613 C285. -Bruce
bdodson is offline   Reply With Quote
Old 2013-09-21, 21:16   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

OK, I think I've got it.

I found an example which crashed repeatably with r944, and which was small enough to run under valgrind.

valgrind tells me that a memset operation to t->tmp_b was running off the end of the array.

When I annotated the code that allocated t->tmp_b and that did the write to it, I found that it was allocating size 64 and doing a write of size 3000; this is obviously not going to work.

But the allocation is of something at least as large as p->first_block_size, and the memset is of size p->first_block_size; so what's going wrong?

Answer: t->tmp_b is allocated (in matrix_thread_init around line 397 of lanczos_matmul0.c) as

Code:
        t->tmp_b = (uint64 *)xmalloc(MAX(64, p->first_block_size) *
					sizeof(uint64));
matrix_thread_init is called from packed_matrix_init, around line 465 of common/lanczos/lanczos_matmul0.c

But p->first_block_size isn't filled in until line 519 of the same function!

If I move the block from '/* start the thread pool */' to 'matrix_thread_init(p, num_threads-1)' inclusive to after the pack_matrix_core(p,A); at the end of packed_matrix_init, it stops blowing up for several test cases of mine.

Last fiddled with by fivemack on 2013-09-21 at 21:38
fivemack is offline   Reply With Quote
Old 2013-09-21, 23:40   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

Sorry, that broke the case where the matrix is too small to need packing, because an early-out path fired before the matrix_thread_init got called.

Instead of

Code:
        if (max_nrows <= MIN_NROWS_TO_PACK)
              return;
you need to set p->first_block_size unconditionally, do the packing only if max_nrows > MIN_NROWS_TO_PACK, and then start the thread pool unconditionally.

Last fiddled with by fivemack on 2013-09-21 at 23:44
fivemack is offline   Reply With Quote
Old 2013-09-22, 07:46   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72×131 Posts
Default

With the second version of my patch, I've run a hundred or so small factorisations overnight without issue.

Would someone with SVN privileges mind casting another eye over it and committing? It looks a bit large, but it's three trivial changes; it's just that one of them changed the indentation of quite a long block.

Code:
===================================================================
--- common/lanczos/lanczos_matmul0.c	(revision 944)
+++ common/lanczos/lanczos_matmul0.c	(working copy)
@@ -446,18 +446,6 @@
 
 	p->num_threads = num_threads = MIN(num_threads, MAX_THREADS);
 
-	/* start the thread pool */
-
-	control.init = matrix_thread_init;
-	control.shutdown = matrix_thread_free;
-	control.data = p;
-
-	if (num_threads > 1) {
-		p->threadpool = threadpool_init(num_threads - 1, 
-						200, &control);
-	}
-	matrix_thread_init(p, num_threads - 1);
-
 	/* pre-generate the structures to drive the thread pool;
 	   do this even for single-threaded runs */
 
@@ -469,63 +457,78 @@
 		p->tasks[i].task_num = i;
 	}
 
-	if (max_nrows <= MIN_NROWS_TO_PACK)
-		return;
+	p->first_block_size = first_block_size;
 
-	/* determine the block sizes. We assume that the largest
-	   cache in the system is unified and shared across all
-	   threads. When performing matrix multiplies 'A*x=b', 
-	   we choose the block size small enough so that one block
-	   of b fits in L1 cache, and choose the superblock size
-	   to be small enough so that a superblock's worth of x
-	   or b takes up 3/4 of the largest cache in the system.
-	   
-	   Making the block size too small increases memory use
-	   and puts more pressure on the larger caches, while
-	   making the superblock too small reduces the effectiveness
-	   of L1 cache and increases the synchronization overhead
-	   in multithreaded runs */
+	if (max_nrows > MIN_NROWS_TO_PACK)
+	  {
 
-	block_size = 8192;
-	superblock_size = 3 * obj->cache_size2 / (4 * sizeof(uint64));
+	    /* determine the block sizes. We assume that the largest
+	       cache in the system is unified and shared across all
+	       threads. When performing matrix multiplies 'A*x=b', 
+	       we choose the block size small enough so that one block
+	       of b fits in L1 cache, and choose the superblock size
+	       to be small enough so that a superblock's worth of x
+	       or b takes up 3/4 of the largest cache in the system.
+	       
+	       Making the block size too small increases memory use
+	       and puts more pressure on the larger caches, while
+	       making the superblock too small reduces the effectiveness
+	       of L1 cache and increases the synchronization overhead
+	       in multithreaded runs */
+	    
+	    block_size = 8192;
+	    superblock_size = 3 * obj->cache_size2 / (4 * sizeof(uint64));
 
-	/* possibly override from the command line */
+	    /* possibly override from the command line */
 
-	if (obj->nfs_args != NULL) {
+	    if (obj->nfs_args != NULL) {
+	      
+	      const char *tmp;
+	      
+	      tmp = strstr(obj->nfs_args, "la_block=");
+	      if (tmp != NULL)
+		block_size = atoi(tmp + 9);
+	      
+	      tmp = strstr(obj->nfs_args, "la_superblock=");
+	      if (tmp != NULL)
+		superblock_size = atoi(tmp + 14);
+	    }
 
-		const char *tmp;
+	    logprintf(obj, "using block size %u and superblock size %u for "
+		      "processor cache size %u kB\n", 
+		      block_size, superblock_size,
+		      obj->cache_size2 / 1024);
+	    
+	    p->unpacked_cols = NULL;
+	    
+	    p->block_size = block_size;
+	    p->num_block_cols = (ncols + block_size - 1) / block_size;
+	    p->num_block_rows = 1 + (nrows - first_block_size + 
+				     block_size - 1) / block_size;
+	    
+	    p->superblock_size = (superblock_size + block_size - 1) / block_size;
+	    p->num_superblock_cols = (p->num_block_cols + p->superblock_size - 1) / 
+	      p->superblock_size;
+	    p->num_superblock_rows = (p->num_block_rows - 1 + p->superblock_size - 1) / 
+	      p->superblock_size;
+	    
+	    /* do the core work of packing the matrix */
+	    
+	    pack_matrix_core(p, A);
+	  }
 
-		tmp = strstr(obj->nfs_args, "la_block=");
-		if (tmp != NULL)
-			block_size = atoi(tmp + 9);
+	/* start the thread pool */
 
-		tmp = strstr(obj->nfs_args, "la_superblock=");
-		if (tmp != NULL)
-			superblock_size = atoi(tmp + 14);
+	control.init = matrix_thread_init;
+	control.shutdown = matrix_thread_free;
+	control.data = p;
+
+	if (num_threads > 1) {
+		p->threadpool = threadpool_init(num_threads - 1, 
+						200, &control);
 	}
+	matrix_thread_init(p, num_threads - 1);
 
-	logprintf(obj, "using block size %u and superblock size %u for "
-			"processor cache size %u kB\n", 
-				block_size, superblock_size,
-				obj->cache_size2 / 1024);
-
-	p->unpacked_cols = NULL;
-	p->first_block_size = first_block_size;
-
-	p->block_size = block_size;
-	p->num_block_cols = (ncols + block_size - 1) / block_size;
-	p->num_block_rows = 1 + (nrows - first_block_size + 
-				block_size - 1) / block_size;
-
-	p->superblock_size = (superblock_size + block_size - 1) / block_size;
-	p->num_superblock_cols = (p->num_block_cols + p->superblock_size - 1) / 
-					p->superblock_size;
-	p->num_superblock_rows = (p->num_block_rows - 1 + p->superblock_size - 1) / 
-					p->superblock_size;
-
-	/* do the core work of packing the matrix */
-
-	pack_matrix_core(p, A);
 }
fivemack is offline   Reply With Quote
Old 2013-09-22, 09:40   #9
debrouxl
 
debrouxl's Avatar
 
Sep 2009

3D116 Posts
Default

AFAICS, your patch fixes the problem reported at http://www.mersenneforum.org/showthread.php?t=18481 , for which I compiled msieve with AddressSanitizer, showing a double free.
debrouxl is offline   Reply With Quote
Old 2013-09-22, 15:16   #10
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Makes perfect sense, sorry for not paying attention.

Would it also work to move the setting of first_block_size to the unconditional initialization at the top of packed_matrix_init and leave everything else alone?
i.e. the patch would be
Code:
--- common/lanczos/lanczos_matmul0.c	(revision 944)
+++ common/lanczos/lanczos_matmul0.c	(working copy)
@@ -428,6 +428,7 @@
 	p->start_col = start_col;
 	p->num_dense_rows = num_dense_rows;
 	p->num_threads = 1;
+	p->first_block_size = first_block_size; /* needed for thread pool init */
 #ifdef HAVE_MPI
 	p->mpi_size = obj->mpi_size;
 	p->mpi_nrows = obj->mpi_nrows;
@@ -510,7 +511,6 @@
 				obj->cache_size2 / 1024);
 
 	p->unpacked_cols = NULL;
-	p->first_block_size = first_block_size;
 
 	p->block_size = block_size;
 	p->num_block_cols = (ncols + block_size - 1) / block_size;
jasonp is offline   Reply With Quote
Old 2013-09-22, 18:51   #11
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001000100112 Posts
Default

Yes, that one-line change looks equivalent
fivemack is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem in running msieve with CUDA mohamed Msieve 20 2013-08-01 08:27
Problem with Msieve 1.50 pgreenw Msieve 1 2012-05-21 03:42
Msieve: problem linking with other C library srawlins Msieve 6 2010-04-02 02:05
Problem compiling msieve with gcc 4.4.2 chris2be8 Msieve 5 2010-02-22 18:35
MSieve - problem with 106-digit effort schickel Msieve 5 2006-08-31 03:19

All times are UTC. The time now is 00:50.


Sat Jul 17 00:50:51 UTC 2021 up 49 days, 22:38, 1 user, load averages: 1.86, 1.59, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.