![]() |
|
|
#1 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Doing the linear algebra for a medium-sized SNFS job on my Haswell, with a freshly-compiled SVN940 and running -t4, I get a segfault just as the Lanczos starts
Code:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe9f38700 (LWP 615)] 0x000000000043e9c9 in mul_packed_small_core () (gdb) bt #0 0x000000000043e9c9 in mul_packed_small_core () #1 0x000000000044a5b6 in worker_thr_routine () #2 0x00007ffff7235f8e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007ffff6f5fe1d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Rebuilding with -g and re-running, the whole matrix->tasks[] contains only zeroes at this point Last fiddled with by fivemack on 2013-08-11 at 23:38 |
|
|
|
|
|
#2 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36×13 Posts |
Seconded.
My test was on r944 and under MPI. Typical dumps from workers are : Code:
[node-4-5:24546] [ 0] /lib64/libpthread.so.0 [0x368700eca0] [node-4-5:24546] [ 1] ./ms(mul_packed_core+0x15) [0x43d6d5] [node-4-5:24546] [ 2] ./ms [0x43c77e] [node-4-5:24546] [ 3] ./ms(mul_sym_NxN_Nx64+0x6c) [0x43cc7c] [node-4-5:24546] [ 4] ./ms [0x437eb7] [node-4-5:24546] [ 5] ./ms(block_lanczos+0x33b) [0x43acfb] [node-4-5:24546] [ 6] ./ms(nfs_solve_linear_system+0x438) [0x42f418] [node-4-5:24546] [ 7] ./ms(factor_gnfs+0x966) [0x4178c6] [node-4-5:24546] [ 8] ./ms(msieve_run+0x519) [0x408b39] [node-4-5:24546] [ 9] ./ms(factor_integer+0x140) [0x406ed0] [node-4-5:24546] [10] ./ms(main+0x7c9) [0x4078b9] [node-4-5:24546] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x368641d9c4] [node-4-5:24546] [12] ./ms [0x406c39] [node-4-5:24546] *** End of error message *** [node-4-5:24542] [ 0] /lib64/libpthread.so.0 [0x368700eca0] [node-4-5:24542] [ 1] ./ms(mul_packed_core+0x15) [0x43d6d5] [node-4-5:24542] [ 2] ./ms [0x43c77e] [node-4-5:24542] [ 3] ./ms(mul_sym_NxN_Nx64+0x6c) [0x43cc7c] [node-4-5:24542] [ 4] ./ms [0x437eb7] [node-4-5:24542] [ 5] ./ms(block_lanczos+0x33b) [0x43acfb] [node-4-5:24542] [ 6] ./ms(nfs_solve_linear_system+0x438) [0x42f418] [node-4-5:24542] [ 7] ./ms(factor_gnfs+0x966) [0x4178c6] [node-4-5:24542] [ 8] ./ms(msieve_run+0x519) [0x408b39] [node-4-5:24542] [ 9] ./ms(factor_integer+0x140) [0x406ed0] [node-4-5:24542] [10] ./ms(main+0x7c9) [0x4078b9] [node-4-5:24542] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x368641d9c4] [node-4-5:24542] [12] ./ms [0x406c39] [node-4-5:24542] *** End of error message *** Following on Greg's sage advice, I downgraded to r923, rebuilt, and the computation then went on without problems in either of the tried combinations (-nc2 4,8; -nc2 8,4; -nc2 8,4 -t 4). The bug is between r923 and r940+. |
|
|
|
|
|
#3 |
|
Tribal Bullet
Oct 2004
67258 Posts |
The current best guess is that there's a buffer overflow at the beginning of the LA that only became apparent in SVN 937, when initialization that could not afford to be overwritten was moved close by. I don't have the time to deal with it.
|
|
|
|
|
|
#4 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
250516 Posts |
Understood.
This is just to leave a hint for anyone in the same shoes that for the time being r923 is fine. |
|
|
|
|
|
#5 | |
|
Jun 2005
lehigh.edu
102410 Posts |
Quote:
Code:
svn checkout -r 932 svn://svn.code.sf.net/p/msieve/code/trunk msieve-code Matrix for 6p448 C289 is Code:
Thu Sep 19 23:02:19 2013 weight of 37122087 cycles is about 4826664707 (130.02/cycle) Code:
Fri Mar 8 09:25:30 2013 weight of 30701948 cycles is about 3991401291 (130.00/cycle) |
|
|
|
|
|
|
#6 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
OK, I think I've got it.
I found an example which crashed repeatably with r944, and which was small enough to run under valgrind. valgrind tells me that a memset operation to t->tmp_b was running off the end of the array. When I annotated the code that allocated t->tmp_b and that did the write to it, I found that it was allocating size 64 and doing a write of size 3000; this is obviously not going to work. But the allocation is of something at least as large as p->first_block_size, and the memset is of size p->first_block_size; so what's going wrong? Answer: t->tmp_b is allocated (in matrix_thread_init around line 397 of lanczos_matmul0.c) as Code:
t->tmp_b = (uint64 *)xmalloc(MAX(64, p->first_block_size) * sizeof(uint64)); But p->first_block_size isn't filled in until line 519 of the same function! If I move the block from '/* start the thread pool */' to 'matrix_thread_init(p, num_threads-1)' inclusive to after the pack_matrix_core(p,A); at the end of packed_matrix_init, it stops blowing up for several test cases of mine. Last fiddled with by fivemack on 2013-09-21 at 21:38 |
|
|
|
|
|
#7 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
Sorry, that broke the case where the matrix is too small to need packing, because an early-out path fired before the matrix_thread_init got called.
Instead of Code:
if (max_nrows <= MIN_NROWS_TO_PACK)
return;
Last fiddled with by fivemack on 2013-09-21 at 23:44 |
|
|
|
|
|
#8 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
With the second version of my patch, I've run a hundred or so small factorisations overnight without issue.
Would someone with SVN privileges mind casting another eye over it and committing? It looks a bit large, but it's three trivial changes; it's just that one of them changed the indentation of quite a long block. Code:
===================================================================
--- common/lanczos/lanczos_matmul0.c (revision 944)
+++ common/lanczos/lanczos_matmul0.c (working copy)
@@ -446,18 +446,6 @@
p->num_threads = num_threads = MIN(num_threads, MAX_THREADS);
- /* start the thread pool */
-
- control.init = matrix_thread_init;
- control.shutdown = matrix_thread_free;
- control.data = p;
-
- if (num_threads > 1) {
- p->threadpool = threadpool_init(num_threads - 1,
- 200, &control);
- }
- matrix_thread_init(p, num_threads - 1);
-
/* pre-generate the structures to drive the thread pool;
do this even for single-threaded runs */
@@ -469,63 +457,78 @@
p->tasks[i].task_num = i;
}
- if (max_nrows <= MIN_NROWS_TO_PACK)
- return;
+ p->first_block_size = first_block_size;
- /* determine the block sizes. We assume that the largest
- cache in the system is unified and shared across all
- threads. When performing matrix multiplies 'A*x=b',
- we choose the block size small enough so that one block
- of b fits in L1 cache, and choose the superblock size
- to be small enough so that a superblock's worth of x
- or b takes up 3/4 of the largest cache in the system.
-
- Making the block size too small increases memory use
- and puts more pressure on the larger caches, while
- making the superblock too small reduces the effectiveness
- of L1 cache and increases the synchronization overhead
- in multithreaded runs */
+ if (max_nrows > MIN_NROWS_TO_PACK)
+ {
- block_size = 8192;
- superblock_size = 3 * obj->cache_size2 / (4 * sizeof(uint64));
+ /* determine the block sizes. We assume that the largest
+ cache in the system is unified and shared across all
+ threads. When performing matrix multiplies 'A*x=b',
+ we choose the block size small enough so that one block
+ of b fits in L1 cache, and choose the superblock size
+ to be small enough so that a superblock's worth of x
+ or b takes up 3/4 of the largest cache in the system.
+
+ Making the block size too small increases memory use
+ and puts more pressure on the larger caches, while
+ making the superblock too small reduces the effectiveness
+ of L1 cache and increases the synchronization overhead
+ in multithreaded runs */
+
+ block_size = 8192;
+ superblock_size = 3 * obj->cache_size2 / (4 * sizeof(uint64));
- /* possibly override from the command line */
+ /* possibly override from the command line */
- if (obj->nfs_args != NULL) {
+ if (obj->nfs_args != NULL) {
+
+ const char *tmp;
+
+ tmp = strstr(obj->nfs_args, "la_block=");
+ if (tmp != NULL)
+ block_size = atoi(tmp + 9);
+
+ tmp = strstr(obj->nfs_args, "la_superblock=");
+ if (tmp != NULL)
+ superblock_size = atoi(tmp + 14);
+ }
- const char *tmp;
+ logprintf(obj, "using block size %u and superblock size %u for "
+ "processor cache size %u kB\n",
+ block_size, superblock_size,
+ obj->cache_size2 / 1024);
+
+ p->unpacked_cols = NULL;
+
+ p->block_size = block_size;
+ p->num_block_cols = (ncols + block_size - 1) / block_size;
+ p->num_block_rows = 1 + (nrows - first_block_size +
+ block_size - 1) / block_size;
+
+ p->superblock_size = (superblock_size + block_size - 1) / block_size;
+ p->num_superblock_cols = (p->num_block_cols + p->superblock_size - 1) /
+ p->superblock_size;
+ p->num_superblock_rows = (p->num_block_rows - 1 + p->superblock_size - 1) /
+ p->superblock_size;
+
+ /* do the core work of packing the matrix */
+
+ pack_matrix_core(p, A);
+ }
- tmp = strstr(obj->nfs_args, "la_block=");
- if (tmp != NULL)
- block_size = atoi(tmp + 9);
+ /* start the thread pool */
- tmp = strstr(obj->nfs_args, "la_superblock=");
- if (tmp != NULL)
- superblock_size = atoi(tmp + 14);
+ control.init = matrix_thread_init;
+ control.shutdown = matrix_thread_free;
+ control.data = p;
+
+ if (num_threads > 1) {
+ p->threadpool = threadpool_init(num_threads - 1,
+ 200, &control);
}
+ matrix_thread_init(p, num_threads - 1);
- logprintf(obj, "using block size %u and superblock size %u for "
- "processor cache size %u kB\n",
- block_size, superblock_size,
- obj->cache_size2 / 1024);
-
- p->unpacked_cols = NULL;
- p->first_block_size = first_block_size;
-
- p->block_size = block_size;
- p->num_block_cols = (ncols + block_size - 1) / block_size;
- p->num_block_rows = 1 + (nrows - first_block_size +
- block_size - 1) / block_size;
-
- p->superblock_size = (superblock_size + block_size - 1) / block_size;
- p->num_superblock_cols = (p->num_block_cols + p->superblock_size - 1) /
- p->superblock_size;
- p->num_superblock_rows = (p->num_block_rows - 1 + p->superblock_size - 1) /
- p->superblock_size;
-
- /* do the core work of packing the matrix */
-
- pack_matrix_core(p, A);
}
|
|
|
|
|
|
#9 |
|
Sep 2009
3D116 Posts |
AFAICS, your patch fixes the problem reported at http://www.mersenneforum.org/showthread.php?t=18481 , for which I compiled msieve with AddressSanitizer, showing a double free.
|
|
|
|
|
|
#10 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Makes perfect sense, sorry for not paying attention.
Would it also work to move the setting of first_block_size to the unconditional initialization at the top of packed_matrix_init and leave everything else alone? i.e. the patch would be Code:
--- common/lanczos/lanczos_matmul0.c (revision 944) +++ common/lanczos/lanczos_matmul0.c (working copy) @@ -428,6 +428,7 @@ p->start_col = start_col; p->num_dense_rows = num_dense_rows; p->num_threads = 1; + p->first_block_size = first_block_size; /* needed for thread pool init */ #ifdef HAVE_MPI p->mpi_size = obj->mpi_size; p->mpi_nrows = obj->mpi_nrows; @@ -510,7 +511,6 @@ obj->cache_size2 / 1024); p->unpacked_cols = NULL; - p->first_block_size = first_block_size; p->block_size = block_size; p->num_block_cols = (ncols + block_size - 1) / block_size; |
|
|
|
|
|
#11 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
11001000100112 Posts |
Yes, that one-line change looks equivalent
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Problem in running msieve with CUDA | mohamed | Msieve | 20 | 2013-08-01 08:27 |
| Problem with Msieve 1.50 | pgreenw | Msieve | 1 | 2012-05-21 03:42 |
| Msieve: problem linking with other C library | srawlins | Msieve | 6 | 2010-04-02 02:05 |
| Problem compiling msieve with gcc 4.4.2 | chris2be8 | Msieve | 5 | 2010-02-22 18:35 |
| MSieve - problem with 106-digit effort | schickel | Msieve | 5 | 2006-08-31 03:19 |