mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2015-02-26, 16:32   #34
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default CUDA datedness

The latest CUDA release, on Linux at least, no longer supports compute_1.0 and compute_1[1-3] architectures are deprecated. Symptoms are as follows
Code:
"/opt/cuda/bin/nvcc" -arch sm_11 -ptx -o stage1_core_sm11.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
"/opt/cuda/bin/nvcc" -arch sm_13 -ptx -o stage1_core_sm13.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
"/opt/cuda/bin/nvcc" -arch sm_20 -ptx -o stage1_core_sm20.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu
cd b40c && make WIN=0 && cd ..
make[1]: Entering directory '/home/pcl/nums/msieve-code/trunk/b40c'
"/opt/cuda/bin/nvcc" -gencode=arch=compute_10,code=\"sm_10,compute_10\"  -o sort_engine_sm10.so sort_engine.cu -Xptxas -v -Xcudafe -# -shared -Xptxas -abi=no -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -I"/opt/cuda/bin/..//include" -I.  -O3  
nvcc fatal   : Unsupported gpu architecture 'compute_10'
Makefile:42: recipe for target 'sort_engine_sm10.so' failed
make[1]: *** [sort_engine_sm10.so] Error 1
make[1]: Leaving directory '/home/pcl/nums/msieve-code/trunk/b40c'
Makefile:314: recipe for target 'b40c/built' failed
This patch patch seems to do the job.
Code:
*** b40c/Makefile~	2015-02-26 16:27:50.849912933 +0000
--- b40c/Makefile	2015-02-26 16:25:16.261044966 +0000
***************
*** 32,45 ****
  
  LIBNAME = sort_engine
  
! all: $(LIBNAME)_sm10.$(EXT) $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT)
  	touch built
  
  clean :
  	rm -f  *.$(EXT) *.lib *.exp *.dll built
  
! $(LIBNAME)_sm10.$(EXT) : $(DEPS)
! 	$(NVCC) $(GEN_SM10) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3  
  
  $(LIBNAME)_sm13.$(EXT) : $(DEPS)
  	$(NVCC) $(GEN_SM13) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3  
--- 32,46 ----
  
  LIBNAME = sort_engine
  
! #all: $(LIBNAME)_sm10.$(EXT) $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT)
! all:  $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT)
  	touch built
  
  clean :
  	rm -f  *.$(EXT) *.lib *.exp *.dll built
  
! # $(LIBNAME)_sm10.$(EXT) : $(DEPS)
! #	$(NVCC) $(GEN_SM10) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3  
  
  $(LIBNAME)_sm13.$(EXT) : $(DEPS)
  	$(NVCC) $(GEN_SM13) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3
I don't have write access to the svn repository so have been unable to commit the changes.
xilman is offline   Reply With Quote
Old 2015-02-27, 14:06   #35
BfoX
 
Feb 2015

32 Posts
Default

Use compute/sm30 or compute/sm35. File produced by 35 is same as compute/sm50.
BfoX is offline   Reply With Quote
Old 2015-03-09, 15:09   #36
BfoX
 
Feb 2015

916 Posts
Default

also need to correct some files in \b40c\util\ folder - the device properties with new SM architectural version. sample - 'cuda_properties.cuh' file, etc.

Last fiddled with by BfoX on 2015-03-09 at 15:11
BfoX is offline   Reply With Quote
Old 2015-03-19, 08:08   #37
poily
 
Nov 2010

2×52 Posts
Default

Recently I found a nasty bug in latest msieve LA code. The bug affects large jobs with unusual number of dense rows. The effect differs from platform to platform: it may cause immediate crash, non-invertible submatrix or corrupt state on some iteration, or even worse - the LA may finish with dependencies unusable on the SQRT stage.

The patch below explains the bug and my solution. One may use double MAX instead of 1+ to save memory a little bit.
Code:
Index: common/lanczos/lanczos_matmul0.c
===================================================================
--- common/lanczos/lanczos_matmul0.c    (revision 980)
+++ common/lanczos/lanczos_matmul0.c    (working copy)
@@ -391,7 +391,7 @@
           and vector-vector operations; it has to be large enough
           to support both */
 
-       t->tmp_b = (uint64 *)xmalloc(MAX(64, p->first_block_size) *
+       t->tmp_b = (uint64 *)xmalloc(MAX(64*(1+(p->num_dense_rows + 63) / 64), p->first_block_size) *
                                        sizeof(uint64));
 }
 
Index: common/lanczos/lanczos_matmul1.c
===================================================================
--- common/lanczos/lanczos_matmul1.c    (revision 980)
+++ common/lanczos/lanczos_matmul1.c    (working copy)
@@ -361,7 +361,7 @@
        packed_block_t *curr_block = p->blocks + block_off;
        uint32 i;
 
-       memset(b, 0, p->first_block_size * sizeof(uint64));
+       memset(b, 0, MAX(64*(1+(p->num_dense_rows + 63) / 64),p->first_block_size) * sizeof(uint64));
 
        if (p->num_threads == 1) {
                vsize = p->ncols;
Maybe it's also the time we fix msieve line siever and remove artificial limitations on polynomial degrees of the form (d,1)? That might be helpful for DLOG sieving experiments.
poily is offline   Reply With Quote
Old 2015-03-19, 16:02   #38
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

Thanks. So you're working with problems that have more than 2000 dense rows?!

The line sieve crash was recently fixed by another user.
jasonp is offline   Reply With Quote
Old 2015-03-19, 16:24   #39
poily
 
Nov 2010

2·52 Posts
Default

Not exactly. But I had a situation when (p->num_dense_rows + 63) / 64 =2 and p->first_block_size<128.
poily is offline   Reply With Quote
Old 2015-03-19, 16:28   #40
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110101012 Posts
Default

I forgot that first_block_size is split across MPI rows. Ouch. Committing now.

Last fiddled with by jasonp on 2015-03-19 at 16:29 Reason: columns -> rows
jasonp is offline   Reply With Quote
Old 2015-03-23, 16:15   #41
poily
 
Nov 2010

2·52 Posts
Default

Looks like the solution I gave was not complete: I forgot about the tmp_b accumulation at the end of mul_packed. This needs to be fixed too.
poily is offline   Reply With Quote
Old 2015-03-30, 10:32   #42
poily
 
Nov 2010

2·52 Posts
Default

Here's the fix to the problem I mentioned before in case anyone is interested:

Code:
--- common/lanczos/lanczos_matmul0.c	(revision 984)
+++ common/lanczos/lanczos_matmul0.c	(working copy)
@@ -135,12 +135,12 @@
 	/* xor the small vectors from each thread */
 
 	memcpy(b, matrix->thread_data[0].tmp_b, 
-			matrix->first_block_size *
+			MAX(matrix->first_block_size,64 * ((matrix->num_dense_rows + 63) / 64)) *
 			sizeof(uint64));
 
 	for (i = 1; i < matrix->num_threads; i++) {
 		accum_xor(b, matrix->thread_data[i].tmp_b, 
-				matrix->first_block_size);
+				MAX(matrix->first_block_size,64 * ((matrix->num_dense_rows + 63) / 64)));
 	}
poily is offline   Reply With Quote
Old 2015-05-05, 18:15   #43
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Just received this from Msieve v. 1.53 (SVN 984) at the end of the filtering. It's running on a 64-bit Fedora 15 machine with 32G RAM and a dual "Quad-Core AMD Opteron(tm) Processor 2380" according to /proc/cpuinfo. Tom Womack may recognize this system ...
Code:
relations with 7+ large ideals: 11064057
commencing 2-way merge
reduce to 14427305 relation sets and 13976685 unique ideals
ignored 8 oversize relation sets
commencing full merge
Return value 132. Terminating...
I vaguely remember a report of an error condition like this some time ago but can't find the details.


Any ideas?
xilman is offline   Reply With Quote
Old 2015-05-06, 07:07   #44
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

1075310 Posts
Default

Found it, buried deeply in the Makefile is
Code:
# gcc with basic optimization (-march flag could
# get overridden by architecture-specific builds)
CC = gcc
WARN_FLAGS = -Wall -W
OPT_FLAGS = -O3 -fomit-frame-pointer -march=core2 \
	    -D_FILE_OFFSET_BITS=64 -DNDEBUG -D_LARGEFILE64_SOURCE

# use := instead of = so we only run the following once
SVN_VERSION := $(shell svnversion .)
ifeq ($(SVN_VERSION),)
	SVN_VERSION := unknown
endif

CFLAGS = $(OPT_FLAGS) $(MACHINE_FLAGS) $(WARN_FLAGS) \
	 	-DMSIEVE_SVN_VERSION="\"$(SVN_VERSION)\"" \
		-I. -Iaprcl -Iinclude -Ignfs -Ignfs/poly -Ignfs/poly/stage1
and, of course, this is not a core2 machine and there are no subsequent definitions of $(MACHINE_FLAGS).

When round tuits are back in season, could this be addressed please? Possibly by adding targets which set that variable appropriately. Who knows, I may even be able to do so myself ...

I knew I'd seen this problem before --- probably when msieve was built on another of my AMD systems.
xilman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.50 feedback firejuggler Msieve 99 2013-02-17 11:53
Msieve v1.48 feedback Jeff Gilchrist Msieve 48 2011-06-10 18:18
Msieve 1.43 feedback Jeff Gilchrist Msieve 47 2009-11-24 15:53
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 01:12.


Sat Jul 17 01:12:22 UTC 2021 up 49 days, 22:59, 1 user, load averages: 0.54, 1.10, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.