![]() |
|
|
#34 |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
1075310 Posts |
The latest CUDA release, on Linux at least, no longer supports compute_1.0 and compute_1[1-3] architectures are deprecated. Symptoms are as follows
Code:
"/opt/cuda/bin/nvcc" -arch sm_11 -ptx -o stage1_core_sm11.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release. "/opt/cuda/bin/nvcc" -arch sm_13 -ptx -o stage1_core_sm13.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release. "/opt/cuda/bin/nvcc" -arch sm_20 -ptx -o stage1_core_sm20.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu cd b40c && make WIN=0 && cd .. make[1]: Entering directory '/home/pcl/nums/msieve-code/trunk/b40c' "/opt/cuda/bin/nvcc" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -o sort_engine_sm10.so sort_engine.cu -Xptxas -v -Xcudafe -# -shared -Xptxas -abi=no -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -I"/opt/cuda/bin/..//include" -I. -O3 nvcc fatal : Unsupported gpu architecture 'compute_10' Makefile:42: recipe for target 'sort_engine_sm10.so' failed make[1]: *** [sort_engine_sm10.so] Error 1 make[1]: Leaving directory '/home/pcl/nums/msieve-code/trunk/b40c' Makefile:314: recipe for target 'b40c/built' failed Code:
*** b40c/Makefile~ 2015-02-26 16:27:50.849912933 +0000 --- b40c/Makefile 2015-02-26 16:25:16.261044966 +0000 *************** *** 32,45 **** LIBNAME = sort_engine ! all: $(LIBNAME)_sm10.$(EXT) $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT) touch built clean : rm -f *.$(EXT) *.lib *.exp *.dll built ! $(LIBNAME)_sm10.$(EXT) : $(DEPS) ! $(NVCC) $(GEN_SM10) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 $(LIBNAME)_sm13.$(EXT) : $(DEPS) $(NVCC) $(GEN_SM13) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 --- 32,46 ---- LIBNAME = sort_engine ! #all: $(LIBNAME)_sm10.$(EXT) $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT) ! all: $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT) touch built clean : rm -f *.$(EXT) *.lib *.exp *.dll built ! # $(LIBNAME)_sm10.$(EXT) : $(DEPS) ! # $(NVCC) $(GEN_SM10) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 $(LIBNAME)_sm13.$(EXT) : $(DEPS) $(NVCC) $(GEN_SM13) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 |
|
|
|
|
|
#35 |
|
Feb 2015
10012 Posts |
Use compute/sm30 or compute/sm35. File produced by 35 is same as compute/sm50.
|
|
|
|
|
|
#36 |
|
Feb 2015
32 Posts |
also need to correct some files in \b40c\util\ folder - the device properties with new SM architectural version. sample - 'cuda_properties.cuh' file, etc.
Last fiddled with by BfoX on 2015-03-09 at 15:11 |
|
|
|
|
|
#37 |
|
Nov 2010
1100102 Posts |
Recently I found a nasty bug in latest msieve LA code. The bug affects large jobs with unusual number of dense rows. The effect differs from platform to platform: it may cause immediate crash, non-invertible submatrix or corrupt state on some iteration, or even worse - the LA may finish with dependencies unusable on the SQRT stage.
The patch below explains the bug and my solution. One may use double MAX instead of 1+ to save memory a little bit. Code:
Index: common/lanczos/lanczos_matmul0.c
===================================================================
--- common/lanczos/lanczos_matmul0.c (revision 980)
+++ common/lanczos/lanczos_matmul0.c (working copy)
@@ -391,7 +391,7 @@
and vector-vector operations; it has to be large enough
to support both */
- t->tmp_b = (uint64 *)xmalloc(MAX(64, p->first_block_size) *
+ t->tmp_b = (uint64 *)xmalloc(MAX(64*(1+(p->num_dense_rows + 63) / 64), p->first_block_size) *
sizeof(uint64));
}
Index: common/lanczos/lanczos_matmul1.c
===================================================================
--- common/lanczos/lanczos_matmul1.c (revision 980)
+++ common/lanczos/lanczos_matmul1.c (working copy)
@@ -361,7 +361,7 @@
packed_block_t *curr_block = p->blocks + block_off;
uint32 i;
- memset(b, 0, p->first_block_size * sizeof(uint64));
+ memset(b, 0, MAX(64*(1+(p->num_dense_rows + 63) / 64),p->first_block_size) * sizeof(uint64));
if (p->num_threads == 1) {
vsize = p->ncols;
|
|
|
|
|
|
#38 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Thanks. So you're working with problems that have more than 2000 dense rows?!
The line sieve crash was recently fixed by another user. |
|
|
|
|
|
#39 |
|
Nov 2010
5010 Posts |
Not exactly. But I had a situation when (p->num_dense_rows + 63) / 64 =2 and p->first_block_size<128.
|
|
|
|
|
|
#40 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
I forgot that first_block_size is split across MPI rows. Ouch. Committing now.
Last fiddled with by jasonp on 2015-03-19 at 16:29 Reason: columns -> rows |
|
|
|
|
|
#41 |
|
Nov 2010
5010 Posts |
Looks like the solution I gave was not complete: I forgot about the tmp_b accumulation at the end of mul_packed. This needs to be fixed too.
|
|
|
|
|
|
#42 |
|
Nov 2010
2×52 Posts |
Here's the fix to the problem I mentioned before in case anyone is interested:
Code:
--- common/lanczos/lanczos_matmul0.c (revision 984)
+++ common/lanczos/lanczos_matmul0.c (working copy)
@@ -135,12 +135,12 @@
/* xor the small vectors from each thread */
memcpy(b, matrix->thread_data[0].tmp_b,
- matrix->first_block_size *
+ MAX(matrix->first_block_size,64 * ((matrix->num_dense_rows + 63) / 64)) *
sizeof(uint64));
for (i = 1; i < matrix->num_threads; i++) {
accum_xor(b, matrix->thread_data[i].tmp_b,
- matrix->first_block_size);
+ MAX(matrix->first_block_size,64 * ((matrix->num_dense_rows + 63) / 64)));
}
|
|
|
|
|
|
#43 |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
10,753 Posts |
Just received this from Msieve v. 1.53 (SVN 984) at the end of the filtering. It's running on a 64-bit Fedora 15 machine with 32G RAM and a dual "Quad-Core AMD Opteron(tm) Processor 2380" according to /proc/cpuinfo. Tom Womack may recognize this system ...
Code:
relations with 7+ large ideals: 11064057 commencing 2-way merge reduce to 14427305 relation sets and 13976685 unique ideals ignored 8 oversize relation sets commencing full merge Return value 132. Terminating... Any ideas? |
|
|
|
|
|
#44 |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
10,753 Posts |
Found it, buried deeply in the Makefile is
Code:
# gcc with basic optimization (-march flag could # get overridden by architecture-specific builds) CC = gcc WARN_FLAGS = -Wall -W OPT_FLAGS = -O3 -fomit-frame-pointer -march=core2 \ -D_FILE_OFFSET_BITS=64 -DNDEBUG -D_LARGEFILE64_SOURCE # use := instead of = so we only run the following once SVN_VERSION := $(shell svnversion .) ifeq ($(SVN_VERSION),) SVN_VERSION := unknown endif CFLAGS = $(OPT_FLAGS) $(MACHINE_FLAGS) $(WARN_FLAGS) \ -DMSIEVE_SVN_VERSION="\"$(SVN_VERSION)\"" \ -I. -Iaprcl -Iinclude -Ignfs -Ignfs/poly -Ignfs/poly/stage1 When round tuits are back in season, could this be addressed please? Possibly by adding targets which set that variable appropriately. Who knows, I may even be able to do so myself ... I knew I'd seen this problem before --- probably when msieve was built on another of my AMD systems. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Msieve 1.50 feedback | firejuggler | Msieve | 99 | 2013-02-17 11:53 |
| Msieve v1.48 feedback | Jeff Gilchrist | Msieve | 48 | 2011-06-10 18:18 |
| Msieve 1.43 feedback | Jeff Gilchrist | Msieve | 47 | 2009-11-24 15:53 |
| Msieve 1.42 feedback | Andi47 | Msieve | 167 | 2009-10-18 19:37 |
| Msieve 1.41 Feedback | Batalov | Msieve | 130 | 2009-06-09 16:01 |