![]() |
|
|
#3598 |
|
Mar 2017
Germany, Wolfsburg
22 Posts |
The main difference is that the lib file "libcudart.so.12" included in the zip, is now loaded from local directory. The Makefile for this is (LD = clang -Wl,-rpath,.):
# where is the CUDA Toolkit installed? CUDA_DIR = /usr/local/cuda CUDA_INCLUDE = -I$(CUDA_DIR)/include/ CUDA_LIB = -L$(CUDA_DIR)/lib64/ # Compiler settings for .c files (CPU) CC = clang -static-libgcc -static-libstdc++ CFLAGS = -Wall -Wextra -O2 $(CUDA_INCLUDE) -malign-double CFLAGS_EXTRA_SIEVE = -funroll-all-loops # Compiler settings for .cu files (CPU/GPU) NVCC = nvcc NVCCFLAGS = $(CUDA_INCLUDE) --ptxas-options=-v # generate code for various compute capabilities #NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc) #NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all! #NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code #NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc #NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code NVCCFLAGS += --generate-code arch=compute_61,code=sm_61 NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 NVCCFLAGS += --generate-code arch=compute_86,code=sm_86 # CC 5.x GPUs will use this code NVCCFLAGS += --generate-code arch=compute_89,code=sm_89 NVCCFLAGS += --generate-code arch=compute_90,code=sm_90 # pass some options to the C host compiler (e.g. gcc on Linux) NVCCFLAGS += --compiler-options=-Wall # Linker LD = clang -Wl,-rpath,. LDFLAGS = -fPIC $(CUDA_LIB) -lcudart -lm -lstdc++ ############################################################################## CSRC = sieve.c timer.c parse.c read_config.c mfaktc.c checkpoint.c \ signal_handler.c output.c CUSRC = tf_72bit.cu tf_96bit.cu tf_barrett96.cu tf_barrett96_gs.cu gpusieve.cu COBJS = $(CSRC:.c=.o) CUOBJS = $(CUSRC:.cu=.o) tf_75bit.o ############################################################################## all: ../mfaktc.exe ../mfaktc.exe : $(COBJS) $(CUOBJS) $(LD) $^ -o $@ $(LDFLAGS) clean : rm -f *.o *~ sieve.o : sieve.c $(CC) $(CFLAGS) $(CFLAGS_EXTRA_SIEVE) -c $< -o $@ tf_75bit.o : tf_96bit.cu $(NVCC) $(NVCCFLAGS) -c $< -o $@ -DSHORTCUT_75BIT %.o : %.cu $(NVCC) $(NVCCFLAGS) -c $< -o $@ %.o : %.c $(CC) $(CFLAGS) -c $< -o $@ ############################################################################## # dependencies generated by cpp -MM checkpoint.o: checkpoint.c params.h # manually add selftest-data-mersenne.c or selftest-data-wagstaff.c mfaktc.o: mfaktc.c params.h my_types.h compatibility.h sieve.h \ read_config.h parse.h timer.h tf_72bit.h tf_96bit.h tf_barrett96.h \ checkpoint.h signal_handler.h output.h gpusieve.h \ selftest-data-mersenne.c selftest-data-wagstaff.c output.o: output.c params.h my_types.h output.h compatibility.h parse.o: parse.c compatibility.h parse.h params.h read_config.o: read_config.c params.h my_types.h sieve.o: sieve.c params.h compatibility.h signal_handler.o: signal_handler.c params.h my_types.h compatibility.h timer.o: timer.c timer.h compatibility.h tf_72bit.o: tf_72bit.cu params.h my_types.h compatibility.h \ my_intrinsics.h sieve.h timer.h output.h tf_debug.h tf_common.cu tf_96bit.o: tf_96bit.cu params.h my_types.h compatibility.h \ my_intrinsics.h sieve.h timer.h output.h tf_debug.h \ tf_96bit_base_math.cu tf_96bit_helper.cu gpusieve_helper.cu tf_common.cu \ tf_common_gs.cu gpusieve.h tf_barrett96.o: tf_barrett96.cu params.h my_types.h compatibility.h \ my_intrinsics.h sieve.h timer.h output.h tf_debug.h \ tf_96bit_base_math.cu tf_96bit_helper.cu tf_barrett96_div.cu \ tf_barrett96_core.cu tf_common.cu tf_barrett96_gs.o: tf_barrett96_gs.cu params.h my_types.h compatibility.h \ my_intrinsics.h sieve.h timer.h output.h tf_debug.h \ tf_96bit_base_math.cu tf_96bit_helper.cu tf_barrett96_div.cu \ tf_barrett96_core.cu gpusieve_helper.cu tf_common_gs.cu gpusieve.h gpusieve.o: gpusieve.cu params.h my_types.h compatibility.h \ my_intrinsics.h gpusieve.h # manually generated dependency tf_75bit.o: tf_96bit.cu params.h my_types.h compatibility.h \ my_intrinsics.h sieve.h timer.h output.h tf_debug.h \ tf_96bit_base_math.cu tf_96bit_helper.cu gpusieve_helper.cu tf_common.cu \ tf_common_gs.cu gpusieve.h Last fiddled with by DeleteNull on 2023-01-02 at 20:15 |
|
|
|
|
|
#3599 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24×199 Posts |
I did a little more experimenting with my 1070s and 332M exponents: turns out increasing GPUSievePrimes to 150,000 was a benefit as was GPUSieveSize to 1024. Got about 10% more performance making those changes.
I'll have to try increasing GPUSievePrimes on the 3070s later. |
|
|
|
|
|
#3600 | |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24·199 Posts |
Quote:
With Turing, Ampere, and Lovelace, half the hardware is being unused. For Turing, the trick would be writing an algorithm that is able to do some of the math in FP32 efficiently. For Ampere and Lovelace, an entirely FP32 algorithm may be simpler. Turing also introduces Tensor Cores, good for INT8. There be an advantage to using these as well. |
|
|
|
|
|
|
#3601 |
|
Jun 2010
Pennsylvania
947 Posts |
Happy New Year, everyone.
Ref. post https://www.mersenneforum.org/showpo...postcount=2744: Yesterday I upgraded that Kubuntu box from 20.04 LTS to 22.04 LTS, and now mfaktc isn't working anymore. (The previous transition, from 18.04 LTS to 20.04 LTS, didn't seem to have this nasty effect.) Here 's the output of mfaktc.exe -d0: Code:
Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 32.31 CUDA driver version 9.10 ERROR: CUDA runtime version must match the CUDA toolkit version used during compile! Some more information: GPU: GeForce GTX 1050 NVIDIA Driver Version: 390.157 I'll be happy to provide additional details that might help to hone in on a solution. |
|
|
|
|
|
#3602 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24·199 Posts |
apt-get install nvidia-driver-510 nvidia-cuda-dev nvidia-cuda-toolkit, reboot, then recompile mfaktc.
|
|
|
|
|
|
#3603 |
|
"Florian"
Oct 2021
Germany
2×103 Posts |
I've compiled a Windows version with CUDA 12.0, CC 8.9, more classes and GPU sieve size 2047
Instead of cudart.lib I linked cudart_static.lib and it generated the executable and two additional files. Can anyone check if it runs without any additional CUDA DLL file? Obviously I can't, since I have the full toolkit installed. |
|
|
|
|
|
#3604 | |
|
Jun 2010
Pennsylvania
947 Posts |
Quote:
![]() Is there a comprehensive guide here to compiling mfaktc for Kubuntu (Ubuntu)? |
|
|
|
|
|
|
#3605 | |
|
Sep 2011
Germany
70258 Posts |
Quote:
|
|
|
|
|
|
|
#3606 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
+2. And PTX, all in one executable. Doc of setting up, and compile. The whole project could get by with few executables, if they included support for the full feasible list of CC values for a given SDK version; SDK 12 and 8 could together cover CC 2.0-9.0; GPUs released over a ~12 year span. Two executables per OS.
Last fiddled with by kriesel on 2023-01-04 at 15:04 |
|
|
|
|
|
#3607 |
|
Jul 2003
10100000002 Posts |
hi,
to mr. james heinrich why is the file from post #3572 not included anymore in your downloads ? it works and includes the source and the makefile you need to have the toolkit12 installed - it is not made for boinc |
|
|
|
|
|
#3608 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
7×13×47 Posts |
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 42 | 2022-12-18 05:59 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |