![]() |
Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.
My Colab sessions only get two Xeon cores, but using both would "double" the throughput for stage 2. |
[QUOTE=EdH;533283]Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.
My Colab sessions only get two Xeon cores, but using both would "double" the throughput for stage 2.[/QUOTE] I don't think so. Stage 2 is run on the CPU, not the GPU, so I don't think anything about stage 2 gets changed when the program is compiled with the --enable-gpu option. This is in the readme.gpu file: [quote]It will compute step 1 on the GPU, and then perform step 2 on the CPU (not in parallel).[/quote] |
[QUOTE=PhilF;533286]I don't think so. Stage 2 is run on the CPU, not the GPU, so I don't think anything about stage 2 gets changed when the program is compiled with the --enable-gpu option.
This is in the readme.gpu file:[/QUOTE] Yeah, I saw that, but it seemed that there was somewhere I read that the latest version had multi-threading. Even if I can't invoke both CPUs in conjunction with a GPU run, if I can run the stage 1 with B2=0 and save the residues, I could rerun ECM with both threads to run the residues. I may have to explore ecm.py and see if there is a manner I can both run the GPU branch and fill the CPU for stage 2. . . |
[QUOTE=EdH;533283]Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.
[/QUOTE] You need to script it. Basically run stage 1 on the GPU, saving parms to a file, split the file into as many bits as you have CPUs, then run an ecm task for each part. My latest script (not yet fully tested) is: [code] #!/bin/bash # Script to run ecm on 2 or more cores against the number in $NAME.poly or $NAME.n aided by the gpu doing stage 1. # It's intended to be called from factMsieve.factordb.pl which searches the logs for factors. # The GPU can do stage 1 in about 1/2 the time the CPU takes to do stage 2 on one core. # It expects 5 parms, the filename prefix, log suffix, the B1 to resume, B1 for GPU to use and the number of cores to use. # The .ini file should have already been created by the caller #set -x NAME=$1 LEN=$2 OLDB1=$3 NEWB1=$4 CORES=$5 INI=$NAME.ini if [[ ! -f $INI ]]; then echo "Can't find .ini file"; exit;fi if [[ -z $LEN ]]; then echo "Can't tell what to call the log";exit;fi if [[ -z $OLDB1 ]]; then echo "Can't tell previous B1 to use";exit;fi if [[ -z $NEWB1 ]]; then echo "Can't tell what B1 to use";exit;fi if [[ -z $CORES ]]; then echo "Can't tell how many cores to use";exit;fi SAVE=$NAME.save if [[ ! -f $SAVE ]]; then echo "Can't find save file from last run"; exit;fi LOG=$NAME.ecm$LEN.log # First split the save file from the previous run and start running them, followed by standard ecm until the GPU has finished. # /home/chris/ecm-6.4.4/ecm was compiled with -enable-shellcmd to make it accept -idlecmd. date "+ %c ecm to $LEN digits starts now" >> $LOG rm save.* split -nr/$CORES $NAME.save save. rm $NAME.save for FILE in save.* do date "+ %c ecm stage 2 with B1=$OLDB1 starts now" >> $NAME.ecm$LEN.$FILE.log (nice -n 19 /home/chris/ecm-gpu/trunk/ecm -resume $FILE $OLDB1;nice -n 19 /home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n $NEWB1 <$INI ) | tee -a $NAME.ecm$LEN.$FILE.log | grep actor & done # Now start running stage 1 on the gpu /home/chris/ecm.2741/trunk/ecm -gpu -save $NAME.save $NEWB1 1 <$INI | tee -a $LOG | grep actor date "+ %c ecm to $LEN digits stage 1 ended" >> $LOG wait # for previous ecm's to finish date "+ %c Finished" | tee -a $NAME.ecm$LEN.save.* >> $LOG grep -q 'Factor found' $LOG $NAME.ecm$LEN.save.* # Check if we found a factor exit $? # And pass RC back to caller [/code] But I've never used colab so don't know how to run things on it. Chris |
[QUOTE=chris2be8;533323]You need to script it. Basically run stage 1 on the GPU, saving parms to a file, split the file into as many bits as you have CPUs, then run an ecm task for each part.
My latest script (not yet fully tested) is: [code] #!/bin/bash # Script to run ecm on 2 or more cores against the number in $NAME.poly or $NAME.n aided by the gpu doing stage 1. # It's intended to be called from factMsieve.factordb.pl which searches the logs for factors. # The GPU can do stage 1 in about 1/2 the time the CPU takes to do stage 2 on one core. # It expects 5 parms, the filename prefix, log suffix, the B1 to resume, B1 for GPU to use and the number of cores to use. # The .ini file should have already been created by the caller #set -x NAME=$1 LEN=$2 OLDB1=$3 NEWB1=$4 CORES=$5 INI=$NAME.ini if [[ ! -f $INI ]]; then echo "Can't find .ini file"; exit;fi if [[ -z $LEN ]]; then echo "Can't tell what to call the log";exit;fi if [[ -z $OLDB1 ]]; then echo "Can't tell previous B1 to use";exit;fi if [[ -z $NEWB1 ]]; then echo "Can't tell what B1 to use";exit;fi if [[ -z $CORES ]]; then echo "Can't tell how many cores to use";exit;fi SAVE=$NAME.save if [[ ! -f $SAVE ]]; then echo "Can't find save file from last run"; exit;fi LOG=$NAME.ecm$LEN.log # First split the save file from the previous run and start running them, followed by standard ecm until the GPU has finished. # /home/chris/ecm-6.4.4/ecm was compiled with -enable-shellcmd to make it accept -idlecmd. date "+ %c ecm to $LEN digits starts now" >> $LOG rm save.* split -nr/$CORES $NAME.save save. rm $NAME.save for FILE in save.* do date "+ %c ecm stage 2 with B1=$OLDB1 starts now" >> $NAME.ecm$LEN.$FILE.log (nice -n 19 /home/chris/ecm-gpu/trunk/ecm -resume $FILE $OLDB1;nice -n 19 /home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n $NEWB1 <$INI ) | tee -a $NAME.ecm$LEN.$FILE.log | grep actor & done # Now start running stage 1 on the gpu /home/chris/ecm.2741/trunk/ecm -gpu -save $NAME.save $NEWB1 1 <$INI | tee -a $LOG | grep actor date "+ %c ecm to $LEN digits stage 1 ended" >> $LOG wait # for previous ecm's to finish date "+ %c Finished" | tee -a $NAME.ecm$LEN.save.* >> $LOG grep -q 'Factor found' $LOG $NAME.ecm$LEN.save.* # Check if we found a factor exit $? # And pass RC back to caller [/code]But I've never used colab so don't know how to run things on it. Chris[/QUOTE]Thanks! I'm looking it over to see how I can incorporate some of the calls. I'm bouncing around between an awful lot of things ATM, which is probably causing some of my difficulties. |
GPU-ECM for CC2.0
Does anyone who has set up enough Windows development toolkits interests in compiling Windows binary of relatively new version(for example, 7.0.4-dev or 7.0.4 or 7.0.5-dev) GPU-ECM for CC2.0 card? It would be good to run it on old notebooks, thanks.:smile:
|
Revisions >3076 of Dev No Longer Work With Cuda 10.x Due To "unnamed structs/unions" in cuda.h
It was recently reported to me that my GMP-ECM-GPU branch instructions for a Colab session no longer work. In verifying the trouble, I too, received the following, during compilation:
[code] configure: Using cuda.h from /usr/local/cuda-10.0/targets/x86_64-linux/include checking cuda.h usability... no checking cuda.h presence... yes configure: WARNING: cuda.h: present but cannot be compiled configure: WARNING: cuda.h: check for missing prerequisite headers? configure: WARNING: cuda.h: see the Autoconf documentation configure: WARNING: cuda.h: section "Present But Cannot Be Compiled" configure: WARNING: cuda.h: proceeding with the compiler's result configure: WARNING: ## ------------------------------------------------ ## configure: WARNING: ## Report this to ecm-discuss@lists.gforge.inria.fr ## configure: WARNING: ## ------------------------------------------------ ## checking for cuda.h... no configure: error: required header file missing Makefile:807: recipe for target 'config.status' failed make: *** [config.status] Error 1 [/code]Further research per ECM Team request showed the following from config.log: [code] In file included from conftest.c:127:0: /usr/local/cuda-10.0/targets/x86_64-linux/include/cuda.h:432:10: warning: ISO C99 doesn't support unnamed structs/unions [-Wpedantic] }; ^ /usr/local/cuda-10.0/targets/x86_64-linux/include/cuda.h:442:10: warning: ISO C99 doesn't support unnamed structs/unions [-Wpedantic] }; ^ configure:15232: $? = 0 configure: failed program was: | /* confdefs.h */ [/code] |
On the off-chance I could solve this simply by adding a name to the unions referenced above, I tried:
[code] union noname { [/code]But alas, no joy: [code] | #include <cuda.h> configure:15308: result: no configure:15308: checking cuda.h presence configure:15308: x86_64-linux-gnu-gcc -E -I/usr/local/cuda-10.0/targets/x86_64-linux/include -I/usr/local//include -I/usr/local//include conftest.c configure:15308: $? = 0 configure:15308: result: yes configure:15308: WARNING: cuda.h: present but cannot be compiled configure:15308: WARNING: cuda.h: check for missing prerequisite headers? configure:15308: WARNING: cuda.h: see the Autoconf documentation configure:15308: WARNING: cuda.h: section "Present But Cannot Be Compiled" configure:15308: WARNING: cuda.h: proceeding with the compiler's result configure:15308: checking for cuda.h configure:15308: result: no configure:15315: error: required header file missing [/code] |
[QUOTE=EdH;542518]It was recently reported to me that my GMP-ECM-GPU branch instructions for a Colab session no longer work. . . [/QUOTE]GMP-ECM has been updated to revision 3081 and this is now working in my Colab instances.
"Thanks!" go out to the GMP-ECM Team. |
I have had mixed results using ECM-GPU on CoLab. Not that CoLab is the problem, it may be the way I am using it. I run sets of 1024 curves at 11e7 on the GPU. Then I transfer the results file to my local system to run step 2. I’ve notice the sigmas are generated consecutively. Is that enough variety or should I break it down and run twice as many sets at 512 curves each?
Running three sets of 1024 curves at 11e7 failed to find a p43. Another run of two sets of 1024 at 11e7 failed to find a p46. Lastly, on the first set of 1024 curves at 11e7 found a p53. On the GPU I perform: [CODE]echo <number> | ecm -v -save Cxxx.txt -gpu -gpucurves 1024 11e7[/CODE] After transferring the 1024 line result file I break it into four pieces using “head” and “tail”. Then each 256 line file is run by: [CODE]ecm -resume Cxxx[B]y[/B].txt -one 11e7[/CODE] where y is a suffix from a to d representing the four smaller files. The p53 may be a lucky hit but the p43 & p46 are a big time miss. Should I run more of the smaller sets to get a better “spread” of sigma? |
Hello,
Consecutive sigmas should not be a problem. I've been running ECM-GPU on my GPUs for several years and it seems to make no difference. The only thing I can think of is to check the ranges don't overlap (eg if colab always seeds the RNG with the same values). If it's not that then just put down to fate. I've had enough lucky hits and unlucky misses to balance out over the long run. Chris |
GMP-ECM Won't Compile for my System - edit: Card is arch 2.0
[B]Edit: [/B]I'm leaving this post up in case someone else has this issue, but adding a note that the card I have is apparently rather old and is only arch 2.0. Therefore, I have given up attempting to make it useful.
[strike]GMP-ECM compiles fine without GPU. But, with "--enable-gpu" I get a "memcpy" error:[/strike] [code] . . . ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0] ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_50' ptxas info : Function properties for _Z16Cuda_Init_Devicev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 2 registers, 320 bytes cmem[0] ptxas info : 384 bytes gmem, 4 bytes cmem[3] ptxas info : Compiling entry function '_Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j' for 'sm_52' ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0] ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_52' ptxas info : Function properties for _Z16Cuda_Init_Devicev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 2 registers, 320 bytes cmem[0] ptxas info : 384 bytes gmem, 4 bytes cmem[3] ptxas info : Compiling entry function '_Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j' for 'sm_53' ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0] ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_53' ptxas info : Function properties for _Z16Cuda_Init_Devicev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 2 registers, 320 bytes cmem[0] [B]/usr/include/string.h:[/B] In function [B]'void* __mempcpy_inline(void*, const void*, size_t)'[/B]: [B]/usr/include/string.h:652:42: [COLOR=Red]error: '[/COLOR]memcpy'[/B] was not declared in this scope return (char *) memcpy (__dest, __src, __n) + __n; [COLOR=Lime]^[/COLOR] Makefile:2558: recipe for target 'cudakernel.lo' failed make[2]: *** [cudakernel.lo] Error 1 make[2]: Leaving directory '/home/math99/Math/ecmgpu' Makefile:1897: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/math99/Math/ecmgpu' Makefile:777: recipe for target 'all' failed make: *** [all] Error 2 [/code]Here's the "./configure" output, in case that is of help: [code] $ ./configure --enable-gpu --with-gmp=/usr/local/ checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for a sed that does not truncate output... /bin/sed checking for CC and CFLAGS in gmp.h... yes CC=gcc CFLAGS=-O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge checking whether CC=gcc and CFLAGS=-O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge works... yes checking for style of include used by make... GNU checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking dependency style of gcc... gcc3 checking for gcc option to accept ISO C99... none needed checking dependency style of gcc... gcc3 checking how to print strings... printf checking for a sed that does not truncate output... (cached) /bin/sed checking for fgrep... /bin/grep -F checking for ld used by gcc... /usr/bin/ld checking if the linker (/usr/bin/ld) is GNU ld... yes checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B checking the name lister (/usr/bin/nm -B) interface... BSD nm checking whether ln -s works... yes checking the maximum length of command line arguments... 1572864 checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop checking for /usr/bin/ld option to reload object files... -r checking for objdump... objdump checking how to recognize dependent libraries... pass_all checking for dlltool... no checking how to associate runtime and link libraries... printf %s\n checking for ar... ar checking for archiver @FILE support... @ checking for strip... strip checking for ranlib... ranlib checking command to parse /usr/bin/nm -B output from gcc object... ok checking for sysroot... no checking for a working dd... /bin/dd checking how to truncate binary pipes... /bin/dd bs=4096 count=1 checking for mt... mt checking if mt is a manifest tool... no checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for dlfcn.h... yes checking for objdir... .libs checking if gcc supports -fno-rtti -fno-exceptions... no checking for gcc option to produce PIC... -fPIC -DPIC checking if gcc PIC flag -fPIC -DPIC works... yes checking if gcc static flag -static works... yes checking if gcc supports -c -o file.o... yes checking if gcc supports -c -o file.o... (cached) yes checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking if libtool supports shared libraries... yes checking whether to build shared libraries... no checking whether to build static libraries... yes checking for int64_t... yes checking for uint64_t... yes checking for unsigned long long int... yes checking for long long int... yes checking for an ANSI C-conforming const... yes checking for inline... inline checking whether time.h and sys/time.h may both be included... yes checking for size_t... yes checking if assembly code is ELF... yes checking for suitable m4... m4 checking how to switch to text section... .text checking how to export a symbol... .globl checking what assembly label suffix to use... : checking if globals are prefixed by underscore... no checking how to switch to text section... (cached) .text checking how to export a symbol... (cached) .globl checking for assembler .type directive... .type $1,@$2 checking for working alloca.h... yes checking for alloca... yes checking for ANSI C header files... (cached) yes checking math.h usability... yes checking math.h presence... yes checking for math.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking malloc.h usability... yes checking malloc.h presence... yes checking for malloc.h... yes checking for strings.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking io.h usability... no checking io.h presence... no checking for io.h... no checking signal.h usability... yes checking signal.h presence... yes checking for signal.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking for windows.h... no checking for psapi.h... no checking ctype.h usability... yes checking ctype.h presence... yes checking for ctype.h... yes checking for sys/types.h... (cached) yes checking sys/resource.h usability... yes checking sys/resource.h presence... yes checking for sys/resource.h... yes checking aio.h usability... yes checking aio.h presence... yes checking for aio.h... yes checking for working strtod... yes checking for pow in -lm... yes checking for floor in -lm... yes checking for sqrt in -lm... yes checking for fmod in -lm... yes checking for cos in -lm... yes checking for aio_read in -lrt... yes checking for GetProcessMemoryInfo in -lpsapi... no checking for isascii... yes checking for memset... yes checking for strchr... yes checking for strlen... yes checking for strncasecmp... yes checking for strstr... yes checking for access... yes checking for unlink... yes checking for isspace... yes checking for isdigit... yes checking for isxdigit... yes checking for time... yes checking for ctime... yes checking for gethostname... yes checking for gettimeofday... yes checking for getrusage... yes checking for memmove... yes checking for signal... yes checking for fcntl... yes checking for fileno... yes checking for setvbuf... yes checking for fallocate... yes checking for aio_read... yes checking for aio_init... yes checking for _fseeki64... no checking for _ftelli64... no checking for malloc_usable_size... yes checking gmp.h usability... yes checking gmp.h presence... yes checking for gmp.h... yes checking for recent GMP... yes checking if GMP is MPIR... no checking whether we can link against GMP... yes checking if gmp.h version and libgmp version are the same... (6.2.0/6.2.0) yes checking for __gmpn_add_nc... yes checking for __gmpn_mod_34lsub1... yes checking for __gmpn_redc_1... yes checking for __gmpn_redc_2... yes checking for __gmpn_mullo_n... yes checking for __gmpn_redc_n... yes checking for __gmpn_preinv_mod_1... yes checking for __gmpn_mod_1s_4p_cps... yes checking for __gmpn_mod_1s_4p... yes checking for __gmpn_mul_fft... yes checking for __gmpn_fft_next_size... yes checking for __gmpn_fft_best_k... yes checking for __gmpn_mulmod_bnm1... yes checking for __gmpn_mulmod_bnm1_next_size... yes checking whether assembler supports --noexecstack option... yes checking whether compiler knows __attribute__((hot))... yes checking for xsltproc... no checking whether self tests are run under valgrind... no checking cuda.h usability... yes checking cuda.h presence... yes checking for cuda.h... yes checking that CUDA Toolkit version is at least 3.0... (7.5) yes checking for cuInit in -lcuda... yes checking that CUDA Toolkit version and runtime version are the same... (7.5/7.5) yes checking for nvcc... /usr/bin/nvcc checking for compatibility between gcc and nvcc... yes checking that nvcc know compute capability 20... yes checking that nvcc know compute capability 21... yes checking that nvcc know compute capability 30... yes checking that nvcc know compute capability 32... yes checking that nvcc know compute capability 35... yes checking that nvcc know compute capability 37... yes checking that nvcc know compute capability 50... yes checking that nvcc know compute capability 52... yes checking that nvcc know compute capability 53... yes checking that nvcc know compute capability 60... no checking that nvcc know compute capability 61... no checking that nvcc know compute capability 62... no checking that nvcc know compute capability 70... no checking that nvcc know compute capability 72... no checking that nvcc know compute capability 75... no checking if nvcc know ptx instruction madc... yes creating config.m4 checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating athlon/Makefile config.status: creating pentium4/Makefile config.status: creating x86_64/Makefile config.status: creating powerpc64/Makefile config.status: creating aprtcle/Makefile config.status: creating build.vc12/Makefile config.status: creating build.vc12/assembler/Makefile config.status: creating build.vc12/ecm/Makefile config.status: creating build.vc12/ecm_gpu/Makefile config.status: creating build.vc12/libecm/Makefile config.status: creating build.vc12/libecm_gpu/Makefile config.status: creating build.vc12/tune/Makefile config.status: creating build.vc12/bench_mulredc/Makefile config.status: creating config.h config.status: creating ecm.h config.status: ecm.h is unchanged config.status: executing depfiles commands config.status: executing libtool commands configure: Configuration: configure: Build for host type x86_64-pc-linux-gnu configure: CC=gcc, CFLAGS=-W -Wall -Wundef -O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge -DWITH_GPU -DECM_GPU_CURVES_BY_BLOCK=16 configure: Linking GMP with /usr/local//lib/libgmp.a configure: Using asm redc code from directory x86_64 configure: Not using SSE2 instructions in NTT code configure: Using APRCL to prove factors prime/composite configure: Assertions enabled configure: Shell command execution disabled configure: OpenMP disabled configure: Memory debugging disabled [/code][strike]Any assistance would be appreciated.[/strike] Edit: Assistance no longer requested. |
I have two different variations of [I]GMP-ECM[/I]. Below are the titles each displays when they are fist started:
[QUOTE]GMP-ECM 7.0.4-dev [configured with MPIR 2.7.2, --enable-gpu, --enable-openmp] [ECM] GMP-ECM 7.0.5-dev [configured with GMP 6.1.2, --enable-asm-redc] [ECM][/QUOTE] The compiled binaries are vastly different in size. The top is 761K bytes. The bottom is 4,396K bytes. I have two machines with the same OS, Windows 7 Pro. One is an i5 and the other is a Xeon. The top runs on both. The bottom will not run on the Xeon. Both will run on my i7 with Windows 10 Pro. I stick with the one which runs on everything. The GPU functions in the top, I do not bother with. Only going to 2^1014 would not help anything that may go beyond. It appears that development on [I]GMP-ECM[/I] has stalled. If this is true, then it is very sad. :two cents: |
[QUOTE=storm5510;559439]It appears that development on [I]GMP-ECM[/I] has stalled. If this is true, then it is very sad.[/QUOTE]
There were seven commits in the last month: [URL="http://gforge.inria.fr/activity/?group_id=135"]http://gforge.inria.fr/activity/?group_id=135[/URL]. |
[QUOTE=kruoli;559727]There were seven commits in the last month: [URL]http://gforge.inria.fr/activity/?group_id=135[/URL].[/QUOTE]
At a glance, those appear to me to be maintenance, not development. |
[QUOTE=kriesel;559752]At a glance, those appear to me to be maintenance, not development.[/QUOTE]
It would be great to get an official comment from the primary developers about what future development, if any, is planned. |
the repository was moved to [url]https://gitlab.inria.fr/zimmerma/ecm[/url]
as gforge.inria.fr is being deprecated. I believe it is still being actively developed |
I assume if anything major comes along, it will be posted here? :question:
|
I want to run GPU-ECM on Windows, but I haven't found any precompiled binaries. Anyone willing to give me compiling instructions for Windows, preferably using the Linux subsystem, or know where I can find existing binaries? I don't have Visual Studio (not even the Community edition, as I don't want a MS account connected to my PC, mostly for privacy reasons).
|
It's built-in to any copy of GMP-ECM 7. Check the "news" file within an ECM download.
Do you mean there are no binaries available for GMP-ECM 7.0.1 or newer? |
Oops. I didn't pay any attention to that. I guess I should RTM before asking any questions.
|
Just to brag a bit: I actually had a successful factoring session with Colab using the GPU branch of GMP-ECM for stage 1 and a local machine for stage 2.
It was only a 146 digit number and it took quite a while, but still, it worked! Colab connected me to a T4 which gave me 2560 cores on which I ran stage 1, with the -save option. The local machine "watched" for the residue file, using the tunneling setup by chalsall, described elsewhere. The local machine used ecm.py by WraithX to run stage 2. A minor session, but it proved the concept.:smile: |
GMP-ECM has the option [C]-one[/C] to tell ECM to stop after the first factor is found. But, when running a GPU, stage 2 is performed on all the residues from stage 1, instead of stopping when a factor is found. Since GMP-ECM seems to still be single* threaded, with lots of cores, it takes a lot longer than it needs to. I can use external separate programs, such as ecm.py, but my scripts would be even more complicated.
Any help? *I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations? |
For P-1 stage 2, GMP-ECM can be configured to use OpenMP. Everything else is single threaded.
|
[QUOTE=kruoli;601259]For P-1 stage 2, GMP-ECM can be configured to use OpenMP. Everything else is single threaded.[/QUOTE]
Thanks Oliver! Maybe that's what I had seen and my memory is a bit foggy. |
There's also some code under [URL="https://gitlab.inria.fr/zimmerma/ecm/-/blob/master/multiecm.c"]multiecm.c[/URL] in gmp-ecm. I've never used it (I prefer Wraith's ecm.py), the header is
[CODE] /* multiecm.c - ECM with many curves with many torsion and/or in parallel Author: F. Moraino */[/CODE] But it doesn't look like it's been worked on in 9 years. |
[QUOTE=SethTro;601284]There's also some code under [URL="https://gitlab.inria.fr/zimmerma/ecm/-/blob/master/multiecm.c"]multiecm.c[/URL] in gmp-ecm. I've never used it (I prefer Wraith's ecm.py), the header is
[CODE] /* multiecm.c - ECM with many curves with many torsion and/or in parallel Author: F. Moraino */[/CODE]But it doesn't look like it's been worked on in 9 years.[/QUOTE]Thanks! I also use ecm.py for stage 2 external work, but it gets pretty complicated in my scripts. |
[QUOTE=EdH;601201]
*I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations?[/QUOTE] From NEWS: [quote] Changes between GMP-ECM 6.4.4 and GMP-ECM 7.0: * GMP-ECM is now thread-safe. In particular the "ecmfactor" binary can be called with say -t 17 to use 17 threads. [/quote] I think I looked at it once but didn't find it useful. I don't think it can be used for doing stage 2 after a GPU has done stage 1. |
That seems familiar! I'm sure that's what I was thinking of. Thanks for finding it!
My next issue is what you reference. I'm currently sending residues to a second machine while tasking the GPU machine with the next level of B1. But, if stage 2 is successful on the second machine, I still need to wait for the GPU to finish its current B1. I've tried [C]pkill ecm[/C], but it doesn't seem to do anything at the call. |
[QUOTE=EdH;601331]I've tried [C]pkill ecm[/C], but it doesn't seem to do anything at the call.[/QUOTE]
Try [C]pkill -1 ecm[/C]. |
[QUOTE=Gimarel;601334]Try [C]pkill -1 ecm[/C].[/QUOTE]I believe that is working. Thanks!
|
[QUOTE=EdH;601201]GMP-ECM has the option [C]-one[/C] to tell ECM to stop after the first factor is found. But, when running a GPU, stage 2 is performed on all the residues from stage 1, instead of stopping when a factor is found. Since GMP-ECM seems to still be single* threaded, with lots of cores, it takes a lot longer than it needs to. I can use external separate programs, such as ecm.py, but my scripts would be even more complicated.
Any help? *I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations?[/QUOTE] I'm happy to look at this as a bug. I vaguely remember that I wasn't sure if I should always stop or only if the cofactor is composite. |
[QUOTE=SethTro;601925]I'm happy to look at this as a bug. I vaguely remember that I wasn't sure if I should always stop or only if the cofactor is composite.[/QUOTE]Thanks, but I've moved to using a separate machine with ecm.py and it is working as needed.
I'd leave as is unless others are interested. For me it is no longer an issue. Again, thank you for considering it. |
There seems to be an oddity about using a GPU in GMP-ECM.
The [C]--help[/C] switch shows the maximum size of a B1 value is 2^1018. Using Windows Calculator to calculate log(2) * 1018 says this is a 308 digit value. If I run this: [CODE]echo "2^14447-1" | gpuecm -gpu -maxmem 2048 5e6 10e6[/CODE]I get this message: [QUOTE]GMP-ECM 7.0.4-dev [configured with MPIR 2.7.2, --enable-gpu, --enable-openmp] [ECM] Input number is 2^14447-1 (4349 digits) GPU: Error, input number should be strictly lower than 2^1018 please report internal errors at <ecm-discuss@lists.gforge.inria.fr>. [/QUOTE]B1 in my example is 5e6 or 5,000,000. This is obviously smaller than a 308 digit number. I am missing something here and I cannot determine what it is. |
[QUOTE=storm5510;606417]There seems to be an oddity about using a GPU in GMP-ECM.
The [C]--help[/C] switch shows the maximum size of a B1 value is 2^1018. Using Windows Calculator to calculate log(2) * 1018 says this is a 308 digit value. If I run this: [CODE]echo "2^14447-1" | gpuecm -gpu -maxmem 2048 5e6 10e6[/CODE]I get this message: B1 in my example is 5e6 or 5,000,000. This is obviously smaller than a 308 digit number. I am missing something here and I cannot determine what it is.[/QUOTE] You are missing something. The inputted number is too big. You specified 2^14447-1. The max is 2^1018. :ermm: |
If you are willing to compile gmp-ecm from source you can test larger input numbers (it's also faster).
See [url]https://www.mersenneforum.org/showthread.php?t=27103[/url] |
[QUOTE=paulunderwood;606419]You are missing something. The inputted number is too big. You specified 2^14447-1. The max is 2^1018. :ermm:[/QUOTE]
Understood. IMHO, it is not good for much of anything at this size. |
Is there any known progress in implementing stage 1 in (at least somewhat optimised) OpenCL up to 512 or maybe 1024 bits?
|
[QUOTE=kruoli;625295]Is there any known progress in implementing stage 1 in (at least somewhat optimised) OpenCL up to 512 or maybe 1024 bits?[/QUOTE]
I don't know of anyone working on this. I did some quick google searches and didn't find any OpenCl arbitrary precision libraries. The old ECM gpu code was fairly straightforward grade school multiplication algorithm (IIRC) that could be adapted quickly for OpenCl, but if you/someone wanted competitive speeds from OpenCl more code would be needed for montgomery multiplication, various size kernels, etc. |
[QUOTE=SethTro;625329]I don't know of anyone working on this. I did some quick google searches and didn't find any OpenCl arbitrary precision libraries. The old ECM gpu code was fairly straightforward grade school multiplication algorithm (IIRC) that could be adapted quickly for OpenCl, but if you/someone wanted competitive speeds from OpenCl more code would be needed for montgomery multiplication, various size kernels, etc.[/QUOTE]
AFAIK, nothing meaningful has been done with [I]GMP-ECM[/I] for a long time. I stopped using it several years ago. It was not worth the time required to run anything. [I]OpenCL[/I] appears not to work a GPU very hard. I have used [I]gpuOwl[/I] a lot, so I know. It does a good job, but there is a limit to its GPU utilization. Around 60% seems to be its limit. |
[QUOTE=storm5510;625373][I]OpenCL[/I] appears not to work a GPU very hard. I have used [I]gpuOwl[/I] a lot, so I know. It does a good job, but there is a limit to its GPU utilization. Around [B]60%[/B] seems to be its limit.[/QUOTE]That statement is not consistent with my experience. I routinely see 99-100% GPU load indicated in GPU-Z for Gpuowl running on Radeon VII, RX480, or RX550.
|
[QUOTE=kriesel;625374]That statement is not consistent with my experience. I routinely see 99-100% GPU load indicated in GPU-Z for Gpuowl running on Radeon VII, RX480, or RX550.[/QUOTE]
I have no experience with [I]AMD[/I] GPU's. All of mine, past and present, are [I]Nvidia[/I] based. Something tells me that [I]AMD[/I] variations are much better suited to run OpenCL than [I]Nvidia[/I] models are. CUDA is much better for [I]Nvidia[/I], it seems. [I]mfaktc[/I] cranks out over 3,000 Ghz-d/day on my ever-older RTX-2080. I am satisfied with that. |
NVIDIA quadro k620, gpuowl v6.11-382, gpu-Z v2.52.0, indicated gpu load [B]100%[/B].
Most recent NVIDIA consumer GPUs have poor DP performance (PRP, LL, P-1) relative to SP performance (TF), while Teslas and AMD typically have decent DP performance ratios. I almost never use RTX2080 or GTX 1650 for anything other than TF because of the 32:1 SP/DP ratio. |
[QUOTE=kriesel;625383]NVIDIA quadro k620, gpuowl v6.11-382, gpu-Z v2.52.0, indicated gpu load [B]100%[/B].
Most recent NVIDIA consumer GPUs have poor DP performance (PRP, LL, P-1) relative to SP performance (TF), while Teslas and AMD typically have decent DP performance ratios. I almost never use RTX2080 or GTX 1650 for anything other than TF because of the 32:1 SP/DP ratio.[/QUOTE] Well, a lot of what is above sails over my head. I will take your word for it. I go by what I see here... |
A math-liking friend just picked up a 4090. If I were to talk him into GPU-ECM runs, how many curves will it test per run?
|
Per [url]https://www.mersenneforum.org/showpost.php?p=601139&postcount=109[/url]
Number of curves is strongly related to number of shader units; 1x, 2x, or 1/2x is generally optimal (you can measure with ./gpu_throughput_test.sh). My 1080ti (3584 shaders) prefers 1792 curves (1 curve / 2 shaders). My 970 (1664 shaders) prefers 832 curves. I would expect the 4090 with 16384 shaders to be best with 8192 or 16384 curves at the same time. |
| All times are UTC. The time now is 04:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.