mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   ECM for CUDA GPUs in latest GMP-ECM ? (https://www.mersenneforum.org/showthread.php?t=16480)

EdH 2019-12-20 16:46

Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.

My Colab sessions only get two Xeon cores, but using both would "double" the throughput for stage 2.

PhilF 2019-12-20 17:02

[QUOTE=EdH;533283]Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.

My Colab sessions only get two Xeon cores, but using both would "double" the throughput for stage 2.[/QUOTE]

I don't think so. Stage 2 is run on the CPU, not the GPU, so I don't think anything about stage 2 gets changed when the program is compiled with the --enable-gpu option.

This is in the readme.gpu file:

[quote]It will compute step 1 on the GPU, and then perform step 2 on the CPU (not in parallel).[/quote]

EdH 2019-12-20 17:38

[QUOTE=PhilF;533286]I don't think so. Stage 2 is run on the CPU, not the GPU, so I don't think anything about stage 2 gets changed when the program is compiled with the --enable-gpu option.

This is in the readme.gpu file:[/QUOTE]
Yeah, I saw that, but it seemed that there was somewhere I read that the latest version had multi-threading. Even if I can't invoke both CPUs in conjunction with a GPU run, if I can run the stage 1 with B2=0 and save the residues, I could rerun ECM with both threads to run the residues.

I may have to explore ecm.py and see if there is a manner I can both run the GPU branch and fill the CPU for stage 2. . .

chris2be8 2019-12-21 16:32

[QUOTE=EdH;533283]Does the GPU branch allow for multi-threading stage 2? I can't seem to find anything in the docs.
[/QUOTE]

You need to script it. Basically run stage 1 on the GPU, saving parms to a file, split the file into as many bits as you have CPUs, then run an ecm task for each part.

My latest script (not yet fully tested) is:
[code]
#!/bin/bash

# Script to run ecm on 2 or more cores against the number in $NAME.poly or $NAME.n aided by the gpu doing stage 1.
# It's intended to be called from factMsieve.factordb.pl which searches the logs for factors.

# The GPU can do stage 1 in about 1/2 the time the CPU takes to do stage 2 on one core.

# It expects 5 parms, the filename prefix, log suffix, the B1 to resume, B1 for GPU to use and the number of cores to use.
# The .ini file should have already been created by the caller

#set -x

NAME=$1
LEN=$2
OLDB1=$3
NEWB1=$4
CORES=$5

INI=$NAME.ini
if [[ ! -f $INI ]]; then echo "Can't find .ini file"; exit;fi
if [[ -z $LEN ]]; then echo "Can't tell what to call the log";exit;fi
if [[ -z $OLDB1 ]]; then echo "Can't tell previous B1 to use";exit;fi
if [[ -z $NEWB1 ]]; then echo "Can't tell what B1 to use";exit;fi
if [[ -z $CORES ]]; then echo "Can't tell how many cores to use";exit;fi

SAVE=$NAME.save
if [[ ! -f $SAVE ]]; then echo "Can't find save file from last run"; exit;fi

LOG=$NAME.ecm$LEN.log

# First split the save file from the previous run and start running them, followed by standard ecm until the GPU has finished.
# /home/chris/ecm-6.4.4/ecm was compiled with -enable-shellcmd to make it accept -idlecmd.
date "+ %c ecm to $LEN digits starts now" >> $LOG

rm save.*
split -nr/$CORES $NAME.save save.
rm $NAME.save
for FILE in save.*
do
date "+ %c ecm stage 2 with B1=$OLDB1 starts now" >> $NAME.ecm$LEN.$FILE.log
(nice -n 19 /home/chris/ecm-gpu/trunk/ecm -resume $FILE $OLDB1;nice -n 19 /home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n $NEWB1 <$INI ) | tee -a $NAME.ecm$LEN.$FILE.log | grep actor &
done

# Now start running stage 1 on the gpu
/home/chris/ecm.2741/trunk/ecm -gpu -save $NAME.save $NEWB1 1 <$INI | tee -a $LOG | grep actor
date "+ %c ecm to $LEN digits stage 1 ended" >> $LOG
wait # for previous ecm's to finish

date "+ %c Finished" | tee -a $NAME.ecm$LEN.save.* >> $LOG

grep -q 'Factor found' $LOG $NAME.ecm$LEN.save.* # Check if we found a factor
exit $? # And pass RC back to caller
[/code]

But I've never used colab so don't know how to run things on it.

Chris

EdH 2019-12-22 18:50

[QUOTE=chris2be8;533323]You need to script it. Basically run stage 1 on the GPU, saving parms to a file, split the file into as many bits as you have CPUs, then run an ecm task for each part.

My latest script (not yet fully tested) is:
[code]
#!/bin/bash

# Script to run ecm on 2 or more cores against the number in $NAME.poly or $NAME.n aided by the gpu doing stage 1.
# It's intended to be called from factMsieve.factordb.pl which searches the logs for factors.

# The GPU can do stage 1 in about 1/2 the time the CPU takes to do stage 2 on one core.

# It expects 5 parms, the filename prefix, log suffix, the B1 to resume, B1 for GPU to use and the number of cores to use.
# The .ini file should have already been created by the caller

#set -x

NAME=$1
LEN=$2
OLDB1=$3
NEWB1=$4
CORES=$5

INI=$NAME.ini
if [[ ! -f $INI ]]; then echo "Can't find .ini file"; exit;fi
if [[ -z $LEN ]]; then echo "Can't tell what to call the log";exit;fi
if [[ -z $OLDB1 ]]; then echo "Can't tell previous B1 to use";exit;fi
if [[ -z $NEWB1 ]]; then echo "Can't tell what B1 to use";exit;fi
if [[ -z $CORES ]]; then echo "Can't tell how many cores to use";exit;fi

SAVE=$NAME.save
if [[ ! -f $SAVE ]]; then echo "Can't find save file from last run"; exit;fi

LOG=$NAME.ecm$LEN.log

# First split the save file from the previous run and start running them, followed by standard ecm until the GPU has finished.
# /home/chris/ecm-6.4.4/ecm was compiled with -enable-shellcmd to make it accept -idlecmd.
date "+ %c ecm to $LEN digits starts now" >> $LOG

rm save.*
split -nr/$CORES $NAME.save save.
rm $NAME.save
for FILE in save.*
do
date "+ %c ecm stage 2 with B1=$OLDB1 starts now" >> $NAME.ecm$LEN.$FILE.log
(nice -n 19 /home/chris/ecm-gpu/trunk/ecm -resume $FILE $OLDB1;nice -n 19 /home/chris/ecm-6.4.4/ecm -c 999 -idlecmd 'ps -ef | grep -q [-]save' -n $NEWB1 <$INI ) | tee -a $NAME.ecm$LEN.$FILE.log | grep actor &
done

# Now start running stage 1 on the gpu
/home/chris/ecm.2741/trunk/ecm -gpu -save $NAME.save $NEWB1 1 <$INI | tee -a $LOG | grep actor
date "+ %c ecm to $LEN digits stage 1 ended" >> $LOG
wait # for previous ecm's to finish

date "+ %c Finished" | tee -a $NAME.ecm$LEN.save.* >> $LOG

grep -q 'Factor found' $LOG $NAME.ecm$LEN.save.* # Check if we found a factor
exit $? # And pass RC back to caller
[/code]But I've never used colab so don't know how to run things on it.

Chris[/QUOTE]Thanks! I'm looking it over to see how I can incorporate some of the calls. I'm bouncing around between an awful lot of things ATM, which is probably causing some of my difficulties.

Fan Ming 2020-01-04 11:55

GPU-ECM for CC2.0
 
Does anyone who has set up enough Windows development toolkits interests in compiling Windows binary of relatively new version(for example, 7.0.4-dev or 7.0.4 or 7.0.5-dev) GPU-ECM for CC2.0 card? It would be good to run it on old notebooks, thanks.:smile:

EdH 2020-04-13 12:28

Revisions >3076 of Dev No Longer Work With Cuda 10.x Due To "unnamed structs/unions" in cuda.h
 
It was recently reported to me that my GMP-ECM-GPU branch instructions for a Colab session no longer work. In verifying the trouble, I too, received the following, during compilation:
[code]
configure: Using cuda.h from /usr/local/cuda-10.0/targets/x86_64-linux/include
checking cuda.h usability... no
checking cuda.h presence... yes
configure: WARNING: cuda.h: present but cannot be compiled
configure: WARNING: cuda.h: check for missing prerequisite headers?
configure: WARNING: cuda.h: see the Autoconf documentation
configure: WARNING: cuda.h: section "Present But Cannot Be Compiled"
configure: WARNING: cuda.h: proceeding with the compiler's result
configure: WARNING: ## ------------------------------------------------ ##
configure: WARNING: ## Report this to ecm-discuss@lists.gforge.inria.fr ##
configure: WARNING: ## ------------------------------------------------ ##
checking for cuda.h... no
configure: error: required header file missing
Makefile:807: recipe for target 'config.status' failed
make: *** [config.status] Error 1
[/code]Further research per ECM Team request showed the following from config.log:
[code]
In file included from conftest.c:127:0:
/usr/local/cuda-10.0/targets/x86_64-linux/include/cuda.h:432:10: warning: ISO C99 doesn't support unnamed structs/unions [-Wpedantic]
};
^
/usr/local/cuda-10.0/targets/x86_64-linux/include/cuda.h:442:10: warning: ISO C99 doesn't support unnamed structs/unions [-Wpedantic]
};
^
configure:15232: $? = 0
configure: failed program was:
| /* confdefs.h */
[/code]

EdH 2020-04-13 16:35

On the off-chance I could solve this simply by adding a name to the unions referenced above, I tried:
[code]
union noname {
[/code]But alas, no joy:
[code]
| #include <cuda.h>
configure:15308: result: no
configure:15308: checking cuda.h presence
configure:15308: x86_64-linux-gnu-gcc -E -I/usr/local/cuda-10.0/targets/x86_64-linux/include -I/usr/local//include -I/usr/local//include conftest.c
configure:15308: $? = 0
configure:15308: result: yes
configure:15308: WARNING: cuda.h: present but cannot be compiled
configure:15308: WARNING: cuda.h: check for missing prerequisite headers?
configure:15308: WARNING: cuda.h: see the Autoconf documentation
configure:15308: WARNING: cuda.h: section "Present But Cannot Be Compiled"
configure:15308: WARNING: cuda.h: proceeding with the compiler's result
configure:15308: checking for cuda.h
configure:15308: result: no
configure:15315: error: required header file missing
[/code]

EdH 2020-04-19 13:59

[QUOTE=EdH;542518]It was recently reported to me that my GMP-ECM-GPU branch instructions for a Colab session no longer work. . . [/QUOTE]GMP-ECM has been updated to revision 3081 and this is now working in my Colab instances.

"Thanks!" go out to the GMP-ECM Team.

RichD 2020-08-09 13:11

I have had mixed results using ECM-GPU on CoLab. Not that CoLab is the problem, it may be the way I am using it. I run sets of 1024 curves at 11e7 on the GPU. Then I transfer the results file to my local system to run step 2. I’ve notice the sigmas are generated consecutively. Is that enough variety or should I break it down and run twice as many sets at 512 curves each?

Running three sets of 1024 curves at 11e7 failed to find a p43. Another run of two sets of 1024 at 11e7 failed to find a p46. Lastly, on the first set of 1024 curves at 11e7 found a p53.

On the GPU I perform:
[CODE]echo <number> | ecm -v -save Cxxx.txt -gpu -gpucurves 1024 11e7[/CODE]

After transferring the 1024 line result file I break it into four pieces using “head” and “tail”. Then each 256 line file is run by:
[CODE]ecm -resume Cxxx[B]y[/B].txt -one 11e7[/CODE]
where y is a suffix from a to d representing the four smaller files.

The p53 may be a lucky hit but the p43 & p46 are a big time miss. Should I run more of the smaller sets to get a better “spread” of sigma?

chris2be8 2020-08-09 16:28

Hello,

Consecutive sigmas should not be a problem. I've been running ECM-GPU on my GPUs for several years and it seems to make no difference.

The only thing I can think of is to check the ranges don't overlap (eg if colab always seeds the RNG with the same values).

If it's not that then just put down to fate. I've had enough lucky hits and unlucky misses to balance out over the long run.

Chris

EdH 2020-09-27 19:47

GMP-ECM Won't Compile for my System - edit: Card is arch 2.0
 
[B]Edit: [/B]I'm leaving this post up in case someone else has this issue, but adding a note that the card I have is apparently rather old and is only arch 2.0. Therefore, I have given up attempting to make it useful.

[strike]GMP-ECM compiles fine without GPU. But, with "--enable-gpu" I get a "memcpy" error:[/strike]
[code]
. . .
ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0]
ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_50'
ptxas info : Function properties for _Z16Cuda_Init_Devicev
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 2 registers, 320 bytes cmem[0]
ptxas info : 384 bytes gmem, 4 bytes cmem[3]
ptxas info : Compiling entry function '_Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j' for 'sm_52'
ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0]
ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_52'
ptxas info : Function properties for _Z16Cuda_Init_Devicev
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 2 registers, 320 bytes cmem[0]
ptxas info : 384 bytes gmem, 4 bytes cmem[3]
ptxas info : Compiling entry function '_Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j' for 'sm_53'
ptxas info : Function properties for _Z15Cuda_Ell_DblAddPA32_VjS1_S1_S1_j
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 12288 bytes smem, 356 bytes cmem[0]
ptxas info : Compiling entry function '_Z16Cuda_Init_Devicev' for 'sm_53'
ptxas info : Function properties for _Z16Cuda_Init_Devicev
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 2 registers, 320 bytes cmem[0]
[B]/usr/include/string.h:[/B] In function [B]'void* __mempcpy_inline(void*, const void*, size_t)'[/B]:
[B]/usr/include/string.h:652:42: [COLOR=Red]error: '[/COLOR]memcpy'[/B] was not declared in this scope
return (char *) memcpy (__dest, __src, __n) + __n;
[COLOR=Lime]^[/COLOR]
Makefile:2558: recipe for target 'cudakernel.lo' failed
make[2]: *** [cudakernel.lo] Error 1
make[2]: Leaving directory '/home/math99/Math/ecmgpu'
Makefile:1897: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/math99/Math/ecmgpu'
Makefile:777: recipe for target 'all' failed
make: *** [all] Error 2
[/code]Here's the "./configure" output, in case that is of help:
[code]
$ ./configure --enable-gpu --with-gmp=/usr/local/
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for a sed that does not truncate output... /bin/sed
checking for CC and CFLAGS in gmp.h... yes CC=gcc CFLAGS=-O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge
checking whether CC=gcc and CFLAGS=-O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge works... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for gcc option to accept ISO C99... none needed
checking dependency style of gcc... gcc3
checking how to print strings... printf
checking for a sed that does not truncate output... (cached) /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... no
checking whether to build static libraries... yes
checking for int64_t... yes
checking for uint64_t... yes
checking for unsigned long long int... yes
checking for long long int... yes
checking for an ANSI C-conforming const... yes
checking for inline... inline
checking whether time.h and sys/time.h may both be included... yes
checking for size_t... yes
checking if assembly code is ELF... yes
checking for suitable m4... m4
checking how to switch to text section... .text
checking how to export a symbol... .globl
checking what assembly label suffix to use... :
checking if globals are prefixed by underscore... no
checking how to switch to text section... (cached) .text
checking how to export a symbol... (cached) .globl
checking for assembler .type directive... .type $1,@$2
checking for working alloca.h... yes
checking for alloca... yes
checking for ANSI C header files... (cached) yes
checking math.h usability... yes
checking math.h presence... yes
checking for math.h... yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking for strings.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking io.h usability... no
checking io.h presence... no
checking for io.h... no
checking signal.h usability... yes
checking signal.h presence... yes
checking for signal.h... yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking for windows.h... no
checking for psapi.h... no
checking ctype.h usability... yes
checking ctype.h presence... yes
checking for ctype.h... yes
checking for sys/types.h... (cached) yes
checking sys/resource.h usability... yes
checking sys/resource.h presence... yes
checking for sys/resource.h... yes
checking aio.h usability... yes
checking aio.h presence... yes
checking for aio.h... yes
checking for working strtod... yes
checking for pow in -lm... yes
checking for floor in -lm... yes
checking for sqrt in -lm... yes
checking for fmod in -lm... yes
checking for cos in -lm... yes
checking for aio_read in -lrt... yes
checking for GetProcessMemoryInfo in -lpsapi... no
checking for isascii... yes
checking for memset... yes
checking for strchr... yes
checking for strlen... yes
checking for strncasecmp... yes
checking for strstr... yes
checking for access... yes
checking for unlink... yes
checking for isspace... yes
checking for isdigit... yes
checking for isxdigit... yes
checking for time... yes
checking for ctime... yes
checking for gethostname... yes
checking for gettimeofday... yes
checking for getrusage... yes
checking for memmove... yes
checking for signal... yes
checking for fcntl... yes
checking for fileno... yes
checking for setvbuf... yes
checking for fallocate... yes
checking for aio_read... yes
checking for aio_init... yes
checking for _fseeki64... no
checking for _ftelli64... no
checking for malloc_usable_size... yes
checking gmp.h usability... yes
checking gmp.h presence... yes
checking for gmp.h... yes
checking for recent GMP... yes
checking if GMP is MPIR... no
checking whether we can link against GMP... yes
checking if gmp.h version and libgmp version are the same... (6.2.0/6.2.0) yes
checking for __gmpn_add_nc... yes
checking for __gmpn_mod_34lsub1... yes
checking for __gmpn_redc_1... yes
checking for __gmpn_redc_2... yes
checking for __gmpn_mullo_n... yes
checking for __gmpn_redc_n... yes
checking for __gmpn_preinv_mod_1... yes
checking for __gmpn_mod_1s_4p_cps... yes
checking for __gmpn_mod_1s_4p... yes
checking for __gmpn_mul_fft... yes
checking for __gmpn_fft_next_size... yes
checking for __gmpn_fft_best_k... yes
checking for __gmpn_mulmod_bnm1... yes
checking for __gmpn_mulmod_bnm1_next_size... yes
checking whether assembler supports --noexecstack option... yes
checking whether compiler knows __attribute__((hot))... yes
checking for xsltproc... no
checking whether self tests are run under valgrind... no
checking cuda.h usability... yes
checking cuda.h presence... yes
checking for cuda.h... yes
checking that CUDA Toolkit version is at least 3.0... (7.5) yes
checking for cuInit in -lcuda... yes
checking that CUDA Toolkit version and runtime version are the same... (7.5/7.5) yes
checking for nvcc... /usr/bin/nvcc
checking for compatibility between gcc and nvcc... yes
checking that nvcc know compute capability 20... yes
checking that nvcc know compute capability 21... yes
checking that nvcc know compute capability 30... yes
checking that nvcc know compute capability 32... yes
checking that nvcc know compute capability 35... yes
checking that nvcc know compute capability 37... yes
checking that nvcc know compute capability 50... yes
checking that nvcc know compute capability 52... yes
checking that nvcc know compute capability 53... yes
checking that nvcc know compute capability 60... no
checking that nvcc know compute capability 61... no
checking that nvcc know compute capability 62... no
checking that nvcc know compute capability 70... no
checking that nvcc know compute capability 72... no
checking that nvcc know compute capability 75... no
checking if nvcc know ptx instruction madc... yes
creating config.m4
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating athlon/Makefile
config.status: creating pentium4/Makefile
config.status: creating x86_64/Makefile
config.status: creating powerpc64/Makefile
config.status: creating aprtcle/Makefile
config.status: creating build.vc12/Makefile
config.status: creating build.vc12/assembler/Makefile
config.status: creating build.vc12/ecm/Makefile
config.status: creating build.vc12/ecm_gpu/Makefile
config.status: creating build.vc12/libecm/Makefile
config.status: creating build.vc12/libecm_gpu/Makefile
config.status: creating build.vc12/tune/Makefile
config.status: creating build.vc12/bench_mulredc/Makefile
config.status: creating config.h
config.status: creating ecm.h
config.status: ecm.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
configure: Configuration:
configure: Build for host type x86_64-pc-linux-gnu
configure: CC=gcc, CFLAGS=-W -Wall -Wundef -O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge -DWITH_GPU -DECM_GPU_CURVES_BY_BLOCK=16
configure: Linking GMP with /usr/local//lib/libgmp.a
configure: Using asm redc code from directory x86_64
configure: Not using SSE2 instructions in NTT code
configure: Using APRCL to prove factors prime/composite
configure: Assertions enabled
configure: Shell command execution disabled
configure: OpenMP disabled
configure: Memory debugging disabled
[/code][strike]Any assistance would be appreciated.[/strike]
Edit: Assistance no longer requested.

storm5510 2020-10-10 15:32

I have two different variations of [I]GMP-ECM[/I]. Below are the titles each displays when they are fist started:

[QUOTE]GMP-ECM 7.0.4-dev [configured with MPIR 2.7.2, --enable-gpu, --enable-openmp] [ECM]

GMP-ECM 7.0.5-dev [configured with GMP 6.1.2, --enable-asm-redc] [ECM][/QUOTE] The compiled binaries are vastly different in size. The top is 761K bytes. The bottom is 4,396K bytes. I have two machines with the same OS, Windows 7 Pro. One is an i5 and the other is a Xeon. The top runs on both. The bottom will not run on the Xeon. Both will run on my i7 with Windows 10 Pro. I stick with the one which runs on everything.

The GPU functions in the top, I do not bother with. Only going to 2^1014 would not help anything that may go beyond.

It appears that development on [I]GMP-ECM[/I] has stalled. If this is true, then it is very sad.

:two cents:

kruoli 2020-10-13 09:36

[QUOTE=storm5510;559439]It appears that development on [I]GMP-ECM[/I] has stalled. If this is true, then it is very sad.[/QUOTE]

There were seven commits in the last month: [URL="http://gforge.inria.fr/activity/?group_id=135"]http://gforge.inria.fr/activity/?group_id=135[/URL].

kriesel 2020-10-13 15:16

[QUOTE=kruoli;559727]There were seven commits in the last month: [URL]http://gforge.inria.fr/activity/?group_id=135[/URL].[/QUOTE]
At a glance, those appear to me to be maintenance, not development.

mathwiz 2020-10-13 19:27

[QUOTE=kriesel;559752]At a glance, those appear to me to be maintenance, not development.[/QUOTE]

It would be great to get an official comment from the primary developers about what future development, if any, is planned.

SethTro 2020-10-14 00:48

the repository was moved to [url]https://gitlab.inria.fr/zimmerma/ecm[/url]

as gforge.inria.fr is being deprecated.

I believe it is still being actively developed

storm5510 2020-10-14 14:21

I assume if anything major comes along, it will be posted here? :question:

Stargate38 2020-12-25 01:15

I want to run GPU-ECM on Windows, but I haven't found any precompiled binaries. Anyone willing to give me compiling instructions for Windows, preferably using the Linux subsystem, or know where I can find existing binaries? I don't have Visual Studio (not even the Community edition, as I don't want a MS account connected to my PC, mostly for privacy reasons).

VBCurtis 2020-12-25 02:09

It's built-in to any copy of GMP-ECM 7. Check the "news" file within an ECM download.
Do you mean there are no binaries available for GMP-ECM 7.0.1 or newer?

Stargate38 2020-12-26 21:39

Oops. I didn't pay any attention to that. I guess I should RTM before asking any questions.

EdH 2020-12-29 16:45

Just to brag a bit: I actually had a successful factoring session with Colab using the GPU branch of GMP-ECM for stage 1 and a local machine for stage 2.

It was only a 146 digit number and it took quite a while, but still, it worked!

Colab connected me to a T4 which gave me 2560 cores on which I ran stage 1, with the -save option. The local machine "watched" for the residue file, using the tunneling setup by chalsall, described elsewhere. The local machine used ecm.py by WraithX to run stage 2.

A minor session, but it proved the concept.:smile:

EdH 2022-03-06 14:52

GMP-ECM has the option [C]-one[/C] to tell ECM to stop after the first factor is found. But, when running a GPU, stage 2 is performed on all the residues from stage 1, instead of stopping when a factor is found. Since GMP-ECM seems to still be single* threaded, with lots of cores, it takes a lot longer than it needs to. I can use external separate programs, such as ecm.py, but my scripts would be even more complicated.

Any help?

*I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations?

kruoli 2022-03-07 12:23

For P-1 stage 2, GMP-ECM can be configured to use OpenMP. Everything else is single threaded.

EdH 2022-03-07 14:10

[QUOTE=kruoli;601259]For P-1 stage 2, GMP-ECM can be configured to use OpenMP. Everything else is single threaded.[/QUOTE]
Thanks Oliver! Maybe that's what I had seen and my memory is a bit foggy.

SethTro 2022-03-07 18:01

There's also some code under [URL="https://gitlab.inria.fr/zimmerma/ecm/-/blob/master/multiecm.c"]multiecm.c[/URL] in gmp-ecm. I've never used it (I prefer Wraith's ecm.py), the header is

[CODE]
/* multiecm.c - ECM with many curves with many torsion and/or in parallel
Author: F. Moraino
*/[/CODE]

But it doesn't look like it's been worked on in 9 years.

EdH 2022-03-07 18:40

[QUOTE=SethTro;601284]There's also some code under [URL="https://gitlab.inria.fr/zimmerma/ecm/-/blob/master/multiecm.c"]multiecm.c[/URL] in gmp-ecm. I've never used it (I prefer Wraith's ecm.py), the header is

[CODE]
/* multiecm.c - ECM with many curves with many torsion and/or in parallel
Author: F. Moraino
*/[/CODE]But it doesn't look like it's been worked on in 9 years.[/QUOTE]Thanks! I also use ecm.py for stage 2 external work, but it gets pretty complicated in my scripts.

chris2be8 2022-03-08 16:47

[QUOTE=EdH;601201]
*I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations?[/QUOTE]

From NEWS:
[quote]
Changes between GMP-ECM 6.4.4 and GMP-ECM 7.0:
* GMP-ECM is now thread-safe. In particular the "ecmfactor" binary can be
called with say -t 17 to use 17 threads.
[/quote]

I think I looked at it once but didn't find it useful. I don't think it can be used for doing stage 2 after a GPU has done stage 1.

EdH 2022-03-08 17:14

That seems familiar! I'm sure that's what I was thinking of. Thanks for finding it!

My next issue is what you reference. I'm currently sending residues to a second machine while tasking the GPU machine with the next level of B1. But, if stage 2 is successful on the second machine, I still need to wait for the GPU to finish its current B1. I've tried [C]pkill ecm[/C], but it doesn't seem to do anything at the call.

Gimarel 2022-03-08 17:33

[QUOTE=EdH;601331]I've tried [C]pkill ecm[/C], but it doesn't seem to do anything at the call.[/QUOTE]
Try [C]pkill -1 ecm[/C].

EdH 2022-03-08 18:31

[QUOTE=Gimarel;601334]Try [C]pkill -1 ecm[/C].[/QUOTE]I believe that is working. Thanks!

SethTro 2022-03-17 08:42

[QUOTE=EdH;601201]GMP-ECM has the option [C]-one[/C] to tell ECM to stop after the first factor is found. But, when running a GPU, stage 2 is performed on all the residues from stage 1, instead of stopping when a factor is found. Since GMP-ECM seems to still be single* threaded, with lots of cores, it takes a lot longer than it needs to. I can use external separate programs, such as ecm.py, but my scripts would be even more complicated.

Any help?

*I had thought at some point, that GMP-ECM introduced multi-threading, but I can't find anything about it. Memory fluctuations?[/QUOTE]

I'm happy to look at this as a bug. I vaguely remember that I wasn't sure if I should always stop or only if the cofactor is composite.

EdH 2022-03-17 13:15

[QUOTE=SethTro;601925]I'm happy to look at this as a bug. I vaguely remember that I wasn't sure if I should always stop or only if the cofactor is composite.[/QUOTE]Thanks, but I've moved to using a separate machine with ecm.py and it is working as needed.

I'd leave as is unless others are interested. For me it is no longer an issue.

Again, thank you for considering it.

storm5510 2022-05-24 17:34

There seems to be an oddity about using a GPU in GMP-ECM.

The [C]--help[/C] switch shows the maximum size of a B1 value is 2^1018. Using Windows Calculator to calculate log(2) * 1018 says this is a 308 digit value.

If I run this:

[CODE]echo "2^14447-1" | gpuecm -gpu -maxmem 2048 5e6 10e6[/CODE]I get this message:

[QUOTE]GMP-ECM 7.0.4-dev [configured with MPIR 2.7.2, --enable-gpu, --enable-openmp] [ECM]
Input number is 2^14447-1 (4349 digits)
GPU: Error, input number should be strictly lower than 2^1018
please report internal errors at <ecm-discuss@lists.gforge.inria.fr>.
[/QUOTE]B1 in my example is 5e6 or 5,000,000. This is obviously smaller than a 308 digit number.

I am missing something here and I cannot determine what it is.

paulunderwood 2022-05-24 17:43

[QUOTE=storm5510;606417]There seems to be an oddity about using a GPU in GMP-ECM.

The [C]--help[/C] switch shows the maximum size of a B1 value is 2^1018. Using Windows Calculator to calculate log(2) * 1018 says this is a 308 digit value.

If I run this:

[CODE]echo "2^14447-1" | gpuecm -gpu -maxmem 2048 5e6 10e6[/CODE]I get this message:

B1 in my example is 5e6 or 5,000,000. This is obviously smaller than a 308 digit number.

I am missing something here and I cannot determine what it is.[/QUOTE]

You are missing something. The inputted number is too big. You specified 2^14447-1. The max is 2^1018. :ermm:

SethTro 2022-05-24 17:52

If you are willing to compile gmp-ecm from source you can test larger input numbers (it's also faster).

See [url]https://www.mersenneforum.org/showthread.php?t=27103[/url]

storm5510 2022-05-25 13:52

[QUOTE=paulunderwood;606419]You are missing something. The inputted number is too big. You specified 2^14447-1. The max is 2^1018. :ermm:[/QUOTE]

Understood. IMHO, it is not good for much of anything at this size.

kruoli 2023-02-20 18:54

Is there any known progress in implementing stage 1 in (at least somewhat optimised) OpenCL up to 512 or maybe 1024 bits?

SethTro 2023-02-21 06:49

[QUOTE=kruoli;625295]Is there any known progress in implementing stage 1 in (at least somewhat optimised) OpenCL up to 512 or maybe 1024 bits?[/QUOTE]

I don't know of anyone working on this. I did some quick google searches and didn't find any OpenCl arbitrary precision libraries. The old ECM gpu code was fairly straightforward grade school multiplication algorithm (IIRC) that could be adapted quickly for OpenCl, but if you/someone wanted competitive speeds from OpenCl more code would be needed for montgomery multiplication, various size kernels, etc.

storm5510 2023-02-21 20:07

[QUOTE=SethTro;625329]I don't know of anyone working on this. I did some quick google searches and didn't find any OpenCl arbitrary precision libraries. The old ECM gpu code was fairly straightforward grade school multiplication algorithm (IIRC) that could be adapted quickly for OpenCl, but if you/someone wanted competitive speeds from OpenCl more code would be needed for montgomery multiplication, various size kernels, etc.[/QUOTE]

AFAIK, nothing meaningful has been done with [I]GMP-ECM[/I] for a long time. I stopped using it several years ago. It was not worth the time required to run anything.

[I]OpenCL[/I] appears not to work a GPU very hard. I have used [I]gpuOwl[/I] a lot, so I know. It does a good job, but there is a limit to its GPU utilization. Around 60% seems to be its limit.

kriesel 2023-02-21 20:26

[QUOTE=storm5510;625373][I]OpenCL[/I] appears not to work a GPU very hard. I have used [I]gpuOwl[/I] a lot, so I know. It does a good job, but there is a limit to its GPU utilization. Around [B]60%[/B] seems to be its limit.[/QUOTE]That statement is not consistent with my experience. I routinely see 99-100% GPU load indicated in GPU-Z for Gpuowl running on Radeon VII, RX480, or RX550.

storm5510 2023-02-22 01:12

[QUOTE=kriesel;625374]That statement is not consistent with my experience. I routinely see 99-100% GPU load indicated in GPU-Z for Gpuowl running on Radeon VII, RX480, or RX550.[/QUOTE]

I have no experience with [I]AMD[/I] GPU's. All of mine, past and present, are [I]Nvidia[/I] based. Something tells me that [I]AMD[/I] variations are much better suited to run OpenCL than [I]Nvidia[/I] models are. CUDA is much better for [I]Nvidia[/I], it seems. [I]mfaktc[/I] cranks out over 3,000 Ghz-d/day on my ever-older RTX-2080. I am satisfied with that.

kriesel 2023-02-22 01:27

NVIDIA quadro k620, gpuowl v6.11-382, gpu-Z v2.52.0, indicated gpu load [B]100%[/B].
Most recent NVIDIA consumer GPUs have poor DP performance (PRP, LL, P-1) relative to SP performance (TF), while Teslas and AMD typically have decent DP performance ratios. I almost never use RTX2080 or GTX 1650 for anything other than TF because of the 32:1 SP/DP ratio.

storm5510 2023-02-22 01:39

[QUOTE=kriesel;625383]NVIDIA quadro k620, gpuowl v6.11-382, gpu-Z v2.52.0, indicated gpu load [B]100%[/B].
Most recent NVIDIA consumer GPUs have poor DP performance (PRP, LL, P-1) relative to SP performance (TF), while Teslas and AMD typically have decent DP performance ratios. I almost never use RTX2080 or GTX 1650 for anything other than TF because of the 32:1 SP/DP ratio.[/QUOTE]

Well, a lot of what is above sails over my head. I will take your word for it. I go by what I see here...

VBCurtis 2023-02-26 03:23

A math-liking friend just picked up a 4090. If I were to talk him into GPU-ECM runs, how many curves will it test per run?

SethTro 2023-02-26 07:53

Per [url]https://www.mersenneforum.org/showpost.php?p=601139&postcount=109[/url]

Number of curves is strongly related to number of shader units; 1x, 2x, or 1/2x is generally optimal (you can measure with ./gpu_throughput_test.sh). My 1080ti (3584 shaders) prefers 1792 curves (1 curve / 2 shaders). My 970 (1664 shaders) prefers 832 curves.

I would expect the 4090 with 16384 shaders to be best with 8192 or 16384 curves at the same time.


All times are UTC. The time now is 04:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.