View Single Post
Old 2021-09-06, 16:02   #82
chris2be8
 
chris2be8's Avatar
 
Sep 2009

89416 Posts
Default

Benchmark results:
Code:
chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -gpucurves 3584 -sigma 3:1000 20000 0;date
Sun  5 Sep 19:42:42 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 1024
GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes
GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves)
Computing 3584 Step 1 took 190ms of CPU time / 20427ms of GPU time
Sun  5 Sep 19:43:03 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0;date
Sun  5 Sep 19:43:29 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 640
GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes
Computing 3584 Step 1 took 30ms of CPU time / 3644ms of GPU time
Sun  5 Sep 19:43:33 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0;date
Sun  5 Sep 19:44:25 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 1024
GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes
GPU: Block: 32x32x1 Grid: 26x1x1 (832 parallel curves)
Computing 832 Step 1 took 188ms of CPU time / 4552ms of GPU time
Sun  5 Sep 19:44:30 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0;date
Sun  5 Sep 19:44:41 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 640
GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes
Computing 832 Step 1 took 8ms of CPU time / 1995ms of GPU time
Sun  5 Sep 19:44:44 BST 2021
So about 5 times faster for (2^499-1)/20959 and about twice as fast for 2^997-1. But these are all small cases.

But my overall throughput won't increase much because my CPU can't do stage 2 as fast as the GPU can do stage 1 now. But that's not your fault. And any speedup is nice. Thanks.


Other lessons learnt:
autoreconf -si creates symlinks to missing files while autoreconf -i copies them. Using -si saves space, but if you upgrade to a new level of automake you can get hanging symlinks:
Code:
lrwxrwxrwx 1 chris users 32 Nov 12  2015 INSTALL -> /usr/share/automake-1.13/INSTALL
lrwxrwxrwx 1 chris users 35 Nov 12  2015 ltmain.sh -> /usr/share/libtool/config/ltmain.sh
They needed updating to:
Code:
lrwxrwxrwx 1 chris users 32 Sep  4 19:20 INSTALL -> /usr/share/automake-1.15/INSTALL
lrwxrwxrwx 1 chris users 38 Sep  4 19:20 ltmain.sh -> /usr/share/libtool/build-aux/ltmain.sh
Not a common issue though.


And suggestions for the install process:
INSTALL-ecm should tell users to run autoreconf -i (or -si) before running ./configure (which is created by autoreconf -i).

./configure compiles several small programs and runs them to check things. If the compile fails it should put out a message saying the compile failed, not one saying it found different levels of run time library etc. If the compile normally produces no output then letting any output it does produce go to the screen would be informative (eg when it can't find -lstdc++).

Chris
chris2be8 is offline   Reply With Quote