mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   Faster GPU-ECM with CGBN (https://www.mersenneforum.org/showthread.php?t=27103)

SethTro 2021-09-05 07:01

[QUOTE=chris2be8;587296]That fails:
[code]
And 'git pull' does nothing:
[code]
chris@4core:~/CGBN> git pull
Already up to date.
[/code]Unless I'm not using it correctly.[/QUOTE]


Ignore this, but for completion sake you can probably clone my copy of CGBN with `git clone -b cgbn_swap https://github.com/sethtroisi/CGBN.git`

The top entry from `git log` should be

[CODE]
commit 1595e543801bcbffd2c36cbf978baff843c09876 (HEAD -> gpu_integration, origin/gpu_integration)
Author: Seth Troisi <sethtroisi@google.com>
Date: Sat Sep 4 20:26:30 2021 -0700

reverted the cgbn_swap change till that is accepted

[/CODE]
If so you should be able to build. If it's not try `git fetch` then `git pull origin gpu_integration`

chris2be8 2021-09-05 15:44

I'm still stuck. I re-downloaded everything from scratch and re-ran autoreconf -si, ./configure and make. But make still fails
[code]
...
libtool: link: ( cd ".libs" && rm -f "libecm.la" && ln -s "../libecm.la" "libecm.la" )
/bin/sh ./libtool --tag=CC --mode=link gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -R /usr/local/cuda/lib64 -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o libecm.la -lgmp -lrt -lm -lm -lm -lm -lm
libtool: link: gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o ./.libs/libecm.a -L/usr/local/cuda/lib64 -lcudart -lgmp -lrt -lm -Wl,-rpath -Wl,/usr/local/cuda/lib64
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `cgbn_ecm_stage1':
tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x8b3): undefined reference to `operator delete(void*)'
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x196e): undefined reference to `operator delete(void*)'
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int>(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >, unsigned int&&)':
tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0x50): undefined reference to `operator new(unsigned long)'
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0xc8): undefined reference to `operator delete(void*)'
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:973: ecm] Error 1
make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm'
make[1]: *** [Makefile:1903: all-recursive] Error 1
make[1]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm'
make: *** [Makefile:783: all] Error 2
[/code]

Any ideas?

paulunderwood 2021-09-05 15:59

Did you install with YaST the dev package of libstdc++?

chris2be8 2021-09-05 18:38

Success!

The vital bit of info came from putting "__gxx_personality_v0" into duckduckgo. That told me it's provided by libstdc++ which is the g++ runtime. After installing gcc9-g++ and its run time libstdc++6-devel-gcc9 everything works.

This has been an educational experience. Next step is to benchmark cgbn on my GPU.

chris2be8 2021-09-06 16:02

Benchmark results:
[code]
chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -gpucurves 3584 -sigma 3:1000 20000 0;date
Sun 5 Sep 19:42:42 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 1024
GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes
GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves)
Computing 3584 Step 1 took 190ms of CPU time / 20427ms of GPU time
Sun 5 Sep 19:43:03 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0;date
Sun 5 Sep 19:43:29 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^499-1)/20959 (146 digits)
Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 640
GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes
Computing 3584 Step 1 took 30ms of CPU time / 3644ms of GPU time
Sun 5 Sep 19:43:33 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0;date
Sun 5 Sep 19:44:25 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 1024
GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes
GPU: Block: 32x32x1 Grid: 26x1x1 (832 parallel curves)
Computing 832 Step 1 took 188ms of CPU time / 4552ms of GPU time
Sun 5 Sep 19:44:30 BST 2021

chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0;date
Sun 5 Sep 19:44:41 BST 2021
GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is (2^997-1) (301 digits)
Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves)
GPU: Using device code targeted for architecture compile_52
GPU: Ptx version is 52
GPU: maxThreadsPerBlock = 640
GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes
Computing 832 Step 1 took 8ms of CPU time / 1995ms of GPU time
Sun 5 Sep 19:44:44 BST 2021
[/code]

So about 5 times faster for (2^499-1)/20959 and about twice as fast for 2^997-1. But these are all small cases.

But my overall throughput won't increase much because my CPU can't do stage 2 as fast as the GPU can do stage 1 now. But that's not your fault. And any speedup is nice. Thanks.


Other lessons learnt:
autoreconf -si creates symlinks to missing files while autoreconf -i copies them. Using -si saves space, but if you upgrade to a new level of automake you can get hanging symlinks:
[code]
lrwxrwxrwx 1 chris users 32 Nov 12 2015 INSTALL -> /usr/share/automake-1.13/INSTALL
lrwxrwxrwx 1 chris users 35 Nov 12 2015 ltmain.sh -> /usr/share/libtool/config/ltmain.sh
[/code]
They needed updating to:
[code]
lrwxrwxrwx 1 chris users 32 Sep 4 19:20 INSTALL -> /usr/share/automake-1.15/INSTALL
lrwxrwxrwx 1 chris users 38 Sep 4 19:20 ltmain.sh -> /usr/share/libtool/build-aux/ltmain.sh
[/code]
Not a common issue though.


And suggestions for the install process:
INSTALL-ecm should tell users to run autoreconf -i (or -si) before running ./configure (which is created by autoreconf -i).

./configure compiles several small programs and runs them to check things. If the compile fails it should put out a message saying the compile failed, not one saying it found different levels of run time library etc. If the compile normally produces no output then letting any output it does produce go to the screen would be informative (eg when it can't find -lstdc++).

Chris

SethTro 2021-09-07 06:25

[QUOTE=chris2be8;587337]Success![/QUOTE]

I'm glad we finally got here!

2.2x speedup for the 1024 bit case is almost exactly what everyone else is seeing (except bsquared maybe because newer card?).

You can often improve overall throughput by adjust to 1.2*B1 and 1/2*B2 (and checking that expected curves stays roughly the same). This can especially help if Stage 1 time < Stage 2 time / cores.

I'll reflect on your notes and see if I can improve the documentation / configure script.

chris2be8 2021-09-07 15:41

[QUOTE=SethTro;587429]
I'll reflect on your notes and see if I can improve the documentation / configure script.[/QUOTE]

How about updating INSTALL-ecm like this:
[code]
diff -u INSTALL-ecm INSTALL-ecm.new
--- INSTALL-ecm 2021-09-05 12:13:55.613439408 +0100
+++ INSTALL-ecm.new 2021-09-07 16:37:42.903291304 +0100
@@ -19,6 +19,7 @@

1) check your configuration with:

+ $ autoreconf -i
$ ./configure

The configure script accepts several options (see ./configure --help).
[/code]

That's a minimum change to get new users started.

WraithX 2021-09-07 18:08

[QUOTE=chris2be8;587449]How about updating INSTALL-ecm like this:
[code]
diff -u INSTALL-ecm INSTALL-ecm.new
--- INSTALL-ecm 2021-09-05 12:13:55.613439408 +0100
+++ INSTALL-ecm.new 2021-09-07 16:37:42.903291304 +0100
@@ -19,6 +19,7 @@

1) check your configuration with:

+ $ autoreconf -i
$ ./configure

The configure script accepts several options (see ./configure --help).
[/code]

That's a minimum change to get new users started.[/QUOTE]

That document describes what users should do when they have downloaded an official release. When building an official release, you do not need to run [C]autoreconf -i[/C]. You only need to run [C]autoreconf -i[/C] when you download a development version with git or svn. I don't think adding [C]autoreconf -i[/C] to this document is a good idea.

Looking at the various documents, I see that [C]README.dev[/C] has the advice of running [C]autoreconf -i[/C].

chris2be8 2021-09-08 15:30

How about having INSTALL-ecm tell users to run [c]autoreconf -i[/c] if they don't have a ./configure in the directory?

And if people get an official release would the files that would be created by autoreconf -i be correct for their OS etc?

EdH 2021-09-08 15:41

@Chris: Did you get your sm_30 card working or just the higher arch one?

chris2be8 2021-09-09 15:40

Just the higher arch one (sm_52). Sorry.

PS. Does CGBN increase the maximum size of number that can be handled? I'd try it, but I'm tied up catching up with ECM work I delayed while I was getting ecm-cgbn working.


All times are UTC. The time now is 02:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.