mersenneforum.org Faster GPU-ECM with CGBN
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2021-09-01, 18:17 #45 chris2be8     Sep 2009 42008 Posts gcc --version returns: Code: gcc (SUSE Linux) 4.8.5 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. zypper search gcc shows it as gcc48 and says gcc5 and gcc6 could also be installed. I've installed clang as well: Code: clang --version clang version 3.8.0 (tags/RELEASE_380/final 262553) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin but that gets a different error: Code: ./configure --enable-gpu=50 --with-cuda=/usr/local/cuda --with-cuda-compiler=clang CC=clang ... configure: Using nvcc compiler from from /usr/local/cuda/bin checking for compatibility between gcc and nvcc... no configure: error: gcc version is not compatible with nvcc I don't think my problems are due to openSUSE. So if someone who has ecm with cgbn working on any Linux distro could say what version of CUDA and what compiler version they have I could probably get it working. Last fiddled with by chris2be8 on 2021-09-01 at 18:19 Reason: Specify Linux
2021-09-01, 18:54   #46
SethTro

"Seth"
Apr 2019

5548 Posts

Quote:
 Originally Posted by bsquared I re-cloned the gpu_integration branch to capture the latest changes and went through the build process with the following caveats: specifying --with-gmp together with --with-cgbn-include doesn't work. I had to use the system default gmp (6.0.0). With compute 70 I still have to replace __any with __any_sync(__activemask() on line 10 of cude_kernel_default.cu building with gcc I get this error in cgbn_stage1.cu: cgbn_stage1.cu(654): error: initialization with "{...}" is not allowed for object of type "const std::vector>" I suppose I need to build with g++ instead?
I rebased the branch to cleanup the git history. so everyone will likely need to git pull and git reset --hard origin/gpu_integration. I'm sorry, but also we're in development and everything is nicer now to review.

I fixed the vector initialize issue and have included your "__any_sync(__activemask()" fix in the repo (I forgot to credit you in the commit but I'll try and do that the next time I rebase).

I'm not sure why --with-gmp doesn't work with --with-cgbn-include if you have some sense of why I'm happy to try and fix.
If it's failing on "checking if CGBN is present..." maybe try adding more flags to acinclude.m4:617 [-I$cgbn_include$GMPLIB], maybe "-I$with_gmp_include" and or "-L$with_gmp_lib"

2021-09-01, 19:00   #47
SethTro

"Seth"
Apr 2019

22·7·13 Posts

Quote:
 Originally Posted by EdH I've passed the frustration point with my systems. I was getting the same with my Ubuntu 20.04 with all the 10.x and 11.x CUDA versions (my card isn't supported by CUDA 11.x, anyway). I installed and made default several older gcc versions (8, 9, 10).* I gave up for now. * I'm curious about the gcc version numer difference between yours and mine. The default Ubuntu 20.04 gcc is 9.3.0, my Debian Buster is 8.3.0, and the default for my Fedora 33 is 10.3.1. Is your version actually that old compared to mine?
I know that feeling and I really empathize. I'm building on the pile of cludge that is cuda and I wish I could make this easier.

did you try with CC=gcc-9? I can also maybe add some debug to the configure log to show which CC it's using.

I personally use this to configure
Code:
./configure --enable-gpu=61 --with-cuda=/usr/local/cuda CC=gcc-9 -with-cgbn-include=/home/five/Projects/CGBN/include/cgbn
and my gcc / nvcc versions
Code:
$gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
If you tell me what compute / sm_arch your card is I can try building and sending you a binary.

Last fiddled with by SethTro on 2021-09-01 at 19:03

 2021-09-01, 19:25 #48 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 25×53 Posts In my case, everything except ECM and Msieve seemed to be working, but I've uninstalled everything now and I thought from a few posts ago that my arch 3.0 was perhaps too ancient, 3.5 being necessary. ATM, updates, etc. are also giving me errors, so I was going step back for a bit. I've uninstalled all the CUDA, NVIDIA, etc. from the system. In its latest iteration, although I had installed CUDA 10.2, nvcc and nvidia-smi claimed to be running CUDA 11, which does not support architecture 3.0. I'll try another installation some time soon and then see where it stalls. If I can't get ECM to build for GPU with my card, there is no point trying to add in cgbn, is there? Thanks!
2021-09-01, 19:26   #49
xilman
Bamboozled!

"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

1095010 Posts

Quote:
 Originally Posted by EdH I've passed the frustration point with my systems. I was getting the same with my Ubuntu 20.04 with all the 10.x and 11.x CUDA versions (my card isn't supported by CUDA 11.x, anyway). I installed and made default several older gcc versions (8, 9, 10).* I gave up for now. * I'm curious about the gcc version numer difference between yours and mine. The default Ubuntu 20.04 gcc is 9.3.0, my Debian Buster is 8.3.0, and the default for my Fedora 33 is 10.3.1. Is your version actually that old compared to mine?
Which is why I would love for someone to make a fully static Linux executable for a relatively low SM value.

OK, it would not be as fast as the latest and greatest but at least it would be much faster than a purely cpu version.

I'd do it myself but haven't been able to compile with CUDA for far too long now.

 2021-09-01, 20:22 #50 chris2be8     Sep 2009 1000100000002 Posts And I've been having "fun" with msieve's CUDA support. The version I had been running failed saying [sort_engine.cu, 95] sort engine: (CUDA error 78: a PTX JIT compilation failed) (probably because compiled with and old version of CUDA. So I decided to install the latest version of msieve, revision 1043. Which also failed with a message saying "file not found" but of course didn't say *which* file it could not find. After a lot of puzzling I found revision 1043 notes the card is compute architecture 5.2 and tries to load stage1_core_sm52.ptx. But the Makefile as shipped is only set up to build ptx files for sm20, sm30, sm35 and sm50. So you are out of luck with any other architecture. I hacked the Makefile, first to remove sm20 which CUDA 9.0 doesn't support, then to add sm52 once I realised that was missing. The makefile probably should build ptx files for all of this list: Code: ~/msieve.1043/trunk> strings msieve | grep ptx stage1_core_sm20.ptx stage1_core_sm30.ptx stage1_core_sm35.ptx stage1_core_sm50.ptx stage1_core_sm52.ptx stage1_core_sm61.ptx stage1_core_sm70.ptx stage1_core_sm75.ptx stage1_core_sm86.ptx If I hadn't knows of the strings command I would have been stuck.
2021-09-01, 20:26   #51
SethTro

"Seth"
Apr 2019

22×7×13 Posts

Quote:
 Originally Posted by bsquared Anyway I can get past all of that and get a working binary and the cpu usage is now much lower. But now the gpu portion appears to be about 15% slower? Before: Code: Input number is 2^997-1 (301 digits) Computing 5120 Step 1 took 75571ms of CPU time / 129206ms of GPU time Throughput: 39.627 curves per second (on average 25.24ms per Step 1) New clone: Code: Input number is 2^997-1 (301 digits) Computing 5120 Step 1 took 643ms of CPU time / 149713ms of GPU time Throughput: 34.199 curves per second (on average 29.24ms per Step 1) Anyone else seeing this?
Can you try running with -v --gpucurves 1280 and --gpucurves 2560 (if you are having fun you can also try 640 and 1792)?
The new code should give you approximate timings quite quickly so no need to complete a full run.

I have seen 2x and 4x slowdowns when gpucurves is large. I may need to put in some code that searches for optimal throughput.

2021-09-01, 20:30   #52
SethTro

"Seth"
Apr 2019

22×7×13 Posts

Quote:
 Originally Posted by xilman Which is why I would love for someone to make a fully static Linux executable for a relatively low SM value. OK, it would not be as fast as the latest and greatest but at least it would be much faster than a purely cpu version. I'd do it myself but haven't been able to compile with CUDA for far too long now.
I don't know how static linking works especially with respect to CUDA but I compilled ecm with all supported SM (including sm35 and sm70) using CUDA 11.2. Feel free to try it, but I wouldn't be to hopeful. It doesn't run in colab and gives an error
./ecm_cgbn_cuda11_2: /lib/x86_64-linux-gnu/libm.so.6: version GLIBC_2.29' not found (required by ./ecm_cgbn_cuda11_2)

https://static.cloudygo.com/static/ecm_cgbn_cuda11_2

^ I pinky-promise this isn't a virus

Last fiddled with by SethTro on 2021-09-01 at 20:35

 2021-09-01, 21:10 #53 frmky     Jul 2003 So Cal 22·547 Posts cudacommon.h is missing from the git repository.
2021-09-01, 21:11   #54
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

34·73 Posts

Quote:
 Originally Posted by chris2be8 gcc --version returns: Code: gcc (SUSE Linux) 4.8.5 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. zypper search gcc shows it as gcc48 and says gcc5 and gcc6 could also be installed.

My guess is that your gcc version may be too old. I would try the most recent version you can get your hands on. The easiest way may be to update your OS into a version that isn't end of life.

Last fiddled with by henryzz on 2021-09-01 at 21:11

2021-09-01, 21:20   #55
bsquared

"Ben"
Feb 2007

1101111110102 Posts

Quote:
 Originally Posted by SethTro Can you try running with -v --gpucurves 1280 and --gpucurves 2560` (if you are having fun you can also try 640 and 1792)? The new code should give you approximate timings quite quickly so no need to complete a full run. I have seen 2x and 4x slowdowns when gpucurves is large. I may need to put in some code that searches for optimal throughput.
1280: (~31 ms/curves)
2560: (~21 ms/curves)
640: (~63 ms/curves)
1792: (~36 ms/curves)

So we have a winner! -gpucurves 2560 beats all the others and anything the old build could do as well (best on the old build was 5120 @ (~25 ms/curves))

With the smaller kernel (running (2^499-1) / 20959), -gpucurves 5120 is fastest at about 6ms/curve on both new and old builds.

 Similar Threads Thread Thread Starter Forum Replies Last Post moytrage Software 50 2021-07-21 05:55 indomit Information & Answers 4 2020-10-07 10:50 paulunderwood Miscellaneous Math 13 2016-08-02 00:05 lidocorc Software 2 2008-11-08 09:26 clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 04:31.

Mon Oct 18 04:31:47 UTC 2021 up 86 days, 23 hrs, 0 users, load averages: 0.59, 0.92, 1.10