mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2021-09-04, 17:08   #67
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2×7×157 Posts
Default

Thanks, but I've already tried that:
Code:
4core:/etc/modprobe.d # cat 60-blacklist.nouveau.conf
blacklist nouveau
options nouveau modeset=0
And it is in the current initramfs:
Code:
4core:/etc/modprobe.d # lsinitrd -f /etc/modprobe.d/60-blacklist.nouveau.conf 
blacklist nouveau
options nouveau modeset=0
lsmod doesn't show any nvidia kernel modules:
Code:
4core:/etc/modprobe.d # lsmod | grep -i nvidia
4core:/etc/modprobe.d #
On my system where CUDA (but not cgbn) works:
Code:
root@sirius:~# lsmod | grep nvidia
nvidia_uvm            876544  0
nvidia_drm             49152  5
nvidia_modeset       1122304  14 nvidia_drm
nvidia              19517440  682 nvidia_uvm,nvidia_modeset
drm_kms_helper        180224  1 nvidia_drm
drm                   483328  8 drm_kms_helper,nvidia_drm
ipmi_msghandler       102400  2 ipmi_devintf,nvidia
chris2be8 is offline   Reply With Quote
Old 2021-09-04, 17:20   #68
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×787 Posts
Default

Did you install through Yast or a direct download from nVidia?
paulunderwood is online now   Reply With Quote
Old 2021-09-04, 17:26   #69
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2×7×157 Posts
Default

I've already tried that:
Code:
4core:/etc/modprobe.d # cat 60-blacklist.nouveau.conf
blacklist nouveau
options nouveau modeset=0
And it is in initrd:
Code:
4core:/etc/modprobe.d # lsinitrd -f /etc/modprobe.d/60-blacklist.nouveau.conf 
blacklist nouveau
options nouveau modeset=0
Digging a bit further I don't think the nvidia kernel modules are correctly installed:
Code:
4core:/lib/modules # find . -name 'nvidia*'
./4.12.14-lp150.12.82-default/updates/nvidia-uvm.ko
./4.12.14-lp150.12.82-default/updates/nvidia-modeset.ko
./4.12.14-lp150.12.82-default/updates/nvidia.ko
./4.12.14-lp150.12.82-default/updates/nvidia-drm.ko
./5.3.18-57-default/weak-updates/updates/nvidia-uvm.ko
./5.3.18-57-default/weak-updates/updates/nvidia-modeset.ko
./5.3.18-57-default/weak-updates/updates/nvidia.ko
./5.3.18-57-default/weak-updates/updates/nvidia-drm.ko
./5.3.18-57-default/kernel/drivers/net/ethernet/nvidia
./5.3.18-57-preempt/kernel/drivers/net/ethernet/nvidia
./5.3.18-59.19-preempt/kernel/drivers/net/ethernet/nvidia
./5.3.18-59.19-default/weak-updates/updates/nvidia-uvm.ko
./5.3.18-59.19-default/weak-updates/updates/nvidia-modeset.ko
./5.3.18-59.19-default/weak-updates/updates/nvidia.ko
./5.3.18-59.19-default/weak-updates/updates/nvidia-drm.ko
./5.3.18-59.19-default/kernel/drivers/net/ethernet/nvidia

4core:/lib/modules # uname -r
5.3.18-59.19-preempt
So the kernel I'm running won't find them because it will look in 5.3.18-59.19-preempt even though they are installed in 5.3.18-59.19-default (next question, how to fix this cleanly). But at least I think I know where I'm going now.
chris2be8 is offline   Reply With Quote
Old 2021-09-04, 17:28   #70
chris2be8
 
chris2be8's Avatar
 
Sep 2009

42268 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Did you install through Yast or a direct download from nVidia?
zypper on the command line. Following the instructions on Nvidia's web site https://developer.nvidia.com/cuda-downloads
chris2be8 is offline   Reply With Quote
Old 2021-09-04, 18:04   #71
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

24·257 Posts
Default

Some of the instructions I saw in the past had a separate step, almost hidden, that was required to install the driver. Is it possible there is a driver install step missing in your procedure?

For my Ubuntu repository install of 10.2, it automatically installs the 470 driver, no matter what I have beforehand.

Is there an equivalent to this Ubuntu command?:
Code:
sudo ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0 ==
modalias : pci:v000010DEd00000FFDsv0000103Csd00000967bc03sc00i00
vendor   : NVIDIA Corporation
model    : GK107 [NVS 510]
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-450 - third-party non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-455 - third-party non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-340 - distro non-free
driver   : nvidia-driver-465 - third-party non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-470 - third-party non-free recommended
driver   : nvidia-driver-418 - third-party non-free
driver   : nvidia-driver-410 - third-party non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-440 - third-party non-free
driver   : nvidia-driver-460 - third-party non-free
driver   : xserver-xorg-video-nouveau - distro free builtin
Would such be of any help?
EdH is offline   Reply With Quote
Old 2021-09-04, 20:10   #72
chris2be8
 
chris2be8's Avatar
 
Sep 2009

1000100101102 Posts
Default

After rebooting using the 5.3.18-59.19-default kernel the nvidia drivers are picked up:
Code:
4core:~ # lspci -v -s 01:00
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. Device 3978
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	[virtual] Expansion ROM at f7000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia
I'll need to fix that but it can wait for now.

Then I started testing things ...

msieve works OK:
Code:
Sat Sep  4 19:10:51 2021  Msieve v. 1.54 (SVN 1043)
Sat Sep  4 19:10:51 2021  random seeds: 6e515738 cae1a347
Sat Sep  4 19:10:51 2021  factoring 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 (100 digits)
Sat Sep  4 19:10:51 2021  no P-1/P+1/ECM available, skipping
Sat Sep  4 19:10:51 2021  commencing number field sieve (100-digit input)
Sat Sep  4 19:10:51 2021  commencing number field sieve polynomial selection
Sat Sep  4 19:10:51 2021  polynomial degree: 4
Sat Sep  4 19:10:51 2021  max stage 1 norm: 1.16e+17
Sat Sep  4 19:10:51 2021  max stage 2 norm: 8.33e+14
Sat Sep  4 19:10:51 2021  min E-value: 9.89e-09
Sat Sep  4 19:10:51 2021  poly select deadline: 54
Sat Sep  4 19:10:51 2021  time limit set to 0.01 CPU-hours
Sat Sep  4 19:10:51 2021  expecting poly E from 1.49e-08 to > 1.71e-08
Sat Sep  4 19:10:51 2021  searching leading coefficients from 10000 to 1000000
Sat Sep  4 19:10:52 2021  using GPU 0 (NVIDIA GeForce GTX 970)
Sat Sep  4 19:10:52 2021  selected card has CUDA arch 5.2
Sat Sep  4 19:11:19 2021  polynomial selection complete
Sat Sep  4 19:11:19 2021  elapsed time 00:00:28
But I've been having fun with ecm.

The problem with conftest turned out to be:
Code:
chris@4core:~> gcc-9 -o conftest -I/usr/local/cuda/include -g -O2 -I/usr/local/cuda/include  -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64   conftest.c -lcudart -lstdc++ -lcuda -lrt -lm -lm -lm -lm -lm
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
So changing ./configure line 15498 from CUDALIB="-lcudart -lstdc++" to CUDALIB="-lcudart" made it work OK.

I then got a lot of errors like this:
Code:
Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
So edited the Makefile to only build for sm_52 since that's all I need.

But trying to build CGBN support I get:
Code:
chris@4core:~/ecm-cgbn/gmp-ecm> make
make  all-recursive
make[1]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm'
Making all in x86_64
make[2]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm/x86_64'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm/x86_64'
make[2]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm'
/bin/sh ./libtool --tag=CC --mode=compile /usr/local/cuda/bin/nvcc --compile -I/home/chris/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include  -DECM_GPU_CURVES_BY_BLOCK=32  --generate-code arch=compute_52,code=sm_52 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include  -DWITH_GPU -o cgbn_stage1.lo cgbn_stage1.cu -static
libtool: compile:  /usr/local/cuda/bin/nvcc --compile -I/home/chris/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include -DECM_GPU_CURVES_BY_BLOCK=32 --generate-code arch=compute_52,code=sm_52 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include -DWITH_GPU cgbn_stage1.cu -o cgbn_stage1.o
cgbn_stage1.cu(435): error: identifier "cgbn_swap" is undefined
          detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" 
(757): here

cgbn_stage1.cu(442): error: identifier "cgbn_swap" is undefined
          detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" 
(757): here

cgbn_stage1.cu(435): error: identifier "cgbn_swap" is undefined
          detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" 
(760): here

cgbn_stage1.cu(442): error: identifier "cgbn_swap" is undefined
          detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" 
(760): here

4 errors detected in the compilation of "cgbn_stage1.cu".
make[2]: *** [Makefile:2571: cgbn_stage1.lo] Error 1
make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm'
make[1]: *** [Makefile:1903: all-recursive] Error 1
make[1]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm'
make: *** [Makefile:783: all] Error 2
This is after several attempts to run make, so hopefully only the relevant messages.

But I've got an older version of ecm working on the GPU (at last!) So i'll leave it for now.
chris2be8 is offline   Reply With Quote
Old 2021-09-04, 22:01   #73
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

24×139 Posts
Default

Quote:
Originally Posted by SethTro View Post
I halved compile time by adding cgbn_swap and avoiding inlining double_add_v2 twice.
Does it affect the runtime? I don't care much about the compile time. Just compile a few small kernels for testing, and once it's stable include a good coverage of kernels and just let it compile overnight if necessary. In my current build I included all of
Code:
  typedef cgbn_params_t<4, 256>   cgbn_params_256;
  typedef cgbn_params_t<4, 512>   cgbn_params_512;
  typedef cgbn_params_t<8, 768>   cgbn_params_768;
  typedef cgbn_params_t<8, 1024>  cgbn_params_1024;
  typedef cgbn_params_t<8, 1536>  cgbn_params_1536;
  typedef cgbn_params_t<8, 2048>  cgbn_params_2048;
  typedef cgbn_params_t<16, 3072> cgbn_params_3072;
  typedef cgbn_params_t<16, 4096> cgbn_params_4096;
  typedef cgbn_params_t<16, 5120> cgbn_params_5120;
  typedef cgbn_params_t<16, 6144> cgbn_params_6144;
  typedef cgbn_params_t<16, 7168> cgbn_params_7168;
  typedef cgbn_params_t<16, 8192> cgbn_params_8192;
  typedef cgbn_params_t<32, 10240> cgbn_params_10240;
  typedef cgbn_params_t<32, 12288> cgbn_params_12288;
  typedef cgbn_params_t<32, 14336> cgbn_params_14336;
  typedef cgbn_params_t<32, 16384> cgbn_params_16384;
  typedef cgbn_params_t<32, 18432> cgbn_params_18432;
  typedef cgbn_params_t<32, 20480> cgbn_params_20480;
  typedef cgbn_params_t<32, 22528> cgbn_params_22528;
  typedef cgbn_params_t<32, 24576> cgbn_params_24576;
  typedef cgbn_params_t<32, 28672> cgbn_params_28672;
  typedef cgbn_params_t<32, 32768> cgbn_params_32768;
and it took a little over an hour to compile for sm_70.
frmky is offline   Reply With Quote
Old 2021-09-05, 02:24   #74
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·787 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
So changing ./configure line 15498 from CUDALIB="-lcudart -lstdc++" to CUDALIB="-lcudart" made it work OK.
Use YaST to search for the dev file of libstdc++ and install it (and its dependencies), and then link with -lstdc++

Last fiddled with by paulunderwood on 2021-09-05 at 02:26
paulunderwood is online now   Reply With Quote
Old 2021-09-05, 03:28   #75
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

32×43 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
This is after several attempts to run make, so hopefully only the relevant messages.

But I've got an older version of ecm working on the GPU (at last!) So i'll leave it for now.
This is an easy fix, you are on the home stretch!

I'll committed a change that depends on https://github.com/NVlabs/CGBN/pull/17 being accepted. I'll committed a change reverting that to 3 cgbn_set's for now. After you `git pull` everything should build!

Alternatively you can use replace your CGBN directory with this one. `git clone -b cgbn_swap git@github.com:sethtroisi/CGBN.git`
SethTro is offline   Reply With Quote
Old 2021-09-05, 03:40   #76
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

32×43 Posts
Default

Quote:
Originally Posted by frmky View Post
Does it affect the runtime? I don't care much about the compile time. Just compile a few small kernels for testing, and once it's stable include a good coverage of kernels and just let it compile overnight if necessary. In my current build I included all of
Code:
  typedef cgbn_params_t<4, 256>   cgbn_params_256;
  typedef cgbn_params_t<4, 512>   cgbn_params_512;
  typedef cgbn_params_t<8, 768>   cgbn_params_768;
  typedef cgbn_params_t<8, 1024>  cgbn_params_1024;
.........
  typedef cgbn_params_t<32, 32768> cgbn_params_32768;
and it took a little over an hour to compile for sm_70.
It doesn't reduce runtime, it does make it faster for me to test things and slightly reduces registers pressure.
SethTro is offline   Reply With Quote
Old 2021-09-05, 05:35   #77
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·7·157 Posts
Default

Quote:
Originally Posted by SethTro View Post
Alternatively you can use replace your CGBN directory with this one. `git clone -b cgbn_swap git@github.com:sethtroisi/CGBN.git`
That fails:
Code:
chris@4core:~> git clone -b cgbn_swap git@github.com:sethtroisi/CGBN.git
Cloning into 'CGBN'...
The authenticity of host 'github.com (140.82.121.4)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com,140.82.121.4' (RSA) to the list of known hosts.
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

And 'git pull' does nothing:
Code:
chris@4core:~/CGBN> git pull
Already up to date.
Unless I'm not using it correctly.

Last fiddled with by chris2be8 on 2021-09-05 at 05:40 Reason: Add note about git pull
chris2be8 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NTT faster than FFT? moytrage Software 50 2021-07-21 05:55
PRP on gpu is faster that on cpu indomit Information & Answers 4 2020-10-07 10:50
faster than LL? paulunderwood Miscellaneous Math 13 2016-08-02 00:05
My CPU is getting faster and faster ;-) lidocorc Software 2 2008-11-08 09:26
Faster than LL? clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 21:50.


Sun Dec 5 21:50:11 UTC 2021 up 135 days, 16:19, 0 users, load averages: 1.76, 1.44, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.