![]() |
Linux install of CUDA toolkit for GT 430 card, maybe…
Our dear forum owner Xyzzy is kindly loaning me his old low-end (but still with extant DP-float support) GT 430 card, which should allow me to do most of the GPU code-dev I intend to do in the coming 6-12 months. He has warned me, however, that the tools-install is a real adventure, and to use the local-expertise around here as a resource. Here are some nuggets from our e-mail exchange:
[quote]Me: How much setup will getting my system to talk to the card need, in your estimation? [i] Xyzzy: you will need help from the forum it should be as simple as "apt-get install nvidia-cuda-toolkit" but it isn't for a ton of reasons [url]https://packages.debian.org/wheezy/nvidia-cuda-toolkit[/url] [/i] Me: Will probably RTFM and do the physical install, followed by the "in an ideal world" build-tools install cmd you mentioned, then call to tell you what error messages I got. [i] Xyzzy: well, for starters, the software is not in your *current* repository. because we have that set up for (i think) only open source software so you will need to add the repository OR use the binary "blob" installer i would ask on the forum about which to use (give them the stats of your environment) also the card only has dvi outputs (i think) so (unless you have dvi cables for your screen) you will need to do a "headless" install [/i] Me: Does the card come with a linux-compatible tools-install CD? [i] Xyzzy: of course not (you wouldn't want that anyways) i will be upfront with you the cuda toolkit install is a bitch i have done it several times and it is one of the hardest software installs i have ever done you might consider booting from a live usb stick and using a persistent file with that to test installs [/i][/quote] Host system hardware is an ATX-cased barebones Haswell (MSI Z87 mobo) running Debian. (Mike, what other system details should I post here?) To the experts: So, just how much fun am I in for? Would downloading the cuda toolkit from the above debian.org link be a good thing to do while I await USPS delivery of the card? Re. the DVI connectors: I have a small LCD monitor I can unbox if needed, but not sure if it has such outputs. (But can easily check, if that helps). My normal way of talking with the Haswell box is via direct ethernet cable to my macbook - the minimalist in me would prefer to continue to do that with the GPU-augmented ATX-cased system, if possible. |
I run two GT430s. I run them on Ubuntu 14.04. I simply enable the third party repository, and installing the nvidia-331 and nvidia-cuda-dev and nvidia-cuda-toolkit packages. That should give you CUDA 6 in the driver and CUDA 5.5 for development. Compiling mfaktc then works:
sudo apt-get install build-essential -y cd ~ wget [URL]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.tar.gz[/URL] tar xf mfaktc-0.20.tar.gz && cd mfaktc-0.20/src && make -j You'll then have it installed in mfaktc. For automatic fetching of work from GPU72, I recommend teknohog's mfloop.py script (which I've also contributed to) at [URL]https://github.com/teknohog/primetools[/URL] . Installing that is as simple as sudo apt-get install git python -y git clone [URL]https://github.com/teknohog/primetools[/URL] Run the script with the --help switch to look at the options. It can do almost everything. I'm working on more features at [URL]https://github.com/MarkRose/primetools[/URL] . I use a crontab like this: [Code]@reboot cd $HOME/mfaktc-0.20 && screen -S mfaktc -d -m $HOME/mfaktc-0.20/mfaktc.exe -d 0 22 * * * * $HOME/primetools/mfloop.py -u lolomg -p drowssap -U lolomg -P drowssap -t 0 -w $HOME/mfaktc-0.20 -e 74 -g 60 -o let_gpu72_decide[/Code] Note that running mfaktc is going to make X feel sluggish. There is no work around to my knowledge, other than using a different card for driving the display, which involves work. If you want to sick with Debian, I've found in the past that the NVidia packages don't put the new libraries under the cuda directory in the shared object path. Assuming CUDA 6, run something like: echo /usr/local/cuda-6.0/lib | sudo tee /etc/ld.so.conf/cuda.conf sudo ldconfig Then you'll need to edit mfaktc-0.20/src/Makefile change line 12 from NVCC = nvcc to NVCC = /usr/local/cuda-6.0/bin/nvcc before compiling. |
FWIW, getting CUDA to run on my Gentoo systems was very nearly trivial. The downside, of course, is that Gentoo doesn't install very much by default and so you need to know what you want. For instance, Ernst's accunt has been re-vived on my system; he discovered gdb was missing so an "emerge gdb" is chuntering away as I type.
|
[QUOTE=xilman;377102] [snip] For instance, Ernst's accunt [snip][/QUOTE]
You really should be more careful of Ernst's feelings. :max: |
Look at the " ECM for CUDA GPUs in latest GMP-ECM ? " thread [url]http://mersenneforum.org/showthread.php?t=16480&page=22[/url] posts 209-234 for details of the fun I had installing it. The system I installed it on runs Linux Mint 15 which is descended from Debian.
Chris |
I used CUDA on OpenSUSE for a couple years. Initial installation was very easy. However, after OpenSUSE update to the next major version, all the hell broke loose and X wouldn't start with some rather idiotic message (well, it actually would start with a black screen and a tiny, meaningless error dialog in the corner). It took many hours to fix (partly because there is almost no guide anywhere how to perform even the simplest tasks in a graphics-less system, from a tty). It was something very specific to SUSE, I don't remember now the full details and I really hope that you won't have any of these troubles.
[SIZE="1"]In fact, when after the next OS upgrade/ nvidia dirver update the whole story repeated, for quite a while I didn't use X; just six tty's. It was all the same for me at the time because I was running a few LA (a few weeks each) that even benefited from having more available memory (because there was no X).[/SIZE] |
[QUOTE=kladner;377114]You really should be more careful of Ernst's feelings. :max:[/QUOTE]Oh weel. Spieling mistakes appen.
|
[QUOTE=Batalov;377131]I used CUDA on OpenSUSE for a couple years. Initial installation was very easy. However, after OpenSUSE update to the next major version, all the hell broke loose and X wouldn't start with some rather idiotic message (well, it actually would start with a black screen and a tiny, meaningless error dialog in the corner). It took many hours to fix (partly because there is almost no guide anywhere how to perform even the simplest tasks in a graphics-less system, from a tty). It was something very specific to SUSE, I don't remember now the full details and I really hope that you won't have any of these troubles.
[SIZE="1"]In fact, when after the next OS upgrade/ nvidia dirver update the whole story repeated, for quite a while I didn't use X; just six tty's. It was all the same for me at the time because I was running a few LA (a few weeks each) that even benefited from having more available memory (because there was no X).[/SIZE][/QUOTE]Been there done that. The solution after each kernel upgrade was to rip out the old driver module and reinstall it from the nvidia distkit. Worrying the first couple of times, tedious thereafter. Eventually NVidia solved the problem properly and I've not had any significant trouble with CUDA thereafter. To be fair, I tend to stick with the RedHat family (CentOS, Fedora, etc) or Gentoo so there may be a lesson there for SuSE and/or Debian and/or Ubuntu fans. |
Yes, later there was some 1-click-update service at OpenSUSE (that tries to do all magic spells for you and actually does them well). But the first time (when OpenSUSE was still 12.1-pre, I think, upping from 11.x) was awful. [URL="http://en.opensuse.org/SDB:NVIDIA_the_hard_way"]The hard way[/URL].
|
Thanks for all the replies, now I need to spend some time triaging the various suggestions into some kind of "try this, this and this first" priority order.
[QUOTE=Mark Rose;377096]sudo apt-get install build-essential -y[/QUOTE] Is that for the CUDA toolkit or mfaktc? If for CUDA, will that work across my hardcoded LAN setup? [My Haswell only talks to my macbook - no wireless, as I said this is a barebones system]. [QUOTE=kladner;377114]You really should be more careful of Ernst's feelings. :max:[/QUOTE] Yes, one must vagilently guard against such embarrassing faux pas at all times. |
I'm on Linux Mint Debian Edition, with a GT430 at the moment. I never had much trouble installing CUDA. It's similar to installing the binary Nvidia driver, which you should do first, except that you don't have to shut down the window manager to install CUDA. For me I think it was as simple as running the BLOB, downloaded from Nvidia's website, with sudo.
If you need to run apt-get and don't have direct Internet access, maybe you need to [url=http://askubuntu.com/questions/7470/how-to-run-sudo-apt-get-update-through-proxy-in-commandline]set up a proxy[/url]? [url=http://squidman.net/squidman/]This appears to be a SQUID proxy[/url] for Mac OS X. |
[QUOTE=ewmayer;377141]Yes, one must vagilently guard against such embarrassing faux pas at all times.[/QUOTE]I've been unreliably informed that Ernst hasn't been feeling himself recently.
Just as well, really. Disgusting habit IMAO. |
[QUOTE=xilman;377135]Oh weel. Spieling mistakes appen.[/QUOTE]
I strongly assumed that a typo was in play. However, the word play took precedence. :smile: |
[QUOTE=kladner;377151]I strongly assumed that a typo was in play. However, the word play took precedence. :smile:[/QUOTE]
As ever when hanging out with a bunch of cunning linguists. |
And master debaters...
|
[QUOTE=ewmayer;377141]Is that for the CUDA toolkit or mfaktc? If for CUDA, will that work across my hardcoded LAN setup? [My Haswell only talks to my macbook - no wireless, as I said this is a barebones system].[/QUOTE]
You'll actually need it in either case. I'm sure Mac OS X can do something with acting as a router. A proxy won't be enough because you still need to do things like resolve DNS. |
[QUOTE=Mark Rose;377211]A proxy won't be enough because you still need to do things like resolve DNS.[/QUOTE]Why? If everything's sent over HTTP, won't an HTTP proxy cause DNS resolving to happen on the proxy server?
|
[QUOTE=Ken_g6;377212]Why? If everything's sent over HTTP, won't an HTTP proxy cause DNS resolving to happen on the proxy server?[/QUOTE]
Depends on the type of proxy server. With an HTTP proxy server, you're correct. With a SOCKS proxy server, it depends. |
[QUOTE=Mark Rose;377211]You'll actually need it in either case. I'm sure Mac OS X can do something with acting as a router. A proxy won't be enough because you still need to do things like resolve DNS.[/QUOTE]
I am currently attempting to use the OS X network sharing facility (along with reconfiguring the Haswell system network config from static IP to DHCP) to allow the macbook (which uses WiFi to connect to my flatmate's router) to act as an internet portal for the Haswell. If I can get that working it seems the best option. First the internet, only then will I switch focus to the CUDA tools install/setup. But now, it's dinner time. More tomorrow. |
OK, here the latest: I replaced the previous static-ip/netmask/gateway/broadcast entries for 'eth0' in my Debian system's /etc/network/interfaces file with
[i] auto eth0 allow-hotplug eth0 iface eth0 inet dhcp [/i] followed by '/etc/init.d/networking restart', which gave a bunch of expected-looking network-restart info messages, and as expected resulted in my being frozen out of the terminal I was using to ssh-to-haswell from my macbook. Next dug out the flat-panel monitor and keyboard I had last used when setting up the then-new Haswell system, am logged in now via that, at the same time have LAN cable plugged into the macbook, which has WiFi (and the sharing-enabled setup to the LAN) enabled. And ... looks good! 'ping [url]www.google.com[/url]' shows 0% packet loss. Next up ... CUDA toolkit (and any associated sw dependencies) install - may do some reading of nVidia 'how to' docs first, over the long July 4th weekend. |
[QUOTE=ewmayer;377338]
And ... looks good! 'ping [url]www.google.com[/url]' shows 0% packet loss. [/quote] Excellent! [quote] Next up ... CUDA toolkit (and any associated sw dependencies) install - may do some reading of nVidia 'how to' docs first, over the long July 4th weekend.[/QUOTE] The only part you really need to pay attention to is blacklisting the nouveau and nv kernel modules, if the installer doesn't do that for you automatically. They'll prevent the nvidia module from loading. Otherwise it should basically work out of the box. I may also suggest looking into dkms which will automatically rebuild the nvidia modules when you install a new kernel. |
OK, with a BIOS tweak to force use of the onboard video on the mobo now holding the 430, the system recognizes the device, and the real fun (Mike's rough characterization of his past experiences with the CUDA tools install) can begin.
Since this is an older card, we agreed that the 5.5 version of the tools made sense - I grabbed the .run file for that [url=https://developer.nvidia.com/cuda-toolkit-55-archive]here[/url], specifically [url=http://developer.download.nvidia.com/compute/cuda/5_5/rel/installers/cuda_5.5.22_linux_64.run]this file[/url], which is shared across all Linux distros. Here is what happened when I executed the .run file as root last night: [i] root@derek:/home/ewmayer# ./cuda_5.5.22_linux_64.run -extract=/root/ Logging to /tmp/cuda_install_6441.log Extracting individual Driver, Toolkit and Samples installers to /root ... [/i] That finished 5-10 seconds later - suspiciously fast, given the 810 MB size of the .run file. The tmp-logfile contains the last "Extracting" line verbatim, nothing else. According to the post-install notes in the nVidia "getting started" PDF, there should now be an nvidia subdir in /dev, but I don't see that, nor anything that remotely looks cuda-related. Here is what ends up in /root afterward (leftmost file is older): [i] 70-persistent-net.rules cuda-linux64-rel-5.5.22-16488124.run cuda-samples-linux-5.5.22-16488124.run NVIDIA-Linux-x86_64-319.37.run [/i] Mike adds: [i] one way you will know that the install is working is there will be a disclaimer you have to accept if you didn't see that and accept it then something is wrong at some point it will ask you what driver to use (the one in the runfile is offered) i think you want that driver not sure what is going on [/i] Any clues/diagnostics-to-try from our local experts welcomed. |
p.s.: Maybe this just means I now need to run the .run files deposited in /root?
Yours truly, - cudaN00b |
Yes, or just invoking the .run file without the -extract option.
|
[QUOTE=ewmayer;378370][i]cuda-linux64-rel-5.5.22-16488124.run
cuda-samples-linux-5.5.22-16488124.run NVIDIA-Linux-x86_64-319.37.run[/i][/QUOTE] Did the first 2 of those, that is more along the lines of what I expected. Do I need the 3rd one for my GUI-less command-line-only setup? |
[QUOTE=ewmayer;378380]Did the first 2 of those, that is more along the lines of what I expected. Do I need the 3rd one for my GUI-less command-line-only setup?[/QUOTE]
You don't need the samples. You do need the other two. |
[QUOTE=Mark Rose;378383]You don't need the samples. You do need the other two.[/QUOTE]
Thanks- but when I execute NVIDIA-Linux-x86_64-319.37.run, unlike the other 2 where all was strictly b&w text-only / command-line-only, now - this is via ssh from the Mac, mind you - I get a wild-colored "GUI like" EULA with 2 red accept / decline boxes at top. But double-clicking "Accept" only highlights the text in the box - i.e. I need a way of forcing things back into cmd-line mode. |
[QUOTE=ewmayer;378384]Thanks- but when I execute NVIDIA-Linux-x86_64-319.37.run,
unlike the other 2 where all was strictly b&w text-only / command-line-only, now - this is via ssh from the Mac, mind you - I get a wild-colored "GUI like" EULA with 2 red accept / decline boxes at top. But double-clicking "Accept" only highlights the text in the box - i.e. I need a way of forcing things back into cmd-line mode.[/QUOTE] Thanks, Mike - Tab to toggle between optionss, Enter to confirm works - but now I get [i] The CC version check failed: The compiler used to compile the kernel (gcc 4.3) does not exactly match the current compiler (gcc 4.4). The Linux 2.6 kernel module loader rejects kernel modules built with a version of gcc that does not exactly match that of the compiler used to build the running kernel. If you know what you are doing and want to ignore the gcc version check, select "No" to continue installation. Otherwise, select "Yes" to abort installation, set the CC environment variable to the name of the compiler used to compile your kernel, and restart installation. Abort now? [/i] Safe to force continue? |
[QUOTE]Thanks, Mike - Tab to toggle between optionss, Enter to confirm works…[/QUOTE]We're just adding this in case this thread is ever useful in the future: The GUI-looking screen is just colored ASCII art and because you are using SSH the mouse works, but only to copy and paste stuff. The remote computer that you are logging into knows nothing about that mouse.
|
Also, try the -h option on the .run file.
IIRC, there was an option to confirm the agreement with the license from CL. |
Can anyone comment on the gcc 4.3 vs 4.4 issue?
Note that my Mlucas-builds-for-Haswell are pretty insensitive to the gcc version, so I'd be fine to downgrade to 4.3 if the CUDA 5.5 install really does have an issue with 4.4. |
I've never dealt with this problem, so this is just from what I've read:
Its not CUDA5.5 that has problems with gcc4.4, its that the kernel module loader dislikes modules and kernels built with different gcc versions. Edit: Its not a problem to have multiple gcc versions. |
This shows that 4.4 is what the system "expects": [url]https://packages.debian.org/wheezy/gcc[/url]
|
Alright, I said 'no' to 'abort install?' (i.e. force-continue) ... that gave
[i] ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option. [/i] which had just 'ok' as an option, which leads to [i] ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at [url]www.nvidia.com[/url].[/i] |
So you apparently don't have the kernel headers installed. Install what they suggest and try again.
|
Possibly, as root:
[FONT="Courier New"]apt-get install linux-headers-`uname -r`[/FONT] |
[QUOTE=Xyzzy;378398]Possibly, as root:
[FONT="Courier New"]apt-get install linux-headers-`uname -r`[/FONT][/QUOTE] The first thing I did [following the recipe described [url=http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/graham/cuda5-debian-wheezy/]here[/url]] was [i] apt-get install linux-headers-'uname -r' [/i] That gave "E: Unable to locate package linux-headers-uname -r" |
[QUOTE=ewmayer;378400]The first thing I did [following the recipe described [url=http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/graham/cuda5-debian-wheezy/]here[/url]] was
[i] apt-get install linux-headers-'uname -r' [/i] That gave "E: Unable to locate package linux-headers-uname -r"[/QUOTE] Aha - Mike's version has upper-left-to-lower-right-slanting `, whereas mine had standard single-quote ' - with the former it works. Should I retry the last .run execution now? |
Those are back quotes. Its been a long time since I've worked on a debian based system with apt-get package manager, but the default
[CODE]apt-get install linux-headers[/CODE]will probably fetch the correct ones. Edit: Ah, I see you already figured it out. Yeah, go ahead. My guess is that it willl build, but not load. But maybe I'm wrong. |
[QUOTE=owftheevil;378402]Yeah, go ahead. My guess is that it willl build, but not load. But maybe I'm wrong.[/QUOTE]
Kernek modules now build ... and I get this: [i] Install NVIDIA's 32-bit compatibility OpenGL libraries? [/i] Yes or no? |
I don't use them, but I'm not really sure what they are for.
|
[QUOTE=owftheevil;378405]I don't use them, but I'm not really sure what they are for.[/QUOTE]
I'll take that as a "sure, why not?" Next screen is [i] Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up. [/i] Is that relevant to my cmd-line-only setup? |
No its not relevant
|
[QUOTE=owftheevil;378407]No its not relevant[/QUOTE]
Alright, we have success - at least of the final .run-file-based setup step: [i] Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 319.37) is now complete. Please update your XF86Config or xorg.conf file as appropriate; see the file /usr/share/doc/NVIDIA_GLX-1.0/README.txt for details. [/i] Offline 'til tomorrow - thanks for all the help, guys! |
[QUOTE]Aha - Mike's version has upper-left-to-lower-right-slanting `, whereas mine had standard single-quote ' - with the former it works.[/QUOTE][url]http://www.tldp.org/LDP/abs/html/commandsub.html[/url]
:max: |
OK, all 3 .run files mentioned above now have been run successfully, after reboot, 'nvcc -V' gives
[i] nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2013 NVIDIA Corporation Built on Wed_Jul_17_18:36:13_PDT_2013 Cuda compilation tools, release 5.5, V5.5.0 [/i] Attempted nvcc compile of a couple basic sourcefiles indcates all is not well, however: [i] ewmayer@derek:~/Mlucas/SRC$ nvcc -c -O3 -DUSE_GPU util.cu gpu_iface.cu In file included from imul_macro.h:29, from mi64.h:30, from util.h:32, from util.cu:23: imul_macro0.h:347:4: error: #error unknown compiler for AMD64. util.cu:1080:5: error: #error unsupported compiler type for x87! [/i] Those preprocessor errors are due to the expected compiler-macro __CUDACC__ not being defined. So I tried using the list-predefines method described in post #4 in [url=http://www.mersenneforum.org/showthread.php?t=18668]this thread[/url]; that doesn't work for me, though: [i] ewmayer@derek:~/Mlucas/SRC$ strings nvcc | grep [-]D strings: 'nvcc': No such file[/i] |
Looks like the CUDA headers aren't in the compiler's default header path.
Try running nvcc with -L/usr/local/cuda-5.5/lib64 |
[QUOTE=Mark Rose;378814]Looks like the CUDA headers aren't in the compiler's default header path.
Try running nvcc with -L/usr/local/cuda-5.5/lib64[/QUOTE] No joy - I tried putting the above path before and after the sourcefile names, and also tried just '.../lib32' on the thought that maybe nvcc was defaulting to 32-bit mode. The lib32 and lib64 dirs are definitely there: [code] ewmayer@derek:~/Mlucas/SRC$ l /usr/local/cuda-5.5/lib64 libcublas_device.a libcudart.so.5.5 libcufftw.so libcurand.so libnppc.so libnpps.so libcublas.so libcudart.so.5.5.22 libcufftw.so.5.5 libcurand.so.5.5 libnppc.so.5.5 libnpps.so.5.5 libcublas.so.5.5 libcudart_static.a libcufftw.so.5.5.22 libcurand.so.5.5.22 libnppc.so.5.5.22 libnpps.so.5.5.22 libcublas.so.5.5.22 libcufft.so libcuinj64.so libcusparse.so libnppi.so libnvToolsExt.so libcudadevrt.a libcufft.so.5.5 libcuinj64.so.5.5 libcusparse.so.5.5 libnppi.so.5.5 libnvToolsExt.so.1 libcudart.so libcufft.so.5.5.22 libcuinj64.so.5.5.22 libcusparse.so.5.5.22 libnppi.so.5.5.22 libnvToolsExt.so.1.0.0 [/code] But these are all link-time libs ... you say we need a path-to-headers, shouldn't that appear via -I[path] in the compile command, and point to a dir full of .h files? |
p.s.: Mike just e-mailed to ask if "nvidia-smi" returns anything yet.
Yes - see below. Also the GPU fan kicked in from that, is still going a few mins later - shouldn't that die down after a few seconds, since there is no process running? [code]ewmayer@derek:~/Mlucas/SRC$ nvidia-smi Tue Jul 22 11:16:35 2014 +------------------------------------------------------+ | NVIDIA-SMI 5.319.37 Driver Version: 319.37 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 430 Off | 0000:01:00.0 N/A | N/A | | 65% 32C N/A N/A / N/A | 3MB / 1023MB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+[/code] |
Updates:
o I couldn't find any way using nvidia-smi to shut the GPU fan back down after I did the above hardware diagnostics, so I simply waited for my ongoing Haswell Fermat-number run to hit its next savefile checkpoint and rebooted. Not a pretty solution, but it will serve for now (unless someone finds a better one). Perhaps I've been spoiled by the relative quietness of the Haswell fan - the case sits with side access panel removed (allowing easy access to the guts) on my desk, i.e. the CPU fan is literally 2 feet away from my head - but with all 4 cores blasting away, and the case fans nearest the CPU also running it's a quite tolerable noise level - nothing that interferes with me using a cellphone on the ear closest to the CPU. The GPU fan - and this is only an unloaded GT430! - was significantly louder than all 3 of the above fans together. But, on to config/compile issues. o Looks like the "__CUDACC__ undefined" issue was due to some incorrect (in the context of GPU builds) nesting of #if-spaghetti in one of the headers used by my util.c ('ln -s'-aliased to util.cu) file. Next issue: nvcc is giving me errors about basic C typedefs, e.g. [i] gpu_iface.cu:30:3: warning: #warning using nvcc [-Wcpp] gpu_iface.cu:32:4: warning: #warning device code trajectory [-Wcpp] gpu_iface.cu:34:5: warning: #warning compiling with double precision [-Wcpp] [b]types.h(354): error: expected a ";"[/b] 1 error detected in the compilation of "/tmp/tmpxft_00002ffa_00000000-6_gpu_iface.cpp1.ii". [/i] The error is for the first typedef in this snip, where I left-annotate with line numbers in the header file to make it easier to compare with the above nvcc error message: [code]352 #if __CUDA_ARCH__ > 120 353 #warning CUDA: compiling with double precision 354 typedef real double; 355 typedef double vec_dbl; 356 #else 357 ...[/code] The "typedef real to double" idea came to me via a CUDA forum earlier today, where it was done via #define - and indeed, changing [i] typedef real double; [/i] to [i] #define real double [/i] fixes that error - but now I get same kind of error for the next typedef on the next line. I don't want to change every typedef in my code to a #define -- this is after all standard C syntax! |
[QUOTE=ewmayer;379006]Updates:
o I couldn't find any way using nvidia-smi to shut the GPU fan back down after I did the above hardware diagnostics, so I simply waited for my ongoing Haswell Fermat-number run to hit its next savefile checkpoint and rebooted. Not a pretty solution, but it will serve for now (unless someone finds a better one). Perhaps I've been spoiled by the relative quietness of the Haswell fan - the case sits with side access panel removed (allowing easy access to the guts) on my desk, i.e. the CPU fan is literally 2 feet away from my head - but with all 4 cores blasting away, and the case fans nearest the CPU also running it's a quite tolerable noise level - nothing that interferes with me using a cellphone on the ear closest to the CPU. The GPU fan - and this is only an unloaded GT430! - was significantly louder than all 3 of the above fans together. But, on to config/compile issues. o Looks like the "__CUDACC__ undefined" issue was due to some incorrect (in the context of GPU builds) nesting of #if-spaghetti in one of the headers used by my util.c ('ln -s'-aliased to util.cu) file. Next issue: nvcc is giving me errors about basic C typedefs, e.g. [i] gpu_iface.cu:30:3: warning: #warning using nvcc [-Wcpp] gpu_iface.cu:32:4: warning: #warning device code trajectory [-Wcpp] gpu_iface.cu:34:5: warning: #warning compiling with double precision [-Wcpp] [b]types.h(354): error: expected a ";"[/b] 1 error detected in the compilation of "/tmp/tmpxft_00002ffa_00000000-6_gpu_iface.cpp1.ii". [/i] The error is for the first typedef in this snip, where I left-annotate with line numbers in the header file to make it easier to compare with the above nvcc error message: [code]352 #if __CUDA_ARCH__ > 120 353 #warning CUDA: compiling with double precision 354 typedef real double; 355 typedef double vec_dbl; 356 #else 357 ...[/code] The "typedef real to double" idea came to me via a CUDA forum earlier today, where it was done via #define - and indeed, changing [i] typedef real double; [/i] to [i] #define real double [/i] fixes that error - but now I get same kind of error for the next typedef on the next line. I don't want to change every typedef in my code to a #define -- this is after all standard C syntax![/QUOTE] Idea based on the IOCCC: #define typedef #define Do so after any #include line ... May not work but worth a try :wink: |
[QUOTE=ewmayer;379006]
354 typedef real double; 355 typedef double vec_dbl; [/quote] I think you mean 'typedef double real;' for the first one (IE 'create a type called 'real' which is another name for double') |
[QUOTE=xilman;379023]#define typedef #define[/QUOTE]
Same thought occurred to me, but no go, because typedefs are ;-terminated. [QUOTE=fivemack;379026]I think you mean 'typedef double real;' for the first one (IE 'create a type called 'real' which is another name for double')[/QUOTE] I was simply flailing around yesterday afternoon, the forum where I saw that had '#define real double' which briefly (and incorrectly) led me to surmise that maybe nvcc used an internal 'master' floating-point type called 'real'. Again, makes no sense except in a "fog of war" kind of way. So could someone please give me a straight answer to the question: Does nvcc support c-standard typedefs, or not? If so, why would it squawk about something as simple as my example above? If not I will have no choice but to monkey with a bunch of header files and carve out "special nvcc preprocessor section", but will be distinctly un-pleased by the need to do so. [b]Update:[/b] As most of the stuff in the above types.h file is unneeded for cuda compiles, I tried simply commenting out the include of that header in my gpu_iface.h file. Resulting compile gave just one missing typedef: [i] gpu_iface.h(74): error: identifier "int32" is undefined [/i] That is here in that header: [i] typedef struct { int32 num_gpu; gpu_info_t gpu_info[MAX_GPU]; } gpu_config_t; [/i] When I preceded that with the needed (i.e. copied over from types.h) [i] typedef int int32; [/i] It works just fine. So maybe the real issue is that nvcc - perhaps some aspect of its 2-step compilation? - doesn't like the kind of headers-including-headers nesting I use? |
If this is cribbed from cuda_xface.h in the Msieve source, the 'int32' is a typedef that Msieve makes up, it is in no way standard. You may be thinking of int32_t, which you get by including stdint.h, and which IIRC nvcc supports happily.
Incidentally, the CUDA runtime API now has a device query function that gives you absolutely all config information, something that didn't exist when I wrote my own crappy device query code. |
[QUOTE=jasonp;379094]If this is cribbed from cuda_xface.h in the Msieve source, the 'int32' is a typedef that Msieve makes up, it is in no way standard. You may be thinking of int32_t, which you get by including stdint.h, and which IIRC nvcc supports happily.[/QUOTE]
The gpu_iface code is indeed cribbed from yours, but I have long had a full set of unambiguous-bitlength integer typedefs in my types.h header. [QUOTE]Incidentally, the CUDA runtime API now has a device query function that gives you absolutely all config information, something that didn't exist when I wrote my own crappy device query code.[/QUOTE] Thanks - will likely use that at some point, but right now am using the custom iface code as a means of testing my cuda tools install. So, looking more closely at the sequence of typedefs in my types.h file: int32 is typedef'd right near the top and nvcc had no problem with it, nor others - except when it gets to the 'typedef double vec_dbl'. Is vec_dbl perhaps an nvcc reserved word? [Note: I don't really need a vector double type for cuda work since that will use the C scalar-double code I have in place - but in order to get SIMD and scalar-double code paths to play nice together, I find it useful to use a shared typedef (e.g. for allocs) which defaults to "length-1 vector' in the scalar case.] |
Mystery solved - in the end I wound up doing a kind of binary search for the preprocessor directive(s) which were causing nvcc to issue the 'missing ;' error - delete all of types.h below a certain line, try building the simple test code. If it builds, restore the deleted lines, then delete roughly half (top or bottom, your choice) and retry. That finally narrowed things down to the real culprit, which was in a block of #defines for "compilers other than the following short list for which there are custom macros defined elsewhere", i.e. preprocessor stuff that hadn't been exercised until now - indeed there was a missing ';' (the last of the struct-defs in the block was missing a terminating ';'), but it was several hundred lines above the line number nvcc listed in its original error message.
So, now - with the terminating ';' added - the test code builds with my existing headers, and gives the expected diagnostics (at least based on first examination): [i] Detected 1 CUDA-enabled GPU devices. GPU #0: GeForce GT 430 v2.1 clock_speed = 1400 MHz num_compute_units = 2 constant_mem_size = 65536 shared_mem_size = 49152 global_mem_size = 1073414144 registers_per_block = 32768 max_threads_per_block = 1024 can_overlap = 1 warp_size = 32 max_thread_dim[3] = [1024,1024,64] max_grid_size[3] = [65535,65535,65535] [/i] ...and I can get down to some real work. |
[QUOTE=Mark Rose;377096]I run two GT430s. I run them on Ubuntu 14.04. I simply enable the third party repository, and installing the nvidia-331 and nvidia-cuda-dev and nvidia-cuda-toolkit packages. That should give you CUDA 6 in the driver and CUDA 5.5 for development. Compiling mfaktc then works:
sudo apt-get install build-essential -y cd ~ wget [url]http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.tar.gz[/url] tar xf mfaktc-0.20.tar.gz && cd mfaktc-0.20/src && make -j[/QUOTE]We are having some difficulties installing everything. We are using Ubuntu 14.04 and it is fully updated. We started with the onboard Intel graphics and installed our new video card. The cable for our LCD was attached to the motherboard. We installed the latest .deb option from [URL="https://developer.nvidia.com/cuda-downloads"]here[/URL]. We then installed "cuda" which gave us a broken primary display. After much wailing and gnashing of teeth we realized that the new video card had taken control of the primary display, so we moved the cable to the new video card and we got our display back. (This was really weird because the onboard graphics display worked using "CTRL+ALT+F1" for text mode.) At this point we had to install "nvidia-cuda-toolkit" to get "nvcc" working. We then went to "/usr/local/cuda-6.5/samples" and, with "sudo" we built the sample programs. We then ran "/usr/local/cuda-6.5/samples/bin/x86_64/linux/release/deviceQuery" and got this output: [CODE]./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 750 Ti" CUDA Driver Version / Runtime Version 6.5 / 6.5 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2048 MBytes (2147287040 bytes) ( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores GPU Clock rate: 1254 MHz (1.25 GHz) Memory Clock rate: 2700 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 750 Ti Result = PASS[/CODE]This indicates to us that the card is working. We also ran "nvidia-smi" to check things: [CODE]Sat Oct 11 17:10:12 2014 +------------------------------------------------------+ | NVIDIA-SMI 340.29 Driver Version: 340.29 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 750 Ti Off | 0000:01:00.0 N/A | N/A | | 40% 32C P8 N/A / N/A | 8MiB / 2047MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+[/CODE]We are not sure if that message is good or not. We then downloaded and compiled mfaktc. When we execute the binary, we get this error: [CODE]mfaktc v0.20 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 4.20 CUDA runtime version 4.20 CUDA driver version 6.50 CUDA device info name GeForce GTX 750 Ti compute capability 5.0 maximum threads per block 1024 number of mutliprocessors 5 (unknown number of shader cores) clock rate 1254MHz Automatic parameters threads per grid 655360 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function [/CODE]We originally wanted to use the onboard graphics for our display and the video card exclusively for mfaktc, but we can accept having to use the video card for both. We do not know where we went wrong, but at least we have a display, which is a lot better than our previous attempts, which ended horribly. :help: |
Is there a 'nix equivalent of Nvidia Control Panel? It gets installed with the driver under Win. It allows setting up different displays in different ways: cloned, extended, and some others.
Try removing the add-in card's driver, the card itself, and configuring the onboard display adapter. Then add it back in. See if it appears in the BIOS. You may be able to designate the first adapter to try when the system boots. I'm just generalizing. It's a long time since I messed with any Linux flavor. EDIT2: You run an Asus board, right? I bet you just have to flip the right BIOS switches to get things under control.....and if this is the way to go, it [strike]is [/strike] should be OS agnostic. |
[CODE]CUDA version info
binary compiled for CUDA 4.20 CUDA runtime version 4.20 CUDA driver version 6.50 [/CODE] This looks funky. Check the paths in the top of the makefile, make sure CUDA_DIR is pointing to wherever cuda-6.5 resides. |
[QUOTE]Check the paths in the top of the makefile, make sure CUDA_DIR is pointing to wherever cuda-6.5 resides.[/QUOTE]New output:
[CODE]mfaktc v0.20 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 6.50 CUDA device info name GeForce GTX 750 Ti compute capability 5.0 maximum threads per block 1024 number of mutliprocessors 5 (unknown number of shader cores) clock rate 1254MHz Automatic parameters threads per grid 655360 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function [/CODE] |
[QUOTE]You may be able to designate the first adapter to try when the system boots.[/QUOTE]We have the BIOS set to use (only) the onboard graphics but the OS chooses differently, even in "failsafe" mode.
|
[QUOTE=Xyzzy;384994]We are having some difficulties installing everything.
We are using Ubuntu 14.04 and it is fully updated. We started with the onboard Intel graphics and installed our new video card. The cable for our LCD was attached to the motherboard. We installed the latest .deb option from [URL="https://developer.nvidia.com/cuda-downloads"]here[/URL]. We then installed "cuda" which gave us a broken primary display. After much wailing and gnashing of teeth we realized that the new video card had taken control of the primary display, so we moved the cable to the new video card and we got our display back. (This was really weird because the onboard graphics display worked using "CTRL+ALT+F1" for text mode.) [/quote] If you install the bumblebee package it will blacklist the nvidia modules from loading at boot, making the Intel video the primary. Alternatively, you can put a file at /etc/modprobe.d/blacklist-nvidia.conf with [code] blacklist nvidia [/code] inside. That should make things revert to the Intel graphics on reboot. [quote] At this point we had to install "nvidia-cuda-toolkit" to get "nvcc" working. We then went to "/usr/local/cuda-6.5/samples" and, with "sudo" we built the sample programs. We then ran "/usr/local/cuda-6.5/samples/bin/x86_64/linux/release/deviceQuery" and got this output: [CODE] CUDA version info binary compiled for CUDA 4.20 CUDA runtime version 4.20 CUDA driver version 6.50 [/code] [/quote] There seems to be another CUDA install on that system. I would look for any old CUDA packages and remove them. The CUDA install from Nvidia also doesn't put its libraries into the default library path. You can do so by running: [code] echo /usr/local/cuda-6.5/lib | sudo tee /etc/ld.so.conf.d/cuda.conf sudo ldconfig [/code] and try compiling and running mfaktc again. I may be wrong on the location of the lib folder inside /usr/local/cuda-6.5. The correct directory will have a bunch of .so files. Note that not loading the nvidia driver at boot means some device modules are missing. I use the following script to start two mfaktc.exe's from two directories in two "screen" sessions for my machine with the two GTX 580's. I use only the Intel graphics for the desktop: [code] #!/bin/bash # mf-start mf-stop if [ "$(lsmod | egrep -c '^nvidia')" = "0" ] ; then sudo modprobe nvidia fi num=$(lspci | grep NVIDIA | grep VGA | wc -l) num=$(expr $num - 1) for i in $(seq 0 $num) ; do if [ ! -e /dev/nvidia$i ] ; then sudo mknod -m 666 /dev/nvidia$i c 195 $i; fi done if [ ! -e /dev/nvidiactl ] ; then sudo mknod -m 666 /dev/nvidiactl c 195 255 fi cd /home/xyzzy/mfaktc0 && screen -d -m -S mf0 ./mfaktc.exe -d 0 cd /home/xyzzy/mfaktc1 && screen -d -m -S mf1 ./mfaktc.exe -d 1 [/code] That script calls mf-stop, which gracefully tells any running mfaktc to quit and waits until they've all quit: [code] #!/bin/bash # mf-stop killall mfaktc.exe 2> /dev/null while pgrep -c mfaktc.exe > /dev/null ; do sleep 0.5 ; done [/code] |
:goodposting:
I don't know what most of those commands mean, but you seem to understand how to Nvidia. :smile: |
v4.2 seems to be what stock RedHat systems default to. Are you sure the cuda-6.5 lib driectory is in your library path? What about setting LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64 in the environment before compiling and running? Also try ldd to find out what libraries your binary is being directed to.
|
[QUOTE=kladner;385000]Is there a 'nix equivalent of Nvidia Control Panel? It gets installed with the driver under Win. It allows setting up different displays in different ways: cloned, extended, and some others...[/QUOTE]
nvidia-settings It may not be installed automatically. I don't recall. |
[QUOTE=Dubslow;385012]:goodposting:
I don't know what most of those commands mean, but you seem to understand how to Nvidia. :smile:[/QUOTE] It's only because I had to figure it out. I have the same setup. |
So I have a first version of a set of GPU-based TF kernels bolted onto my longstanding TF siever code runing on my humble nVidia 430 card. (This is a far more manageable way of getting hands-on coding/optimization experience on GPU than, say, a bignum-FFT code.) Here a high-level overview of the current simple early-trials setup. Notationally, we are trying many factor candidates q = 2.k.p+1 to see if they divide 2^p-1. All the usual tricks to cost-effectively winnow the q's ahead of the modexp step are done.
[1] In GPU mode, the siever accumulates batches of 2^20 k's (each k stored in a 64-bit array slot - the alternative of a 64-bit 'base k' and 2^20 32-bit offsets halves the GPU <-> CPU memory transfer, but runs more slowly for me), then sends each such batch to the GPU. [2] There are currently 2 distinct GPU modular-exponentiation kernels available, depending on the size of the q's being tested: a pure-integer one for q < 2^64, and a floating-double-based one which allows q's up to 2^78. Both are based on the Hensel-Montgomery modmul, and use the inverse-power-of-2 variant I describe in [url=http://arxiv.org/abs/1303.0328]section 7 of this manuscript[/url], and which I also used (in 96-bit-modulus mode) to discover the 4th known factor of the double-Mersenne MM31 roughly a decade ago. Interestingly, the 78-bit algo runs less than 2x slower than the 64-bit - I had expected a much more severe penalty there, based on previous comparative timings for the 2 algorithms on Intel CPUs in 32-bit-build mode (i.e. where the integer math is 64-bit-emulated-via-32-bit, as for the GPU). [3] There is currently no exploitation of concurrency - the CPU does a bunch of sieving, feeds the resulting batches of k's to the GPU, then checks the returned results to see if a factor was turned up. Lather, rinse, repeat. Ideally one would want the CPU to be able to continue sieving in preparation for the next batch while the GPU is doing parallel modexp on the current batch. Thoughts on the simplest/most-effective ways of achieving the desired concurrency are welcome, [4] I am quite certain I am underutilizing the GPU by up to an order of magnitude. Have not begun playing with the nVidia profiling tools, but (a) an mfaktc build runs 4-5x faster on the same hardware, and (b) If I run 2 separate TF processes on the Haswell-quad system in which the GPU is an add-on, which thus can sieve on separate CPUs but must share the GPU, the runtime for each only increases a few % over a single-job trial. So, "I'm looking for a couple of good 2x speedups". Thoughts: o Is running multiple kernels from within the same overall job a good way of boosting GPU usage? If so, would it be best to launch each kernel from within a separate sieving thread (using e.g. pthreads)? What are the good options here? o Does the [i]num_compute_units[/i] output I showed in my 28 July post (which corresponds to what nVidia calls the multiProcessorCount) indicate how many kernels can run concurrently on the GPU? o Are streams a better way of achieving concurrency than multiple kernels? o It seems number-of-registers-used is a key performance correlate on the GPU, as it directly bears on the achievable degree of parallelism. How do I get NVCC to report that? When I added the '-ptxas-options=-v' flag to the compile flags following the CUDA C guide, I did not see the extra reporting re. register usage which that is supposed to trigger. o Does it makes sense to 'onshore' the sieving onto the GPU? At the moment, I don't see a good way to do this - sure, one can envision 'parallel subsievers' each of which eliminates multiples of some subset of the small-primes base from the current sieving-range bitmap, but how could one concurrently bit-clear the shared-memory bitmap for these? Using multiple copies of the initial bitmap, clearing a subrange of primes from each and then ANDing them all together seems like it would use way too large a memory footprint. Thoughts from the old hands appreciated! |
| All times are UTC. The time now is 17:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.