mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 Register FAQ Search Today's Posts Mark Forums Read

2018-04-04, 18:40   #2663
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×32×5×72 Posts

Quote:
 Originally Posted by Lexicographer Hello! I'm not sure it's correct place to ask this, but I'm bumping into a problem while trying to compile the latest CUDALucas under Linux. The problem is: It's the same if I try different versions of compute/sm. I have CUDA Toolkit 9.1 installed. Any suggestions, please?
The existing linux executables won't do for your purposes? (Existing linux executables from 2015 are CUDA versions 4.2, 5.0, 5.5, 6.0, 6.5; all 64-bit.)

Look at NVIDIA developer support or download sites for nvml.h, and the linux equivalent of the nvml.dll required on Windows? Note from flashjh at https://sourceforge.net/p/cudalucas/code/102/ says: Updates for GPU UUID from NVML. nvml.dll is required and included with the .7z file. Have not compiled or tested linux yet. Also included code to exclude 1.x GPUs for CUDA >7.0 and future code to exclude 2.x when CUDA does not support.

https://sourceforge.net/projects/cudalucas/files/ has a tgz file of precompiled V2.05.1 for linux, which is from before the added code requiring nvml.

Or see https://stackoverflow.com/questions/...r-file-missing from some years back, about the effects of a missing nmvl.h.

Good luck!
Attached Thumbnails

Last fiddled with by kriesel on 2018-04-04 at 18:46

2018-04-05, 17:57   #2664
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

441010 Posts

Quote:
 Originally Posted by LaurV The other way, to display ETA as the "number of remaining iterations" multiplied with "iteration time", will give you an immediate result when you move it to a faster toy, but it will be very-VERY jumpy ETA, due to the fact that iteration time varies a lot with how busy your computer is. Some of us use the computers for other activities too. So it is not "reliable". Some kind of "averaging" with the past values (either SMA, or EMA) need to be done, to avoid the jumpy ETA, and you will still see "no effect" when you move it, unless the MA (moving average) main period passes. Of course, it would be nice to have an option in the ini file, for example, where to chose an averaging period, something like 255 should be the actual method, (just an example), something like 0 should be "no averaging" (jumpy). But I feel we request too much already.
I appreciate your input. I've read enough of the thread posts to know you've been around here a while.

It appears to me that different apps handle the jumpy versus averaging ETA differently. Prime95 appears to take both forks; worker window ETA fluctuates with current time per iteration and user behavior, while Status appears to be the long term averaging over the history of the exponent run approach. In my experience this often results in the actual progress beating the Status prediction to completion by days or sometimes weeks. I find the more responsive indication more useful. Others may not.

As far as requesting too much, I prefer to think of the lists I've made as representing abundant opportunity. Any would be code author or maintainer can choose how to spend his time, what his personal priorities are, and what to tackle. Any progress that's made and shared benefits all the users of a software title and the project.

 2018-04-06, 06:00 #2665 Dubslow Basketry That Evening!     "Bunslow the Bold" Jun 2011 40
2018-04-06, 06:07   #2666
SELROC

9,733 Posts

Quote:
 Originally Posted by kriesel I appreciate your input. I've read enough of the thread posts to know you've been around here a while. It appears to me that different apps handle the jumpy versus averaging ETA differently. Prime95 appears to take both forks; worker window ETA fluctuates with current time per iteration and user behavior, while Status appears to be the long term averaging over the history of the exponent run approach. In my experience this often results in the actual progress beating the Status prediction to completion by days or sometimes weeks. I find the more responsive indication more useful. Others may not. As far as requesting too much, I prefer to think of the lists I've made as representing abundant opportunity. Any would be code author or maintainer can choose how to spend his time, what his personal priorities are, and what to tackle. Any progress that's made and shared benefits all the users of a software title and the project.

With mprime I have noted that even on a dedicated system, with no user input, the ETA fluctuates with system load, obviously because of the system daemons that periodically occupy the CPU. Theoretically with maximum priority set for mprime, still the ETA should be fluctuating.

2018-04-07, 03:21   #2667
Lexicographer

Mar 2018
Shenzhen, China

2×32 Posts
Finally compiled on Kubuntu

Quote:
 Originally Posted by kriesel V2.05.1 for linux, which is from before the added code requiring nvml.

I downloaded file CUDALucas.cu of version 2.05.1 (r94) and the source finally compiled. I already confirmed the executable works well.

As for the precompiled executables, they just won't run on my machine:

Quote:
 $./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 bash: ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64: No such file or directory 2018-04-07, 18:13 #2668 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×32×5×72 Posts Quote:  Originally Posted by Lexicographer Thanks for your help! I downloaded file CUDALucas.cu of version 2.05.1 (r94) and the source finally compiled. I already confirmed the executable works well. As for the precompiled executables, they just won't run on my machine: You're welcome. Reviewing my offline notes, I see there was a stretch of posts 2642-2648 about the same error message you encountered. Note that success on one fft length, or one run, while a good sign, does not ensure success on another. Running the available residue checks on the fft lengths you're likely to use is recommended. (See the -r option.) Running a separate gpu memory test program, or thorough coverage with CUDALucas's -memtest option, is also a useful precaution. There are certain known-bad residues that sometimes appear, indicating a problem. Continuing an exponent run after any such appear is a waste of processing time. Restarting from a saved interim file from before such appear is recommended. For CUDALucas, these known problem residues are: 0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffd 0x0 is ok for a final residue if it does not occur before then; appearing only as the final residue is the sign of a rare prime result. Appearing earlier is a sign of something gone wrong, perhaps a host to gpu or gpu to host failed memory copy. Residues that repeat from one iteration to the next would also be a symptom of a problem. As I recall, the April 18 2017 2.06beta did not include the nvml requirement, but did include at least some checks for some of those known-bad residues. I suggest you try compiling that version for more likely valid results. Please consider making your compiled executable(s) available to James Heinrich for posting on his software mirror. http://download.mersenne.ca/CUDALucas/ For performance tuning, if you haven't yet, you may want to run -fftbench and -threadbench for the range of fft lengths you're likely to use. I would use at least 3 passes on a fast card like yours, and expand the range outward to powers of two including the intended usage. For example, if intending to use 4608k-7168k, benchmark fft and threads over 4096k-8192k. You may find skimming through the bug and wish list useful; last posted http://www.mersenneforum.org/showpos...postcount=2658 Please note any other bugs you might find, or features that would be useful, and I'll add them to the list. Also a run-time scaling guide is available at post 2659. Last fiddled with by kriesel on 2018-04-07 at 18:32 2018-04-08, 15:34 #2669 chris2be8 Sep 2009 76216 Posts Quote:  Originally Posted by Lexicographer As for the precompiled executables, they just won't run on my machine: Quote: $ ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 bash: ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64: No such file or directory
Code:
ls -l CUDALucas-2.05.1-CUDA6.5-linux-x86_64
file CUDALucas-2.05.1-CUDA6.5-linux-x86_64
ldd CUDALucas-2.05.1-CUDA6.5-linux-x86_64
uname -a
That might give a clue as to what's wrong.

Chris

 2018-04-08, 18:52 #2670 janfrode   Apr 2018 1 Posts Built latest CUDALucas from source.. Just a tiny FYI on what was needed to build the latest CUDALucas from source on RHEL7.4 with CUDA 9.1. I had to add "#include " to CUDALucas.cu and also it needed to be linked with /usr/local/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so to the final build command was: gcc CUDALucas.o parse.o -O1 -Wall -fPIC -L/usr/local/cuda/lib64 -L/usr/local/cuda/targets/x86_64-linux/lib/stubs/ -lcufft -lcudart -lnvidia-ml -lm -o CUDALucas and for ppc64le it was approximately the same command: gcc CUDALucas.o parse.o -O1 -Wall -fPIC -L/usr/local/cuda/lib64 -L/usr/local/cuda/targets/x86_64-linux/lib/stubs/ -lcufft -lcudart -lnvidia-ml -lm -o CUDALucas oh.. and in the Makefile I also specified "--generate-code arch=compute_50,code=sm_50" which is the capability of my 940MX, and compute_60,cm_60 for the Tesla P100. On the ppc64le I am seeing some strange stuff (crash on -threadbench) or if I don't specify "-threads". But it seems to be working fine for LL.
2018-04-09, 03:51   #2671
Lexicographer

Mar 2018
Shenzhen, China

2×32 Posts

Quote:
 Originally Posted by chris2be8 Please post output from: Code: ls -l CUDALucas-2.05.1-CUDA6.5-linux-x86_64 file CUDALucas-2.05.1-CUDA6.5-linux-x86_64 ldd CUDALucas-2.05.1-CUDA6.5-linux-x86_64 uname -a
Results:

Quote:
 $ls -l CUDALucas-2.05.1-CUDA6.5-linux-x86_64 -rwxr-xr-x 1 andriy andriy 753136 2月 12 2015 CUDALucas-2.05.1-CUDA6.5-linux-x86_64$ file CUDALucas-2.05.1-CUDA6.5-linux-x86_64 CUDALucas-2.05.1-CUDA6.5-linux-x86_64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a8b4728865a4f5a480dd218c33fd85728a4914c3, not stripped $ldd CUDALucas-2.05.1-CUDA6.5-linux-x86_64 linux-vdso.so.1 => (0x00007ffce48d6000) /lib/$LIB/liblsp.so => /lib/lib/x86_64-linux-gnu/liblsp.so (0x00007f1f9f438000) libcufft.so.6.5 => not found libcudart.so.6.5 => not found libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1f9f0e2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1f9ed02000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1f9eafe000) /lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f1f9f63d000) \$ uname -a Linux DeepDragon 4.13.0-38-generic #43-Ubuntu SMP Wed Mar 14 15:20:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
If I run locate libcufft, some of the results are:
Quote:
 /usr/lib/x86_64-linux-gnu/libcufft.so.8.0 /usr/local/cuda-9.1/lib64/libcufft.so /usr/local/cuda-9.1/lib64/libcufft.so.9.1 /usr/local/cuda-9.1/lib64/libcufft.so.9.1.85
And it's similar for libcudart:
Quote:
 /usr/lib/x86_64-linux-gnu/libcudart.so.8.0 /usr/lib/x86_64-linux-gnu/libcudart.so.8.0.61 /usr/local/cuda-9.1/lib64/libcudart.so /usr/local/cuda-9.1/lib64/libcudart.so.9.1 /usr/local/cuda-9.1/lib64/libcudart.so.9.1.85

2018-04-09, 04:45   #2672
Lexicographer

Mar 2018
Shenzhen, China

1216 Posts
Also compiled for 1080 Ti with CUDA 9.1

Quote:
 Originally Posted by janfrode I had to add "#include " to CUDALucas.cu and also it needed to be linked with libnvidia-ml.so
Thanks for your help! I tried that and it worked with the latest code (2.06beta, r102).

Since the NVIDIA Management Library (nvidia-ml, nvml) was in different location on my machine, my build commands were:
Quote:
 /usr/local/cuda/bin/nvcc -O1 --generate-code arch=compute_61,code=compute_61 --compiler-options=-Wall -I/usr/local/cuda/include -c CUDALucas.cu gcc CUDALucas.o parse.o -O1 -Wall -fPIC -L/usr/local/cuda/lib64 -L/usr/lib/nvidia-390/ -lcufft -lcudart -lm -lnvidia-ml -o CUDALucas
So I linked /usr/lib/nvidia-390/ instead of /usr/local/cuda/targets/x86_64-linux/lib/stubs/, and used compute_61 for my GTX 1080 Ti.

Last fiddled with by Lexicographer on 2018-04-09 at 04:55

 2018-05-27, 16:58 #2673 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×32×5×72 Posts Improved recovery from Windows TDRs on old gpus A recovery method tested the last few days on CUDAPm1 and GTX480 may also apply here. See the detailed writeup at http://www.mersenneforum.org/showpos...8&postcount=37

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 15:21.

Mon Sep 21 15:21:14 UTC 2020 up 11 days, 12:32, 1 user, load averages: 2.04, 2.26, 1.99