mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-05-31, 06:00   #1321
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by flashjh View Post
I downloaded the file @ 17:23 MDT

Edit: Can you compile/run Windows versions?
I have Windows, but I'd rather not have to shut down and reboot; I'd also need to get the CUDA toolkit and MSVC installed.

I'm half considering going to all the bother; see my edit of the post you quoted

Last fiddled with by Dubslow on 2012-05-31 at 06:06
Dubslow is offline   Reply With Quote
Old 2012-05-31, 06:13   #1322
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Heehee, I've been modifying the source with trifling differences for much of the evening, and that DL was symlinked to my files. I'll see how late it was
I've completed one expo with 2.02 myself, and another one due in a few hours. I'll come back and edit this post with a source archive that compiles "out of the box" on both Windows and Linux.

Edit: Sheesh, where did you get that makefile? In all 2.x versions I've seen from msft, the makefiles he's included all had arch=sm_13 and -O2; this win makefile has a whole bunch of stuff that is (AFAICT) unnecessary. How did you even get it to compile with arch not sm_13? The change msft made in 2.01 requires sm_13 (at least nvcc threw me an error when I tried later arches), and we've figured out it's the fastest.
Code:
CUFLAGS = -m64 --ptxas-options=-v -ccbin=$(CCLOC) -D$(BIT)  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE  "-I$(CUDA)/include"  -D__x86_64__ -O3
The include certainly isn't necessary; the only thing that is included is in the source archives. I'm pretty sure if you did a Ctrl+F on the defines in the source, you wouldn't find anything. Here's my CUFLAGS:
Code:
CUFLAGS = -O2 -arch=sm_13 --compiler-options=-Wall
The first two are from msft's makefile, and the last one is the one I added which produced warnings.

Is this win makefile a relic of MacLucas?
I had to modify a .win MAKEFILE some time ago. I think I used the include because I didn't have all the paths in my PATH; I'd have to check on that one later. As for sm_13, well, CUDA 3.2, sm_13 is faster, but it wouldn't compile with this version. It has something to do with extern, I think. Also, I never tried buidling all version with sm_13 since that wasn't what everyone was wanting (or at least I thought).

Quote:
How did you even get it to compile with arch not sm_13? The change msft made in 2.01 requires sm_13 (at least nvcc threw me an error when I tried later arches), and we've figured out it's the fastest.
It has always worked fine for me with sm_13, sm_20, sm_21, as applicable from the MAKEFILE. I saw your post about it some time ago, but it wasn't an issue - I don't know why?

Code:
CUFLAGS = -m64 --ptxas-options=-v -ccbin=$(CCLOC) -D$(BIT) -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-I$(CUDA)/include" -D__x86_64__ -O3
I rememeber looking up all the CUFLAGS in nVidia's CUDA whitepaper. It all made sense. I was able to modify the MAKEFILE to work, but I'm still learning some of the terminology in there. I know I used Brain's MAKEFILE as an example (among others' also).

Last fiddled with by flashjh on 2012-05-31 at 06:15
flashjh is offline   Reply With Quote
Old 2012-05-31, 07:18   #1323
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I'll come back and edit this post with a source archive that compiles "out of the box" on both Windows and Linux.
Okay, here it is. I've made some minor changes to the Windows makefile:
Code:
CUFLAGS = -m64 --ptxas-options=-v -ccbin=$(CCLOC) -D$(BIT)  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DDO_NOT_USE_LONG_DOUBLE  "-I$(CUDA)/include"  -D__x86_64__ -O3
Code:
$(CC) $(CFLAGS) /c $< /Fo$@
According to http://msdn.microsoft.com/en-us/library/032xwy55.aspx , /Tp makes it think it's a C++ file, which certainly isn't the case.

I also added various preprocessor statements to parse.c and copied the (float) casts into CUDALucas.cu. There should be a lot less manual work required on your part flash I suddenly see why compiling was such a pain. One last thing, flash: I have no clue if this would work or not, but since "C" was not required on those function defines at the top of CUDALucas.cu, does it compile if you delete the "extern"? If so, try deleting those statements altogether; "parse.h" has been modified to include the definitions, so that any "extern" statements are only needed on Linux.

msft, do you have any comments about arch, compiler flags, etc.? (PS, Could you please explain how choose_fft_length() works, and/or why the array np[] isn't just [1,2,3,4,5,6,7]?)

(Note: I tried making it so that you could enable -t on the fly; it turns out there's a reason msft didn't have that . If you use flash's compilation before this post, don't press "t" when -k is enabled. That's reverted in this attachment.)
Attached Files
File Type: bz2 CUDALucas-2.02.tar.bz2 (19.3 KB, 68 views)

Last fiddled with by Dubslow on 2012-05-31 at 07:56
Dubslow is offline   Reply With Quote
Old 2012-05-31, 09:37   #1324
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Heehee, I've been modifying the source with trifling differences for much of the evening
Heh, it seems I never enumerated those changes. They were of a mostly janitorial nature not affecting how the program behaves; a key exception is that I added another "PrintDeviceInfo" ini file option. This means you can see the device info even for device 0, and alternately you can turn off the printing for devices other than 0; previously those two were not possible with the -d switch. There have also been some cosmetic changes, mostly to what PrintDeviceInfo actually prints

And, since my previous post, I've now made it so that running "-r" (self test) prints slightly more information, as well as automatically enabling the -t extra error checking. It's attached.
Attached Files
File Type: bz2 CUDALucas-2.02.tar.bz2 (19.6 KB, 73 views)
Dubslow is offline   Reply With Quote
Old 2012-06-01, 03:53   #1325
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default Running -cufftbench bug

Whenever I run any sort of -cufftbench, CUDALucas takes up a whole processor core (as opposed to ~nil when doing "production" work). This happens even in versions 2.00 and 2.01, so I know it's not anything I did to the code. Has anybody else seen this happen?

Edit: My screen also becomes very unresponsive (though not unuseable). Again, this doesn't normally happen.

Last fiddled with by Dubslow on 2012-06-01 at 04:02
Dubslow is offline   Reply With Quote
Old 2012-06-01, 06:05   #1326
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Okay, here it is. I've made some minor changes to the Windows makefile:
Code:
CUFLAGS = -m64 --ptxas-options=-v -ccbin=$(CCLOC) -D$(BIT)  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DDO_NOT_USE_LONG_DOUBLE  "-I$(CUDA)/include"  -D__x86_64__ -O3
Code:
$(CC) $(CFLAGS) /c $< /Fo$@
According to http://msdn.microsoft.com/en-us/library/032xwy55.aspx , /Tp makes it think it's a C++ file, which certainly isn't the case.

I also added various preprocessor statements to parse.c and copied the (float) casts into CUDALucas.cu. There should be a lot less manual work required on your part flash I suddenly see why compiling was such a pain. One last thing, flash: I have no clue if this would work or not, but since "C" was not required on those function defines at the top of CUDALucas.cu, does it compile if you delete the "extern"? If so, try deleting those statements altogether; "parse.h" has been modified to include the definitions, so that any "extern" statements are only needed on Linux.

msft, do you have any comments about arch, compiler flags, etc.? (PS, Could you please explain how choose_fft_length() works, and/or why the array np[] isn't just [1,2,3,4,5,6,7]?)

(Note: I tried making it so that you could enable -t on the fly; it turns out there's a reason msft didn't have that . If you use flash's compilation before this post, don't press "t" when -k is enabled. That's reverted in this attachment.)
Quote:
Originally Posted by Dubslow View Post
Heh, it seems I never enumerated those changes. They were of a mostly janitorial nature not affecting how the program behaves; a key exception is that I added another "PrintDeviceInfo" ini file option. This means you can see the device info even for device 0, and alternately you can turn off the printing for devices other than 0; previously those two were not possible with the -d switch. There have also been some cosmetic changes, mostly to what PrintDeviceInfo actually prints

And, since my previous post, I've now made it so that running "-r" (self test) prints slightly more information, as well as automatically enabling the -t extra error checking. It's attached.
Ok, everything compiled -- still have not tested. Here is a breakdown of changes I had to make for compile:

1) The MAKEFILE.WIN you created worked except I can't remove '/Tp'. Without that the compiler didn't understand 'timezone':

Code:
 
timeval.c(31) : warning C4115: 'timezone' : named type definition in parentheses
timeval.c(32) : error C2055: expected formal parameter list, not a type list make: *** [timeval.x64.obj] Error 2
2) I was able to remove all the requested extern statements from CUDALucas.cu

3) CUDALucas.cu line 352 needed (float)

4) Parse.c line 386 needed '}' to close the if statement (this one took me a while to find)

5) I was able to remove the "-I$(CUDA)/include" line in the MAKEFILE.WIN

6) I was able to compile 3.2 | sm_13 ( I was using the wrong environment for 3.2 before... that's what happens with no )

I didn't change any of your linux definitions and the statements seem good for compiling 'out of the box' for Win and Linux. You can take the included files and see if you can compile. If not, pass everything back to me for another run.

Have you tested everything on Linux? Since you made the changes, can you pass along a sample worktodo.txt file with good and bad so I can ensure Windows is capable of everything your Linux can do ;)

(Source here, Compiled versions in next post)
Attached Files
File Type: zip CUDALucas2.02.src.zip (22.2 KB, 75 views)

Last fiddled with by flashjh on 2012-06-01 at 06:12
flashjh is offline   Reply With Quote
Old 2012-06-01, 06:08   #1327
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

CUDALucas 2.02 x64 (updated) - Untested

Includes:

CUDA 3.2 | sm_13
CUDA 4.0 | sm_20
CUDA 4.0 | sm_21
Attached Files
File Type: zip CUDALucas2.02.x64.zip (241.2 KB, 79 views)
flashjh is offline   Reply With Quote
Old 2012-06-01, 13:08   #1328
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

7×1,373 Posts
Default

grrrr...

command line should have priority when there is no ini file, and from the tentative to switch the politeness it can be seen that it has, but in this case the messages are gibberish...

Code:
>cl2024020x64 -d 0 -c 100000 -s backup0 -t -polite 0 -k 26070883

Warning: Couldn't parse ini file option CheckpointIterations; using default: 10000
Warning: Couldn't parse ini file option Threads; using default: 256
Warning: Couldn't parse ini file option SaveAllCheckpoints; using default: off
Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off
Warning: Couldn't parse ini file option Polite; using default: 1
Warning: Couldn't parse ini file option Interactive; using default: off
Warning: Couldn't parse ini file option DeviceNumber; using default: 0
Warning: Couldn't parse ini file option PrintDeviceInfo; using default: off
Warning: Couldn't parse ini file option WorkFile; using default: "worktodo.txt"
Warning: Couldn't parse ini file option FFTLength; using autoselect.
------- DEVICE 0 -------
name                GeForce GTX 580
<...snip...>
multiProcessorCount 16

mkdir: cannot create directory `backup0': File exists
Continuing work from a partial result of M26070883 fft length = 1572864 iteration = 9082202
Iteration 9100000 M( 26070883 )C, 0x243869296cca0a3a, n = 1572864, CUDALucas v2.02 err = 0.02246 (0:51 real, 0.5022 ms/iter, ETA 2:21:28)
Iteration 9200000 M( 26070883 )C, 0xd32a5f55d157802f, n = 1572864, CUDALucas v2.02 err = 0.02344 (4:32 real, 2.7260 ms/iter, ETA 12:43:17)
p
   -polite 1
p
   -polite 0
Iteration 9300000 M( 26070883 )C, 0x23e3b9dd22444ef9, n = 1572864, CUDALucas v2.02 err = 0.02344 (4:34 real, 2.7415 ms/iter, ETA 12:43:03)
The second instance (the other card) really scared me when it said "devicenumber not found using default 0", but I calmed myself down when I saw it was running on device 1, as it was supposed to do.

Last fiddled with by LaurV on 2012-06-01 at 13:11
LaurV is offline   Reply With Quote
Old 2012-06-01, 19:37   #1329
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by LaurV View Post
grrrr...

command line should have priority when there is no ini file, and from the tentative to switch the politeness it can be seen that it has, but in this case the messages are gibberish...
I assumed that everyone would get a copy of the inifile with the exes, even if they didn't use them.
Code:
  /*! These warnings are uncommented because the "shipped" ini file also contains these defaults,
      and therefore none of these should be set off anyways. */
  if( !IniGetInt(INIFILE, "CheckpointIterations", &checkpoint_iter, 10000) )
    fprintf(stderr, "Warning: Couldn't parse ini file option CheckpointIterations; using default: 10000\n");
  if( !IniGetInt(INIFILE, "Threads", &threads, 256) )
    fprintf(stderr, "Warning: Couldn't parse ini file option Threads; using default: 256\n");
  if( !IniGetInt(INIFILE, "SaveAllCheckpoints", &s_f, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option SaveAllCheckpoints; using default: off\n");
  if( s_f && !IniGetStr(INIFILE, "SaveFolder", folder, "savefiles") )
    fprintf(stderr, "Warning: Couldn't parse ini file option SaveFolder; using default: \"savefiles\"\n");
  if( !IniGetInt(INIFILE, "CheckRoundoffAllIterations", &t_f, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off\n");
  if( !IniGetInt(INIFILE, "Polite", &polite, 1) )
    fprintf(stderr, "Warning: Couldn't parse ini file option Polite; using default: 1\n");
  if( !IniGetInt(INIFILE, "Interactive", &k_f, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option Interactive; using default: off\n");
  if( !IniGetInt(INIFILE, "DeviceNumber", &device_number, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option DeviceNumber; using default: 0\n");
  if( !IniGetInt(INIFILE, "PrintDeviceInfo", &d_f, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option PrintDeviceInfo; using default: off\n");
  if( !IniGetStr(INIFILE, "WorkFile", input_filename, "worktodo.txt") )
    fprintf(stderr, "Warning: Couldn't parse ini file option WorkFile; using default: \"worktodo.txt\"\n");
  if( !IniGetInt(INIFILE, "FFTLength", &fftlen, 0) )
    fprintf(stderr, "Warning: Couldn't parse ini file option FFTLength; using autoselect.\n");
Also in the "shipped" ini file I explain that any of the options are overwridden by the command line
Quote:
Originally Posted by LaurV View Post
Code:
>cl2024020x64 -d 0 -c 100000 -s backup0 -t -polite 0 -k 26070883

Warning: Couldn't parse ini file option CheckpointIterations; using default: 10000
Warning: Couldn't parse ini file option Threads; using default: 256
Warning: Couldn't parse ini file option SaveAllCheckpoints; using default: off
Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off
Warning: Couldn't parse ini file option Polite; using default: 1
Warning: Couldn't parse ini file option Interactive; using default: off
Warning: Couldn't parse ini file option DeviceNumber; using default: 0
Warning: Couldn't parse ini file option PrintDeviceInfo; using default: off
Warning: Couldn't parse ini file option WorkFile; using default: "worktodo.txt"
Warning: Couldn't parse ini file option FFTLength; using autoselect.
------- DEVICE 0 -------
name                GeForce GTX 580
<...snip...>
multiProcessorCount 16

mkdir: cannot create directory `backup0': File exists
Continuing work from a partial result of M26070883 fft length = 1572864 iteration = 9082202
Iteration 9100000 M( 26070883 )C, 0x243869296cca0a3a, n = 1572864, CUDALucas v2.02 err = 0.02246 (0:51 real, 0.5022 ms/iter, ETA 2:21:28)
Iteration 9200000 M( 26070883 )C, 0xd32a5f55d157802f, n = 1572864, CUDALucas v2.02 err = 0.02344 (4:32 real, 2.7260 ms/iter, ETA 12:43:17)
p
   -polite 1
p
   -polite 0
Iteration 9300000 M( 26070883 )C, 0x23e3b9dd22444ef9, n = 1572864, CUDALucas v2.02 err = 0.02344 (4:34 real, 2.7415 ms/iter, ETA 12:43:03)
The second instance (the other card) really scared me when it said "devicenumber not found using default 0", but I calmed myself down when I saw it was running on device 1, as it was supposed to do.
Like we both remarked, the default was overwritten with the command line switch.

Well, now that I look at the zip, flash didn't include the ini-file with the binaries I've added it to the attached zip. Anyone downloading the Windows executables should get this one instead (The .exes are identical, I literally used the same ones unmodified from flash's zip.)

The ini-file also has some basic 'documentation' which describe the options in further detail than the (now -h) help message.

-------------------------------------------------------------

In the future, how should the code handle not being able to read the ini file but getting options through the command line? I could make it so that warnings are only printed for options not passed as args, but that would be somewhat difficult...
Attached Files
File Type: zip CUDALucas2.02x64-with-ini.zip (243.0 KB, 79 views)
Dubslow is offline   Reply With Quote
Old 2012-06-01, 19:53   #1330
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I assumed that everyone would get a copy of the inifile with the exes, even if they didn't use them.
Oops, sorry.


Quote:
In the future, how should the code handle not being able to read the ini but getting options through the command line? I could make it so that warningnted for options not passed as args, but that would be somewhat difficult...
Make it check for .ini first, if not there then just print '.ini not found' and handle like before.

Edit: A lot more work, maybe, but you could also ask if the user wants to create an .ini file and use any supplied switches as default. It's a lot easier if the compiler just includes it the first time

Last fiddled with by flashjh on 2012-06-01 at 20:00
flashjh is offline   Reply With Quote
Old 2012-06-01, 21:01   #1331
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Default

Quote:
Originally Posted by flashjh View Post
Make it check for .ini first, if not there then just print '.ini not found' and handle like before.
Good idea. I'm not sure how to check if a file exists in C, but a quick Google ought to solve that.
Quote:
Originally Posted by flashjh View Post
Edit: A lot more work, maybe, but you could also ask if the user wants to create an .ini file and use any supplied switches as default. It's a lot easier if the compiler just includes it the first time
Yich... part of the inifile is that, like I said, it includes a lot of ancillary information besides configuration. I'll have to think about it. (And it'd put improving FFT selection on hold.)
Dubslow is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 03:11.


Sat Jul 17 03:11:51 UTC 2021 up 50 days, 59 mins, 1 user, load averages: 1.27, 1.35, 1.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.