mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-11-18, 02:47   #1992
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by flashjh View Post
My fft.txt file says that Threads=512 256 256 is the best setting for me to use for the current FFT range I'm in, so I leave it there and that works for the -r test too.
Thanks, Jerry. That seems to complete the puzzle. "Threads=512 256 256" seems to have let me complete 'CUDALucas -cufftbench 1 8192 1' twice, when mostly it would not complete with 1024 1024 1024, or 256 256 256. I found the threads file which CUDAPm1 generated for my 580, and it agrees with your numbers.

The card is still throttled way back. I'm going to see if it will run with at least the core clocked up a bit.

EDIT: Partial correction is in order. Before the last few runs I also switched the display from the 580 to the GTX 570. This was at Antonio's suggestion, and also seemed to play a role in stabilizing the 580.

EDIT2: I declared victory prematurely. The latest attempt with -r yielded-
Code:
E:\CUDA\2.05-BETA>CUDALucas -r

------- DEVICE 0 -------
name                GeForce GTX 580
Compatibility       2.0
clockRate (MHz)     1564
memClockRate (MHz)  1600
totalGlobalMem      1610612736
totalConstMem       65536
l2CacheSize         786432
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     1536
multiProcessorCount 16
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
textureAlignment    512
deviceOverlap       1

Starting self test M86243 fft length = 4K
Running careful round off test for 1000 iterations.
If average error > 0.25, or maximum error > 0.35,
the test will restart with a longer FFT.
Iteration  100, average error = 0.15317, max error = 0.23438
Iteration  200, average error = 0.16318, max error = 0.24521
Iteration  300, average error = 0.16738, max error = 0.23047
Iteration  400, average error = 0.17024, max error = 0.25000
Iteration  500, average error = 0.17168, max error = 0.25000
Iteration  600, average error = 0.17195, max error = 0.23438
Iteration  700, average error = 0.17195, max error = 0.20947
Iteration  800, average error = 0.17240, max error = 0.21875
Iteration  900, average error = 0.17285, max error = 0.25000
Iteration 1000, average error = 0.17264 <= 0.25 (max error = 0.25000), continuing test.
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4K, CUDALucas v2.05 Beta err = 0.28125 (0:01 real, 0.0825 ms/iter)
This residue is correct.

The fft length 16K is too large for the exponent 132049. Restart with smaller fft.
The above happens regardless of which card the monitor is connected to.

Last fiddled with by kladner on 2013-11-18 at 03:35
kladner is offline   Reply With Quote
Old 2013-11-18, 15:15   #1993
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Quote:
Originally Posted by Antonio View Post
The -2GiB is reported by 2.05-Beta-x64, downloaded today.
The card has +2GiB of memory installed.
My mistake. I fixed the code, but only put it into CUDAPm1. I put it into CUDALucas this weekend. That version of the code will be up soon.
owftheevil is offline   Reply With Quote
Old 2013-11-18, 15:23   #1994
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Quote:
Originally Posted by kladner View Post
I am having a pretty rough time with 2.05 on the GTX 580.
'CUDALucas -cufftbench 1 8192 1' crashes on GTX 580, brings down graphic driver 327.23 (which restarts). 782 MHz core, 1600 VRAM

This is just the latest test of many. Occasionally, the test completes.

Rolling back the driver from 331.65 to 327.23 made no difference.

2.04-beta successfully completes
'CUDALucas -cufftbench 32768 3276800 32768' and has turned in good DCs at 830 MHz core, 1600 VRAM.

I haven't yet tried running a DC on 2.05-beta.

I have the card throttled back from where it normally runs mfaktc to stock: from 844 MHz to 782 MHz. The RAM is 400 MHz below stock.

Any suggestions would be appreciated.

EDIT: Tried running an exponent, 30651xxx on 2.05, 830 MHz core, 1600 VRAM. Started with 1728K, instead of stepping up to it from 1600K, as 2.04 did. Crashed a bit after the 40,000th iteration.

Code:
Iteration 10000 M( 30651671 )C, 0x6b79bd6d5adfb7de, n = 1728K, CUDALucas v2.05 Beta err = 0.05396 (0:26 real, 2.8857 ms/iter, ETA 24:33:42)
Iteration 20000 M( 30651671 )C, 0x53064732900985e9, n = 1728K, CUDALucas v2.05 Beta err = 0.06055 (0:26 real, 2.6133 ms/iter, ETA 22:14:09)
Iteration 30000 M( 30651671 )C, 0xe85abecfe0f40dce, n = 1728K, CUDALucas v2.05 Beta err = 0.05469 (0:26 real, 2.6123 ms/iter, ETA 22:13:12)
Iteration 40000 M( 30651671 )C, 0xa4208cf27dd73713, n = 1728K, CUDALucas v2.05 Beta err = 0.06250 (0:26 real, 2.6123 ms/iter, ETA 22:12:48)
CUDALucas.cu(310) : cudaSafeCall() Runtime API error 30: unknown error.
This is an Nvidia driver error. I used to think it only occured when the card was also driving the display. Recently I got this error on a 570 which was not driving the display (although it didn't report an error, it just hung indefinitely). It seems to have been introduced as of the 300+ drivers.
owftheevil is offline   Reply With Quote
Old 2013-11-18, 15:29   #1995
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Quote:
Originally Posted by flashjh View Post
I just completed a DC on M57885161 with CUDALucas 2.05-Beta-x64. It completed without error and I even switched FFT sizes a few times. Since I have the full run of residues from the first time I ran it, I was able to check progress along the way.

The only issue I found so far was keyboard input. If Interactive=n is set to 1 in the .ini file then anytime I pressed a key the program would stop progress. GPU usage dropped to about 50% but ^c still stopped the run. I could restart with no problems. Anyone else seen this in Windows or Linux? Can some others test this to see if it's working or not in Windows and Linux?

I haven't run all the FFT benchmarks yet, I'll do that now. Anyone else having a problem with the amount of memory reported by CUDALucas?
I have seen this keyboard input problem. If I run cmd.exe (?? I think that's its name) keyboard input dosen't work. The other console program, whatever its called, does work with keyboard input.
owftheevil is offline   Reply With Quote
Old 2013-11-18, 15:31   #1996
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Quote:
Originally Posted by kladner View Post
Another observation/question- should the savefiles of v 2.04beta be more than three times as large as those of v 2.05beta? .....EDIT: for the same exponent?
Yes they should be.
owftheevil is offline   Reply With Quote
Old 2013-11-18, 15:45   #1997
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Quote:
Originally Posted by Manpowre View Post
.
.
.
Starting self test M43112609 fft length = 2304K
Running careful round off test for 1000 iterations.
If average error > 0.25, or maximum error > 0.35,
the test will restart with a longer FFT.
Iteration 100, average error = 0.17969, max error = 0.28125
Iteration 200, average error = 0.20398, max error = 0.26563
Iteration 300, average error = 0.21162, max error = 0.27344
Iteration 400, average error = 0.21489, max error = 0.28125
Iteration 500, average error = 0.21730, max error = 0.28125
Iteration 600, average error = 0.21847, max error = 0.26563
Iteration 700, average error = 0.21941, max error = 0.25781
Iteration 800, average error = 0.22026, max error = 0.25879
Iteration 900, average error = 0.22068, max error = 0.26172
Iteration 1000, average error = 0.22089 <= 0.25 (max error = 0.28125), continuin
g test.
Iteration 10000 M( 43112609 )C, 0x62871c7027ff12c8, n = 2304K, CUDALucas v2.05 B
eta err = 0.50000 (0:21 real, 2.0989 ms/iter)
Expected residue [e86891ebf6cd70c4] does not match actual residue [62871c7027ff1
2c8]
.
.
.
(tested on titan)
Notice the round-off error. Which driver are you using?
owftheevil is offline   Reply With Quote
Old 2013-11-18, 15:55   #1998
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

1001110112 Posts
Default

Quote:
Originally Posted by flashjh View Post
An issue:

When running CUDALucas -r with a GeForce GTX --- fft.txt

you may get the error:
Code:
The fft length 32K is too large for the exponent 216091. Restart with smaller fft.
Removing the file, as noted above, fixes the error. So when -cufftbench is run and the .txt file is gereated I presume the FFTs are tuned correctly. However, the new 'less tolerant' code won't accept those values for use in the self test.

Also, can someone explain the updated threads in 2.05. Is it necessary to have .ini file threads anymore? Why three values instead of 1. What is the interaction with the new .txt file?

Thanks
Or instead of deleting the threads.txt file, insert a line with 16 as its only entry before the line with 32 on it. This is a lack of foresight on my part. Even though 32k ffts are faster than 16k or other smaller ffts big enough to handle 216091, some of those smaller ffts are still needed. I'll think about how to fix this.

As for threads in the ini file, there are three kernels whose performance depends on the number of threads they are invoked with. 2.04 and earlier fixed the threads on two of them at 128, which is a good compromise. Those values should be the defaults in the ini file. I don't know how 1024 snuck its way in as the default.The values in threads.txt override the ini values.
owftheevil is offline   Reply With Quote
Old 2013-11-18, 18:54   #1999
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Notice the round-off error. Which driver are you using?
Latest nvidia 331.65

cudalucas 2.03 doesnt give me wrong residue.

I took down clock on gpu with 200mhz, and it run fine again with 2.05.
Manpowre is offline   Reply With Quote
Old 2013-11-18, 19:21   #2000
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32×5×7 Posts
Default

Quote:
Originally Posted by Manpowre View Post
Latest nvidia 331.65

cudalucas 2.03 doesnt give me wrong residue.

I took down clock on gpu with 200mhz, and it run fine again with 2.05.
Good to hear. What clocks are you running at now?
owftheevil is offline   Reply With Quote
Old 2013-11-18, 19:38   #2001
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Good to hear. What clocks are you running at now?
780 mhz, stock memory clock, I put the residue test up again for 20 repeats an hour ago, so I will complete this, then take it back up to stock clock at 880 and retest 20 residue runs again.
Manpowre is offline   Reply With Quote
Old 2013-11-18, 21:55   #2002
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

The new code is compiled and the windows binaries (release/debug) are posted on SourceForge.

@owftheevil: The -memtest functions, but something isn't right with the iterations. For example 56 1000 1 on my 580 says ETA 12181:18:07

I posted a working memtest.zip to sourceforge

EDIT: Please only use 2.05 Beta .exe files for testing the code. It is not ready for production use yet. Thanks!

Last fiddled with by flashjh on 2013-11-18 at 21:56
flashjh is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 07:18.


Fri Aug 6 07:18:57 UTC 2021 up 14 days, 1:47, 1 user, load averages: 3.02, 2.80, 2.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.