mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-08-29, 10:42   #1552
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

Quote:
Originally Posted by TheJudger View Post
There is only one point where I'm unsure: I don't know whether you can run both apps (with different CUDA versions) concurrently or not.
You can't. I use different sm's for CL and mfaktc, they run perfect as long as I don't mix them for the same card. I can mix them in the computer in the same time if they target different cards and the cards are not SLI. Keeping many versions in the same time in the computer is easy, you only put the right dlls in each folder, as both mfaktc and CL look in the folder for the dll if it is not loaded. But you can't RUN two versions on the same card in the same time.
LaurV is offline   Reply With Quote
Old 2012-08-29, 11:09   #1553
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·3,221 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Okay, slight change of plans: I recall LaurV somewhere saying that a larger FFT length was faster than some smaller ones in CUDALucas' table, but I wasn't able to relocate that post. In addition, I will also add the signal-handling fix discussed before to r39.

In the meantime, all Windows users should test flash's latest compile for the filelocking bug; note, however, that compared to earlier beta releases, some FFT lengths might not appear. If the bug is confirmed killed, then the final release (non-beta) of 2.04 will reincorporate the changes from the old binary lost in the new ones (i.e., it will be r39). r39 will be committed when LaurV responds.
Yes, I have the bad habit to tune the FFT size manually for every range and sometime this saved me hours of CL-ing. For example (IIRC, if not then I will give exact confirmation tomorrow, I have no internet on my house since Friday morning, and I still have to go home this evening, which will be in max half hour, and check if I am not mistaking the numbers. Here I have internet, but no GPU), so, for example, 2304k (smooth as 2^18*3^2) is much faster then the smaller ones (about 6% faster then the default one) and 2592 (smooth 2^15*3^4) is about 14% faster (NO JOKE!) than the default one (can't remember which, maybe 2400k, maybe 2568k, or the next higher one). I have an excel table somewhere, if the net problem can't be solved soon, then I will bring it here (the table is not complete, just for the ranges I had expos to test, i.e. 25M to 46M expos, but is very detailed).

And the cards are gtx580, gtx570, tesla c2050, no difference between them. Smaller granulation of FFT (smoother number) is always faster then smaller FFT size with bigger granulation (not so smooth), with very few exceptions. 1440k is such exception which id 5-smooth but still very fast! Higher then 1440 (default FFT) the default size can be almost always tuned to a better one. I can't say for sure if this is not card/OS/whatever dependent. Someone should try FFT 2592k against the smaller defaults on gtx580 on linux. I constantly get (beside of smaller/safer rounding errors) a speed improvement of 13-14% on win64/gtx580 (which is the main setup). This translates into 46-49 hours for a 4xM expo, instead of 52-55 hours.

edit: I am going home now, but you can search the forum for "2592k" I am 100% sure for this number (it seems to be only multiple of 2 and 3 too :D) and you should find my former posts. Trust better the numbers in those posts then the numbers in the current post.

Last fiddled with by LaurV on 2012-08-29 at 11:22
LaurV is offline   Reply With Quote
Old 2012-08-29, 20:23   #1554
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

11100001101012 Posts
Default

Quote:
Originally Posted by LaurV View Post
Yes, I have the bad habit to tune the FFT size manually for every range and sometime this saved me hours of CL-ing. For example (IIRC, if not then I will give exact confirmation tomorrow, I have no internet on my house since Friday morning, and I still have to go home this evening, which will be in max half hour, and check if I am not mistaking the numbers. Here I have internet, but no GPU), so, for example, 2304k (smooth as 2^18*3^2) is much faster then the smaller ones (about 6% faster then the default one) and 2592 (smooth 2^15*3^4) is about 14% faster (NO JOKE!) than the default one (can't remember which, maybe 2400k, maybe 2568k, or the next higher one). I have an excel table somewhere, if the net problem can't be solved soon, then I will bring it here (the table is not complete, just for the ranges I had expos to test, i.e. 25M to 46M expos, but is very detailed).

And the cards are gtx580, gtx570, tesla c2050, no difference between them. Smaller granulation of FFT (smoother number) is always faster then smaller FFT size with bigger granulation (not so smooth), with very few exceptions. 1440k is such exception which id 5-smooth but still very fast! Higher then 1440 (default FFT) the default size can be almost always tuned to a better one. I can't say for sure if this is not card/OS/whatever dependent. Someone should try FFT 2592k against the smaller defaults on gtx580 on linux. I constantly get (beside of smaller/safer rounding errors) a speed improvement of 13-14% on win64/gtx580 (which is the main setup). This translates into 46-49 hours for a 4xM expo, instead of 52-55 hours.

edit: I am going home now, but you can search the forum for "2592k" I am 100% sure for this number (it seems to be only multiple of 2 and 3 too :D) and you should find my former posts. Trust better the numbers in those posts then the numbers in the current post.
I would love to see the spreadsheet. For what it's worth, here's all four five lines of how CUDALucas chooses a length:

Code:
  #define COUNT 119
  int multipliers[COUNT] = {  6,     8,    12,    16,    18,    24,    32,    
                             40,    48,    64,    72,    80,    96,   120,   
                            128,   144,   160,   192,   224,   240,   256,   
                            288,   320,   336,   384,   448,   480,   512,   
                            576,   640,   672,   768,   800,   864,   896,   
                            960,  1024,  1120,  1152,  1200,  1280,  1344,
                           1440,  1536,  1600,  1680,  1728,  1792,  1920, 
                           2048,  2240,  2304,  2400,  2560,  2688,  2880,  
                           3072,  3200,  3360,  3456,  3584,  3840,  4000,  
                           4096,  4480,  4608,  4800,  5120,  5376,  5600,  
                           5760,  6144,  6400,  6720,  6912,  7168,  7680,  
                           8000,  8192,  8960,  9216,  9600, 10240, 10752, 
                          11200, 11520, 12288, 12800, 13440, 13824, 14366, 
                          15360, 16000, 16128, 16384, 17920, 18432, 19200, 
                          20480, 21504, 22400, 23040, 24576, 25600, 26880, 
                          29672, 30720, 32000, 32768, 34992, 36864, 38400, 
                          40960, 46080, 49152, 51200, 55296, 61440, 65536   };
  // Largely copied from Prime95's jump tables, up to 32M
  // Support up to 64M, the maximum length with threads == 1024
...
    int len, i, estimate = q/20;
    for(i = 0; i < COUNT; i++) {
      len = 1024*multipliers[i];
      if( len >= estimate ) 
      {
        return len;
      }
    }
If you say larger lengths are faster, it should just be a matter of removing the slower lengths from the table.

Last fiddled with by Dubslow on 2012-08-29 at 20:25 Reason: [strike]
Dubslow is offline   Reply With Quote
Old 2012-08-30, 04:58   #1555
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

We need a switch like -fft that does more than just q/20 and then increase until >=. When enabled it can a test several FFT lengths, log the time and error for each and then select the best one for that particular exponent. If a worktodo file is used, then it runs the FFT test when each exponent is started. Once an FFT is selected, it will need to be able to put the FFT into the worktodo file for that exponent. The main problem is how many FFTs to test before it's a waste of time. (If LaurV's suggestion can be vetted, it may be possible to narrow down the FFTs to a small enough number to test all each time). Once enough test data is collected and reviewed, it may be possible to have the program select a particular set of FFTs to test based on the exponent number and GPU chipset.

One thing I noticed, when the .ini file contains a particular FFT length, if the program needs to change FFT sizes, it always goes up. However, I was testing smaller exponents that needed smaller FFTs (it took me a while to figure out why the program was failing; then I remembered the FFT size in the .ini file). The mentioned test above could also be used to select correct FFTs for all exponents if the default FFT is too big for the exponent (which caused serious rounding errors). (I guess if the -fft switch can be implemented, there will be no reason to specify FFTs in the .ini file. One could put an FFT that is incorrect in the worktodo though.)


Thoughts?

------
So far, testing of the new 2.04 beta is going well, for me. I was able to place many smaller exponents in the worktodo file and they all continued just fine. My DC still has a while left though...

How is the testing going for everyone else?
flashjh is offline   Reply With Quote
Old 2012-08-30, 13:21   #1556
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by flashjh View Post
------
So far, testing of the new 2.04 beta is going well, for me. I was able to place many smaller exponents in the worktodo file and they all continued just fine. My DC still has a while left though...

How is the testing going for everyone else?
I have successfully completed 13 DC's and 2 LL's with 2.04-Beta-3.2-sm_13-x64. I think there were two times when I saw the Corrupt Save File cause a restart. I spotted these pretty quickly and was able to resume from a very recent good Save File with little lost work time.
kladner is offline   Reply With Quote
Old 2012-08-30, 15:23   #1557
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

Quote:
Originally Posted by kladner View Post
I have successfully completed 13 DC's and 2 LL's with 2.04-Beta-3.2-sm_13-x64. I think there were two times when I saw the Corrupt Save File cause a restart. I spotted these pretty quickly and was able to resume from a very recent good Save File with little lost work time.
Have you switched to the updated 2.04 beta? Have you had any file locking problems with the new one?
flashjh is offline   Reply With Quote
Old 2012-08-30, 16:29   #1558
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Would this creation date be the latest?
Friday, ‎August ‎03, ‎2012, ‏‎9:21:17 AM
I just downloaded it to be sure, but the one I was running has the same date. So....I guess I probably have been running the latest version.

I confess that I do not entirely understand the file locking issue.

I think most or all of the savefile corruption episodes were associated with unrelated (I think) BSODs. I have not seen CL restart (corrupt savefile) in the last 5-6 runs.

Please ask if there's other data you want.

Thanks to flash and dubslow (EDIT: and LaurV!) for all their work on this project. Bravo, Guys!

Last fiddled with by kladner on 2012-08-30 at 16:30
kladner is offline   Reply With Quote
Old 2012-08-30, 18:01   #1559
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by kladner View Post
Thanks to flash and dubslow (EDIT: and LaurV!) for all their work on this project. Bravo, Guys!
Don't forget msft! He does all the mathy stuff
Dubslow is offline   Reply With Quote
Old 2012-08-30, 18:23   #1560
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Don't forget msft! He does all the mathy stuff
That's always the hazard of giving credit: leaving someone out.

Thanks msft! Sorry for the omission.
kladner is offline   Reply With Quote
Old 2012-08-30, 18:24   #1561
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by kladner View Post
Would this creation date be the latest?
Friday, ‎August ‎03, ‎2012, ‏‎9:21:17 AM
Go here. The lastest build is 28 Aug 2012
Quote:
Originally Posted by Dubslow View Post
Don't forget msft! He does all the mathy stuff
Agree, and many others! I just make it compile on Windows

Last fiddled with by flashjh on 2012-08-30 at 18:24
flashjh is offline   Reply With Quote
Old 2012-08-30, 18:29   #1562
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Thanks Jerry. Done!
kladner is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 10:33.


Fri Aug 6 10:33:09 UTC 2021 up 14 days, 5:02, 1 user, load averages: 3.59, 3.69, 3.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.