mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2012-06-21, 18:29   #34
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by firejuggler View Post
I know. Just said the link here http://gpu72.com/account/getassignments/ to P-1 doesn't work. (atm)
kracker is offline   Reply With Quote
Old 2012-06-21, 18:30   #35
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

260216 Posts
Default

Quote:
Originally Posted by firejuggler View Post
Yup. But I moved the LL P-1 page from .../p-1/ to .../llp-1/. I have added a 301 code for those who might have had the original form bookmarked.
chalsall is online now   Reply With Quote
Old 2012-06-21, 18:32   #36
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

230028 Posts
Default

Quote:
Originally Posted by kracker View Post
I know. Just said the link here http://gpu72.com/account/getassignments/ to P-1 doesn't work. (atm)
You guys are just too fast. Channeling your inner Dubslow???
chalsall is online now   Reply With Quote
Old 2012-06-21, 18:35   #37
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default New worktype(s) - TF/P-1 for CUDALucas

Hello everybody,
I suggest a minor relevant worktype: TF/P-1 for CUDALucas.
There are ranges where CL is more efficient than others. For example the 4M FFT length region at about M(72M) is very suitable for CL.
Today, I do the necessary TF and P-1 on my own.
There may be other efficient FFT lengths but 4M is my favorite.
I wouldn't be sad if this not so interesting for the rest of the world.
Greetz, Brain
Brain is offline   Reply With Quote
Old 2012-06-21, 18:37   #38
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by chalsall View Post
You guys are just too fast. Channeling your inner Dubslow???
No. Yes. No. Uh, what? Ugh. Don't make me spam more!
kracker is offline   Reply With Quote
Old 2012-06-21, 18:49   #39
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×5×7×139 Posts
Default

Quote:
Originally Posted by Brain View Post
I suggest a minor relevant worktype: TF/P-1 for CUDALucas.
There are ranges where CL is more efficient than others. For example the 4M FFT length region at about M(72M) is very suitable for CL. Today, I do the necessary TF and P-1 on my own. There may be other efficient FFT lengths but 4M is my favorite. I wouldn't be sad if this not so interesting for the rest of the world.
With the framework we've built, we can offer just about any type of work desired. I personally prefer to offer work which is immediately (or, at least, soon to be) of value to GIMPS.

I'm reading that you are suggesting we do some TF/P-1ing in the 72M range? If so, exactly what is the optimal starting point? And can you (or anyone else) suggest other ranges which are particularly "sweet" for CUDALucas?
chalsall is online now   Reply With Quote
Old 2012-06-21, 20:50   #40
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by Brain View Post
I suggest a minor relevant worktype: TF/P-1 for CUDALucas. There are ranges where CL is more efficient than others. For example the 4M FFT length region at about M(72M) is very suitable for CL.
In the forthcoming Beta, the FFT selection has been modified slightly to only use lengths that Prime95 uses (which are the smoothest and thus most suitable).
Code:
int
choose_fft_length (int q, int* index)
{  
/* In order to increase length if an exponent has a round off issue, we use an
extra paramter that we can adjust on the fly. In check(), index starts as -1,
the default. In that case, choose from the table. If index >= 0, we must assume
it's an override index and return the corresponding length. If index > table-count,
then we assume it's a manual fftlen and return the proper index. */
  #define COUNT 89
  int multipliers[COUNT] = {  6,     8,    12,    16,    18,    24, 
                             32,    40,    48,    64,    72,    80, 
                             96,   120,   128,   144,   160,   192, 
                            240,   256,   288,   320,   384,   480, 
                            512,   576,   640,   768,   864,   960, 
                           1024,  1152,  1280,  1440,  1536,  1600, 
                           1728,  1920,  2048,  2304,  2400,  2560,
                           2880,  3072,  3200,  3456,  3840,  4000, 
                           4096,  4608,  4800,  5120,  5760,  6144, 
                           6400,  6912,  7680,  8000,  8192,  9216,
                           9600, 10240, 11520, 12288, 12800, 13824, 
                          15360, 16000, 16384, 18432, 19200, 20480, 
                          23040, 24576, 25600, 27648, 30720, 32000, 
                          32768, 34992, 36864, 38400, 40960, 46080,
                          49152, 51200, 55296, 61440, 65536         };
  // Largely copied from Prime95's jump tables, up to 32M
  // Support up to 64M, the maximum length with threads == 1024
    int len, i, estimate = q/20;
    for(i = 0; i < COUNT; i++) {
      len = 1024*multipliers[i];
      if( len >= estimate ) 
      {
        *index = i;
        return len;
      }
    }
  return 0;
}
(The old code also used the q/20 estimate, but the method of choosing such a length greater than the estimate wasn't very good.)

Last fiddled with by Dubslow on 2012-06-21 at 20:52
Dubslow is offline   Reply With Quote
Old 2012-06-22, 02:14   #41
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×312 Posts
Default

Quote:
Originally Posted by firejuggler View Post
M3321935017 takes 91 630.041 Ghz/day to LL once
That is one digit too much for the exponent. We are not dreaming so high yet. 100M digits TF work would be enough.
LaurV is offline   Reply With Quote
Old 2012-06-22, 09:32   #42
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

Quote:
Originally Posted by chalsall View Post
I'm reading that you are suggesting we do some TF/P-1ing in the 72M range? If so, exactly what is the optimal starting point? And can you (or anyone else) suggest other ranges which are particularly "sweet" for CUDALucas?
My reference is the FFT efficiency table. I'm at work and can evaluate the start end points this weekend. Will see.
Brain is offline   Reply With Quote
Old 2012-06-23, 20:59   #43
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default CUDALucas 4M FFT length

Quote:
Originally Posted by Brain View Post
My reference is the FFT efficiency table. I'm at work and can evaluate the start end points this weekend. Will see.
Prime95 lists on the benchmark page the 4M FFT length as follows: 68.13M to 77.91M (4096K)

My M(72M) from first comment seems to be very low for 4M FFT:

CL has the following round offs and stalls with -f 4194304 at 77M (max err=0.35).
Code:
M75,0 err=0.20
M75,5 err=0.27
M76,0 err=0.31
M76,5 err=0.34
M77,0 err=0.39 XXX
I cannot name yet the efficiency differences in comparison but powers of 2 are best.

I assume that there are too few CL users interested..?
Brain is offline   Reply With Quote
Old 2012-06-23, 21:17   #44
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Actually, the table I posted above is now slightly out of date; I realized that I'd been using the wrong table in Prime95's code. (The newer, more extensive version is posted below.)

The updated table, which copies the lengths Prime95 uses for Mersenne numbers, should match pretty well with efficiency table that AH posted, though I haven't actually compared them myself. George also used the 7-smooth criterion when selecting lengths for Prime95. (Note about comparison: CL's table is multiples of 1024, so you'll either have to multiply this table by 1024 or divide AH's table by 1024 to compare them.)

The end result is that CL should now be almost as efficient as Prime95 for a random Mersenne exponent.
Code:
  #define COUNT 119
  int multipliers[COUNT] = {  6,     8,    12,    16,    18,    24,    32,    
                             40,    48,    64,    72,    80,    96,   120,   
                            128,   144,   160,   192,   224,   240,   256,   
                            288,   320,   336,   384,   448,   480,   512,   
                            576,   640,   672,   768,   800,   864,   896,   
                            960,  1024,  1120,  1152,  1200,  1280,  1344,
                           1440,  1536,  1600,  1680,  1728,  1792,  1920, 
                           2048,  2240,  2304,  2400,  2560,  2688,  2880,  
                           3072,  3200,  3360,  3456,  3584,  3840,  4000,  
                           4096,  4480,  4608,  4800,  5120,  5376,  5600,  
                           5760,  6144,  6400,  6720,  6912,  7168,  7680,  
                           8000,  8192,  8960,  9216,  9600, 10240, 10752, 
                          11200, 11520, 12288, 12800, 13440, 13824, 14366, 
                          15360, 16000, 16128, 16384, 17920, 18432, 19200, 
                          20480, 21504, 22400, 23040, 24576, 25600, 26880, 
                          29672, 30720, 32000, 32768, 34992, 36864, 38400, 
                          40960, 46080, 49152, 51200, 55296, 61440, 65536   };
Dubslow is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
What do the different types of work each mean? jrafanelli Information & Answers 20 2019-02-01 05:27
suggestions for new work types ixfd64 PrimeNet 4 2011-09-20 07:20
New work types Unregistered Information & Answers 0 2011-07-25 10:19
Work Types Unregistered Information & Answers 3 2010-07-28 09:54
v5 work types S00113 PrimeNet 14 2008-12-10 00:26

All times are UTC. The time now is 15:12.


Fri Jul 16 15:12:38 UTC 2021 up 49 days, 12:59, 2 users, load averages: 1.96, 1.78, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.