mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Conjectures 'R Us (https://www.mersenneforum.org/forumdisplay.php?f=81)
-   -   Prime finding rate, Sierp vs. Riesel? (https://www.mersenneforum.org/showthread.php?t=17132)

CGKIII 2012-09-01 18:19

Prime finding rate, Sierp vs. Riesel?
 
Has anyone done analysis of whether the prime finding rate is higher for Sierp or Riesel?

The most appropriate comparison, I think, would be to look at bases with the same conjecture, and then look at the distribution of k's remaining at a given n level. Though same bases with different conjectures might be fine as well (especially if you only consider k's < the min conjecture). Different bases, different conjectures might be worthwhile, but that seems like the least appropriate of comparisons.

If there's data someone can send me, I'd be happy to investigate myself.

The question I'm trying to answer is: "Does it appear to be easier to find primes for S or R conjectures?" If it does look like there's a (reasonably) clear winner, it will affect the type of work I choose going forward.

MyDogBuster 2012-09-01 20:06

[QUOTE]Has anyone done analysis of whether the prime finding rate is higher for Sierp or Riesel?[/QUOTE]

I have no stats. but my observations over the years is that they are about the same, with maybe a slight edge to Riesel but not much. The difference you might see in the stats is attributed to Riesel being tested more than Sierp.

Testing times are similar based on n. The test times do increase for the same n for higher bases. I'm assuming you are running on Windows. Do you know if your machines are Sandy Bridge architecture?

henryzz 2012-09-01 21:10

Which of Riesel or Sierp has more algebraic factors? I imagine algebraic factors would scue the stats a little.

rogue 2012-09-01 23:20

[QUOTE=henryzz;309974]Which of Riesel or Sierp has more algebraic factors? I imagine algebraic factors would scue the stats a little.[/QUOTE]

Riesel because even powers have algebraic factors.

CGKIII 2012-09-02 17:03

[QUOTE=MyDogBuster;309971]I'm assuming you are running on Windows. Do you know if your machines are Sandy Bridge architecture?[/QUOTE]

Yes, I'm on Windows. I've got an AMD Phenom II quadcore desktop, an i5 quadcore laptop, and then an AMD Athlon II dualcore laptop. Unsure how those correspond to Sandy Bridge.

MyDogBuster 2012-09-02 17:19

[QUOTE]Yes, I'm on Windows. I've got an AMD Phenom II quadcore desktop, an i5 quadcore laptop, and then an AMD Athlon II dualcore laptop. Unsure how those correspond to Sandy Bridge. [/QUOTE]They are not Sandy Bridge because they are AMD. Sandy Bridge is a chipset for Intel machines with an instruction set (AVX) that utilizes advanced vectoring. A few people ported CLLR to use this new instruction set. It's about 30-40% faster than pfgw or cllr. Just checking.

henryzz 2012-09-02 19:35

[QUOTE=MyDogBuster;310060]They are not Sandy Bridge because they are AMD. Sandy Bridge is a chipset for Intel machines with an instruction set (AVX) that utilizes advanced vectoring. A few people ported CLLR to use this new instruction set. It's about 30-40% faster than pfgw or cllr. Just checking.[/QUOTE]
The i5 could be sandy bridge. If you use the latest official version of llr(3.8.9) then it will use AVX if detected(and not if not detected). The latest pfgw is the same. AMD Bulldozer supports AVX but runs slower when using the AVX code so it is disabled for them by default.

@CGKIII Do you mean an i7 quadcore laptop or i5 quadcore desktop? All i5 laptops are dualcore not quad. :) Not that this matters.

CGKIII 2012-09-02 23:21

Henryzz, the laptop says "Intel(R) Core(TM) i5 CPU M 560 @ 2.67 GHz," 2.66 GHz, 3.42 GB of RAM. When I pop open the Task Manager, go to the Performance tab, it shows four slots four CPU usage. I get ~25% usage per active client. Not sure if there's a better place for me to be looking. It's a Dell Latitude E6410.

I have llrexe =cllr.exe, using cllr.exe from the crus_pack in one of the intro threads.

Dubslow 2012-09-03 00:54

[QUOTE=CGKIII;310093]Henryzz, the laptop says "Intel(R) Core(TM) i5 CPU M 560 @ 2.67 GHz," 2.66 GHz, 3.42 GB of RAM. When I pop open the Task Manager, go to the Performance tab, it shows four slots four CPU usage. I get ~25% usage per active client. Not sure if there's a better place for me to be looking. It's a Dell Latitude E6410.

I have llrexe =cllr.exe, using cllr.exe from the crus_pack in one of the intro threads.[/QUOTE]

The Sandy Bridge CPUs are all designated 2xxx, while yours is 560, so it's not Sandy Bridge. (Ivy Bridge are 3xxx.)

henryzz 2012-09-03 08:55

[QUOTE=CGKIII;310093]Henryzz, the laptop says "Intel(R) Core(TM) i5 CPU M 560 @ 2.67 GHz," 2.66 GHz, 3.42 GB of RAM. When I pop open the Task Manager, go to the Performance tab, it shows four slots four CPU usage. I get ~25% usage per active client. Not sure if there's a better place for me to be looking. It's a Dell Latitude E6410.

I have llrexe =cllr.exe, using cllr.exe from the crus_pack in one of the intro threads.[/QUOTE]
The thing that is probably confusing you is that your CPU is dual core with hyperthreading. This makes it appear like you have 4 cores. With some workloads hyperthreading can provide a speedup. This is so optimised it doesn't provide any worth mentioning. If you find a way of turning it off(hard if not impossible on laptops) then it should use a little less power and produce less heat.

LaurV 2012-09-03 14:25

With P95 you don't need to turn off nothing, just launch only 2 workers. The program is enough clever to use the right affinity. To make sure, you can use the option/benchmark function and see if any improvement 2 workers against 4 workers, most probably not, and 4 workers may be even slower for higher expos, due to memory bandwidth limit.

CGKIII 2012-09-05 02:47

Thanks everyone for the clarifications. After a lot of investigating, it seems like the next/first pc I build will be "pretty badass" and won't happen until maybe Christmastime. The i7-3770k seems really quite neat.

Is there any way to do CRUS work on a GPU (and if not, are there any plans in the next 1-2 years to enable that functionality)? This is quickly turning into my favorite project, and I would like to be able to invest in gear that will be helpful.

Newish question: What's the probability for finding a prime on a given test? I assume it's some function of b, k, n, form(S or R), and the sieve depth.

For the past few days I've been spending a lot of time trying to estimate the remaining amount of work left. I'm currently taking S391 from n=2,500 to n=25,000 starting with 938 k's. I'm at n ~ 9,900 and have eliminated 393 k's so far.

I understand that time to test scales with n^2 (driven by # of digits, right? So a higher base would have higher times). So I can easily get a sense of the time left to compute all remaining tests. However, finding primes cuts things down quite a bit, and I haven't been able to quite figure out how to get the probability of finding a prime. A rough guess has been that "for the same b, for the same form, for the same sieve depth, for the same n [B][I]range[/I][/B], roughly the same [B][I]percent[/I][/B] of k's will be eliminated." I'm not entirely sure why that would be the case, but it seemed to hold somewhat reasonably for my results thus far. Does anyone have anything better? This is using pfgw (and sometimes pfgw64, if that matters).

mdettweiler 2012-09-05 04:07

[QUOTE=CGKIII;310342]Is there any way to do CRUS work on a GPU (and if not, are there any plans in the next 1-2 years to enable that functionality)? This is quickly turning into my favorite project, and I would like to be able to invest in gear that will be helpful.[/QUOTE]
There is--sort of. An llrCUDA (for nVidia GPUs only--ATI is not as easy to program this kind of application on) has been developed, but it's not entirely read for "prime time" yet; it produces good results and is fairly fast on numbers of the right size (more on this later), but can be a bit tricky to set up and use. At present, it's distributed as source code only--you need to install the CUDA development libraries and compile it yourself. It can be done, but it's definitely a bit of an adventure.

You can check it out more [url=http://www.mersenneforum.org/showthread.php?t=14608&page=6]here[/url]--the thread meanders through the process of the program's development from the beginning, so it's a bit long and uninteresting to follow, but if you skip to the last page (around page 250 or so, in case your # of posts per page setting is different than mine) you should be able to get to the latest code pretty quickly. One thing to look out for: some versions only support k*2^n+-1 (base 2), while others support the full gamut of bases that CPU LLR does. The main developer doesn't speak native English, so it's not entirely clear to me at first glance which versions support other bases and which don't (I haven't been following the project as closely of late); I'd try the last version posted in that thread that doesn't say "only k*2^n+-1" and see if it works, and if it gives an error about the tests not being base 2, try working your way backward until you get one that does work.

You might also try asking around at the PrimeGrid ([url]http://www.primegrid.com[/url]) forums to see if they've been doing any work on llrCUDA--they have a more sizable community of GPU-endowed testers to work with, so it's possible that development has continued at a more robust pace there even though things are a bit scattered on the mersenneforum side of things. They might, for instance, have precompiled binaries available so you don't need to build the program from source.

The thing that llrCUDA rather tricky to fit into a CRUS effort is that the nature of GPU computing makes for algorithms that are much more effective on large numbers than small ones. As an example, a 100,000 digit number (~n=300K base 2) will take less than 5 minutes to LLR on a fast CPU, but significantly longer on a GPU; but a 10,000,000 digit number, which takes 3-4 weeks to test on a fast CPU, can be done in a few days on a GPU. This is because the GPU's advantage is that it has, basically, a lot of little processors that aren't much by themselves but can compute quite a bit working together. The LLR algorithm is iterative, and thus isn't very easily parallelizable; it can still be done to an extent, but it bottlenecks on latency of communication between computation threads, so the GPU is only using a fraction of its potential, as it were. (Sieving, on the other hand, is very parallelizable, and can be sped up hundreds-fold on a GPU.) Thus, the GPU can be a great help if you want to test really big numbers (like at GIMPS, where they have a well-developed CUDALucas program for LL tests which are very similar to LLRs, and at PrimeGrid's Generalized Fermat subproject, where they are testing similarly large numbers); but its usefulness is limited at CRUS, where our focus is to tackle broader ranges of smaller numbers instead of taking a specific, narrower swath of numbers to high search depths.

Long story short: GPUs can have some limited use at CRUS, but often the trouble is quite a bit more than worth it for the reward. A GPU is, generally, much better put to use at something like PrimeGrid's sieving projects or their Generalized Fermat search; for CRUS, a good fast CPU is going to do far more for you.

henryzz 2012-09-05 08:44

[QUOTE=CGKIII;310342]Newish question: What's the probability for finding a prime on a given test? I assume it's some function of b, k, n, form(S or R), and the sieve depth.
[/quote]
[URL]http://mersenneforum.org/showpost.php?p=295249&postcount=50[/URL]

[quote]
For the past few days I've been spending a lot of time trying to estimate the remaining amount of work left. I'm currently taking S391 from n=2,500 to n=25,000 starting with 938 k's. I'm at n ~ 9,900 and have eliminated 393 k's so far.

I understand that time to test scales with n^2 (driven by # of digits, right? So a higher base would have higher times). So I can easily get a sense of the time left to compute all remaining tests. However, finding primes cuts things down quite a bit, and I haven't been able to quite figure out how to get the probability of finding a prime. A rough guess has been that "for the same b, for the same form, for the same sieve depth, for the same n [B][I]range[/I][/B], roughly the same [B][I]percent[/I][/B] of k's will be eliminated." I'm not entirely sure why that would be the case, but it seemed to hold somewhat reasonably for my results thus far. Does anyone have anything better? This is using pfgw (and sometimes pfgw64, if that matters).[/QUOTE]

When optimising sieving we test a candidate 60% through the range of n(and k if there is a wide range of k. The fft length depends on k as well.) We take that as being the average test time so we multiply that by the number of candidates remaining to get the total testing time. This doesn't include ks being removed but it can give a reasonable estimate sometimes.

Puzzle-Peter 2012-09-05 18:17

[QUOTE=henryzz;310356]
When optimising sieving we test a candidate 60% through the range of n(and k if there is a wide range of k. The fft length depends on k as well.) We take that as being the average test time so we multiply that by the number of candidates remaining to get the total testing time. This doesn't include ks being removed but it can give a reasonable estimate sometimes.[/QUOTE]

That works quite well if you have a large number of k values left. With a single k left, each test might be the last one. With very little k's left a prime found will reduce the work left by a significant percentage. Any estimate becomes rather useless in these cases. You will still get a worst-case guess which can be nice for deciding wheter to tackle a range or not.

CGKIII 2012-09-05 20:09

Thanks. In C10, you have a "magic number" of 1.781. Can you explain that to me?

Additionally, B13 says the form is k*2^n +/- 1, but C4 is an input for a different base. Should B13 read: k*b^n...? Or are there other adjustments necessary for different bases?

Dubslow 2012-09-05 20:13

[QUOTE=mdettweiler;310347]
The thing that llrCUDA rather tricky to fit into a CRUS effort is that the nature of GPU computing makes for algorithms that are much more effective on large numbers than small ones. As an example, a 100,000 digit number (~n=300K base 2) will take less than 5 minutes to LLR on a fast CPU, but significantly longer on a GPU; but a 10,000,000 digit number, which takes 3-4 weeks to test on a fast CPU, can be done in a few days on a GPU. This is because the GPU's advantage is that it has, basically, a lot of little processors that aren't much by themselves but can compute quite a bit working together. The LLR algorithm is iterative, and thus isn't very easily parallelizable; it can still be done to an extent, but it bottlenecks on latency of communication between computation threads, so the GPU is only using a fraction of its potential, as it were. (Sieving, on the other hand, is very parallelizable, and can be sped up hundreds-fold on a GPU.) Thus, the GPU can be a great help if you want to test really big numbers (like at GIMPS, where they have a well-developed CUDALucas program for LL tests which are very similar to LLRs, and at PrimeGrid's Generalized Fermat subproject, where they are testing similarly large numbers); but its usefulness is limited at CRUS, where our focus is to tackle broader ranges of smaller numbers instead of taking a specific, narrower swath of numbers to high search depths.[/QUOTE]
I don't know anything about the LLR test or llrCUDA, but I do know that CUDALucas scales almost as well as Prime95 with exponent (which, for Mersenne numbers without a multiplier, scales directly to the size of the number).

mdettweiler 2012-09-05 22:16

[QUOTE=Dubslow;310428]I don't know anything about the LLR test or llrCUDA, but I do know that CUDALucas scales almost as well as Prime95 with exponent (which, for Mersenne numbers without a multiplier, scales directly to the size of the number).[/QUOTE]
Hmm...does that hold for really small (n<2M or even <1M) numbers though? That's typically the range we're dealing with here at CRUS. When testing llrCUDA (on a GTX 460) I found that it was slower than a CPU up to about n=1.2M or so, and increasingly eclipsed the CPU as n (exponent) got bigger. I didn't test much higher than n=9M or so, so I don't know if it goes linear at some point in there. (The only k*b^n+-c project that's anywhere near GIMPS levels is Seventeen or Bust, which last I recall had a leading edge around 18-19M; nothing else is higher than 7M-8M or so, and most are far below that).

I'm not super knowledgeable about exactly how the GPU algorithm differs from the CPU, but AFAIU the trick with GPUs is to break a problem up into tons of relatively-small chunks for processing on the individual GPU "cores". I remember being told that for tpsieve/ppsieve (k*2^n+-1 sieving on GPUs), it needs a range of at least k=10,000 or so to reach full efficiency--indeed, you can expand to a range of that size nearly "for free" compared to a smaller (say k=1,000 or less) range. So if LL/LLR applications on the GPU are remotely like that (as much so as can be achieved for an iterative method), I would imagine there's a similar "tipping point" below which the GPU-testing-time graph becomes rather discontinuous.

Dubslow 2012-09-05 22:37

[QUOTE=mdettweiler;310448]Hmm...does that hold for really small (n<2M or even <1M) numbers though? That's typically the range we're dealing with here at CRUS. When testing llrCUDA (on a GTX 460) I found that it was slower than a CPU up to about n=1.2M or so, and increasingly eclipsed the CPU as n (exponent) got bigger. I didn't test much higher than n=9M or so, so I don't know if it goes linear at some point in there. (The only k*b^n+-c project that's anywhere near GIMPS levels is Seventeen or Bust, which last I recall had a leading edge around 18-19M; nothing else is higher than 7M-8M or so, and most are far below that).[/QUOTE]
[code]~/CUDALucas∰∂ CUDALucas -r

Starting M86243 fft length = 6K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00002, max error = 0.00002
Iteration 200, average error = 0.00002, max error = 0.00002
Iteration 300, average error = 0.00002, max error = 0.00002
Iteration 400, average error = 0.00002, max error = 0.00002
Iteration 500, average error = 0.00002, max error = 0.00002
Iteration 600, average error = 0.00002, max error = 0.00002
Iteration 700, average error = 0.00002, max error = 0.00002
Iteration 800, average error = 0.00002, max error = 0.00002
Iteration 900, average error = 0.00002, max error = 0.00002
Iteration 1000, average error = 0.00002 < 0.25 (max error = 0.00002), continuing test.
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 6K, CUDALucas v2.04 Beta err = 0.0000 (0:01 real, 0.0720 ms/iter, ETA 0:05)
This residue is correct.

Starting M132049 fft length = 8K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00029, max error = 0.00040
Iteration 200, average error = 0.00031, max error = 0.00037
Iteration 300, average error = 0.00033, max error = 0.00043
Iteration 400, average error = 0.00033, max error = 0.00040
Iteration 500, average error = 0.00034, max error = 0.00041
Iteration 600, average error = 0.00034, max error = 0.00040
Iteration 700, average error = 0.00034, max error = 0.00040
Iteration 800, average error = 0.00034, max error = 0.00043
Iteration 900, average error = 0.00034, max error = 0.00041
Iteration 1000, average error = 0.00034 < 0.25 (max error = 0.00037), continuing test.
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8K, CUDALucas v2.04 Beta err = 0.0005 (0:01 real, 0.0900 ms/iter, ETA 0:10)
This residue is correct.

Starting M216091 fft length = 12K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00310, max error = 0.00415
Iteration 200, average error = 0.00341, max error = 0.00427
Iteration 300, average error = 0.00349, max error = 0.00415
Iteration 400, average error = 0.00351, max error = 0.00415
Iteration 500, average error = 0.00355, max error = 0.00430
Iteration 600, average error = 0.00355, max error = 0.00415
Iteration 700, average error = 0.00356, max error = 0.00421
Iteration 800, average error = 0.00358, max error = 0.00439
Iteration 900, average error = 0.00361, max error = 0.00452
Iteration 1000, average error = 0.00361 < 0.25 (max error = 0.00427), continuing test.
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12K, CUDALucas v2.04 Beta err = 0.0049 (0:01 real, 0.1077 ms/iter, ETA 0:21)
This residue is correct.

Starting M756839 fft length = 40K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.02011, max error = 0.02734
Iteration 200, average error = 0.02237, max error = 0.02637
Iteration 300, average error = 0.02292, max error = 0.02539
Iteration 400, average error = 0.02360, max error = 0.02734
Iteration 500, average error = 0.02393, max error = 0.03125
Iteration 600, average error = 0.02415, max error = 0.03125
Iteration 700, average error = 0.02428, max error = 0.02734
Iteration 800, average error = 0.02434, max error = 0.02591
Iteration 900, average error = 0.02448, max error = 0.03125
Iteration 1000, average error = 0.02455 < 0.25 (max error = 0.02832), continuing test.
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40K, CUDALucas v2.04 Beta err = 0.0352 (0:02 real, 0.2004 ms/iter, ETA 2:28)
This residue is correct.

Starting M859433 fft length = 48K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.00617, max error = 0.00830
Iteration 200, average error = 0.00695, max error = 0.00830
Iteration 300, average error = 0.00714, max error = 0.00830
Iteration 400, average error = 0.00722, max error = 0.00806
Iteration 500, average error = 0.00726, max error = 0.00806
Iteration 600, average error = 0.00732, max error = 0.00879
Iteration 700, average error = 0.00737, max error = 0.00928
Iteration 800, average error = 0.00742, max error = 0.00940
Iteration 900, average error = 0.00742, max error = 0.00842
Iteration 1000, average error = 0.00743 < 0.25 (max error = 0.00928), continuing test.
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 48K, CUDALucas v2.04 Beta err = 0.0103 (0:02 real, 0.2172 ms/iter, ETA 3:02)
This residue is correct.

Starting M1257787 fft length = 64K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.07321, max error = 0.10156
Iteration 200, average error = 0.08017, max error = 0.09375
Iteration 300, average error = 0.08297, max error = 0.09863
Iteration 400, average error = 0.08498, max error = 0.10156
Iteration 500, average error = 0.08607, max error = 0.10156
Iteration 600, average error = 0.08709, max error = 0.10156
Iteration 700, average error = 0.08713, max error = 0.09570
Iteration 800, average error = 0.08771, max error = 0.10864
Iteration 900, average error = 0.08777, max error = 0.09961
Iteration 1000, average error = 0.08766 < 0.25 (max error = 0.09766), continuing test.
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 64K, CUDALucas v2.04 Beta err = 0.1172 (0:03 real, 0.2629 ms/iter, ETA 5:25)
This residue is correct.

Starting M1398269 fft length = 72K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.05718, max error = 0.07617
Iteration 200, average error = 0.06297, max error = 0.07812
Iteration 300, average error = 0.06571, max error = 0.07812
Iteration 400, average error = 0.06654, max error = 0.08594
Iteration 500, average error = 0.06703, max error = 0.07812
Iteration 600, average error = 0.06736, max error = 0.07812
Iteration 700, average error = 0.06753, max error = 0.07422
Iteration 800, average error = 0.06797, max error = 0.08203
Iteration 900, average error = 0.06859, max error = 0.08203
Iteration 1000, average error = 0.06862 < 0.25 (max error = 0.07812), continuing test.
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 72K, CUDALucas v2.04 Beta err = 0.0957 (0:03 real, 0.2993 ms/iter, ETA 6:53)
This residue is correct.

Starting M2976221 fft length = 160K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.03141, max error = 0.04297
Iteration 200, average error = 0.03608, max error = 0.04883
Iteration 300, average error = 0.03706, max error = 0.04211
Iteration 400, average error = 0.03796, max error = 0.04590
Iteration 500, average error = 0.03818, max error = 0.04297
Iteration 600, average error = 0.03842, max error = 0.04395
Iteration 700, average error = 0.03865, max error = 0.04883
Iteration 800, average error = 0.03869, max error = 0.04297
Iteration 900, average error = 0.03885, max error = 0.04492
Iteration 1000, average error = 0.03883 < 0.25 (max error = 0.04297), continuing test.
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 160K, CUDALucas v2.04 Beta err = 0.0508 (0:06 real, 0.6127 ms/iter, ETA 30:13)
This residue is correct.

Starting M3021377 fft length = 160K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04585, max error = 0.06250
Iteration 200, average error = 0.05122, max error = 0.06152
Iteration 300, average error = 0.05345, max error = 0.06055
Iteration 400, average error = 0.05395, max error = 0.06445
Iteration 500, average error = 0.05475, max error = 0.06543
Iteration 600, average error = 0.05517, max error = 0.06348
Iteration 700, average error = 0.05539, max error = 0.06445
Iteration 800, average error = 0.05569, max error = 0.06323
Iteration 900, average error = 0.05577, max error = 0.06250
Iteration 1000, average error = 0.05579 < 0.25 (max error = 0.06250), continuing test.
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 160K, CUDALucas v2.04 Beta err = 0.0725 (0:06 real, 0.6098 ms/iter, ETA 30:35)
This residue is correct.[/code]
The test runs the first 10,000 iterations of all the Mersenne primes, though I cut it short because of the character limit.

This is also a GTX 460, though I didn't bother testing the same exponents with Prime95. Even so, you can see from the ETA how well it scales. (The ETAs might be few percent longer than they should be, especially for these smaller exponents -- the first 1,000 "careful" iterations take longer than "average" (depending on the user's settings).) The FFT takes the most time, and that's abstracted out to the cufft library from nVidia. 86243 is the (somewhat arbitrary) lower limit -- you'd have to ask msft what makes that the limit, and how much lower the code can actually go.

mdettweiler 2012-09-05 23:20

Now that's interesting. It seems that the latest CUDALucas scales quite a bit better to lower n's than does llrCUDA. I haven't had the opportunity to try some of the more recent llrCUDAs (the GTX 460 I was testing on bit the dust), but I do remember it scaled much less favorably at the low end. I just tried testing M216091 using LLR on 2.2Ghz Phenom II and it finished in 45 seconds, versus the ~20 second estimate you got on your 460. Clearly, the GPU does quite well even at that small size; but when I tried testing a number of similar size before with llrCUDA, it took about 20-30 [I]minutes[/I].

Part of this may be due to the fact that CUDALucas is a much more mature program and has had more time for polishing and tweaking--llrCUDA's development stagnated rather early in the process due to lack of significant interest (since the speed boost wouldn't have been as pronounced on typical LLR-sized numbers even if it matched CUDALucas's speed). That, and--as far as I understand it--it seemed llrCUDA scaled particularly badly as k increased, which is not so much the case on the CPU, where n's effect is far more pronounced.

That said, if someone with the appropriate knowledge picks up the development of llrCUDA at some point in the future, they may yet be able to get it up to more typical performance levels like those of CUDALucas or GeneferCUDA. I suspect that its ultimate potential hasn't been quite as nearly tapped as either of the latter programs.

henryzz 2012-09-06 10:36

[QUOTE=CGKIII;310426]Thanks. In C10, you have a "magic number" of 1.781. Can you explain that to me?

Additionally, B13 says the form is k*2^n +/- 1, but C4 is an input for a different base. Should B13 read: k*b^n...? Or are there other adjustments necessary for different bases?[/QUOTE]
It certainly should say k*b^n... I don't have a clue how that has gone so long unnoticed.

1.781 is an approximation of [URL]http://www.wolframalpha.com/input/?i=exp%28euler%27s+constant%29[/URL]

henryzz 2012-09-06 13:31

1 Attachment(s)
I made a new version of the odds of prime spreadsheet. I have included CRUS support. Now you can get estimates for how many primes including k removal. The only problem with this is that each k is treated like it has the same weight as the others.

Please post any suggestions for improvements. If anyone fancies making it colourful I would be grateful. It is beginning to look a little cluttered.

CGKIII 2012-09-12 19:08

What sort of project-level data are available? The stats pages are fantastic, but is there any way to get at the underlying data? (Receiving a snapshot of the current database would be sufficient, real-time updates aren't necessary).

My main interest is basically...
Does the following exist?
For each base b, for each k < (conjectured k), the min(n) for which k*b^n (+/-) 1 is prime?

Heck, even graphing b vs. conjectured k would be interesting, but I'd rather not have to manually type all the numbers into a spreadsheet.

I think that would be really interesting to look at / toy around with. Additionally, it could lend itself to a (thinking very long term) double-check effort, since it seems like on multiple occasions, manual efforts (even pretty well-run ones like this) can have some errors, due to all sorts of things.

If this doesn't exist, is there any way to start tracking / maintaining it, and consolidating what we can piece together?

rogue 2012-09-12 19:44

1 Attachment(s)
The attached is freely available, but I don't recall where. Only a couple of people were responsible for collecting the list and they used a program written by one of them (source available in the forum somewhere). It is possible that some of the conjectures with conjectured k > 2^32 have lower conjectured k due to the potential of overflows in the program that computed the conjectured k.

There are two possible double-checks that I can think of. The first is to modify the program mentioned above to avoid overflows (or write something different). The second is to verify that all reported primes for a conjecture are truly prime.

henryzz 2012-09-12 20:33

[QUOTE=rogue;311314]The attached is freely available, but I don't recall where. Only a couple of people were responsible for collecting the list and they used a program written by one of them (source available in the forum somewhere). It is possible that some of the conjectures with conjectured k > 2^32 have lower conjectured k due to the potential of overflows in the program that computed the conjectured k.

There are two possible double-checks that I can think of. The first is to modify the program mentioned above to avoid overflows (or write something different). The second is to verify that all reported primes for a conjecture are truly prime.[/QUOTE]

At some point it would be nice to create a database that can hold all the data required to prove each base.

rogue 2012-09-12 21:18

[QUOTE=henryzz;311320]At some point it would be nice to create a database that can hold all the data required to prove each base.[/QUOTE]

That is many terabytes of data, possibly as much as a petabyte.

henryzz 2012-09-12 21:39

[QUOTE=rogue;311323]That is many terabytes of data, possibly as much as a petabyte.[/QUOTE]

Eventually. We have yet to generate a lot of it yet. By the time we do storing it will be possible.

rogue 2012-09-12 23:16

[QUOTE=henryzz;311333]Eventually. We have yet to generate a lot of it yet. By the time we do storing it will be possible.[/QUOTE]

I see little benefit in storing most of it as much of it can be easily regenerated.


All times are UTC. The time now is 10:24.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.