mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2020-06-21, 14:00   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·23·61 Posts
Default

Here's my take on what may be required. I'd be very interested in what George, Ernst, Preda, and the rest of the approximately 18 programmers who created the commonly used several GIMPS cpu and gpu applications have to say about how they did it. This should not be taken as discouragement for new entrants, but as recognition of what the authors of existing code have done.

Realistically consider whether you have what it takes and are willing to pay the price it takes. Commitment, great skills, best algorithms, best implementations, combined.

Become a world-class programmer. Sort of like an Olympian athlete. Train ruthlessly and constantly. Innate ability required. Program a lot. Learn from the best work of others. Read the source code.

Be open, even eager, for constructive criticism. Remove ego and emotion from decisions. Use excellent rational judgment.

Identify the best available mathematical algorithms for the task you're tackling. Learn them.
For Woltman this probably involved library research in physical books, identifying experts in mathematics and number theory, by library, email, conferences, etc, asking questions, being humble about having blind spots and getting help identifying them. He did this in the WWW's infancy, by 1995. There was nothing like this forum then. Perhaps mailing lists.

Learn continually. Stay current in the relevant areas. Learn well the programming languages and other tools.
Know a high level language well, how to combine it with assembler, how to structure a program well, how to determine what parts are taking how much time (profiler), and especially when is too soon to optimize.

Be ruthless about performance. Write the same functionality many ways and measure their actual performance. Start with what the processor documentation indicates would be fast, and try many different ways too. Know and use optimization techniques. Use different code branches optimized for different processor models, different operand inputs, etc.

Be more ruthless about correctness of the program's operation. Include multiple error checks in the software. Include complementary checks in the project that uses the software. Produce outputs that can be compared to known-good outputs of other programs. Compare them. Frequently. Search the world for independently produced data for what's correct output and allies that can help ensure correctness.

Give the software away for free, including source code, create a community that uses it, accept and welcome any useful coding, testing, etc help that appears.

Sustained commitment. Efficient use of your own time. Spend large amounts of time well focused and well directed, over a quarter century in the case of prime95.
kriesel is offline   Reply With Quote
Old 2020-06-21, 14:08   #13
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·23·61 Posts
Default

Quote:
Originally Posted by a1call View Post
Develop a module that can perform this operation in a greatly expedited manner (which I believe is possible) and then you don't need to worry about any of the rest of hardware/software used. You will outperform any general-purpose highly efficient supercomputer.
Bottomline is that addressing the single most time consuming operation is more effective than all the other software/hardware optimization (since it consumes weeks of delay at its best for current number ranges).
Just my 2 cents worth.
Thanks for your comments.
Do you know why current chips are small?
As a thought experiment long ago, there was a discussion on a thread about what it would take to implement something like the PRP test algorithm in hardware, such that it could clock exponent values in one port and primality test results out another every clock cycle or two. As I recall the conclusion was tiling the area of a small US state, including distributed solar cells to power it. How to reconfigure continuously as exponents increased and longer fft length was required was left as an exercise to the reader.
There has been some preliminary discussion of FPGAs or ASICs. My guess is they're more suitable for TF than P-1 or primality testing. The development time and cost is considerable.

Last fiddled with by kriesel on 2020-06-21 at 16:17
kriesel is offline   Reply With Quote
Old 2020-06-21, 16:47   #14
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

2·52·31 Posts
Default

There is a document on the the GIMPS site, I think by George, that shows some light on what is involved in a software optimisation process : P4 Notes.

Jacob

Last fiddled with by S485122 on 2020-06-21 at 16:47 Reason: the "4" was gone (or more probably I did'nt type it)
S485122 is offline   Reply With Quote
Old 2020-06-21, 19:07   #15
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

22·61 Posts
Default

Quote:
Originally Posted by xilman View Post
0) First get it right, then get it fast.
[...]
1) Make sure you are using fast algorithms.
[...]
2) Instrument your code to find out where it is spending all of its time.


It really boills down to that (yes, other comments are helpful as well, of course). When I started to "understand" what Prime95 does, I first had to learn that nearly every step can have a substantial optimization. But when I tried to code something basic myself, my basic code was unstable. So I'll really recommend this procedure and you'll can always look up on our pages for optimizations.
kruoli is online now   Reply With Quote
Old 2020-06-21, 19:27   #16
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

155458 Posts
Default

Quote:
Originally Posted by M344587487 View Post
you will end up with less hair than you started with and probably a drinking problem.
LOL, guilty.

Quote:
Originally Posted by makis View Post
I also wanted to ask. Let's say that I won't use any assembly. How much am I leaving on the table in terms of performance? Is that even quantifiable?
Mlucas FFTs are written in C, Prime95 in assembly. Try looking in the Mlucas subforum as Ernst has often benchmarked Mlucas against Prime95. Prime95 is not twice as fast as Mlucas.
Prime95 is offline   Reply With Quote
Old 2020-06-21, 21:00   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

420910 Posts
Default

Quote:
Originally Posted by S485122 View Post
There is a document on the the GIMPS site, I think by George, that shows some light on what is involved in a software optimisation process : P4 Notes.

Jacob
Dating back to the Pentium 4 and some ancient dialect of MS Word. When George decides to write his memoirs instead of more number theory code (perish the thought), maybe he could start with an update of this through AVX512 or whatever's been reached by then.

As to hair, that or rather its departing happens regardless of coding or not, depending mainly on maternal line. (Although pulling it out during programming does not help matters.)

Last fiddled with by kriesel on 2020-06-21 at 21:01 Reason: accursed excess line feeds
kriesel is offline   Reply With Quote
Old 2020-06-22, 05:52   #18
carpetpool
 
carpetpool's Avatar
 
"Sam"
Nov 2016

13·23 Posts
Post

Apart from what others have said, there's nothing stopping you from creating your own algorithm, especially for multiplication. You will want to look at running times of existing algorithms such as FFT, which currently runs in O( ln(n) ) time is my understanding. So reducing multiplication to O ( ln(n)^(1/2) ) or even O ( ln(ln(n)) ) time would definitely speed computations up. I don't think that will happen anytime soon, however.

Also out of curiosity, suppose you had to write your own libraries from scratch, will writing everything in C give best performance (starting from FFT algorithms, to a PRP test involving binary exponentiation)? The advantage of writing such libraries is you could easily* write code using CUDA compiler or equivalent, so that the program takes advantage of a GPU, which are faster than CPUs (most of the time anyways).
carpetpool is offline   Reply With Quote
Old 2020-06-22, 16:04   #19
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·23·61 Posts
Default

Quote:
Originally Posted by carpetpool View Post
reducing multiplication to O ( ln(n)^(1/2) ) or even O ( ln(ln(n)) ) time would definitely speed computations up. I don't think that will happen anytime soon, however.
Agreed as to timing. Schonhage-Strassen multiplication by fft is O(n ln n ln ln n). So is Crandall's IBDWT improvement on it, used in all the state of the art GIMPS primality test programs.

All known multiplication algorithms are O (n) or higher, where n is number of bits or words required. Knuth's O(n) is a special case that requires hardware capable of processing n bits wide. Hardware operating 108 to 232 bits wide would be quite large and probably would not clock very fast. IIRC it's been conjectured the lower bound is O(n log n). Various recent papers have closely approached that for very large n, much larger than 232. Since primality tests are performed by ~n iterations of n bits width, they are O(n2 ln n ln ln n). That works out to be around O(n2.1) for n of current interest to finding new Mersenne primes.

See https://www.mersenneforum.org/showpost.php?p=510721&postcount=7 for a more detailed writeup and links to the various publications.

Last fiddled with by kriesel on 2020-06-22 at 16:10
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GGNFS not being developed? jux Factoring 4 2015-12-27 03:05

All times are UTC. The time now is 13:06.

Fri Aug 7 13:06:57 UTC 2020 up 21 days, 8:53, 1 user, load averages: 2.96, 2.67, 2.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.