mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-04-26, 19:54   #12
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

145110 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Yes, this is a new feature ; it was required by users searching k*b^n+c candidates for the first n giving a prime, in a set of several k values. These users are using the "StopOnPrimedk=1" option, so, the .ini file may become large, and need a backup...

Regards,
Jean
Thanks Jean
I also use that option StopOnPrimedk=1, so that is reason I noticed this behavior :)
pepi37 is online now   Reply With Quote
Old 2017-01-29, 19:15   #13
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2·47·101 Posts
Lightbulb Multi-threaded LLR is now a reality

A proposal for the future LLR release 3.8.18:

I implemented and tested a proof-of-concept build of LLR with added parallelism and the test was successful.

The only change needed is
gwset_num_threads (gwdata, IniGetInt(INI_FILE, "ThreadsPerTest", 1));
needs to be inserted a few lines below each gwinit (gwdata); but before a call to gwsetup (gwdata, ...)
There are probably a dozen of such places in all separate primeform setups. I was only interested to speed up the pminus1 test for the special ABCDN case* at the moment, so this is the only place I inserted this line in my copy of the code.

It would be probably in demand to be able to modify ThreadsPerTest not only from llr.ini (or from inline definition -oThreadsPerTest=4 ) but with a simple shortcut -t 4
_________________
*ABCDN is the internal designation for the N-1 provable form $a^$b-$a^$c+1, which apparently includes a2*N-aN+1. In this case N must be 3-smooth for a candidate to have a chance to be prime.
For this particular form, there is also a more extensive patch that would allow for all-complex rightly-sized FFT to replace the generic FFT call. I will release this path later.
Attached Files
File Type: zip patch-parallel-llr.zip (1.4 KB, 83 views)

Last fiddled with by Batalov on 2017-01-29 at 21:25 Reason: (attached the patch along the described lines)
Batalov is offline   Reply With Quote
Old 2017-01-31, 15:32   #14
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

24416 Posts
Default Multithreading in LLR

Quote:
Originally Posted by Batalov View Post
A proposal for the future LLR release 3.8.18:

I implemented and tested a proof-of-concept build of LLR with added parallelism and the test was successful.

The only change needed is
gwset_num_threads (gwdata, IniGetInt(INI_FILE, "ThreadsPerTest", 1));
needs to be inserted a few lines below each gwinit (gwdata); but before a call to gwsetup (gwdata, ...)
There are probably a dozen of such places in all separate primeform setups. I was only interested to speed up the pminus1 test for the special ABCDN case* at the moment, so this is the only place I inserted this line in my copy of the code.

It would be probably in demand to be able to modify ThreadsPerTest not only from llr.ini (or from inline definition -oThreadsPerTest=4 ) but with a simple shortcut -t 4
_________________
*ABCDN is the internal designation for the N-1 provable form $a^$b-$a^$c+1, which apparently includes a2*N-aN+1. In this case N must be 3-smooth for a candidate to have a chance to be prime.
For this particular form, there is also a more extensive patch that would allow for all-complex rightly-sized FFT to replace the generic FFT call. I will release this path later.
Hi Serge,

Many congrats for this nice work, and also for your amazing successes on Generalized uniques!
I downloaded your patch and will immediately begin to work on Version 3.8.18!
I need also to know what extra-code you are using with LLR as a proving program for GU's in order that I can insert in in this next version.
Best Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2017-01-31, 16:03   #15
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

949410 Posts
Default

I used to have a patch to LLR that runs the PRP test for N = a2m \mp am + 1 using modulo M = a3m \pm 1 (because N | M) for exponentiation (with only one modulo N at the end). For a year now this was added to the Prime95 official source source (and enabled under PhiExtensions=1); this uses a fast all-complex FFT of size exactly 3m. Alone or with Ryan, we run this official P95 binary and then validate hits with N-1 test in LLR.

I will put together the N-1 patch for LLR that does the same, maybe next weekend. It will run these proofs faster still, but the current state of the test is already acceptably fast in parallel mode as is, and serves as a tangibly different test because a generic FFT of a different size (>3m) is used.
Batalov is offline   Reply With Quote
Old 2017-01-31, 21:53   #16
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

145110 Posts
Default

If you know what is gain when you use 2 ,3 or 4 cores instead one core? I hope it is near theoretical limit . Prime95 do multi-core calculation very well.
pepi37 is online now   Reply With Quote
Old 2017-01-31, 22:19   #17
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22·1,217 Posts
Default

The gain will be very similar to Prime95 per-core, as it's using the same code from gwnum. However, the gain is a function of the size of the input; prime95 does not gain nearly as much on a triple-check of something near 1M digits, and LLR won't either.
VBCurtis is offline   Reply With Quote
Old 2017-02-01, 09:25   #18
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

1,451 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
The gain will be very similar to Prime95 per-core, as it's using the same code from gwnum. However, the gain is a function of the size of the input; prime95 does not gain nearly as much on a triple-check of something near 1M digits, and LLR won't either.
If it be similar or same as for Prime95 I will be happy
pepi37 is online now   Reply With Quote
Old 2017-02-01, 12:49   #19
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3×5×29 Posts
Default

Based on my previous observations on Prime95 benchmark results and FFT size vs. CPU L3 cache size, there will be a sweet spot of benefit from multiple threads, and outside that there will be no gain or even a degradation in performance.

For single task with multiple workers, the optimum point seems to be when the task being worked on fills but not exceeds the L3 cache. So for typical i7 with 8MB, that's around 1024k FFT size. Above that, performance drops down towards the ram bandwidth limitation. Below that, it holds up quite well before dropping. Small FFT tasks can be much slower, presumably due to overheads.

So when this is available, it will complicate matters a bit. Multiple small FFT tasks which in total can fit in the processor L3 cache can continue to be run as such for maximum performance. For FFTs perhaps from 512k to 1024k (for 8MB cache CPU, scale down for lesser cache CPUs), running multi-thread would give a nice throughput boost over running separate tasks. How much benefit depends on how much you are held back by ram bandwidth otherwise. Above that, there may be a small benefit and I think I'd prefer to finish one task 4x as fast than 4 tasks separately.

Obviously once it has been implemented it would need testing, and I probably wont be the only one doing that. I do mostly tasks at PrimeGrid, and there are many subprojects there that could fit into the sweet spot.
mackerel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR Version 3.8.19 released Jean Penné Software 11 2017-02-23 08:52
LLR Version 3.8.18 released [deprecated] Jean Penné Software 43 2017-02-20 12:05
LLR Version 3.8.14 released (deprecated) Jean Penné Software 67 2015-05-02 07:24
Prime95 version 28.5 (deprecated, use 28.7) Prime95 Software 162 2015-04-05 16:19
LLR beta Version 3.8.13 (deprecated) Jean Penné Software 111 2015-01-26 21:41

All times are UTC. The time now is 17:34.


Sun Aug 1 17:34:05 UTC 2021 up 9 days, 12:03, 0 users, load averages: 2.10, 1.69, 1.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.