![]() |
|
|
#12 | |
|
Dec 2011
After milion nines:)
145110 Posts |
Quote:
I also use that option StopOnPrimedk=1, so that is reason I noticed this behavior :) |
|
|
|
|
|
|
#13 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2·47·101 Posts |
A proposal for the future LLR release 3.8.18:
I implemented and tested a proof-of-concept build of LLR with added parallelism and the test was successful. The only change needed is gwset_num_threads (gwdata, IniGetInt(INI_FILE, "ThreadsPerTest", 1)); needs to be inserted a few lines below each gwinit (gwdata); but before a call to gwsetup (gwdata, ...) There are probably a dozen of such places in all separate primeform setups. I was only interested to speed up the pminus1 test for the special ABCDN case* at the moment, so this is the only place I inserted this line in my copy of the code. It would be probably in demand to be able to modify ThreadsPerTest not only from llr.ini (or from inline definition -oThreadsPerTest=4 ) but with a simple shortcut -t 4 _________________ *ABCDN is the internal designation for the N-1 provable form $a^$b-$a^$c+1, which apparently includes a2*N-aN+1. In this case N must be 3-smooth for a candidate to have a chance to be prime. For this particular form, there is also a more extensive patch that would allow for all-complex rightly-sized FFT to replace the generic FFT call. I will release this path later. Last fiddled with by Batalov on 2017-01-29 at 21:25 Reason: (attached the patch along the described lines) |
|
|
|
|
|
#14 | |
|
May 2004
FRANCE
24416 Posts |
Quote:
Many congrats for this nice work, and also for your amazing successes on Generalized uniques! I downloaded your patch and will immediately begin to work on Version 3.8.18! I need also to know what extra-code you are using with LLR as a proving program for GU's in order that I can insert in in this next version. Best Regards, Jean |
|
|
|
|
|
|
#15 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
949410 Posts |
I used to have a patch to LLR that runs the PRP test for N = a2m
I will put together the N-1 patch for LLR that does the same, maybe next weekend. It will run these proofs faster still, but the current state of the test is already acceptably fast in parallel mode as is, and serves as a tangibly different test because a generic FFT of a different size (>3m) is used. |
|
|
|
|
|
#16 |
|
Dec 2011
After milion nines:)
145110 Posts |
If you know what is gain when you use 2 ,3 or 4 cores instead one core? I hope it is near theoretical limit
. Prime95 do multi-core calculation very well.
|
|
|
|
|
|
#17 |
|
"Curtis"
Feb 2005
Riverside, CA
22·1,217 Posts |
The gain will be very similar to Prime95 per-core, as it's using the same code from gwnum. However, the gain is a function of the size of the input; prime95 does not gain nearly as much on a triple-check of something near 1M digits, and LLR won't either.
|
|
|
|
|
|
#18 | |
|
Dec 2011
After milion nines:)
1,451 Posts |
Quote:
|
|
|
|
|
|
|
#19 |
|
Feb 2016
UK
3×5×29 Posts |
Based on my previous observations on Prime95 benchmark results and FFT size vs. CPU L3 cache size, there will be a sweet spot of benefit from multiple threads, and outside that there will be no gain or even a degradation in performance.
For single task with multiple workers, the optimum point seems to be when the task being worked on fills but not exceeds the L3 cache. So for typical i7 with 8MB, that's around 1024k FFT size. Above that, performance drops down towards the ram bandwidth limitation. Below that, it holds up quite well before dropping. Small FFT tasks can be much slower, presumably due to overheads. So when this is available, it will complicate matters a bit. Multiple small FFT tasks which in total can fit in the processor L3 cache can continue to be run as such for maximum performance. For FFTs perhaps from 512k to 1024k (for 8MB cache CPU, scale down for lesser cache CPUs), running multi-thread would give a nice throughput boost over running separate tasks. How much benefit depends on how much you are held back by ram bandwidth otherwise. Above that, there may be a small benefit and I think I'd prefer to finish one task 4x as fast than 4 tasks separately. Obviously once it has been implemented it would need testing, and I probably wont be the only one doing that. I do mostly tasks at PrimeGrid, and there are many subprojects there that could fit into the sweet spot. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| LLR Version 3.8.19 released | Jean Penné | Software | 11 | 2017-02-23 08:52 |
| LLR Version 3.8.18 released [deprecated] | Jean Penné | Software | 43 | 2017-02-20 12:05 |
| LLR Version 3.8.14 released (deprecated) | Jean Penné | Software | 67 | 2015-05-02 07:24 |
| Prime95 version 28.5 (deprecated, use 28.7) | Prime95 | Software | 162 | 2015-04-05 16:19 |
| LLR beta Version 3.8.13 (deprecated) | Jean Penné | Software | 111 | 2015-01-26 21:41 |