![]() |
[QUOTE=Batalov;451656]
Perhaps it's time to hack the multithreaded llr binary. :rolleyes: [/QUOTE] Go for it -- you have 4-5 days to achieve it. :tu: |
[QUOTE=paulunderwood;451657]Any better sleuths out there? :hello:[/QUOTE]
Yes. It will be @ #12 (see the TOP12th bit?), so #digits > 4053946 |
[QUOTE=Batalov;451656]P.S. [I]Cyclo[/I] is not written for this form: 393216 is not a power of 2, but 3 times a power of 2[/QUOTE]
Have you checked with Yves whether it is theoretically possible to modify Cyclo to efficiently support these extensions? |
Yes, but the progress was slow. He wrote that in his mainstream research:
[QUOTE]For the time being, I’m working on optimizing my [GFN OCL] transform on GTX 1080.[/QUOTE]As for extending for Phi(2^u*3^v,b), he wrote much earlier [QUOTE]first I would like to build a basic (theoretical) program able to test any 2^u.3^v, i.e. generic radix 2 and 3 stages. If a radix 3 stage is available, any 2^u.3^v can be tested. But optimized versions can certainly not be generic and I may not write all of them. [/QUOTE] |
[QUOTE=paulunderwood;451658]Go for it -- you have 4-5 days to achieve it. :tu:[/QUOTE]
I went for it... ...and now the ETA is only 30 hrs. :whee: The initial patch is sent to Jean. He can make consistent llr code changes all over llr and provide API hooks for the ThreadsPerTest, like : extra options to command-line (e.g.[B] llr -t 4[/B]) and to llr.ini (e.g. [B]ThreadsPerTest=4 [/B]) |
Your patch comes as my LLR efforts nearly all exceed 1M digits, and I wish for two-threaded LLR frequently. Thank you very very much, sir!
|
Do you think that a multi-threaded LLR will improve throughput, considering the impact on cache of tests on large numbers? Would muti-socket boxes need to use utilities such as [c]taskset[/c]?
|
For some very large candidates, 2- or 4- threaded LLR may become handy even in the search process (like prime95), not just validation runs. Once this change is deployed, a bit of benchmarking and timing will be useful. Note that LLR has many modes in addition to Riesel and Proth; many of them may have different scaling behaviors.
[U]A trivial disclaimer[/U] (so that no one expects miracles): The scaling for very large number of threads (e.g. 16 or 32) should not be expected to very impressive. The scaling factor still goes up but almost level off at about 5x, maybe 6x. However, for 2 or 4 threads, this is likely going to be a contender to a conventional run. |
A small tweak about affinities:
I have rented one of my trusted instances (c4.8xlarge) because though it has less cores than m4.16xlarge - they are faster. And because scaling is almost flat between 18 and 32 cores. One experiment that I now ran in real/time was: while observing effective speed of the test (ms / iter), I played two scenarios against each other: 1. assign affinities 0-17 (that's all physical cores used once, and leave h/t functionality idle) 2. assign affinities 0-8, 18-26 (that's physical and hyper cores on only one of the two 9-core chips; there are two 9-core Xeon(R) CPU E5-2666 v3 @ 2.90GHz in this instance) Scenario 2 wins at about 12% premium. Which is specific for a 2-chip board like this and signals us that memory transport between chips is less effective than keeping the data in the cache of 1 CPU. More testing will be needed when the new LLR binary is released. |
[QUOTE=axn;451662]Yes. It will be @ #12 (see the TOP12th bit?), so #digits > 4053946[/QUOTE]
The cat is out of the bag: It is 4,055,114 digits. [URL="http://primes.utm.edu/primes/page.php?id=122812"]Phi(3, - 143332^393216) [/URL] Congrats to Ryan and Serge. And well done for the multi-threaded proof, Serge. :banana::banana::banana::banana: |
I ended up running two 18-threaded N-1 tests: one that started from a=3, then continued with a=5... and the other started from scratch with a=11 (in another folder and with affinities set to the 18 ht cores of the second 9-core chip). Both ran with [B][I]7.6 ms / iter[/I][/B] speed.
The a=11 won the jackpot in one go, the a=5 run again didn't get the proof (one of the three a^((N-1)/f)-1 was not coprime to N) and continued into a=7 and I killed it and released the instance. |
| All times are UTC. The time now is 14:35. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.