![]() |
![]() |
#1 |
"Sam"
Nov 2016
2×163 Posts |
![]()
I know there's a way to run different LLR instances and have them assigned to different designated CPU, making it run significantly faster than if only one instance were used.
I am using a 4 core, 8 thread CPU. In the attachment I sent, one instance of LLR is running with only one thread, and time per bit is 0.576 ms. The CPU affinity is set to 0. After terminating the program, I copy the LLR exectuable to another directory and run a test on a number of similar size to the first run (one thread). The CPU affinity is set to 1. I check on the first run, when I notice a time increase of 1.172 ms. almost twice as running one one LLR application! No speedup whatsoever. My goal is to run 4 instances of LLR with similar time sufficiency as only running one instance of LLR single threaded (4 instances each running with close to 0.576 ms. per bit, so that testing is 4x faster). Does anyone know what I am doing wrong here? I am aware that running a single instance with 8 threads is less productive than running 4 single threaded instances and for some reason I never figured out how to achieve the latter. Thanks for help! |
![]() |
![]() |
![]() |
#2 |
Sep 2002
Database er0rr
2×5×359 Posts |
![]()
Running only one instance has all the cache too itself and will run quicker than running two instances where there will be contention for cache. On a 4c/8t box I run on instance with the -t4 option. I think this approach is cache friendlier.
|
![]() |
![]() |
![]() |
#3 |
"Curtis"
Feb 2005
Riverside, CA
26·73 Posts |
![]()
In Windows, are cores 0 and 1 hyperthreads of the same physical core? That would explain your timing exactly doubling.
What happens when you assign the second LLR copy to core 2 rather than 1? Have you tried not assigning affinity? I've had decent luck just letting Windows utilize the cores- manually assigning affinity does help sometimes, but for this use case I'm not sure it matters for you. |
![]() |
![]() |
![]() |
#4 |
"Sam"
Nov 2016
32610 Posts |
![]()
Thanks for the suggestions! I ran 4 subsequent instances of LLR --- assigning affinity to CPUS 0, 2.
The time increased by about 0.120 ms which I guess makes sense given that more cores means slower clock speed. I loaded up 4 instances running on CPUS 0, 2, 4, 6 and the time per bit almost doubled --- a (0.380 ms. increase). I think Paul is right --- running four threads on one instance seems to be faster than running 4 instances single threaded. I would think that with larger number of cores, say 12 or 16, the latter might become slower? |
![]() |
![]() |
![]() |
#5 |
Sep 2002
Database er0rr
E0616 Posts |
![]()
I don't know about 12 core chips running LLR, but generally it makes sense to run 1 instance per chip or chiplet.
|
![]() |
![]() |
![]() |
#6 | |
"Curtis"
Feb 2005
Riverside, CA
26×73 Posts |
![]() Quote:
Once FFT reaches 256K, 2-threaded runs work pretty well. OP- I've run LLR on this size of number on prebuilt machines with slow 2-channel memory, and running 3 instances was just about as fast as 4 but generated quite a bit less heat. That is, 3 is enough to saturate the memory on some quad-core machines. It takes some experimenting with threads-per-process and number of processes to find the sweet spot! |
|
![]() |
![]() |
![]() |
#7 | |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×2,909 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prime95 and cpu affinity | pepi37 | Software | 4 | 2019-04-25 05:51 |
Unexplained slowdown (affinity problem?) | Siegmund | Software | 6 | 2017-06-03 05:31 |
[Patch] CPU affinity prompt problem in mprime Linux / OS X build | Explorer09 | Software | 1 | 2017-03-01 02:34 |
Set affinity does not work | g33py | Software | 3 | 2016-07-27 05:26 |
Processor Affinity | R.D. Silverman | Programming | 19 | 2015-04-24 22:46 |