![]() |
Number of workers vs. number of CPUs
I'm a little confused on how many workers I should spawn in prime95. Should I spawn 1 worker per CPU, per Core, or per Hyperthread "core"?
The prime95 program offered to run 6 workers on my WIN7 virtual machine that I've assigned 32GB RAM and 20vCPUs to. Is that a good ratio? It appears that the VM is running at 100% CPU. Would it be better to run 1 worker and dedicate 20 CPUs to it? |
Boundaries that seem to apply across all CPU families:
More than one worker per physical core is not optimal (hyperthreaded cores should not be considered for worker count). Assigning one worker to more threads than a single physical socket is inefficient; each socket should get its own worker, at minimum. Within those two bounds, optimal production is determined by experimentation; the benchmark tools mostly automate this, but virtual machines are hard to pin down because thread assignments may go to HT cores sometimes but not others. |
[QUOTE=daxmick;472676]I'm a little confused on how many workers I should spawn in prime95. Should I spawn 1 worker per CPU, per Core, or per Hyperthread "core"?
The prime95 program offered to run 6 workers on my WIN7 virtual machine that I've assigned 32GB RAM and 20vCPUs to. Is that a good ratio? It appears that the VM is running at 100% CPU. Would it be better to run 1 worker and dedicate 20 CPUs to it?[/QUOTE] Welcome. First 100% CPU is always expected as Prime95 is very efficient. NEVER allocate more workers than Physical cores. (There is the very odd exception to this rule but not enough to consider). I'm guessing Prime95 thinks you have 6 Physical Cores... If you do indeed have 6 cores the general rule is to run 6 workers with 1 core each. Sometime it is slightly more efficient to run less workers with more cores each: For example 3 workers with 2 cores each or 2 workers with 3 cores each. If you want to complete a very large assignment quickly allocate all 6 cores to 1 worker. However, the overall throughput will be up to 25% less than 6 workers with 1 core each. NOTE: a very large assignment is something like an LL test on an exponent over 100 Million. If you have more or less physical cores adjust appropriately. |
[QUOTE=petrw1;472682]
I'm guessing Prime95 thinks you have 6 Physical Cores... [/QUOTE] First, thank you for the quick reply! The odd thing is that I have 2 physical CPUs (sockets) and each have 12 cores. So, if my math is correct I have 24 cores (48 with hyperthreading). So, if I want to maximize the number of "things" I'm working on I "could" have 12 workers, or if I wanted to maximize speed on completing a single "thing" I could have 1 worker. Is that how I should look at this? |
[QUOTE=daxmick;472692]First, thank you for the quick reply!
The odd thing is that I have 2 physical CPUs (sockets) and each have 12 cores. So, if my math is correct I have 24 cores (48 with hyperthreading). So, if I want to maximize the number of "things" I'm working on I "could" have 12 workers, or if I wanted to maximize speed on completing a single "thing" I could have 1 worker. Is that how I should look at this?[/QUOTE] If you really have 24 cores then you should have 24 workers. Your limiting factor may be RAM. With 32GB and 24 workers definitely do NOT run P-1 tests. Again unless you are doing a REALLY big assignment you would lose a reasonable amount of overall thruput putting all 24 cores on 1 assignment. As VBCurtis your best bet would be to run the Benchmark tool. On Version 2.8.x in Windows it is: Options... Benchmark. In Version 2.9.x there are a few more options. I believe you want a "Throughput" benchmark. Maybe someone can correct me. It the end it should direct you to the best worker/core mix. And further indicate the number of Physical cores. |
Options/Benchmark is your friend. Prime95 arbitrarily guessed 4 cores/worker would be pretty good.
Do a throughput benchmark using all 24 cores, a 4M FFT size, and 2,4,6,8,12 workers. Let us know what was best -- we are a curious bunch. |
[QUOTE=petrw1;472693]With 32GB and 24 workers definitely do NOT run P-1 tests.[/QUOTE]
So, RAM is included in the calculation? That adds to the question then... how much RAM per core should I account for? Or is it RAM per worker? I have up to 128GB of RAM available. [QUOTE=Prime95;472694]Options/Benchmark is your friend. Prime95 arbitrarily guessed 4 cores/worker would be pretty good. Do a throughput benchmark using all 24 cores, a 4M FFT size, and 2,4,6,8,12 workers. Let us know what was best -- we are a curious bunch.[/QUOTE] I wasn't able to adjust the workers for the Throughput benchmark. It was 1,2,6,20 cores (currently with 32GB RAM). Unfortunately I don't know how this program works well enough to really read the results (unless it is just the max iter/sec value). Maybe someone can decode/explain it? Here are the results: <snip> [Wed Nov 29 14:40:46 2017] Compare your results to other computers at [URL]http://www.mersenne.org/report_benchmarks[/URL] Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz CPU speed: 1371.03 MHz, 20 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 15 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Machine topology as determined by hwloc library: Machine#0 (total=31082972KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe) NUMANode#0 (local=15302680KB, total=15302680KB) Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7) L3 (size=15360KB, linesize=64, ways=20, Inclusive=1) L2 (size=256KB, linesize=64, ways=8, Inclusive=0) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000001) PU#0 (cpuset: 0x00000001) Core (cpuset: 0x00000002) PU#1 (cpuset: 0x00000002) Core (cpuset: 0x00000004) PU#2 (cpuset: 0x00000004) Core (cpuset: 0x00000008) PU#3 (cpuset: 0x00000008) Core (cpuset: 0x00000010) PU#4 (cpuset: 0x00000010) Core (cpuset: 0x00000020) PU#5 (cpuset: 0x00000020) Core (cpuset: 0x00000040) PU#6 (cpuset: 0x00000040) Core (cpuset: 0x00000080) PU#7 (cpuset: 0x00000080) Core (cpuset: 0x00000100) PU#8 (cpuset: 0x00000100) Core (cpuset: 0x00000200) PU#9 (cpuset: 0x00000200) NUMANode#1 (local=15780292KB, total=15780292KB) Package#1 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7) L3 (size=15360KB, linesize=64, ways=20, Inclusive=1) L2 (size=256KB, linesize=64, ways=8, Inclusive=0) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000400) PU#10 (cpuset: 0x00000400) Core (cpuset: 0x00000800) PU#11 (cpuset: 0x00000800) Core (cpuset: 0x00001000) PU#12 (cpuset: 0x00001000) Core (cpuset: 0x00002000) PU#13 (cpuset: 0x00002000) Core (cpuset: 0x00004000) PU#14 (cpuset: 0x00004000) Core (cpuset: 0x00008000) PU#15 (cpuset: 0x00008000) Core (cpuset: 0x00010000) PU#16 (cpuset: 0x00010000) Core (cpuset: 0x00020000) PU#17 (cpuset: 0x00020000) Core (cpuset: 0x00040000) PU#18 (cpuset: 0x00040000) Core (cpuset: 0x00080000) PU#19 (cpuset: 0x00080000) Prime95 64-bit version 29.4, RdtscTiming=1 Timings for 2048K FFT length (20 cores, 1 worker): 2.40 ms. Throughput: 417.35 iter/sec. Timings for 2048K FFT length (20 cores, 2 workers): 3.50, 3.51 ms. Throughput: 570.46 iter/sec. Timings for 2048K FFT length (20 cores, 6 workers): 10.70, 10.69, 8.42, 9.42, 12.00, 8.57 ms. Throughput: 611.89 iter/sec. Timings for 2048K FFT length (20 cores, 20 workers): 38.21, 38.33, 38.31, 38.31, 17.91, 35.94, 36.37, 18.15, 38.42, 38.39, 38.36, 38.32, 17.43, 38.42, 38.28, 38.19, 38.04, 38.25, 17.36, 38.39 ms. Throughput: 646.77 iter/sec. Timings for 2304K FFT length (20 cores, 1 worker): 4.15 ms. Throughput: 241.13 iter/sec. Timings for 2304K FFT length (20 cores, 2 workers): 3.85, 3.88 ms. Throughput: 517.34 iter/sec. Timings for 2304K FFT length (20 cores, 6 workers): 11.53, 10.27, 9.75, 10.23, 10.69, 10.34 ms. Throughput: 574.78 iter/sec. Timings for 2304K FFT length (20 cores, 20 workers): 40.52, 23.15, 35.27, 40.88, 40.39, 21.67, 40.95, 39.49, 31.57, 38.20, 39.93, 40.20, 19.38, 40.52, 40.27, 40.41, 19.36, 40.17, 40.23, 40.44 ms. Throughput: 601.11 iter/sec. Timings for 2400K FFT length (20 cores, 1 worker): 3.42 ms. Throughput: 292.46 iter/sec. Timings for 2400K FFT length (20 cores, 2 workers): 3.77, 3.77 ms. Throughput: 530.75 iter/sec. Timings for 2400K FFT length (20 cores, 6 workers): 12.33, 12.89, 7.86, 11.96, 9.72, 9.84 ms. Throughput: 574.05 iter/sec. Timings for 2400K FFT length (20 cores, 20 workers): 39.98, 40.55, 28.61, 40.29, 40.02, 39.59, 37.99, 24.52, 22.99, 40.43, 32.32, 40.09, 40.28, 40.18, 20.40, 40.19, 40.07, 40.01, 40.43, 23.68 ms. Throughput: 591.48 iter/sec. Timings for 2560K FFT length (20 cores, 1 worker): 3.14 ms. Throughput: 318.65 iter/sec. Timings for 2560K FFT length (20 cores, 2 workers): 4.35, 4.30 ms. Throughput: 462.82 iter/sec. Timings for 2560K FFT length (20 cores, 6 workers): 11.99, 13.23, 11.09, 11.03, 14.05, 11.33 ms. Throughput: 499.19 iter/sec. Timings for 2560K FFT length (20 cores, 20 workers): 39.06, 37.04, 47.32, 46.93, 22.23, 49.12, 51.32, 27.56, 48.94, 51.36, 48.66, 48.86, 48.21, 34.94, 49.27, 49.53, 49.32, 49.49, 25.27, 21.49 ms. Throughput: 513.50 iter/sec. Timings for 2688K FFT length (20 cores, 1 worker): 3.28 ms. Throughput: 304.46 iter/sec. Timings for 2688K FFT length (20 cores, 2 workers): 4.36, 4.35 ms. Throughput: 459.37 iter/sec. [Wed Nov 29 14:45:54 2017] Timings for 2688K FFT length (20 cores, 6 workers): 13.08, 13.23, 10.58, 12.76, 12.83, 11.04 ms. Throughput: 493.39 iter/sec. Timings for 2688K FFT length (20 cores, 20 workers): 24.45, 47.69, 48.00, 47.75, 28.96, 38.76, 37.22, 48.61, 48.32, 48.57, 47.70, 47.90, 23.82, 23.24, 44.10, 47.90, 48.27, 48.27, 48.01, 47.68 ms. Throughput: 506.34 iter/sec. Timings for 2880K FFT length (20 cores, 1 worker): 4.37 ms. Throughput: 228.88 iter/sec. Timings for 2880K FFT length (20 cores, 2 workers): 4.73, 4.85 ms. Throughput: 417.48 iter/sec. Timings for 2880K FFT length (20 cores, 6 workers): 16.77, 13.26, 10.49, 12.55, 14.41, 12.31 ms. Throughput: 460.71 iter/sec. Timings for 2880K FFT length (20 cores, 20 workers): 37.25, 40.22, 48.85, 33.47, 48.82, 24.89, 48.70, 49.19, 48.90, 49.24, 25.41, 46.63, 45.70, 48.11, 48.66, 48.40, 25.95, 48.13, 49.43, 49.20 ms. Throughput: 488.89 iter/sec. Timings for 3072K FFT length (20 cores, 1 worker): 3.55 ms. Throughput: 281.84 iter/sec. Timings for 3072K FFT length (20 cores, 2 workers): 5.41, 5.41 ms. Throughput: 369.54 iter/sec. Timings for 3072K FFT length (20 cores, 6 workers): 18.44, 17.31, 11.57, 11.79, 19.59, 16.38 ms. Throughput: 395.31 iter/sec. Timings for 3072K FFT length (20 cores, 20 workers): 62.78, 26.01, 63.60, 56.58, 67.10, 68.04, 68.02, 26.69, 66.81, 68.02, 68.14, 49.76, 36.64, 62.25, 31.23, 46.76, 52.42, 46.79, 61.31, 68.11 ms. Throughput: 402.18 iter/sec. Timings for 3200K FFT length (20 cores, 1 worker): 5.88 ms. Throughput: 169.94 iter/sec. Timings for 3200K FFT length (20 cores, 2 workers): 5.41, 6.12 ms. Throughput: 348.38 iter/sec. Timings for 3200K FFT length (20 cores, 6 workers): 14.67, 15.56, 13.00, 16.57, 14.94, 11.97 ms. Throughput: 420.22 iter/sec. Timings for 3200K FFT length (20 cores, 20 workers): 30.51, 46.17, 54.46, 39.68, 38.83, 54.05, 54.58, 54.56, 54.89, 46.56, 55.20, 54.06, 55.14, 55.12, 52.44, 54.26, 53.96, 54.72, 27.76, 28.53 ms. Throughput: 436.89 iter/sec. Timings for 3360K FFT length (20 cores, 1 worker): 3.65 ms. Throughput: 273.63 iter/sec. Timings for 3360K FFT length (20 cores, 2 workers): 5.39, 5.39 ms. Throughput: 370.87 iter/sec. Timings for 3360K FFT length (20 cores, 6 workers): 16.19, 15.84, 13.75, 14.42, 17.07, 14.15 ms. Throughput: 396.18 iter/sec. Timings for 3360K FFT length (20 cores, 20 workers): 31.55, 58.57, 58.84, 55.16, 58.61, 59.27, 58.73, 29.12, 54.68, 59.17, 58.91, 47.68, 58.45, 58.94, 34.64, 53.75, 51.15, 58.31, 31.31, 59.16 ms. Throughput: 409.43 iter/sec. Timings for 3456K FFT length (20 cores, 1 worker): 4.10 ms. Throughput: 244.11 iter/sec. [Wed Nov 29 14:51:09 2017] Timings for 3456K FFT length (20 cores, 2 workers): 5.92, 5.87 ms. Throughput: 339.44 iter/sec. Timings for 3456K FFT length (20 cores, 6 workers): 19.57, 17.84, 13.56, 16.80, 18.63, 14.50 ms. Throughput: 363.07 iter/sec. Timings for 3456K FFT length (20 cores, 20 workers): 65.46, 61.70, 63.60, 65.22, 66.48, 33.04, 52.40, 66.38, 34.28, 66.15, 66.54, 39.14, 64.78, 66.58, 64.80, 62.08, 64.60, 46.17, 65.94, 31.01 ms. Throughput: 373.41 iter/sec. Timings for 3584K FFT length (20 cores, 1 worker): 4.16 ms. Throughput: 240.15 iter/sec. Timings for 3584K FFT length (20 cores, 2 workers): 6.72, 6.71 ms. Throughput: 297.75 iter/sec. Timings for 3584K FFT length (20 cores, 6 workers): 20.76, 22.58, 14.98, 16.59, 24.33, 16.51 ms. Throughput: 321.14 iter/sec. Timings for 3584K FFT length (20 cores, 20 workers): 76.80, 75.28, 75.80, 80.42, 72.56, 81.19, 72.34, 33.61, 81.32, 32.69, 81.03, 80.52, 73.16, 77.12, 76.79, 33.11, 64.69, 79.31, 38.12, 75.66 ms. Throughput: 326.63 iter/sec. Timings for 3840K FFT length (20 cores, 1 worker): 5.41 ms. Throughput: 185.01 iter/sec. Timings for 3840K FFT length (20 cores, 2 workers): 6.59, 6.56 ms. Throughput: 304.39 iter/sec. Timings for 3840K FFT length (20 cores, 6 workers): 17.96, 20.20, 16.72, 15.19, 23.88, 17.07 ms. Throughput: 331.30 iter/sec. Timings for 3840K FFT length (20 cores, 20 workers): 71.68, 39.16, 53.96, 71.43, 70.50, 53.34, 71.57, 49.44, 71.82, 56.43, 68.67, 68.87, 40.21, 70.96, 71.39, 71.04, 40.10, 44.88, 71.68, 71.31 ms. Throughput: 342.12 iter/sec. Timings for 4032K FFT length (20 cores, 1 worker): 4.73 ms. Throughput: 211.54 iter/sec. Timings for 4032K FFT length (20 cores, 2 workers): 6.99, 6.98 ms. Throughput: 286.28 iter/sec. Timings for 4032K FFT length (20 cores, 6 workers): 16.60, 25.97, 18.07, 19.64, 19.20, 19.79 ms. Throughput: 307.60 iter/sec. Timings for 4032K FFT length (20 cores, 20 workers): 76.88, 79.40, 79.36, 36.92, 76.59, 60.98, 63.29, 47.68, 78.14, 78.19, 48.42, 57.99, 78.47, 61.37, 78.25, 62.23, 44.51, 79.94, 77.13, 78.79 ms. Throughput: 313.53 iter/sec. Timings for 4096K FFT length (20 cores, 1 worker): 5.18 ms. Throughput: 193.18 iter/sec. Timings for 4096K FFT length (20 cores, 2 workers): 7.31, 7.29 ms. Throughput: 274.03 iter/sec. Timings for 4096K FFT length (20 cores, 6 workers): 22.95, 20.14, 18.22, 22.82, 22.11, 16.91 ms. Throughput: 296.26 iter/sec. [Wed Nov 29 14:56:14 2017] Timings for 4096K FFT length (20 cores, 20 workers): 79.73, 79.14, 77.49, 78.53, 79.39, 39.13, 39.36, 79.21, 79.83, 79.68, 71.10, 66.34, 59.75, 70.98, 79.29, 55.76, 79.08, 40.79, 79.47, 78.83 ms. Throughput: 305.01 iter/sec. Timings for 4480K FFT length (20 cores, 1 worker): 5.38 ms. Throughput: 185.84 iter/sec. Timings for 4480K FFT length (20 cores, 2 workers): 7.49, 7.47 ms. Throughput: 267.44 iter/sec. Timings for 4480K FFT length (20 cores, 6 workers): 25.92, 18.35, 20.65, 20.54, 24.16, 19.28 ms. Throughput: 283.43 iter/sec. Timings for 4480K FFT length (20 cores, 20 workers): 41.13, 83.66, 41.92, 82.51, 83.57, 81.01, 83.30, 83.00, 83.33, 83.33, 48.19, 83.46, 83.62, 82.28, 82.34, 83.66, 83.56, 69.19, 74.60, 40.70 ms. Throughput: 289.94 iter/sec. Timings for 4608K FFT length (20 cores, 1 worker): 5.61 ms. Throughput: 178.17 iter/sec. Timings for 4608K FFT length (20 cores, 2 workers): 7.90, 7.89 ms. Throughput: 253.43 iter/sec. Timings for 4608K FFT length (20 cores, 6 workers): 25.14, 22.88, 19.34, 23.64, 25.61, 18.41 ms. Throughput: 270.86 iter/sec. Timings for 4608K FFT length (20 cores, 20 workers): 86.80, 86.17, 87.27, 86.26, 88.85, 42.12, 86.63, 79.47, 44.13, 88.85, 88.15, 87.34, 87.18, 42.15, 86.41, 88.15, 41.88, 86.70, 86.29, 86.48 ms. Throughput: 278.69 iter/sec. Timings for 4800K FFT length (20 cores, 1 worker): 5.66 ms. Throughput: 176.62 iter/sec. Timings for 4800K FFT length (20 cores, 2 workers): 8.23, 8.19 ms. Throughput: 243.52 iter/sec. Timings for 4800K FFT length (20 cores, 6 workers): 25.57, 26.91, 18.84, 23.79, 23.07, 22.78 ms. Throughput: 258.64 iter/sec. Timings for 4800K FFT length (20 cores, 20 workers): 93.11, 90.29, 91.82, 47.44, 59.41, 94.25, 50.78, 92.35, 92.56, 85.95, 59.33, 90.55, 94.93, 42.52, 90.34, 92.09, 91.99, 94.98, 90.94, 54.14 ms. Throughput: 268.94 iter/sec. Timings for 5120K FFT length (20 cores, 1 worker): 6.47 ms. Throughput: 154.51 iter/sec. Timings for 5120K FFT length (20 cores, 2 workers): 9.21, 9.16 ms. Throughput: 217.75 iter/sec. Timings for 5120K FFT length (20 cores, 6 workers): 30.00, 30.28, 20.17, 34.15, 22.34, 23.50 ms. Throughput: 232.52 iter/sec. Timings for 5120K FFT length (20 cores, 20 workers): 49.59, 101.60, 101.01, 100.88, 101.62, 99.63, 98.00, 100.16, 48.96, 101.52, 99.51, 91.20, 49.11, 101.38, 96.45, 101.02, 100.16, 100.72, 50.93, 101.14 ms. Throughput: 241.10 iter/sec. Timings for 5376K FFT length (20 cores, 1 worker): 6.53 ms. Throughput: 153.05 iter/sec. [Wed Nov 29 15:01:24 2017] Timings for 5376K FFT length (20 cores, 2 workers): 9.34, 9.31 ms. Throughput: 214.41 iter/sec. Timings for 5376K FFT length (20 cores, 6 workers): 25.42, 30.07, 24.28, 21.27, 34.93, 25.79 ms. Throughput: 228.21 iter/sec. Timings for 5376K FFT length (20 cores, 20 workers): 69.55, 77.28, 96.74, 104.01, 102.45, 58.99, 74.29, 102.78, 103.01, 94.24, 102.66, 101.69, 103.75, 71.43, 88.18, 57.02, 63.89, 103.12, 90.30, 103.89 ms. Throughput: 235.63 iter/sec. Timings for 5760K FFT length (20 cores, 1 worker): 7.09 ms. Throughput: 141.02 iter/sec. Timings for 5760K FFT length (20 cores, 2 workers): 9.66, 9.61 ms. Throughput: 207.63 iter/sec. Timings for 5760K FFT length (20 cores, 6 workers): 31.77, 30.04, 21.98, 32.62, 23.01, 27.46 ms. Throughput: 220.78 iter/sec. Timings for 5760K FFT length (20 cores, 20 workers): 63.67, 107.85, 107.65, 61.85, 106.82, 65.99, 100.42, 108.34, 108.39, 105.96, 106.88, 107.29, 107.29, 107.88, 86.25, 109.04, 107.83, 53.38, 109.05, 56.44 ms. Throughput: 225.73 iter/sec. Timings for 6144K FFT length (20 cores, 1 worker): 7.70 ms. Throughput: 129.90 iter/sec. Timings for 6144K FFT length (20 cores, 2 workers): 11.12, 11.12 ms. Throughput: 179.86 iter/sec. Timings for 6144K FFT length (20 cores, 6 workers): 28.95, 40.99, 26.71, 41.50, 35.84, 22.80 ms. Throughput: 192.23 iter/sec. Timings for 6144K FFT length (20 cores, 20 workers): 66.36, 125.75, 124.54, 83.76, 123.24, 123.94, 74.74, 124.56, 125.17, 100.65, 130.60, 125.03, 105.78, 123.31, 122.69, 65.99, 124.44, 130.60, 59.97, 110.87 ms. Throughput: 196.41 iter/sec. Timings for 6400K FFT length (20 cores, 1 worker): 7.74 ms. Throughput: 129.23 iter/sec. Timings for 6400K FFT length (20 cores, 2 workers): 11.50, 11.42 ms. Throughput: 174.47 iter/sec. Timings for 6400K FFT length (20 cores, 6 workers): 43.84, 33.20, 25.16, 34.96, 33.42, 28.68 ms. Throughput: 186.05 iter/sec. Timings for 6400K FFT length (20 cores, 20 workers): 67.83, 59.96, 129.29, 122.52, 133.75, 132.97, 126.52, 126.06, 133.96, 100.43, 127.68, 58.84, 135.45, 133.65, 130.27, 133.77, 135.44, 129.95, 128.17, 58.70 ms. Throughput: 190.33 iter/sec. Timings for 6720K FFT length (20 cores, 1 worker): 8.31 ms. Throughput: 120.40 iter/sec. Timings for 6720K FFT length (20 cores, 2 workers): 11.53, 11.37 ms. Throughput: 174.72 iter/sec. [Wed Nov 29 15:06:26 2017] Timings for 6720K FFT length (20 cores, 6 workers): 27.68, 43.63, 30.57, 42.24, 32.78, 25.91 ms. Throughput: 184.54 iter/sec. Timings for 6720K FFT length (20 cores, 20 workers): 129.37, 129.04, 115.29, 130.64, 68.38, 129.43, 119.89, 62.04, 128.76, 127.72, 129.64, 126.78, 127.53, 128.82, 76.40, 61.43, 128.18, 129.49, 101.97, 111.66 ms. Throughput: 189.07 iter/sec. Timings for 6912K FFT length (20 cores, 1 worker): 8.58 ms. Throughput: 116.49 iter/sec. Timings for 6912K FFT length (20 cores, 2 workers): 13.05, 12.98 ms. Throughput: 153.71 iter/sec. Timings for 6912K FFT length (20 cores, 6 workers): 35.86, 37.57, 37.19, 45.06, 46.26, 26.01 ms. Throughput: 163.65 iter/sec. Timings for 6912K FFT length (20 cores, 20 workers): 155.31, 65.91, 158.20, 158.95, 160.21, 158.13, 160.28, 158.85, 155.31, 66.22, 73.09, 152.40, 150.07, 152.12, 155.50, 123.06, 150.83, 75.56, 116.95, 157.62 ms. Throughput: 163.66 iter/sec. Timings for 7168K FFT length (20 cores, 1 worker): 8.95 ms. Throughput: 111.73 iter/sec. Timings for 7168K FFT length (20 cores, 2 workers): 13.20, 13.23 ms. Throughput: 151.34 iter/sec. Timings for 7168K FFT length (20 cores, 6 workers): 50.22, 34.11, 32.42, 37.41, 37.99, 36.57 ms. Throughput: 160.47 iter/sec. Timings for 7168K FFT length (20 cores, 20 workers): 69.52, 151.76, 151.72, 151.10, 153.70, 152.53, 151.18, 69.46, 153.64, 152.80, 149.58, 147.73, 144.40, 148.19, 149.92, 149.94, 68.87, 72.03, 140.11, 148.58 ms. Throughput: 164.05 iter/sec. Timings for 7680K FFT length (20 cores, 1 worker): 9.23 ms. Throughput: 108.37 iter/sec. Timings for 7680K FFT length (20 cores, 2 workers): 14.34, 14.34 ms. Throughput: 139.47 iter/sec. Timings for 7680K FFT length (20 cores, 6 workers): 50.29, 31.75, 43.11, 54.94, 33.58, 39.34 ms. Throughput: 147.98 iter/sec. Timings for 7680K FFT length (20 cores, 20 workers): 167.94, 71.75, 93.29, 109.12, 179.30, 179.36, 176.81, 171.65, 166.21, 176.98, 91.27, 102.03, 179.94, 163.99, 167.54, 168.75, 168.41, 163.89, 152.53, 81.45 ms. Throughput: 149.26 iter/sec. Timings for 8000K FFT length (20 cores, 1 worker): 9.82 ms. Throughput: 101.84 iter/sec. Timings for 8000K FFT length (20 cores, 2 workers): 14.04, 13.93 ms. Throughput: 143.02 iter/sec. Timings for 8000K FFT length (20 cores, 6 workers): 50.13, 49.67, 27.40, 49.86, 37.47, 34.30 ms. Throughput: 152.48 iter/sec. [Wed Nov 29 15:11:36 2017] Timings for 8000K FFT length (20 cores, 20 workers): 154.19, 72.59, 158.16, 163.38, 72.88, 155.24, 151.27, 148.87, 163.62, 159.39, 99.64, 155.69, 159.07, 98.74, 158.52, 163.94, 164.93, 164.83, 89.34, 113.56 ms. Throughput: 155.99 iter/sec. Timings for 8192K FFT length (20 cores, 1 worker): 10.50 ms. Throughput: 95.27 iter/sec. Timings for 8192K FFT length (20 cores, 2 workers): 15.86, 15.82 ms. Throughput: 126.27 iter/sec. Timings for 8192K FFT length (20 cores, 6 workers): 46.48, 43.33, 46.01, 60.91, 42.41, 37.49 ms. Throughput: 133.00 iter/sec. Timings for 8192K FFT length (20 cores, 20 workers): 184.80, 93.81, 182.43, 187.74, 145.93, 113.38, 189.01, 148.27, 187.70, 139.96, 187.41, 182.88, 189.76, 183.01, 187.30, 87.91, 166.80, 135.81, 96.36, 189.81 ms. Throughput: 134.32 iter/sec. </snip> |
[QUOTE=daxmick;472710]So, RAM is included in the calculation? That adds to the question then... how much RAM per core should I account for? Or is it RAM per worker? I have up to 128GB of RAM available.
I wasn't able to adjust the workers for the Throughput benchmark. It was 1,2,6,20 cores (currently with 32GB RAM). Unfortunately I don't know how this program works well enough to really read the results (unless it is just the max iter/sec value). Maybe someone can decode/explain it?[/QUOTE] RAM is irrelevant. Prime95 will use on the order of 50MB per worker. Yes, it simply is a case of maximizing the throughput (iter/sec) value. Which in your case seems heavily skewed to one core per worker. I'd try benching the 5, 10, 20 worker case just to be sure (I previously suggested 6,12 because I thought you had a 24 worker case). Assuming the 20 worker benchmark maintains the best throughput, the only question remaining is "do you have the patience to wait for 20 workers to plod along at a slow pace before getting any results?". GIMPS is better off with 4 completed results after a week's time rather than 20 abandoned partially completed results in a week's time. |
[QUOTE=Prime95;472715]Assuming the 20 worker benchmark maintains the best throughput, the only question remaining is "do you have the patience to wait for 20 workers to plod along at a slow pace before getting any results?". GIMPS is better off with 4 completed results after a week's time rather than 20 abandoned partially completed results in a week's time.[/QUOTE]
I'm not worried about how many or how long. I'm looking for best overall performance in the long term. In other words, if it is more efficient to "plod through" 20 concurrent tests over several weeks vs. 4 concurrent tests in just a few days then I'd do the 20. BUT if I can do multiple "4 concurrent tests" (in this case more than 5 in series) faster than the 20 concurrent then I should choose to use 4 workers, Yes? Just trying to figure out how to read the results output and decide which is best to do. |
[QUOTE=daxmick;472717]I'm not worried about how many or how long. I'm looking for best overall performance in the long term. In other words, if it is more efficient to "plod through" 20 concurrent tests over several weeks vs. 4 concurrent tests in just a few days then I'd do the 20. BUT if I can do multiple "4 concurrent tests" (in this case more than 5 in series) faster than the 20 concurrent then I should choose to use 4 workers, Yes?
Just trying to figure out how to read the results output and decide which is best to do.[/QUOTE]If your only goal is to maximise throughput over the [b]long term[/b] then you only need to look at the value "iter/sec" and aim to maximise it. |
[QUOTE=retina;472718]If your only goal is to maximise throughput over the [B]long term[/B] then you only need to look at the value "iter/sec" and aim to maximise it.[/QUOTE]
Which, from the above results output, appears to be 20 cores and 6 workers, yes? (Which happens to be the suggested number of workers when I first started the program.) :smile: |
[QUOTE=daxmick;472719]Which, from the above results output, appears to be 20 cores and 6 workers, yes? (Which happens to be the suggested number of workers when I first started the program.) :smile:[/QUOTE]Nope. It looks to me as though 20 cores / 20 workers is better for you.
|
With apologies for jumping onto this thread. I just ran the test and got the following results for my i7 8700K, 6-cores. This indicates I should use the setting of 1 worker, yes? I have hyper-threading checked for trial factoring, but NOT for LL, P-1, ECM. For what it's worth, that's my default setting.
Timings for 2048K FFT length (6 cores, 1 worker): 1.74 ms. Throughput: 573.89 iter/sec. Timings for 2048K FFT length (6 cores, 6 workers): 13.94, 14.11, 13.81, 13.88, 14.04, 14.19 ms. Throughput: 428.75 iter/sec. Timings for 2048K FFT length (6 cores hyperthreaded, 1 worker): 1.81 ms. Throughput: 552.21 iter/sec. Timings for 2048K FFT length (6 cores hyperthreaded, 6 workers): 14.76, 14.51, 14.41, 14.52, 14.96, 14.42 ms. Throughput: 411.06 iter/sec. Timings for 2304K FFT length (6 cores, 1 worker): 2.16 ms. Throughput: 462.90 iter/sec. Timings for 2304K FFT length (6 cores, 6 workers): 15.98, 15.58, 15.51, 15.32, 15.50, 15.83 ms. Throughput: 384.19 iter/sec. Timings for 2304K FFT length (6 cores hyperthreaded, 1 worker): 2.30 ms. Throughput: 435.24 iter/sec. Timings for 2304K FFT length (6 cores hyperthreaded, 6 workers): 16.72, 16.57, 16.55, 16.47, 17.16, 17.06 ms. Throughput: 358.20 iter/sec. Timings for 2400K FFT length (6 cores, 1 worker): 2.18 ms. Throughput: 457.92 iter/sec. Timings for 2400K FFT length (6 cores, 6 workers): 16.72, 16.50, 16.25, 16.09, 16.45, 16.61 ms. Throughput: 365.12 iter/sec. Timings for 2400K FFT length (6 cores hyperthreaded, 1 worker): 2.58 ms. Throughput: 386.97 iter/sec. Timings for 2400K FFT length (6 cores hyperthreaded, 6 workers): 18.28, 18.26, 18.98, 18.05, 18.80, 18.41 ms. Throughput: 325.06 iter/sec. Timings for 2560K FFT length (6 cores, 1 worker): 2.44 ms. Throughput: 410.15 iter/sec. Timings for 2560K FFT length (6 cores, 6 workers): 17.64, 17.54, 17.61, 17.53, 17.58, 17.63 ms. Throughput: 341.13 iter/sec. Timings for 2560K FFT length (6 cores hyperthreaded, 1 worker): 2.58 ms. Throughput: 387.11 iter/sec. Timings for 2560K FFT length (6 cores hyperthreaded, 6 workers): 18.41, 18.21, 17.93, 17.76, 18.89, 18.67 ms. Throughput: 327.82 iter/sec. Timings for 2688K FFT length (6 cores, 1 worker): 2.58 ms. Throughput: 386.93 iter/sec. |
[QUOTE=ctteg;480660]With apologies for jumping onto this thread. I just ran the test and got the following results for my i7 8700K, 6-cores. This indicates I should use the setting of 1 worker, yes? [/QUOTE]
Yes |
"With apologies....."
Please. No need. Your post seems completely agreeable with the topic. :smile: |
Thank you both. Much appreciated.
|
| All times are UTC. The time now is 17:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.