![]() |
![]() |
#1 |
Feb 2014
5416 Posts |
![]()
I'm a little confused on how many workers I should spawn in prime95. Should I spawn 1 worker per CPU, per Core, or per Hyperthread "core"?
The prime95 program offered to run 6 workers on my WIN7 virtual machine that I've assigned 32GB RAM and 20vCPUs to. Is that a good ratio? It appears that the VM is running at 100% CPU. Would it be better to run 1 worker and dedicate 20 CPUs to it? |
![]() |
![]() |
![]() |
#2 |
"Curtis"
Feb 2005
Riverside, CA
460410 Posts |
![]()
Boundaries that seem to apply across all CPU families:
More than one worker per physical core is not optimal (hyperthreaded cores should not be considered for worker count). Assigning one worker to more threads than a single physical socket is inefficient; each socket should get its own worker, at minimum. Within those two bounds, optimal production is determined by experimentation; the benchmark tools mostly automate this, but virtual machines are hard to pin down because thread assignments may go to HT cores sometimes but not others. |
![]() |
![]() |
![]() |
#3 | |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2·3·5·151 Posts |
![]() Quote:
First 100% CPU is always expected as Prime95 is very efficient. NEVER allocate more workers than Physical cores. (There is the very odd exception to this rule but not enough to consider). I'm guessing Prime95 thinks you have 6 Physical Cores... If you do indeed have 6 cores the general rule is to run 6 workers with 1 core each. Sometime it is slightly more efficient to run less workers with more cores each: For example 3 workers with 2 cores each or 2 workers with 3 cores each. If you want to complete a very large assignment quickly allocate all 6 cores to 1 worker. However, the overall throughput will be up to 25% less than 6 workers with 1 core each. NOTE: a very large assignment is something like an LL test on an exponent over 100 Million. If you have more or less physical cores adjust appropriately. |
|
![]() |
![]() |
![]() |
#4 |
Feb 2014
22·3·7 Posts |
![]()
First, thank you for the quick reply!
The odd thing is that I have 2 physical CPUs (sockets) and each have 12 cores. So, if my math is correct I have 24 cores (48 with hyperthreading). So, if I want to maximize the number of "things" I'm working on I "could" have 12 workers, or if I wanted to maximize speed on completing a single "thing" I could have 1 worker. Is that how I should look at this? |
![]() |
![]() |
![]() |
#5 | |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2·3·5·151 Posts |
![]() Quote:
Your limiting factor may be RAM. With 32GB and 24 workers definitely do NOT run P-1 tests. Again unless you are doing a REALLY big assignment you would lose a reasonable amount of overall thruput putting all 24 cores on 1 assignment. As VBCurtis your best bet would be to run the Benchmark tool. On Version 2.8.x in Windows it is: Options... Benchmark. In Version 2.9.x there are a few more options. I believe you want a "Throughput" benchmark. Maybe someone can correct me. It the end it should direct you to the best worker/core mix. And further indicate the number of Physical cores. |
|
![]() |
![]() |
![]() |
#6 |
P90 years forever!
Aug 2002
Yeehaw, FL
1C7E16 Posts |
![]()
Options/Benchmark is your friend. Prime95 arbitrarily guessed 4 cores/worker would be pretty good.
Do a throughput benchmark using all 24 cores, a 4M FFT size, and 2,4,6,8,12 workers. Let us know what was best -- we are a curious bunch. |
![]() |
![]() |
![]() |
#7 | |
Feb 2014
22×3×7 Posts |
![]()
So, RAM is included in the calculation? That adds to the question then... how much RAM per core should I account for? Or is it RAM per worker? I have up to 128GB of RAM available.
Quote:
<snip> [Wed Nov 29 14:40:46 2017] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz CPU speed: 1371.03 MHz, 20 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 15 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Machine topology as determined by hwloc library: Machine#0 (total=31082972KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe) NUMANode#0 (local=15302680KB, total=15302680KB) Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7) L3 (size=15360KB, linesize=64, ways=20, Inclusive=1) L2 (size=256KB, linesize=64, ways=8, Inclusive=0) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000001) PU#0 (cpuset: 0x00000001) Core (cpuset: 0x00000002) PU#1 (cpuset: 0x00000002) Core (cpuset: 0x00000004) PU#2 (cpuset: 0x00000004) Core (cpuset: 0x00000008) PU#3 (cpuset: 0x00000008) Core (cpuset: 0x00000010) PU#4 (cpuset: 0x00000010) Core (cpuset: 0x00000020) PU#5 (cpuset: 0x00000020) Core (cpuset: 0x00000040) PU#6 (cpuset: 0x00000040) Core (cpuset: 0x00000080) PU#7 (cpuset: 0x00000080) Core (cpuset: 0x00000100) PU#8 (cpuset: 0x00000100) Core (cpuset: 0x00000200) PU#9 (cpuset: 0x00000200) NUMANode#1 (local=15780292KB, total=15780292KB) Package#1 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=45, CPUModel="Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz", CPUStepping=7) L3 (size=15360KB, linesize=64, ways=20, Inclusive=1) L2 (size=256KB, linesize=64, ways=8, Inclusive=0) L1d (size=32KB, linesize=64, ways=8, Inclusive=0) Core (cpuset: 0x00000400) PU#10 (cpuset: 0x00000400) Core (cpuset: 0x00000800) PU#11 (cpuset: 0x00000800) Core (cpuset: 0x00001000) PU#12 (cpuset: 0x00001000) Core (cpuset: 0x00002000) PU#13 (cpuset: 0x00002000) Core (cpuset: 0x00004000) PU#14 (cpuset: 0x00004000) Core (cpuset: 0x00008000) PU#15 (cpuset: 0x00008000) Core (cpuset: 0x00010000) PU#16 (cpuset: 0x00010000) Core (cpuset: 0x00020000) PU#17 (cpuset: 0x00020000) Core (cpuset: 0x00040000) PU#18 (cpuset: 0x00040000) Core (cpuset: 0x00080000) PU#19 (cpuset: 0x00080000) Prime95 64-bit version 29.4, RdtscTiming=1 Timings for 2048K FFT length (20 cores, 1 worker): 2.40 ms. Throughput: 417.35 iter/sec. Timings for 2048K FFT length (20 cores, 2 workers): 3.50, 3.51 ms. Throughput: 570.46 iter/sec. Timings for 2048K FFT length (20 cores, 6 workers): 10.70, 10.69, 8.42, 9.42, 12.00, 8.57 ms. Throughput: 611.89 iter/sec. Timings for 2048K FFT length (20 cores, 20 workers): 38.21, 38.33, 38.31, 38.31, 17.91, 35.94, 36.37, 18.15, 38.42, 38.39, 38.36, 38.32, 17.43, 38.42, 38.28, 38.19, 38.04, 38.25, 17.36, 38.39 ms. Throughput: 646.77 iter/sec. Timings for 2304K FFT length (20 cores, 1 worker): 4.15 ms. Throughput: 241.13 iter/sec. Timings for 2304K FFT length (20 cores, 2 workers): 3.85, 3.88 ms. Throughput: 517.34 iter/sec. Timings for 2304K FFT length (20 cores, 6 workers): 11.53, 10.27, 9.75, 10.23, 10.69, 10.34 ms. Throughput: 574.78 iter/sec. Timings for 2304K FFT length (20 cores, 20 workers): 40.52, 23.15, 35.27, 40.88, 40.39, 21.67, 40.95, 39.49, 31.57, 38.20, 39.93, 40.20, 19.38, 40.52, 40.27, 40.41, 19.36, 40.17, 40.23, 40.44 ms. Throughput: 601.11 iter/sec. Timings for 2400K FFT length (20 cores, 1 worker): 3.42 ms. Throughput: 292.46 iter/sec. Timings for 2400K FFT length (20 cores, 2 workers): 3.77, 3.77 ms. Throughput: 530.75 iter/sec. Timings for 2400K FFT length (20 cores, 6 workers): 12.33, 12.89, 7.86, 11.96, 9.72, 9.84 ms. Throughput: 574.05 iter/sec. Timings for 2400K FFT length (20 cores, 20 workers): 39.98, 40.55, 28.61, 40.29, 40.02, 39.59, 37.99, 24.52, 22.99, 40.43, 32.32, 40.09, 40.28, 40.18, 20.40, 40.19, 40.07, 40.01, 40.43, 23.68 ms. Throughput: 591.48 iter/sec. Timings for 2560K FFT length (20 cores, 1 worker): 3.14 ms. Throughput: 318.65 iter/sec. Timings for 2560K FFT length (20 cores, 2 workers): 4.35, 4.30 ms. Throughput: 462.82 iter/sec. Timings for 2560K FFT length (20 cores, 6 workers): 11.99, 13.23, 11.09, 11.03, 14.05, 11.33 ms. Throughput: 499.19 iter/sec. Timings for 2560K FFT length (20 cores, 20 workers): 39.06, 37.04, 47.32, 46.93, 22.23, 49.12, 51.32, 27.56, 48.94, 51.36, 48.66, 48.86, 48.21, 34.94, 49.27, 49.53, 49.32, 49.49, 25.27, 21.49 ms. Throughput: 513.50 iter/sec. Timings for 2688K FFT length (20 cores, 1 worker): 3.28 ms. Throughput: 304.46 iter/sec. Timings for 2688K FFT length (20 cores, 2 workers): 4.36, 4.35 ms. Throughput: 459.37 iter/sec. [Wed Nov 29 14:45:54 2017] Timings for 2688K FFT length (20 cores, 6 workers): 13.08, 13.23, 10.58, 12.76, 12.83, 11.04 ms. Throughput: 493.39 iter/sec. Timings for 2688K FFT length (20 cores, 20 workers): 24.45, 47.69, 48.00, 47.75, 28.96, 38.76, 37.22, 48.61, 48.32, 48.57, 47.70, 47.90, 23.82, 23.24, 44.10, 47.90, 48.27, 48.27, 48.01, 47.68 ms. Throughput: 506.34 iter/sec. Timings for 2880K FFT length (20 cores, 1 worker): 4.37 ms. Throughput: 228.88 iter/sec. Timings for 2880K FFT length (20 cores, 2 workers): 4.73, 4.85 ms. Throughput: 417.48 iter/sec. Timings for 2880K FFT length (20 cores, 6 workers): 16.77, 13.26, 10.49, 12.55, 14.41, 12.31 ms. Throughput: 460.71 iter/sec. Timings for 2880K FFT length (20 cores, 20 workers): 37.25, 40.22, 48.85, 33.47, 48.82, 24.89, 48.70, 49.19, 48.90, 49.24, 25.41, 46.63, 45.70, 48.11, 48.66, 48.40, 25.95, 48.13, 49.43, 49.20 ms. Throughput: 488.89 iter/sec. Timings for 3072K FFT length (20 cores, 1 worker): 3.55 ms. Throughput: 281.84 iter/sec. Timings for 3072K FFT length (20 cores, 2 workers): 5.41, 5.41 ms. Throughput: 369.54 iter/sec. Timings for 3072K FFT length (20 cores, 6 workers): 18.44, 17.31, 11.57, 11.79, 19.59, 16.38 ms. Throughput: 395.31 iter/sec. Timings for 3072K FFT length (20 cores, 20 workers): 62.78, 26.01, 63.60, 56.58, 67.10, 68.04, 68.02, 26.69, 66.81, 68.02, 68.14, 49.76, 36.64, 62.25, 31.23, 46.76, 52.42, 46.79, 61.31, 68.11 ms. Throughput: 402.18 iter/sec. Timings for 3200K FFT length (20 cores, 1 worker): 5.88 ms. Throughput: 169.94 iter/sec. Timings for 3200K FFT length (20 cores, 2 workers): 5.41, 6.12 ms. Throughput: 348.38 iter/sec. Timings for 3200K FFT length (20 cores, 6 workers): 14.67, 15.56, 13.00, 16.57, 14.94, 11.97 ms. Throughput: 420.22 iter/sec. Timings for 3200K FFT length (20 cores, 20 workers): 30.51, 46.17, 54.46, 39.68, 38.83, 54.05, 54.58, 54.56, 54.89, 46.56, 55.20, 54.06, 55.14, 55.12, 52.44, 54.26, 53.96, 54.72, 27.76, 28.53 ms. Throughput: 436.89 iter/sec. Timings for 3360K FFT length (20 cores, 1 worker): 3.65 ms. Throughput: 273.63 iter/sec. Timings for 3360K FFT length (20 cores, 2 workers): 5.39, 5.39 ms. Throughput: 370.87 iter/sec. Timings for 3360K FFT length (20 cores, 6 workers): 16.19, 15.84, 13.75, 14.42, 17.07, 14.15 ms. Throughput: 396.18 iter/sec. Timings for 3360K FFT length (20 cores, 20 workers): 31.55, 58.57, 58.84, 55.16, 58.61, 59.27, 58.73, 29.12, 54.68, 59.17, 58.91, 47.68, 58.45, 58.94, 34.64, 53.75, 51.15, 58.31, 31.31, 59.16 ms. Throughput: 409.43 iter/sec. Timings for 3456K FFT length (20 cores, 1 worker): 4.10 ms. Throughput: 244.11 iter/sec. [Wed Nov 29 14:51:09 2017] Timings for 3456K FFT length (20 cores, 2 workers): 5.92, 5.87 ms. Throughput: 339.44 iter/sec. Timings for 3456K FFT length (20 cores, 6 workers): 19.57, 17.84, 13.56, 16.80, 18.63, 14.50 ms. Throughput: 363.07 iter/sec. Timings for 3456K FFT length (20 cores, 20 workers): 65.46, 61.70, 63.60, 65.22, 66.48, 33.04, 52.40, 66.38, 34.28, 66.15, 66.54, 39.14, 64.78, 66.58, 64.80, 62.08, 64.60, 46.17, 65.94, 31.01 ms. Throughput: 373.41 iter/sec. Timings for 3584K FFT length (20 cores, 1 worker): 4.16 ms. Throughput: 240.15 iter/sec. Timings for 3584K FFT length (20 cores, 2 workers): 6.72, 6.71 ms. Throughput: 297.75 iter/sec. Timings for 3584K FFT length (20 cores, 6 workers): 20.76, 22.58, 14.98, 16.59, 24.33, 16.51 ms. Throughput: 321.14 iter/sec. Timings for 3584K FFT length (20 cores, 20 workers): 76.80, 75.28, 75.80, 80.42, 72.56, 81.19, 72.34, 33.61, 81.32, 32.69, 81.03, 80.52, 73.16, 77.12, 76.79, 33.11, 64.69, 79.31, 38.12, 75.66 ms. Throughput: 326.63 iter/sec. Timings for 3840K FFT length (20 cores, 1 worker): 5.41 ms. Throughput: 185.01 iter/sec. Timings for 3840K FFT length (20 cores, 2 workers): 6.59, 6.56 ms. Throughput: 304.39 iter/sec. Timings for 3840K FFT length (20 cores, 6 workers): 17.96, 20.20, 16.72, 15.19, 23.88, 17.07 ms. Throughput: 331.30 iter/sec. Timings for 3840K FFT length (20 cores, 20 workers): 71.68, 39.16, 53.96, 71.43, 70.50, 53.34, 71.57, 49.44, 71.82, 56.43, 68.67, 68.87, 40.21, 70.96, 71.39, 71.04, 40.10, 44.88, 71.68, 71.31 ms. Throughput: 342.12 iter/sec. Timings for 4032K FFT length (20 cores, 1 worker): 4.73 ms. Throughput: 211.54 iter/sec. Timings for 4032K FFT length (20 cores, 2 workers): 6.99, 6.98 ms. Throughput: 286.28 iter/sec. Timings for 4032K FFT length (20 cores, 6 workers): 16.60, 25.97, 18.07, 19.64, 19.20, 19.79 ms. Throughput: 307.60 iter/sec. Timings for 4032K FFT length (20 cores, 20 workers): 76.88, 79.40, 79.36, 36.92, 76.59, 60.98, 63.29, 47.68, 78.14, 78.19, 48.42, 57.99, 78.47, 61.37, 78.25, 62.23, 44.51, 79.94, 77.13, 78.79 ms. Throughput: 313.53 iter/sec. Timings for 4096K FFT length (20 cores, 1 worker): 5.18 ms. Throughput: 193.18 iter/sec. Timings for 4096K FFT length (20 cores, 2 workers): 7.31, 7.29 ms. Throughput: 274.03 iter/sec. Timings for 4096K FFT length (20 cores, 6 workers): 22.95, 20.14, 18.22, 22.82, 22.11, 16.91 ms. Throughput: 296.26 iter/sec. [Wed Nov 29 14:56:14 2017] Timings for 4096K FFT length (20 cores, 20 workers): 79.73, 79.14, 77.49, 78.53, 79.39, 39.13, 39.36, 79.21, 79.83, 79.68, 71.10, 66.34, 59.75, 70.98, 79.29, 55.76, 79.08, 40.79, 79.47, 78.83 ms. Throughput: 305.01 iter/sec. Timings for 4480K FFT length (20 cores, 1 worker): 5.38 ms. Throughput: 185.84 iter/sec. Timings for 4480K FFT length (20 cores, 2 workers): 7.49, 7.47 ms. Throughput: 267.44 iter/sec. Timings for 4480K FFT length (20 cores, 6 workers): 25.92, 18.35, 20.65, 20.54, 24.16, 19.28 ms. Throughput: 283.43 iter/sec. Timings for 4480K FFT length (20 cores, 20 workers): 41.13, 83.66, 41.92, 82.51, 83.57, 81.01, 83.30, 83.00, 83.33, 83.33, 48.19, 83.46, 83.62, 82.28, 82.34, 83.66, 83.56, 69.19, 74.60, 40.70 ms. Throughput: 289.94 iter/sec. Timings for 4608K FFT length (20 cores, 1 worker): 5.61 ms. Throughput: 178.17 iter/sec. Timings for 4608K FFT length (20 cores, 2 workers): 7.90, 7.89 ms. Throughput: 253.43 iter/sec. Timings for 4608K FFT length (20 cores, 6 workers): 25.14, 22.88, 19.34, 23.64, 25.61, 18.41 ms. Throughput: 270.86 iter/sec. Timings for 4608K FFT length (20 cores, 20 workers): 86.80, 86.17, 87.27, 86.26, 88.85, 42.12, 86.63, 79.47, 44.13, 88.85, 88.15, 87.34, 87.18, 42.15, 86.41, 88.15, 41.88, 86.70, 86.29, 86.48 ms. Throughput: 278.69 iter/sec. Timings for 4800K FFT length (20 cores, 1 worker): 5.66 ms. Throughput: 176.62 iter/sec. Timings for 4800K FFT length (20 cores, 2 workers): 8.23, 8.19 ms. Throughput: 243.52 iter/sec. Timings for 4800K FFT length (20 cores, 6 workers): 25.57, 26.91, 18.84, 23.79, 23.07, 22.78 ms. Throughput: 258.64 iter/sec. Timings for 4800K FFT length (20 cores, 20 workers): 93.11, 90.29, 91.82, 47.44, 59.41, 94.25, 50.78, 92.35, 92.56, 85.95, 59.33, 90.55, 94.93, 42.52, 90.34, 92.09, 91.99, 94.98, 90.94, 54.14 ms. Throughput: 268.94 iter/sec. Timings for 5120K FFT length (20 cores, 1 worker): 6.47 ms. Throughput: 154.51 iter/sec. Timings for 5120K FFT length (20 cores, 2 workers): 9.21, 9.16 ms. Throughput: 217.75 iter/sec. Timings for 5120K FFT length (20 cores, 6 workers): 30.00, 30.28, 20.17, 34.15, 22.34, 23.50 ms. Throughput: 232.52 iter/sec. Timings for 5120K FFT length (20 cores, 20 workers): 49.59, 101.60, 101.01, 100.88, 101.62, 99.63, 98.00, 100.16, 48.96, 101.52, 99.51, 91.20, 49.11, 101.38, 96.45, 101.02, 100.16, 100.72, 50.93, 101.14 ms. Throughput: 241.10 iter/sec. Timings for 5376K FFT length (20 cores, 1 worker): 6.53 ms. Throughput: 153.05 iter/sec. [Wed Nov 29 15:01:24 2017] Timings for 5376K FFT length (20 cores, 2 workers): 9.34, 9.31 ms. Throughput: 214.41 iter/sec. Timings for 5376K FFT length (20 cores, 6 workers): 25.42, 30.07, 24.28, 21.27, 34.93, 25.79 ms. Throughput: 228.21 iter/sec. Timings for 5376K FFT length (20 cores, 20 workers): 69.55, 77.28, 96.74, 104.01, 102.45, 58.99, 74.29, 102.78, 103.01, 94.24, 102.66, 101.69, 103.75, 71.43, 88.18, 57.02, 63.89, 103.12, 90.30, 103.89 ms. Throughput: 235.63 iter/sec. Timings for 5760K FFT length (20 cores, 1 worker): 7.09 ms. Throughput: 141.02 iter/sec. Timings for 5760K FFT length (20 cores, 2 workers): 9.66, 9.61 ms. Throughput: 207.63 iter/sec. Timings for 5760K FFT length (20 cores, 6 workers): 31.77, 30.04, 21.98, 32.62, 23.01, 27.46 ms. Throughput: 220.78 iter/sec. Timings for 5760K FFT length (20 cores, 20 workers): 63.67, 107.85, 107.65, 61.85, 106.82, 65.99, 100.42, 108.34, 108.39, 105.96, 106.88, 107.29, 107.29, 107.88, 86.25, 109.04, 107.83, 53.38, 109.05, 56.44 ms. Throughput: 225.73 iter/sec. Timings for 6144K FFT length (20 cores, 1 worker): 7.70 ms. Throughput: 129.90 iter/sec. Timings for 6144K FFT length (20 cores, 2 workers): 11.12, 11.12 ms. Throughput: 179.86 iter/sec. Timings for 6144K FFT length (20 cores, 6 workers): 28.95, 40.99, 26.71, 41.50, 35.84, 22.80 ms. Throughput: 192.23 iter/sec. Timings for 6144K FFT length (20 cores, 20 workers): 66.36, 125.75, 124.54, 83.76, 123.24, 123.94, 74.74, 124.56, 125.17, 100.65, 130.60, 125.03, 105.78, 123.31, 122.69, 65.99, 124.44, 130.60, 59.97, 110.87 ms. Throughput: 196.41 iter/sec. Timings for 6400K FFT length (20 cores, 1 worker): 7.74 ms. Throughput: 129.23 iter/sec. Timings for 6400K FFT length (20 cores, 2 workers): 11.50, 11.42 ms. Throughput: 174.47 iter/sec. Timings for 6400K FFT length (20 cores, 6 workers): 43.84, 33.20, 25.16, 34.96, 33.42, 28.68 ms. Throughput: 186.05 iter/sec. Timings for 6400K FFT length (20 cores, 20 workers): 67.83, 59.96, 129.29, 122.52, 133.75, 132.97, 126.52, 126.06, 133.96, 100.43, 127.68, 58.84, 135.45, 133.65, 130.27, 133.77, 135.44, 129.95, 128.17, 58.70 ms. Throughput: 190.33 iter/sec. Timings for 6720K FFT length (20 cores, 1 worker): 8.31 ms. Throughput: 120.40 iter/sec. Timings for 6720K FFT length (20 cores, 2 workers): 11.53, 11.37 ms. Throughput: 174.72 iter/sec. [Wed Nov 29 15:06:26 2017] Timings for 6720K FFT length (20 cores, 6 workers): 27.68, 43.63, 30.57, 42.24, 32.78, 25.91 ms. Throughput: 184.54 iter/sec. Timings for 6720K FFT length (20 cores, 20 workers): 129.37, 129.04, 115.29, 130.64, 68.38, 129.43, 119.89, 62.04, 128.76, 127.72, 129.64, 126.78, 127.53, 128.82, 76.40, 61.43, 128.18, 129.49, 101.97, 111.66 ms. Throughput: 189.07 iter/sec. Timings for 6912K FFT length (20 cores, 1 worker): 8.58 ms. Throughput: 116.49 iter/sec. Timings for 6912K FFT length (20 cores, 2 workers): 13.05, 12.98 ms. Throughput: 153.71 iter/sec. Timings for 6912K FFT length (20 cores, 6 workers): 35.86, 37.57, 37.19, 45.06, 46.26, 26.01 ms. Throughput: 163.65 iter/sec. Timings for 6912K FFT length (20 cores, 20 workers): 155.31, 65.91, 158.20, 158.95, 160.21, 158.13, 160.28, 158.85, 155.31, 66.22, 73.09, 152.40, 150.07, 152.12, 155.50, 123.06, 150.83, 75.56, 116.95, 157.62 ms. Throughput: 163.66 iter/sec. Timings for 7168K FFT length (20 cores, 1 worker): 8.95 ms. Throughput: 111.73 iter/sec. Timings for 7168K FFT length (20 cores, 2 workers): 13.20, 13.23 ms. Throughput: 151.34 iter/sec. Timings for 7168K FFT length (20 cores, 6 workers): 50.22, 34.11, 32.42, 37.41, 37.99, 36.57 ms. Throughput: 160.47 iter/sec. Timings for 7168K FFT length (20 cores, 20 workers): 69.52, 151.76, 151.72, 151.10, 153.70, 152.53, 151.18, 69.46, 153.64, 152.80, 149.58, 147.73, 144.40, 148.19, 149.92, 149.94, 68.87, 72.03, 140.11, 148.58 ms. Throughput: 164.05 iter/sec. Timings for 7680K FFT length (20 cores, 1 worker): 9.23 ms. Throughput: 108.37 iter/sec. Timings for 7680K FFT length (20 cores, 2 workers): 14.34, 14.34 ms. Throughput: 139.47 iter/sec. Timings for 7680K FFT length (20 cores, 6 workers): 50.29, 31.75, 43.11, 54.94, 33.58, 39.34 ms. Throughput: 147.98 iter/sec. Timings for 7680K FFT length (20 cores, 20 workers): 167.94, 71.75, 93.29, 109.12, 179.30, 179.36, 176.81, 171.65, 166.21, 176.98, 91.27, 102.03, 179.94, 163.99, 167.54, 168.75, 168.41, 163.89, 152.53, 81.45 ms. Throughput: 149.26 iter/sec. Timings for 8000K FFT length (20 cores, 1 worker): 9.82 ms. Throughput: 101.84 iter/sec. Timings for 8000K FFT length (20 cores, 2 workers): 14.04, 13.93 ms. Throughput: 143.02 iter/sec. Timings for 8000K FFT length (20 cores, 6 workers): 50.13, 49.67, 27.40, 49.86, 37.47, 34.30 ms. Throughput: 152.48 iter/sec. [Wed Nov 29 15:11:36 2017] Timings for 8000K FFT length (20 cores, 20 workers): 154.19, 72.59, 158.16, 163.38, 72.88, 155.24, 151.27, 148.87, 163.62, 159.39, 99.64, 155.69, 159.07, 98.74, 158.52, 163.94, 164.93, 164.83, 89.34, 113.56 ms. Throughput: 155.99 iter/sec. Timings for 8192K FFT length (20 cores, 1 worker): 10.50 ms. Throughput: 95.27 iter/sec. Timings for 8192K FFT length (20 cores, 2 workers): 15.86, 15.82 ms. Throughput: 126.27 iter/sec. Timings for 8192K FFT length (20 cores, 6 workers): 46.48, 43.33, 46.01, 60.91, 42.41, 37.49 ms. Throughput: 133.00 iter/sec. Timings for 8192K FFT length (20 cores, 20 workers): 184.80, 93.81, 182.43, 187.74, 145.93, 113.38, 189.01, 148.27, 187.70, 139.96, 187.41, 182.88, 189.76, 183.01, 187.30, 87.91, 166.80, 135.81, 96.36, 189.81 ms. Throughput: 134.32 iter/sec. </snip> |
|
![]() |
![]() |
![]() |
#8 | |
P90 years forever!
Aug 2002
Yeehaw, FL
161768 Posts |
![]() Quote:
Yes, it simply is a case of maximizing the throughput (iter/sec) value. Which in your case seems heavily skewed to one core per worker. I'd try benching the 5, 10, 20 worker case just to be sure (I previously suggested 6,12 because I thought you had a 24 worker case). Assuming the 20 worker benchmark maintains the best throughput, the only question remaining is "do you have the patience to wait for 20 workers to plod along at a slow pace before getting any results?". GIMPS is better off with 4 completed results after a week's time rather than 20 abandoned partially completed results in a week's time. |
|
![]() |
![]() |
![]() |
#9 | |
Feb 2014
22·3·7 Posts |
![]() Quote:
Just trying to figure out how to read the results output and decide which is best to do. |
|
![]() |
![]() |
![]() |
#10 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·34·37 Posts |
![]() Quote:
Last fiddled with by retina on 2017-11-30 at 00:54 |
|
![]() |
![]() |
![]() |
#11 | |
Feb 2014
22·3·7 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Change number of workers not interactively | Mcbsc | Software | 1 | 2015-03-09 14:25 |
Number of CPUs to use | houding | Software | 3 | 2015-02-26 19:56 |
Number of distinct prime factors of a Double Mersenne number | aketilander | Operazione Doppi Mersennes | 1 | 2012-11-09 21:16 |
command line switch for the number of workers to start | roemer2201 | Software | 6 | 2012-02-16 07:47 |
Fermat number F6=18446744073709551617 is a composite number. Proof. | literka | Factoring | 5 | 2012-01-30 12:28 |