mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-12-11, 18:16   #1
simon389
 
Aug 2013

3×29 Posts
Default 29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase

Great results using AVX512 instruction set from this link. Show a 15% improvement over stock benchmarks. System is a Win10 i7 9800X with 3600Mhz DDR4 19-20-20-40 XMP and a EVGA x299 Micro mobo.

Throughput:

Code:
Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
CPU speed: 3796.93 MHz, 8 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA, AVX512F
L1 cache size: 32 KB
L2 cache size: 1 MB, L3 cache size: 16896 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
 Machine#0 (total=63709080KB, Backend=Windows, hwlocVersion=2.0.1, ProcessName=prime95.exe)
  Package (total=63709080KB, CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=85, CPUModel="Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz", CPUStepping=4)
    L3 (size=16896KB, linesize=64, ways=11, Inclusive=0)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000003)
            PU#0 (cpuset: 0x00000001)
            PU#1 (cpuset: 0x00000002)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000000c)
            PU#2 (cpuset: 0x00000004)
            PU#3 (cpuset: 0x00000008)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000030)
            PU#4 (cpuset: 0x00000010)
            PU#5 (cpuset: 0x00000020)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x000000c0)
            PU#6 (cpuset: 0x00000040)
            PU#7 (cpuset: 0x00000080)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000300)
            PU#8 (cpuset: 0x00000100)
            PU#9 (cpuset: 0x00000200)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000c00)
            PU#10 (cpuset: 0x00000400)
            PU#11 (cpuset: 0x00000800)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00003000)
            PU#12 (cpuset: 0x00001000)
            PU#13 (cpuset: 0x00002000)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000c000)
            PU#14 (cpuset: 0x00004000)
            PU#15 (cpuset: 0x00008000)
Prime95 64-bit version 29.5, RdtscTiming=1
Timings for 2048K FFT length (8 cores, 1 worker):  0.73 ms.  Throughput: 1368.71 iter/sec.
Timings for 2048K FFT length (8 cores, 8 workers):  7.82,  7.76,  7.73,  7.16,  7.73,  7.77,  7.77,  7.24 ms.  Throughput: 1050.64 iter/sec.
Timings for 2100K FFT length (8 cores, 1 worker):  0.86 ms.  Throughput: 1163.83 iter/sec.
Timings for 2100K FFT length (8 cores, 8 workers):  8.01,  8.01,  8.08,  7.99,  8.01,  8.09,  8.01,  8.16 ms.  Throughput: 994.68 iter/sec.
Timings for 2160K FFT length (8 cores, 1 worker):  0.93 ms.  Throughput: 1079.86 iter/sec.
Timings for 2160K FFT length (8 cores, 8 workers):  8.31,  8.34,  8.32,  8.39,  8.54,  8.32,  8.50,  8.33 ms.  Throughput: 954.54 iter/sec.
Timings for 2240K FFT length (8 cores, 1 worker):  0.84 ms.  Throughput: 1193.63 iter/sec.
Timings for 2240K FFT length (8 cores, 8 workers):  8.61,  8.61,  8.55,  7.98,  8.55,  8.65,  8.52,  8.08 ms.  Throughput: 948.24 iter/sec.
Timings for 2304K FFT length (8 cores, 1 worker):  0.87 ms.  Throughput: 1143.86 iter/sec.
Timings for 2304K FFT length (8 cores, 8 workers):  9.00,  8.36,  8.96,  8.95,  8.96,  8.95,  8.95,  8.32 ms.  Throughput: 909.37 iter/sec.
Timings for 2400K FFT length (8 cores, 1 worker):  1.11 ms.  Throughput: 898.90 iter/sec.
Timings for 2400K FFT length (8 cores, 8 workers):  9.49,  9.36,  9.29,  9.28,  9.41,  9.40,  9.29,  9.58 ms.  Throughput: 852.29 iter/sec.
Timings for 2520K FFT length (8 cores, 1 worker):  1.05 ms.  Throughput: 952.29 iter/sec.
Timings for 2520K FFT length (8 cores, 8 workers):  9.83,  9.83,  9.84,  9.94,  9.83,  9.93,  9.82,  9.84 ms.  Throughput: 811.58 iter/sec.
Timings for 2560K FFT length (8 cores, 1 worker):  1.09 ms.  Throughput: 920.54 iter/sec.
Timings for 2560K FFT length (8 cores, 8 workers):  9.99,  9.88,  9.89,  9.97,  9.91,  9.88,  9.82,  9.90 ms.  Throughput: 807.69 iter/sec.
Timings for 2592K FFT length (8 cores, 1 worker):  1.08 ms.  Throughput: 925.90 iter/sec.
[Tue Dec 11 09:02:23 2018]
Timings for 2592K FFT length (8 cores, 8 workers): 10.06, 10.06, 10.06, 10.23, 10.05,  9.98, 10.21, 10.12 ms.  Throughput: 792.41 iter/sec.
Timings for 2688K FFT length (8 cores, 1 worker):  1.12 ms.  Throughput: 892.16 iter/sec.
Timings for 2688K FFT length (8 cores, 8 workers): 10.46, 10.46, 10.39, 10.39, 10.41, 10.31, 10.45, 10.53 ms.  Throughput: 767.46 iter/sec.
Timings for 2744K FFT length (8 cores, 1 worker):  1.08 ms.  Throughput: 922.34 iter/sec.
Timings for 2744K FFT length (8 cores, 8 workers): 10.68, 10.56, 10.54, 10.46, 10.58, 10.62, 10.44, 10.70 ms.  Throughput: 756.86 iter/sec.
Timings for 2800K FFT length (8 cores, 1 worker):  1.20 ms.  Throughput: 830.12 iter/sec.
Timings for 2800K FFT length (8 cores, 8 workers): 11.31, 11.11, 11.51, 11.17, 11.23, 11.19, 11.43, 11.11 ms.  Throughput: 710.67 iter/sec.
Timings for 2880K FFT length (8 cores, 1 worker):  1.06 ms.  Throughput: 946.90 iter/sec.
Timings for 2880K FFT length (8 cores, 8 workers): 11.12, 10.67, 11.13, 10.69, 11.14, 11.05, 11.06, 11.14 ms.  Throughput: 727.43 iter/sec.
Timings for 2940K FFT length (8 cores, 1 worker):  1.23 ms.  Throughput: 810.87 iter/sec.
Timings for 2940K FFT length (8 cores, 8 workers): 11.69, 11.99, 11.64, 11.78, 11.76, 11.87, 11.71, 11.74 ms.  Throughput: 679.54 iter/sec.
Timings for 3000K FFT length (8 cores, 1 worker):  1.38 ms.  Throughput: 724.27 iter/sec.
Timings for 3000K FFT length (8 cores, 8 workers): 12.04, 11.77, 11.73, 11.91, 11.87, 11.74, 11.77, 11.92 ms.  Throughput: 675.47 iter/sec.
Timings for 3072K FFT length (8 cores, 1 worker):  1.08 ms.  Throughput: 929.90 iter/sec.
Timings for 3072K FFT length (8 cores, 8 workers): 11.97, 11.92, 11.96, 11.90, 11.43, 11.90, 11.48, 11.96 ms.  Throughput: 677.41 iter/sec.
Timings for 3136K FFT length (8 cores, 1 worker):  1.22 ms.  Throughput: 821.07 iter/sec.
Timings for 3136K FFT length (8 cores, 8 workers): 12.89, 13.00, 12.14, 13.17, 13.10, 13.12, 13.10, 13.39 ms.  Throughput: 616.33 iter/sec.
Timings for 3200K FFT length (8 cores, 1 worker):  1.55 ms.  Throughput: 644.37 iter/sec.
Timings for 3200K FFT length (8 cores, 8 workers): 12.94, 12.90, 12.85, 12.89, 12.90, 12.80, 12.87, 12.84 ms.  Throughput: 621.46 iter/sec.
Timings for 3360K FFT length (8 cores, 1 worker):  1.25 ms.  Throughput: 797.97 iter/sec.
[Tue Dec 11 09:07:33 2018]
Timings for 3360K FFT length (8 cores, 8 workers): 13.38, 13.22, 13.31, 13.22, 12.70, 13.22, 13.38, 12.86 ms.  Throughput: 608.01 iter/sec.
Timings for 3456K FFT length (8 cores, 1 worker):  1.32 ms.  Throughput: 757.13 iter/sec.
Timings for 3456K FFT length (8 cores, 8 workers): 13.81, 13.26, 13.37, 13.76, 13.77, 13.81, 13.76, 13.91 ms.  Throughput: 584.89 iter/sec.
Timings for 3528K FFT length (8 cores, 1 worker):  1.48 ms.  Throughput: 673.91 iter/sec.
Timings for 3528K FFT length (8 cores, 8 workers): 14.41, 14.04, 14.25, 14.15, 14.19, 14.28, 14.05, 14.10 ms.  Throughput: 564.06 iter/sec.
Timings for 3600K FFT length (8 cores, 1 worker):  1.72 ms.  Throughput: 580.65 iter/sec.
Timings for 3600K FFT length (8 cores, 8 workers): 14.58, 14.97, 14.51, 14.69, 14.44, 14.54, 14.48, 14.69 ms.  Throughput: 547.51 iter/sec.
Timings for 3840K FFT length (8 cores, 1 worker):  1.83 ms.  Throughput: 547.43 iter/sec.
Timings for 3840K FFT length (8 cores, 8 workers): 15.45, 15.45, 15.46, 15.38, 15.37, 15.44, 15.46, 15.65 ms.  Throughput: 517.52 iter/sec.
Timings for 4032K FFT length (8 cores, 1 worker):  1.61 ms.  Throughput: 620.34 iter/sec.
Timings for 4032K FFT length (8 cores, 8 workers): 16.66, 16.42, 15.97, 15.99, 16.53, 16.47, 16.42, 16.66 ms.  Throughput: 488.19 iter/sec.
Timings for 4200K FFT length (8 cores, 1 worker):  1.89 ms.  Throughput: 529.11 iter/sec.
Timings for 4200K FFT length (8 cores, 8 workers): 16.99, 16.57, 16.60, 16.81, 16.75, 16.71, 16.80, 16.90 ms.  Throughput: 477.18 iter/sec.
Timings for 4320K FFT length (8 cores, 1 worker):  2.07 ms.  Throughput: 483.48 iter/sec.
Timings for 4320K FFT length (8 cores, 8 workers): 17.68, 17.42, 18.06, 17.43, 17.54, 17.57, 17.52, 17.52 ms.  Throughput: 454.74 iter/sec.
Timings for 4480K FFT length (8 cores, 1 worker):  2.11 ms.  Throughput: 473.51 iter/sec.
Timings for 4480K FFT length (8 cores, 8 workers): 18.14, 18.16, 18.20, 18.01, 18.10, 18.16, 18.02, 18.25 ms.  Throughput: 441.25 iter/sec.
Timings for 4608K FFT length (8 cores, 1 worker):  1.96 ms.  Throughput: 510.11 iter/sec.
Timings for 4608K FFT length (8 cores, 8 workers): 18.88, 19.71, 19.93, 19.71, 19.71, 19.92, 19.96, 19.31 ms.  Throughput: 407.45 iter/sec.
Timings for 4704K FFT length (8 cores, 1 worker):  1.99 ms.  Throughput: 501.69 iter/sec.
[Tue Dec 11 09:12:47 2018]
Timings for 4704K FFT length (8 cores, 8 workers): 19.88, 20.07, 19.45, 20.10, 20.23, 20.07, 20.10, 20.37 ms.  Throughput: 399.39 iter/sec.
Timings for 4800K FFT length (8 cores, 1 worker):  2.54 ms.  Throughput: 393.58 iter/sec.
Timings for 4800K FFT length (8 cores, 8 workers): 21.34, 21.35, 21.40, 21.20, 21.33, 21.03, 21.30, 21.28 ms.  Throughput: 375.96 iter/sec.
Timings for 5040K FFT length (8 cores, 1 worker):  2.12 ms.  Throughput: 471.25 iter/sec.
Timings for 5040K FFT length (8 cores, 8 workers): 19.58, 19.65, 19.39, 19.50, 19.66, 19.41, 19.51, 19.77 ms.  Throughput: 409.02 iter/sec.
Timings for 5120K FFT length (8 cores, 1 worker):  2.26 ms.  Throughput: 442.98 iter/sec.
Timings for 5120K FFT length (8 cores, 8 workers): 20.07, 20.10, 20.03, 20.16, 20.23, 20.12, 20.09, 20.37 ms.  Throughput: 397.09 iter/sec.
Timings for 5184K FFT length (8 cores, 1 worker):  2.46 ms.  Throughput: 407.15 iter/sec.
Timings for 5184K FFT length (8 cores, 8 workers): 21.61, 21.23, 21.05, 21.20, 21.22, 21.21, 21.31, 21.14 ms.  Throughput: 376.53 iter/sec.
Timings for 5376K FFT length (8 cores, 1 worker):  2.34 ms.  Throughput: 426.51 iter/sec.
Timings for 5376K FFT length (8 cores, 8 workers): 21.35, 21.18, 21.12, 21.16, 21.03, 21.16, 21.14, 21.18 ms.  Throughput: 378.02 iter/sec.
Timings for 5760K FFT length (8 cores, 1 worker):  3.04 ms.  Throughput: 328.89 iter/sec.
Timings for 5760K FFT length (8 cores, 8 workers): 25.89, 25.89, 25.33, 26.24, 25.57, 25.65, 25.34, 25.45 ms.  Throughput: 311.65 iter/sec.
Timings for 6048K FFT length (8 cores, 1 worker):  2.70 ms.  Throughput: 370.33 iter/sec.
Timings for 6048K FFT length (8 cores, 8 workers): 24.07, 23.91, 23.74, 23.80, 24.10, 23.89, 23.79, 23.75 ms.  Throughput: 334.99 iter/sec.
Timings for 6144K FFT length (8 cores, 1 worker):  2.78 ms.  Throughput: 360.05 iter/sec.
Timings for 6144K FFT length (8 cores, 8 workers): 24.60, 24.37, 24.10, 24.35, 24.51, 24.10, 24.34, 24.65 ms.  Throughput: 328.19 iter/sec.
Timings for 6272K FFT length (8 cores, 1 worker):  2.81 ms.  Throughput: 356.19 iter/sec.
Timings for 6272K FFT length (8 cores, 8 workers): 25.14, 24.75, 24.55, 24.79, 24.79, 24.82, 24.63, 24.63 ms.  Throughput: 323.07 iter/sec.
Timings for 6400K FFT length (8 cores, 1 worker):  2.94 ms.  Throughput: 339.68 iter/sec.
[Tue Dec 11 09:18:02 2018]
Timings for 6400K FFT length (8 cores, 8 workers): 26.10, 25.50, 25.40, 25.67, 25.86, 25.60, 25.62, 25.84 ms.  Throughput: 311.33 iter/sec.
Timings for 6720K FFT length (8 cores, 1 worker):  3.10 ms.  Throughput: 322.12 iter/sec.
Timings for 6720K FFT length (8 cores, 8 workers): 27.45, 27.08, 26.78, 26.99, 27.03, 27.03, 26.87, 27.23 ms.  Throughput: 295.68 iter/sec.
Timings for 7056K FFT length (8 cores, 1 worker):  3.25 ms.  Throughput: 307.78 iter/sec.
Timings for 7056K FFT length (8 cores, 8 workers): 28.96, 28.43, 28.47, 28.30, 28.48, 28.09, 28.09, 28.51 ms.  Throughput: 281.54 iter/sec.
Timings for 7168K FFT length (8 cores, 1 worker):  3.30 ms.  Throughput: 303.35 iter/sec.
Timings for 7168K FFT length (8 cores, 8 workers): 28.69, 28.57, 28.55, 28.27, 28.56, 28.51, 28.23, 28.67 ms.  Throughput: 280.66 iter/sec.
Timings for 7200K FFT length (8 cores, 1 worker):  3.26 ms.  Throughput: 307.10 iter/sec.
Timings for 7200K FFT length (8 cores, 8 workers): 28.65, 28.16, 28.30, 27.88, 27.88, 28.27, 28.30, 28.49 ms.  Throughput: 283.28 iter/sec.
Timings for 7680K FFT length (8 cores, 1 worker):  3.52 ms.  Throughput: 283.95 iter/sec.
Timings for 7680K FFT length (8 cores, 8 workers): 30.58, 30.44, 30.51, 30.48, 29.96, 29.90, 30.47, 30.76 ms.  Throughput: 263.29 iter/sec.
Timings for 8064K FFT length (8 cores, 1 worker):  3.91 ms.  Throughput: 256.04 iter/sec.
Timings for 8064K FFT length (8 cores, 8 workers): 33.84, 33.26, 33.58, 33.39, 33.27, 33.16, 33.07, 33.49 ms.  Throughput: 239.65 iter/sec.
FFT trials:

Code:
Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
CPU speed: 3799.90 MHz, 8 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA, AVX512F
L1 cache size: 32 KB
L2 cache size: 1 MB, L3 cache size: 16896 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
 Machine#0 (total=63709080KB, Backend=Windows, hwlocVersion=2.0.1, ProcessName=prime95.exe)
  Package (total=63709080KB, CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=85, CPUModel="Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz", CPUStepping=4)
    L3 (size=16896KB, linesize=64, ways=11, Inclusive=0)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000003)
            PU#0 (cpuset: 0x00000001)
            PU#1 (cpuset: 0x00000002)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000000c)
            PU#2 (cpuset: 0x00000004)
            PU#3 (cpuset: 0x00000008)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000030)
            PU#4 (cpuset: 0x00000010)
            PU#5 (cpuset: 0x00000020)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x000000c0)
            PU#6 (cpuset: 0x00000040)
            PU#7 (cpuset: 0x00000080)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000300)
            PU#8 (cpuset: 0x00000100)
            PU#9 (cpuset: 0x00000200)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000c00)
            PU#10 (cpuset: 0x00000400)
            PU#11 (cpuset: 0x00000800)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00003000)
            PU#12 (cpuset: 0x00001000)
            PU#13 (cpuset: 0x00002000)
      L2 (size=1024KB, linesize=64, ways=16, Inclusive=0)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000c000)
            PU#14 (cpuset: 0x00004000)
            PU#15 (cpuset: 0x00008000)
Prime95 64-bit version 29.5, RdtscTiming=1
Best time for 2048K FFT length: 4.252 ms., avg: 4.281 ms.
Best time for 2100K FFT length: 4.423 ms., avg: 4.453 ms.
Best time for 2160K FFT length: 4.623 ms., avg: 4.645 ms.
Best time for 2240K FFT length: 5.027 ms., avg: 5.054 ms.
Best time for 2304K FFT length: 5.247 ms., avg: 5.289 ms.
Best time for 2400K FFT length: 5.275 ms., avg: 5.325 ms.
Best time for 2520K FFT length: 5.611 ms., avg: 5.638 ms.
Best time for 2560K FFT length: 5.732 ms., avg: 5.761 ms.
Best time for 2592K FFT length: 5.781 ms., avg: 5.831 ms.
Best time for 2688K FFT length: 6.017 ms., avg: 6.062 ms.
Best time for 2744K FFT length: 6.208 ms., avg: 6.238 ms.
Best time for 2800K FFT length: 6.491 ms., avg: 6.546 ms.
Best time for 2880K FFT length: 6.922 ms., avg: 6.959 ms.
Best time for 2940K FFT length: 6.780 ms., avg: 6.811 ms.
Best time for 3000K FFT length: 6.935 ms., avg: 6.998 ms.
Best time for 3072K FFT length: 6.926 ms., avg: 6.972 ms.
Best time for 3136K FFT length: 7.364 ms., avg: 7.426 ms.
Best time for 3200K FFT length: 7.607 ms., avg: 7.656 ms.
Best time for 3360K FFT length: 8.217 ms., avg: 8.268 ms.
Best time for 3456K FFT length: 8.631 ms., avg: 8.686 ms.
Best time for 3528K FFT length: 8.316 ms., avg: 8.394 ms.
Best time for 3600K FFT length: 8.676 ms., avg: 8.722 ms.
Best time for 3840K FFT length: 9.489 ms., avg: 9.536 ms.
Best time for 4032K FFT length: 10.175 ms., avg: 10.259 ms.
Best time for 4200K FFT length: 10.633 ms., avg: 10.710 ms.
Best time for 4320K FFT length: 10.815 ms., avg: 10.916 ms.
Best time for 4480K FFT length: 11.428 ms., avg: 11.520 ms.
Best time for 4608K FFT length: 11.827 ms., avg: 11.898 ms.
Best time for 4704K FFT length: 12.055 ms., avg: 12.115 ms.
Best time for 4800K FFT length: 13.173 ms., avg: 13.282 ms.
Best time for 5040K FFT length: 12.565 ms., avg: 12.626 ms.
Best time for 5120K FFT length: 12.856 ms., avg: 12.937 ms.
Best time for 5184K FFT length: 13.680 ms., avg: 13.743 ms.
Best time for 5376K FFT length: 13.495 ms., avg: 13.544 ms.
Best time for 5760K FFT length: 16.107 ms., avg: 16.170 ms.
Best time for 6048K FFT length: 15.438 ms., avg: 15.514 ms.
Best time for 6144K FFT length: 15.451 ms., avg: 15.547 ms.
Best time for 6272K FFT length: 16.114 ms., avg: 16.167 ms.
Best time for 6400K FFT length: 16.478 ms., avg: 16.524 ms.
Best time for 6720K FFT length: 17.346 ms., avg: 17.428 ms.
Best time for 7056K FFT length: 18.173 ms., avg: 18.240 ms.
Best time for 7168K FFT length: 18.389 ms., avg: 18.466 ms.
Best time for 7200K FFT length: 18.565 ms., avg: 18.661 ms.
Best time for 7680K FFT length: 19.820 ms., avg: 19.919 ms.
Best time for 8064K FFT length: 21.247 ms., avg: 21.336 ms.
Timing FFTs using 2 threads on 2 cores.
Best time for 2048K FFT length: 2.319 ms., avg: 2.342 ms.
Best time for 2100K FFT length: 2.573 ms., avg: 2.590 ms.
Best time for 2160K FFT length: 2.605 ms., avg: 2.631 ms.
Best time for 2240K FFT length: 2.758 ms., avg: 2.782 ms.
Best time for 2304K FFT length: 2.879 ms., avg: 2.908 ms.
Best time for 2400K FFT length: 3.049 ms., avg: 3.075 ms.
Best time for 2520K FFT length: 3.169 ms., avg: 3.203 ms.
Best time for 2560K FFT length: 3.165 ms., avg: 3.181 ms.
Best time for 2592K FFT length: 3.221 ms., avg: 3.242 ms.
Best time for 2688K FFT length: 3.326 ms., avg: 3.350 ms.
Best time for 2744K FFT length: 3.394 ms., avg: 3.412 ms.
Best time for 2800K FFT length: 3.575 ms., avg: 3.603 ms.
Best time for 2880K FFT length: 3.685 ms., avg: 3.708 ms.
Best time for 2940K FFT length: 3.756 ms., avg: 3.781 ms.
Best time for 3000K FFT length: 3.942 ms., avg: 3.971 ms.
Best time for 3072K FFT length: 3.657 ms., avg: 3.680 ms.
Best time for 3136K FFT length: 4.034 ms., avg: 4.059 ms.
Best time for 3200K FFT length: 4.314 ms., avg: 4.343 ms.
Best time for 3360K FFT length: 4.360 ms., avg: 4.391 ms.
Best time for 3456K FFT length: 4.542 ms., avg: 4.569 ms.
Best time for 3528K FFT length: 4.553 ms., avg: 4.576 ms.
Best time for 3600K FFT length: 4.898 ms., avg: 4.935 ms.
Best time for 3840K FFT length: 5.330 ms., avg: 5.372 ms.
Best time for 4032K FFT length: 5.384 ms., avg: 5.403 ms.
Best time for 4200K FFT length: 5.798 ms., avg: 5.840 ms.
Best time for 4320K FFT length: 5.966 ms., avg: 6.023 ms.
Best time for 4480K FFT length: 6.322 ms., avg: 6.367 ms.
Best time for 4608K FFT length: 6.233 ms., avg: 6.265 ms.
Best time for 4704K FFT length: 6.374 ms., avg: 6.411 ms.
Best time for 4800K FFT length: 7.349 ms., avg: 7.403 ms.
Best time for 5040K FFT length: 6.634 ms., avg: 6.662 ms.
Best time for 5120K FFT length: 6.821 ms., avg: 6.871 ms.
Best time for 5184K FFT length: 7.449 ms., avg: 7.491 ms.
Best time for 5376K FFT length: 7.113 ms., avg: 7.151 ms.
Best time for 5760K FFT length: 8.820 ms., avg: 8.887 ms.
Best time for 6048K FFT length: 8.085 ms., avg: 8.132 ms.
Best time for 6144K FFT length: 8.135 ms., avg: 8.179 ms.
Best time for 6272K FFT length: 8.417 ms., avg: 8.457 ms.
Best time for 6400K FFT length: 8.595 ms., avg: 8.641 ms.
Best time for 6720K FFT length: 9.136 ms., avg: 9.177 ms.
Best time for 7056K FFT length: 9.523 ms., avg: 9.583 ms.
Best time for 7168K FFT length: 9.657 ms., avg: 9.693 ms.
Best time for 7200K FFT length: 9.711 ms., avg: 9.768 ms.
Best time for 7680K FFT length: 10.254 ms., avg: 10.321 ms.
Best time for 8064K FFT length: 11.154 ms., avg: 11.222 ms.
Timing FFTs using 3 threads on 3 cores.
Best time for 2048K FFT length: 1.599 ms., avg: 1.625 ms.
Best time for 2100K FFT length: 1.833 ms., avg: 1.845 ms.
Best time for 2160K FFT length: 1.853 ms., avg: 1.875 ms.
Best time for 2240K FFT length: 1.912 ms., avg: 1.927 ms.
Best time for 2304K FFT length: 1.983 ms., avg: 2.001 ms.
Best time for 2400K FFT length: 2.181 ms., avg: 2.208 ms.
Best time for 2520K FFT length: 2.246 ms., avg: 2.273 ms.
Best time for 2560K FFT length: 2.215 ms., avg: 2.232 ms.
Best time for 2592K FFT length: 2.255 ms., avg: 2.277 ms.
Best time for 2688K FFT length: 2.312 ms., avg: 2.330 ms.
Best time for 2744K FFT length: 2.424 ms., avg: 2.447 ms.
Best time for 2800K FFT length: 2.502 ms., avg: 2.522 ms.
Best time for 2880K FFT length: 2.533 ms., avg: 2.559 ms.
Best time for 2940K FFT length: 2.622 ms., avg: 2.641 ms.
Best time for 3000K FFT length: 2.789 ms., avg: 2.807 ms.
Best time for 3072K FFT length: 2.471 ms., avg: 2.493 ms.
Best time for 3136K FFT length: 2.767 ms., avg: 2.783 ms.
Best time for 3200K FFT length: 3.026 ms., avg: 3.059 ms.
Best time for 3360K FFT length: 2.939 ms., avg: 2.968 ms.
Best time for 3456K FFT length: 3.080 ms., avg: 3.098 ms.
Best time for 3528K FFT length: 3.148 ms., avg: 3.174 ms.
Best time for 3600K FFT length: 3.465 ms., avg: 3.507 ms.
Best time for 3840K FFT length: 3.706 ms., avg: 3.761 ms.
Best time for 4032K FFT length: 3.677 ms., avg: 3.708 ms.
Best time for 4200K FFT length: 4.141 ms., avg: 4.173 ms.
Best time for 4320K FFT length: 4.210 ms., avg: 4.244 ms.
Best time for 4480K FFT length: 4.381 ms., avg: 4.414 ms.
Best time for 4608K FFT length: 4.234 ms., avg: 4.264 ms.
Best time for 4704K FFT length: 4.361 ms., avg: 4.405 ms.
Best time for 4800K FFT length: 5.262 ms., avg: 5.306 ms.
Best time for 5040K FFT length: 4.633 ms., avg: 4.666 ms.
Best time for 5120K FFT length: 4.667 ms., avg: 4.686 ms.
Best time for 5184K FFT length: 5.189 ms., avg: 5.225 ms.
Best time for 5376K FFT length: 4.856 ms., avg: 4.892 ms.
Best time for 5760K FFT length: 6.458 ms., avg: 6.498 ms.
Best time for 6048K FFT length: 5.539 ms., avg: 5.579 ms.
Best time for 6144K FFT length: 5.548 ms., avg: 5.586 ms.
Best time for 6272K FFT length: 5.937 ms., avg: 5.994 ms.
Best time for 6400K FFT length: 5.806 ms., avg: 5.850 ms.
Best time for 6720K FFT length: 6.239 ms., avg: 6.275 ms.
Best time for 7056K FFT length: 6.545 ms., avg: 6.584 ms.
Best time for 7168K FFT length: 6.557 ms., avg: 6.605 ms.
Best time for 7200K FFT length: 6.671 ms., avg: 6.725 ms.
Best time for 7680K FFT length: 6.965 ms., avg: 6.997 ms.
Best time for 8064K FFT length: 7.655 ms., avg: 7.688 ms.
Timing FFTs using 4 threads on 4 cores.
Best time for 2048K FFT length: 1.211 ms., avg: 1.236 ms.
Best time for 2100K FFT length: 1.492 ms., avg: 1.509 ms.
Best time for 2160K FFT length: 1.504 ms., avg: 1.536 ms.
Best time for 2240K FFT length: 1.448 ms., avg: 1.463 ms.
Best time for 2304K FFT length: 1.515 ms., avg: 1.545 ms.
Best time for 2400K FFT length: 1.681 ms., avg: 1.717 ms.
Best time for 2520K FFT length: 1.756 ms., avg: 1.774 ms.
Best time for 2560K FFT length: 1.776 ms., avg: 1.792 ms.
Best time for 2592K FFT length: 1.805 ms., avg: 1.833 ms.
Best time for 2688K FFT length: 1.853 ms., avg: 1.872 ms.
Best time for 2744K FFT length: 1.875 ms., avg: 1.902 ms.
Best time for 2800K FFT length: 2.014 ms., avg: 2.037 ms.
Best time for 2880K FFT length: 1.946 ms., avg: 1.963 ms.
Best time for 2940K FFT length: 2.118 ms., avg: 2.133 ms.
Best time for 3000K FFT length: 2.162 ms., avg: 2.184 ms.
Best time for 3072K FFT length: 1.889 ms., avg: 1.901 ms.
Best time for 3136K FFT length: 2.120 ms., avg: 2.150 ms.
Best time for 3200K FFT length: 2.380 ms., avg: 2.411 ms.
Best time for 3360K FFT length: 2.292 ms., avg: 2.306 ms.
Best time for 3456K FFT length: 2.347 ms., avg: 2.366 ms.
Best time for 3528K FFT length: 2.531 ms., avg: 2.559 ms.
Best time for 3600K FFT length: 2.682 ms., avg: 2.702 ms.
Best time for 3840K FFT length: 2.971 ms., avg: 2.999 ms.
Best time for 4032K FFT length: 2.806 ms., avg: 2.843 ms.
Best time for 4200K FFT length: 3.154 ms., avg: 3.189 ms.
Best time for 4320K FFT length: 3.337 ms., avg: 3.361 ms.
Best time for 4480K FFT length: 3.511 ms., avg: 3.560 ms.
Best time for 4608K FFT length: 3.259 ms., avg: 3.279 ms.
Best time for 4704K FFT length: 3.388 ms., avg: 3.424 ms.
Best time for 4800K FFT length: 4.158 ms., avg: 4.210 ms.
Best time for 5040K FFT length: 3.525 ms., avg: 3.554 ms.
Best time for 5120K FFT length: 3.677 ms., avg: 3.704 ms.
Best time for 5184K FFT length: 4.155 ms., avg: 4.172 ms.
Best time for 5376K FFT length: 3.850 ms., avg: 3.874 ms.
Best time for 5760K FFT length: 4.986 ms., avg: 5.024 ms.
Best time for 6048K FFT length: 4.316 ms., avg: 4.345 ms.
Best time for 6144K FFT length: 4.389 ms., avg: 4.429 ms.
Best time for 6272K FFT length: 4.531 ms., avg: 4.559 ms.
Best time for 6400K FFT length: 4.556 ms., avg: 4.590 ms.
Best time for 6720K FFT length: 4.996 ms., avg: 5.029 ms.
Best time for 7056K FFT length: 5.162 ms., avg: 5.195 ms.
Best time for 7168K FFT length: 5.213 ms., avg: 5.254 ms.
Best time for 7200K FFT length: 5.181 ms., avg: 5.212 ms.
Best time for 7680K FFT length: 5.473 ms., avg: 5.501 ms.
Best time for 8064K FFT length: 6.076 ms., avg: 6.148 ms.
Timing FFTs using 5 threads on 5 cores.
Best time for 2048K FFT length: 1.022 ms., avg: 1.035 ms.
Best time for 2100K FFT length: 1.248 ms., avg: 1.272 ms.
Best time for 2160K FFT length: 1.272 ms., avg: 1.314 ms.
Best time for 2240K FFT length: 1.183 ms., avg: 1.210 ms.
Best time for 2304K FFT length: 1.267 ms., avg: 1.279 ms.
Best time for 2400K FFT length: 1.410 ms., avg: 1.434 ms.
Best time for 2520K FFT length: 1.468 ms., avg: 1.487 ms.
Best time for 2560K FFT length: 1.512 ms., avg: 1.539 ms.
Best time for 2592K FFT length: 1.502 ms., avg: 1.513 ms.
Best time for 2688K FFT length: 1.573 ms., avg: 1.584 ms.
Best time for 2744K FFT length: 1.547 ms., avg: 1.563 ms.
Best time for 2800K FFT length: 1.714 ms., avg: 1.735 ms.
Best time for 2880K FFT length: 1.604 ms., avg: 1.624 ms.
Best time for 2940K FFT length: 1.758 ms., avg: 1.781 ms.
Best time for 3000K FFT length: 1.860 ms., avg: 1.895 ms.
Best time for 3072K FFT length: 1.538 ms., avg: 1.558 ms.
Best time for 3136K FFT length: 1.742 ms., avg: 1.771 ms.
Best time for 3200K FFT length: 2.016 ms., avg: 2.050 ms.
Best time for 3360K FFT length: 1.877 ms., avg: 1.904 ms.
Best time for 3456K FFT length: 1.932 ms., avg: 1.952 ms.
Best time for 3528K FFT length: 2.091 ms., avg: 2.115 ms.
Best time for 3600K FFT length: 2.246 ms., avg: 2.259 ms.
Best time for 3840K FFT length: 2.537 ms., avg: 2.574 ms.
Best time for 4032K FFT length: 2.334 ms., avg: 2.367 ms.
Best time for 4200K FFT length: 2.621 ms., avg: 2.652 ms.
Best time for 4320K FFT length: 2.825 ms., avg: 2.848 ms.
Best time for 4480K FFT length: 2.945 ms., avg: 2.986 ms.
Best time for 4608K FFT length: 2.703 ms., avg: 2.729 ms.
Best time for 4704K FFT length: 2.786 ms., avg: 2.821 ms.
Best time for 4800K FFT length: 3.507 ms., avg: 3.551 ms.
Best time for 5040K FFT length: 2.918 ms., avg: 2.943 ms.
Best time for 5120K FFT length: 3.090 ms., avg: 3.112 ms.
Best time for 5184K FFT length: 3.463 ms., avg: 3.485 ms.
Best time for 5376K FFT length: 3.223 ms., avg: 3.254 ms.
Best time for 5760K FFT length: 4.236 ms., avg: 4.275 ms.
Best time for 6048K FFT length: 3.617 ms., avg: 3.655 ms.
Best time for 6144K FFT length: 3.683 ms., avg: 3.719 ms.
Best time for 6272K FFT length: 3.676 ms., avg: 3.711 ms.
Best time for 6400K FFT length: 3.765 ms., avg: 3.780 ms.
Best time for 6720K FFT length: 4.195 ms., avg: 4.234 ms.
Best time for 7056K FFT length: 4.275 ms., avg: 4.303 ms.
Best time for 7168K FFT length: 4.333 ms., avg: 4.376 ms.
Best time for 7200K FFT length: 4.288 ms., avg: 4.322 ms.
Best time for 7680K FFT length: 4.585 ms., avg: 4.620 ms.
Best time for 8064K FFT length: 5.016 ms., avg: 5.057 ms.
Timing FFTs using 6 threads on 6 cores.
Best time for 2048K FFT length: 0.886 ms., avg: 0.894 ms.
Best time for 2100K FFT length: 1.099 ms., avg: 1.123 ms.
Best time for 2160K FFT length: 1.113 ms., avg: 1.130 ms.
Best time for 2240K FFT length: 1.040 ms., avg: 1.057 ms.
Best time for 2304K FFT length: 1.086 ms., avg: 1.098 ms.
Best time for 2400K FFT length: 1.216 ms., avg: 1.243 ms.
Best time for 2520K FFT length: 1.242 ms., avg: 1.265 ms.
Best time for 2560K FFT length: 1.291 ms., avg: 1.304 ms.
Best time for 2592K FFT length: 1.312 ms., avg: 1.340 ms.
Best time for 2688K FFT length: 1.344 ms., avg: 1.383 ms.
Best time for 2744K FFT length: 1.343 ms., avg: 1.358 ms.
Best time for 2800K FFT length: 1.482 ms., avg: 1.492 ms.
Best time for 2880K FFT length: 1.340 ms., avg: 1.362 ms.
Best time for 2940K FFT length: 1.514 ms., avg: 1.549 ms.
Best time for 3000K FFT length: 1.637 ms., avg: 1.669 ms.
Best time for 3072K FFT length: 1.300 ms., avg: 1.324 ms.
Best time for 3136K FFT length: 1.499 ms., avg: 1.525 ms.
Best time for 3200K FFT length: 1.738 ms., avg: 1.761 ms.
Best time for 3360K FFT length: 1.576 ms., avg: 1.600 ms.
Best time for 3456K FFT length: 1.631 ms., avg: 1.656 ms.
Best time for 3528K FFT length: 1.813 ms., avg: 1.838 ms.
Best time for 3600K FFT length: 1.931 ms., avg: 1.946 ms.
Best time for 3840K FFT length: 2.201 ms., avg: 2.231 ms.
Best time for 4032K FFT length: 1.999 ms., avg: 2.010 ms.
Best time for 4200K FFT length: 2.220 ms., avg: 2.244 ms.
Best time for 4320K FFT length: 2.417 ms., avg: 2.432 ms.
Best time for 4480K FFT length: 2.545 ms., avg: 2.566 ms.
Best time for 4608K FFT length: 2.339 ms., avg: 2.355 ms.
Best time for 4704K FFT length: 2.409 ms., avg: 2.431 ms.
Best time for 4800K FFT length: 3.044 ms., avg: 3.087 ms.
Best time for 5040K FFT length: 2.455 ms., avg: 2.491 ms.
Best time for 5120K FFT length: 2.667 ms., avg: 2.745 ms.
Best time for 5184K FFT length: 2.951 ms., avg: 2.977 ms.
Best time for 5376K FFT length: 2.775 ms., avg: 2.809 ms.
Best time for 5760K FFT length: 3.605 ms., avg: 3.641 ms.
Best time for 6048K FFT length: 3.079 ms., avg: 3.110 ms.
Best time for 6144K FFT length: 3.217 ms., avg: 3.249 ms.
Best time for 6272K FFT length: 3.202 ms., avg: 3.223 ms.
Best time for 6400K FFT length: 3.300 ms., avg: 3.321 ms.
Best time for 6720K FFT length: 3.555 ms., avg: 3.582 ms.
Best time for 7056K FFT length: 3.674 ms., avg: 3.712 ms.
Best time for 7168K FFT length: 3.819 ms., avg: 3.850 ms.
Best time for 7200K FFT length: 3.745 ms., avg: 3.764 ms.
Best time for 7680K FFT length: 4.067 ms., avg: 4.093 ms.
Best time for 8064K FFT length: 4.417 ms., avg: 4.465 ms.
Timing FFTs using 7 threads on 7 cores.
Best time for 2048K FFT length: 0.791 ms., avg: 0.799 ms.
Best time for 2100K FFT length: 0.997 ms., avg: 1.020 ms.
Best time for 2160K FFT length: 1.000 ms., avg: 1.024 ms.
Best time for 2240K FFT length: 0.916 ms., avg: 0.942 ms.
Best time for 2304K FFT length: 0.976 ms., avg: 0.988 ms.
Best time for 2400K FFT length: 1.097 ms., avg: 1.131 ms.
Best time for 2520K FFT length: 1.112 ms., avg: 1.135 ms.
Best time for 2560K FFT length: 1.151 ms., avg: 1.161 ms.
Best time for 2592K FFT length: 1.148 ms., avg: 1.163 ms.
Best time for 2688K FFT length: 1.194 ms., avg: 1.207 ms.
Best time for 2744K FFT length: 1.205 ms., avg: 1.216 ms.
Best time for 2800K FFT length: 1.302 ms., avg: 1.320 ms.
Best time for 2880K FFT length: 1.185 ms., avg: 1.209 ms.
Best time for 2940K FFT length: 1.350 ms., avg: 1.377 ms.
Best time for 3000K FFT length: 1.433 ms., avg: 1.473 ms.
Best time for 3072K FFT length: 1.140 ms., avg: 1.170 ms.
Best time for 3136K FFT length: 1.347 ms., avg: 1.370 ms.
Best time for 3200K FFT length: 1.588 ms., avg: 1.609 ms.
Best time for 3360K FFT length: 1.380 ms., avg: 1.406 ms.
Best time for 3456K FFT length: 1.454 ms., avg: 1.467 ms.
Best time for 3528K FFT length: 1.581 ms., avg: 1.612 ms.
Best time for 3600K FFT length: 1.744 ms., avg: 1.759 ms.
Best time for 3840K FFT length: 1.928 ms., avg: 1.946 ms.
Best time for 4032K FFT length: 1.774 ms., avg: 1.790 ms.
Best time for 4200K FFT length: 1.962 ms., avg: 2.009 ms.
Best time for 4320K FFT length: 2.191 ms., avg: 2.210 ms.
Best time for 4480K FFT length: 2.239 ms., avg: 2.273 ms.
Best time for 4608K FFT length: 2.079 ms., avg: 2.099 ms.
Best time for 4704K FFT length: 2.151 ms., avg: 2.164 ms.
Best time for 4800K FFT length: 2.751 ms., avg: 2.797 ms.
Best time for 5040K FFT length: 2.235 ms., avg: 2.259 ms.
Best time for 5120K FFT length: 2.358 ms., avg: 2.388 ms.
Best time for 5184K FFT length: 2.667 ms., avg: 2.694 ms.
Best time for 5376K FFT length: 2.495 ms., avg: 2.515 ms.
Best time for 5760K FFT length: 3.306 ms., avg: 3.351 ms.
Best time for 6048K FFT length: 2.829 ms., avg: 2.852 ms.
Best time for 6144K FFT length: 2.917 ms., avg: 2.982 ms.
Best time for 6272K FFT length: 2.928 ms., avg: 2.958 ms.
Best time for 6400K FFT length: 3.056 ms., avg: 3.077 ms.
Best time for 6720K FFT length: 3.290 ms., avg: 3.309 ms.
Best time for 7056K FFT length: 3.410 ms., avg: 3.459 ms.
Best time for 7168K FFT length: 3.487 ms., avg: 3.542 ms.
Best time for 7200K FFT length: 3.422 ms., avg: 3.447 ms.
Best time for 7680K FFT length: 3.765 ms., avg: 3.793 ms.
Best time for 8064K FFT length: 4.125 ms., avg: 4.153 ms.
Timing FFTs using 8 threads on 8 cores.
Best time for 2048K FFT length: 0.702 ms., avg: 0.720 ms.
Best time for 2100K FFT length: 0.898 ms., avg: 0.911 ms.
Best time for 2160K FFT length: 0.914 ms., avg: 0.939 ms.
Best time for 2240K FFT length: 0.828 ms., avg: 0.848 ms.
Best time for 2304K FFT length: 0.866 ms., avg: 0.876 ms.
Best time for 2400K FFT length: 1.016 ms., avg: 1.039 ms.
Best time for 2520K FFT length: 1.000 ms., avg: 1.022 ms.
Best time for 2560K FFT length: 1.024 ms., avg: 1.031 ms.
Best time for 2592K FFT length: 1.046 ms., avg: 1.064 ms.
Best time for 2688K FFT length: 1.076 ms., avg: 1.090 ms.
Best time for 2744K FFT length: 1.063 ms., avg: 1.074 ms.
Best time for 2800K FFT length: 1.191 ms., avg: 1.224 ms.
Best time for 2880K FFT length: 1.049 ms., avg: 1.065 ms.
Best time for 2940K FFT length: 1.223 ms., avg: 1.250 ms.
Best time for 3000K FFT length: 1.328 ms., avg: 1.349 ms.
Best time for 3072K FFT length: 1.018 ms., avg: 1.043 ms.
Best time for 3136K FFT length: 1.204 ms., avg: 1.230 ms.
Best time for 3200K FFT length: 1.442 ms., avg: 1.472 ms.
Best time for 3360K FFT length: 1.248 ms., avg: 1.265 ms.
Best time for 3456K FFT length: 1.300 ms., avg: 1.315 ms.
Best time for 3528K FFT length: 1.469 ms., avg: 1.480 ms.
Best time for 3600K FFT length: 1.597 ms., avg: 1.617 ms.
Best time for 3840K FFT length: 1.773 ms., avg: 1.814 ms.
Best time for 4032K FFT length: 1.615 ms., avg: 1.637 ms.
Best time for 4200K FFT length: 1.827 ms., avg: 1.851 ms.
Best time for 4320K FFT length: 2.001 ms., avg: 2.047 ms.
Best time for 4480K FFT length: 2.057 ms., avg: 2.095 ms.
Best time for 4608K FFT length: 1.928 ms., avg: 1.950 ms.
Best time for 4704K FFT length: 1.983 ms., avg: 2.009 ms.
Best time for 4800K FFT length: 2.551 ms., avg: 2.580 ms.
Best time for 5040K FFT length: 2.085 ms., avg: 2.118 ms.
Best time for 5120K FFT length: 2.197 ms., avg: 2.228 ms.
Best time for 5184K FFT length: 2.468 ms., avg: 2.484 ms.
Best time for 5376K FFT length: 2.338 ms., avg: 2.367 ms.
[Tue Dec 11 08:57:22 2018]
Best time for 5760K FFT length: 3.016 ms., avg: 3.119 ms.
Best time for 6048K FFT length: 2.675 ms., avg: 2.699 ms.
Best time for 6144K FFT length: 2.748 ms., avg: 2.768 ms.
Best time for 6272K FFT length: 2.760 ms., avg: 2.792 ms.
Best time for 6400K FFT length: 2.886 ms., avg: 2.907 ms.
Best time for 6720K FFT length: 3.117 ms., avg: 3.150 ms.
Best time for 7056K FFT length: 3.253 ms., avg: 3.290 ms.
Best time for 7168K FFT length: 3.305 ms., avg: 3.395 ms.
Best time for 7200K FFT length: 3.268 ms., avg: 3.304 ms.
Best time for 7680K FFT length: 3.568 ms., avg: 3.590 ms.
Best time for 8064K FFT length: 3.905 ms., avg: 3.941 ms.
Double checking a 2^51million exponent was working at 1.08ms/iteration, which is pretty amazing. Unfortunately it said there was a 0.4 rounding error, so I deleted the progress and started the doublecheck over at 29.4 build 8. It might be simply because my 80mm Noctua aircooler is insufficient for this ridiculously hot CPU (93C at load), so I'm getting a bigger cooler (140mm) and will try again and see if the fault is on my end.
simon389 is offline   Reply With Quote
Old 2018-12-11, 19:33   #2
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

455110 Posts
Default

Do I need Build 5 for this improvement?
I have Build 3. (Of 29.5 of course)
petrw1 is offline   Reply With Quote
Old 2018-12-11, 20:19   #3
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

22·757 Posts
Default

Most of that speed increase is probably just the AVX512 in the 29.5 version, but you should upgrade to build 5, follow this thread:

https://mersenneforum.org/showthread.php?t=23723



29.5 build 4.

1) Slightly faster small FFTs (under 128K)
2) Reduced round off error for FFT lengths divisible by 7.
3) Proper FFT crossover points.

29.5 build 5.

1) Fixed AVX FFTs where FFT length was divisible by 7.
2) Fixed several zero padded FFT bugs.
3) Wider CPU dialog box.
4) Correct reporting of Skylake-X L2 cache size.

Last fiddled with by ATH on 2018-12-11 at 20:20
ATH is offline   Reply With Quote
Old 2018-12-11, 20:21   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

162638 Posts
Default

Quote:
Originally Posted by simon389 View Post
Unfortunately it said there was a 0.4 rounding error, so I deleted the progress and started the doublecheck over at 29.4 build 8..
Don't fret roundoff errors, they are normal occurrences.
Prime95 is online now   Reply With Quote
Old 2018-12-12, 01:03   #5
xx005fs
 
"Eric"
Jan 2018
USA

110101002 Posts
Default 29.5/future version wishlists...

Awesome!! I am just hoping that all the GPU computations will be integrated into Prime95 in the future both as a GPU memory and core synthetic stress test as well as reporting and getting work directly using the software (possibly with more optimized algo?), which will make it very convenient.
xx005fs is offline   Reply With Quote
Old 2018-12-12, 06:52   #6
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

220578 Posts
Default

I hope they will never be. Separate programs and tools come from different people who have different competencies. All these programs and tools have different requirements for optimizations, maintenance, etc., and they stay much better as separate tools, which can be upgraded separate, debugged separate, etc.

Otherwise it will be a hell, in spite of the fact that on paper everything looks wonderful, putting all current tools in the same program is as utopic as the communism was... hehe... They both would work in an ideal society, but not in practice.

I have a post here somewhere about buying a swiss knife with scissors, cork screw, nail clipper, a lot of other tools, and a small screwdriver in a corner, when what you actually need is a big robust screwdriver only.

Last fiddled with by LaurV on 2018-12-12 at 06:56
LaurV is offline   Reply With Quote
Old 2018-12-12, 12:21   #7
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

132728 Posts
Default

Quote:
Originally Posted by LaurV View Post
I hope they will never be. Separate programs and tools come from different people who have different competencies. All these programs and tools have different requirements for optimizations, maintenance, etc., and they stay much better as separate tools, which can be upgraded separate, debugged separate, etc.

Otherwise it will be a hell, in spite of the fact that on paper everything looks wonderful, putting all current tools in the same program is as utopic as the communism was... hehe... They both would work in an ideal society, but not in practice.

I have a post here somewhere about buying a swiss knife with scissors, cork screw, nail clipper, a lot of other tools, and a small screwdriver in a corner, when what you actually need is a big robust screwdriver only.
Prime95 calling a separate executable for gpus might be a suitable compromise.
henryzz is online now   Reply With Quote
Old 2018-12-12, 16:45   #8
simon389
 
Aug 2013

3·29 Posts
Talking

Update, installed a huge 6-pipe 140mm Noctua cooler and temps lowered to 75C at load. Currently iterating a 2^88,000,000 exponent with 29.5 build 5 at 1.95ms/iter which completes it (from start to finish) in just under two days.

Last fiddled with by simon389 on 2018-12-12 at 16:46
simon389 is offline   Reply With Quote
Old 2018-12-13, 08:35   #9
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AC16 Posts
Default

Good to hear of the improvement. For me 93 C is scary territory, even for GPUs. The long-term effects can't be good, especially for CPUs.

That temp is not surprising, if the cooler was driven by one or two 80mm fans. Are you running one, or two fans on the new cooler? Either way it is probably quieter than the old cooler.
kladner is offline   Reply With Quote
Old 2018-12-13, 08:45   #10
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

35×52 Posts
Default

Quote:
Originally Posted by kladner View Post
For me 93 C is scary territory, even for GPUs. The long-term effects can't be good, especially for CPUs.
I'd say the scariness factor is not warranted. I guess it depends upon what you mean by "long" in "long term". If you are worried about 20+ years in the future then perhaps such temperatures might be a concern. For shorter values of "long" I think anything less than 100C is perfectly fine for silicon devices.
retina is online now   Reply With Quote
Old 2018-12-13, 08:55   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×31×79 Posts
Default

Quote:
Originally Posted by retina View Post
I'd say the scariness factor is not warranted. I guess it depends upon what you mean by "long" in "long term". If you are worried about 20+ years in the future then perhaps such temperatures might be a concern. For shorter values of "long" I think anything less than 100C is perfectly fine for silicon devices.
On the NVIDIA side, allowable gpu temperature specification has been declining. For older models, 100+C is common; newer say 97, 94 or 91C. See the attachment at https://www.mersenneforum.org/showpo...11&postcount=2 for some examples with source links.
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AVX512 performance on new shiny Intel kit heliosh Hardware 19 2020-01-18 04:01
LL first test shows 4 days to complete. sr13798 Information & Answers 2 2016-11-14 16:30
Unofficial experimental beta build wombatman YAFU 22 2016-02-19 18:59
Huge ECM speed increase with GMP 6.0.0 wombatman GMP-ECM 13 2014-04-03 22:29
How do they increase processor speed? clowns789 Lounge 17 2004-02-15 00:31

All times are UTC. The time now is 17:16.

Wed Feb 24 17:16:04 UTC 2021 up 83 days, 13:27, 0 users, load averages: 1.37, 1.73, 1.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.