mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-04-28, 14:22   #12
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

35·31 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I'll create a Windows build tonight
Here it is. I can't test it, but it ought to run the benchmarks just fine: https://www.dropbox.com/s/828cfj80u8...ime95.exe?dl=0
Prime95 is offline   Reply With Quote
Old 2017-04-28, 14:26   #13
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55628 Posts
Default

I got a segmentation fault on the i7-4770k. Happened right after the 64K FFT.

Code:
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus, 1 worker.  Average times:  0.12 ms.  Total throughput: 8690.98 iter/sec.
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus, 2 workers.  Average times:  0.16,  0.17 ms.  Total throughput: 11897.02 iter/sec.
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus, 3 workers.  Average times:  0.18,  0.23,  0.22 ms.  Total throughput: 14364.27 iter/sec.
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus, 4 workers.  Average times:  0.25,  0.26,  0.25,  0.23 ms.  Total throughput: 16405.66 iter/sec.
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus hyperthreaded, 1 worker.  Average times:  0.12 ms.  Total throughput: 8365.94 iter/sec.
[Work thread Apr 28 08:46] Timing 64K FFT, 4 cpus hyperthreaded, 2 workers.  Average times:  0.14,  0.19 ms.  Total throughput: 12638.12 iter/sec.
[Work thread Apr 28 08:47] Timing 64K FFT, 4 cpus hyperthreaded, 3 workers.  Segmentation fault (core dumped)
I've never seen mprime do that before. I've now restarted the benchmark without hyperthreading.
Mark Rose is offline   Reply With Quote
Old 2017-04-28, 23:21   #14
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753310 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I've never seen mprime do that before. I've now restarted the benchmark without hyperthreading.
Interesting. Let me know if it happens again.

Remember, I don't need the 2 & 3 worker case. The 1 and maximum (which I assume is 4) worker cases are sufficient.
Prime95 is offline   Reply With Quote
Old 2017-04-29, 00:29   #15
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Interesting. Let me know if it happens again.

Remember, I don't need the 2 & 3 worker case. The 1 and maximum (which I assume is 4) worker cases are sufficient.
It hasn't happened with not testing hyperthreading. When this test run is done, I'll try again with hyperthreading to see if I can reproduce the segfault.

I'm already at 2000K with the 2/3/4 worker cases on the 4770k. It's probably faster to let it complete at this point. I can let it chew all weekend if necessary.

I've often seen the 4th core provide little help when memory bandwidth is low. It may be advantageous to benchmark n-1 cores for the automatic tuning, and to use n-1 cores to have less of an effect on system responsiveness up if the last core doesn't provide much benefit. I can also test on a 4770 non-k with DDR3-1600 which exhibits this, if it's of interest to you.
Mark Rose is offline   Reply With Quote
Old 2017-04-29, 00:33   #16
GP2
 
GP2's Avatar
 
Sep 2003

50318 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Remember, I don't need the 2 & 3 worker case. The 1 and maximum (which I assume is 4) worker cases are sufficient.
But what if the topology has two sockets? With two CPUs, running two workers will usually have greater throughput than running one worker.

I'm also running a benchmark on an AWS c4.8xlarge instance with every number of workers, just out of curiosity. I can already see that the curve of throughput vs. FFT size looks very dissimilar for the 1-worker case than for the 2-worker case. In particular, they don't go non-monotonic in the same places, and the curves actually cross in one brief interval within the 3000K to 4000K range, where running one worker actually gives greater throughput than running two workers (this corresponds roughly to the 70M exponents).

The bottom line is, you just can't interpolate between 1 worker and maximum workers in order to estimate the 2-worker case, or other numbers of workers, at least for this particular virtual machine.

I wonder if this effect is genuine or spurious... maybe 8 seconds isn't long enough for reliable timing? Or perhaps it's an artifact due to the c4.8xlarge not quite being bare metal. It uses the entire physical machine, so there are no other AWS customers there to mess up the timings, but still. There are 9 cores available on each CPU, for a total of 18; a Xeon E5-2666 would be expected to have 10, as others have pointed out, so maybe the hypervisor kicks in from time to time and messes up the cache in random ways.

The other AWS instances, with 1, 2, 4, or 8 cores, all run within a single CPU, so for those perhaps the throughput numbers will be more sensible.

Last fiddled with by GP2 on 2017-04-29 at 00:54
GP2 is offline   Reply With Quote
Old 2017-04-29, 00:48   #17
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100102 Posts
Default

Did you change the P-States on that instance?
Mark Rose is offline   Reply With Quote
Old 2017-04-29, 00:57   #18
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Did you change the P-States on that instance?
No.

Running 18 instances of c4.large gives considerably better throughput than one instance of c4.8xlarge, and is almost always a lot cheaper on a per-core basis too. So in practice I mostly stick to c4.large and don't fiddle around with the heftier instances, except to run benchmarks.

The next larger size above c4.large is c4.xlarge, with two cores. The cost is comparable (i.e., two cores vs. one and the spot price is twice as much), but it incurs about a 5% throughput penalty. The c4.2xlarge and c4.4xlarge on the other hand incur considerably bigger throughput penalties and the spot price is usually higher on a per-core basis.

In an ideal world, an 18-core instance would have exactly 18 times the throughput of a 1-core instance and cost exactly 18 times as much. In reality, it's both considerably lower in throughput (for running mprime) and considerably higher in price. The 8-core and 4-core instances are no bargains either. Even the 2-core instance is slightly less cost-effective than the 1-core instance.

Last fiddled with by GP2 on 2017-04-29 at 01:18
GP2 is offline   Reply With Quote
Old 2017-04-29, 03:09   #19
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

35·31 Posts
Default

Quote:
Originally Posted by GP2 View Post
The bottom line is, you just can't interpolate between 1 worker and maximum workers in order to estimate the 2-worker case, or other numbers of workers, at least for this particular virtual machine..
Agreed. What I'm trying to determine is a reasonable subset of the FFT implementations to include in a final executable.

In real-world use, if you've setup prime95 to run 2 workers then prime95 will need to benchmark the 2-worker, 9-cores each case to determine the best FFT implementation for that machine.

If you'd like to contribute a benchmark for me to work with, there are some weird undoc.txt options for benchmarking that can force prime95 to bench 2-workers using 9-cores each. That probably reflects the most realistic real-world configuration.
Prime95 is offline   Reply With Quote
Old 2017-04-29, 04:05   #20
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If you'd like to contribute a benchmark for me to work with, there are some weird undoc.txt options for benchmarking that can force prime95 to bench 2-workers using 9-cores each. That probably reflects the most realistic real-world configuration.
Since hwloc now automatically determines the topology, the program knows that there are two CPUs present. So perhaps the benchmark code could automatically offer to test
  • 1 worker
  • <number of CPUs> workers (only if different from 1)
  • max workers

I think you'd want to benchmark this often enough so that it really ought to be offered as a standard benchmarking prompt, without resorting to undoc.txt options.

Also, if I recall, benchmarking hyperthreading has Y as the default. If hyperthreading is sort of deprecated now, perhaps the default could be changed to N.
GP2 is offline   Reply With Quote
Old 2017-04-29, 04:27   #21
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AE16 Posts
Default

FFT timings
i7 6700K limited to 4 GHz
RAM 3200 MHz, 17-18-18-36 timings
v29.2 Build 1
Code:
[Apr 28 23:21] Worker starting
[Apr 28 23:21] Your timings will be written to the results.txt file.
[Apr 28 23:21] Compare your results to other computers at http://www.mersenne.org/report_benchmarks
[Apr 28 23:21] Timing 25 iterations of 2048K FFT length.  Best time: 7.492 ms., avg time: 7.578 ms.
[Apr 28 23:21] Timing 25 iterations of 2240K FFT length.  Best time: 7.835 ms., avg time: 7.882 ms.
[Apr 28 23:21] Timing 25 iterations of 2304K FFT length.  Best time: 8.220 ms., avg time: 8.356 ms.
[Apr 28 23:21] Timing 25 iterations of 2400K FFT length.  Best time: 8.253 ms., avg time: 8.321 ms.
[Apr 28 23:21] Timing 25 iterations of 2560K FFT length.  Best time: 9.807 ms., avg time: 9.925 ms.
[Apr 28 23:21] Timing 25 iterations of 2688K FFT length.  Best time: 8.914 ms., avg time: 8.983 ms.
[Apr 28 23:21] Timing 25 iterations of 2800K FFT length.  Best time: 10.523 ms., avg time: 10.625 ms.
[Apr 28 23:21] Timing 25 iterations of 2880K FFT length.  Best time: 10.328 ms., avg time: 10.876 ms.
[Apr 28 23:21] Timing 25 iterations of 3072K FFT length.  Best time: 11.018 ms., avg time: 11.110 ms.
[Apr 28 23:21] Timing 25 iterations of 3200K FFT length.  Best time: 12.489 ms., avg time: 12.608 ms.
[Apr 28 23:21] Timing 25 iterations of 3360K FFT length.  Best time: 11.982 ms., avg time: 12.190 ms.
[Apr 28 23:21] Timing 25 iterations of 3456K FFT length.  Best time: 12.473 ms., avg time: 12.575 ms.
[Apr 28 23:21] Timing 25 iterations of 3584K FFT length.  Best time: 12.180 ms., avg time: 12.281 ms.
[Apr 28 23:21] Timing 25 iterations of 3840K FFT length.  Best time: 14.493 ms., avg time: 14.653 ms.
[Apr 28 23:21] Timing 25 iterations of 4000K FFT length.  Best time: 14.522 ms., avg time: 14.621 ms.
[Apr 28 23:21] Timing 25 iterations of 4032K FFT length.  Best time: 15.030 ms., avg time: 15.143 ms.
[Apr 28 23:21] Timing 25 iterations of 4096K FFT length.  Best time: 15.246 ms., avg time: 15.472 ms.
[Apr 28 23:21] Timing 25 iterations of 4480K FFT length.  Best time: 16.058 ms., avg time: 16.167 ms.
[Apr 28 23:21] Timing 25 iterations of 4608K FFT length.  Best time: 16.634 ms., avg time: 17.668 ms.
[Apr 28 23:21] Timing 25 iterations of 4800K FFT length.  Best time: 17.940 ms., avg time: 18.738 ms.
[Apr 28 23:21] Timing 25 iterations of 5120K FFT length.  Best time: 19.837 ms., avg time: 20.019 ms.
[Apr 28 23:21] Timing 25 iterations of 5376K FFT length.  Best time: 20.223 ms., avg time: 20.588 ms.
[Apr 28 23:21] Timing 25 iterations of 5600K FFT length.  Best time: 21.039 ms., avg time: 21.216 ms.
[Apr 28 23:21] Timing 25 iterations of 5760K FFT length.  Best time: 21.592 ms., avg time: 21.830 ms.
[Apr 28 23:21] Timing 25 iterations of 6144K FFT length.  Best time: 22.859 ms., avg time: 23.287 ms.
[Apr 28 23:21] Timing 25 iterations of 6400K FFT length.  Best time: 24.999 ms., avg time: 25.182 ms.
[Apr 28 23:21] Timing 25 iterations of 6720K FFT length.  Best time: 25.914 ms., avg time: 26.151 ms.
[Apr 28 23:21] Timing 25 iterations of 6912K FFT length.  Best time: 27.318 ms., avg time: 27.496 ms.
[Apr 28 23:21] Timing 25 iterations of 7168K FFT length.  Best time: 27.612 ms., avg time: 27.704 ms.
[Apr 28 23:22] Timing 25 iterations of 7680K FFT length.  Best time: 29.898 ms., avg time: 30.183 ms.
[Apr 28 23:22] Timing 25 iterations of 8000K FFT length.  Best time: 31.303 ms., avg time: 31.541 ms.
[Apr 28 23:22] Timing 25 iterations of 8064K FFT length.  Best time: 32.713 ms., avg time: 33.000 ms.
[Apr 28 23:22] Timing 25 iterations of 8192K FFT length.  Best time: 31.968 ms., avg time: 32.342 ms.
[Apr 28 23:22] Timing FFTs using 4 threads on 4 cores.
[Apr 28 23:22] Timing 25 iterations of 2048K FFT length.  Best time: 2.270 ms., avg time: 2.389 ms.
[Apr 28 23:22] Timing 25 iterations of 2240K FFT length.  Best time: 2.241 ms., avg time: 2.284 ms.
[Apr 28 23:22] Timing 25 iterations of 2304K FFT length.  Best time: 2.354 ms., avg time: 2.392 ms.
[Apr 28 23:22] Timing 25 iterations of 2400K FFT length.  Best time: 2.405 ms., avg time: 2.542 ms.
[Apr 28 23:22] Timing 25 iterations of 2560K FFT length.  Best time: 3.043 ms., avg time: 3.077 ms.
[Apr 28 23:22] Timing 25 iterations of 2688K FFT length.  Best time: 2.605 ms., avg time: 2.896 ms.
[Apr 28 23:22] Timing 25 iterations of 2800K FFT length.  Best time: 3.032 ms., avg time: 3.069 ms.
[Apr 28 23:22] Timing 25 iterations of 2880K FFT length.  Best time: 3.714 ms., avg time: 3.796 ms.
[Apr 28 23:22] Timing 25 iterations of 3072K FFT length.  Best time: 3.318 ms., avg time: 3.369 ms.
[Apr 28 23:22] Timing 25 iterations of 3200K FFT length.  Best time: 4.026 ms., avg time: 4.344 ms.
[Apr 28 23:22] Timing 25 iterations of 3360K FFT length.  Best time: 3.467 ms., avg time: 3.615 ms.
[Apr 28 23:22] Timing 25 iterations of 3456K FFT length.  Best time: 3.669 ms., avg time: 7.299 ms.
[Apr 28 23:22] Timing 25 iterations of 3584K FFT length.  Best time: 3.620 ms., avg time: 3.704 ms.
[Apr 28 23:22] Timing 25 iterations of 3840K FFT length.  Best time: 4.353 ms., avg time: 4.609 ms.
[Apr 28 23:22] Timing 25 iterations of 4000K FFT length.  Best time: 4.323 ms., avg time: 4.366 ms.
[Apr 28 23:22] Timing 25 iterations of 4032K FFT length.  Best time: 4.328 ms., avg time: 4.451 ms.
[Apr 28 23:22] Timing 25 iterations of 4096K FFT length.  Best time: 4.585 ms., avg time: 4.891 ms.
[Apr 28 23:22] Timing 25 iterations of 4480K FFT length.  Best time: 4.624 ms., avg time: 4.812 ms.
[Apr 28 23:22] Timing 25 iterations of 4608K FFT length.  Best time: 5.021 ms., avg time: 6.677 ms.
[Apr 28 23:22] Timing 25 iterations of 4800K FFT length.  Best time: 5.475 ms., avg time: 7.231 ms.
[Apr 28 23:22] Timing 25 iterations of 5120K FFT length.  Best time: 6.085 ms., avg time: 6.183 ms.
[Apr 28 23:22] Timing 25 iterations of 5376K FFT length.  Best time: 5.882 ms., avg time: 6.055 ms.
[Apr 28 23:22] Timing 25 iterations of 5600K FFT length.  Best time: 6.071 ms., avg time: 6.160 ms.
[Apr 28 23:22] Timing 25 iterations of 5760K FFT length.  Best time: 6.540 ms., avg time: 6.633 ms.
[Apr 28 23:22] Timing 25 iterations of 6144K FFT length.  Best time: 6.980 ms., avg time: 7.053 ms.
[Apr 28 23:22] Timing 25 iterations of 6400K FFT length.  Best time: 8.103 ms., avg time: 8.502 ms.
[Apr 28 23:22] Timing 25 iterations of 6720K FFT length.  Best time: 7.733 ms., avg time: 7.834 ms.
[Apr 28 23:22] Timing 25 iterations of 6912K FFT length.  Best time: 7.644 ms., avg time: 8.270 ms.
[Apr 28 23:22] Timing 25 iterations of 7168K FFT length.  Best time: 8.253 ms., avg time: 8.442 ms.
[Apr 28 23:22] Timing 25 iterations of 7680K FFT length.  Best time: 9.267 ms., avg time: 9.393 ms.
[Apr 28 23:22] Timing 25 iterations of 8000K FFT length.  Best time: 10.203 ms., avg time: 10.282 ms.
[Apr 28 23:22] Timing 25 iterations of 8064K FFT length.  Best time: 9.066 ms., avg time: 9.172 ms.
[Apr 28 23:23] Timing 25 iterations of 8192K FFT length.  Best time: 9.558 ms., avg time: 9.667 ms.
[Apr 28 23:23] FFT timings benchmark complete.
[Apr 28 23:23] Worker stopped.
Code:
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
CPU speed: 4008.19 MHz, 4 hyperthreaded cores
CPU features: Prefetchw, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Machine topology as determined by hwloc library:
 Machine#0 (total=14299576KB, Backend=Windows, hwlocVersion=1.11.6, ProcessName=prime95.exe)
  NUMANode#0 (local=14299576KB, total=14299576KB)
    Package#0 (CPUVendor=GenuineIntel, CPUFamilyNumber=6, CPUModelNumber=94, CPUModel="Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz", CPUStepping=3)
      L3 (size=8192KB, linesize=64, ways=16, Inclusive=1)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000003)
              PU#0 (cpuset: 0x00000001)
              PU#1 (cpuset: 0x00000002)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x0000000c)
              PU#2 (cpuset: 0x00000004)
              PU#3 (cpuset: 0x00000008)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000030)
              PU#4 (cpuset: 0x00000010)
              PU#5 (cpuset: 0x00000020)
        L2 (size=256KB, linesize=64, ways=4, Inclusive=0)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x000000c0)
              PU#6 (cpuset: 0x00000040)
              PU#7 (cpuset: 0x00000080)
Prime95 64-bit version 29.2, RdtscTiming=1
Timings for 2048K FFT length (1 cpu, 1 worker):  7.58 ms.  Throughput: 131.90 iter/sec.
Timings for 2048K FFT length (4 cpus, 1 worker):  2.34 ms.  Throughput: 426.63 iter/sec.
Timings for 2240K FFT length (1 cpu, 1 worker):  7.86 ms.  Throughput: 127.22 iter/sec.
Timings for 2240K FFT length (4 cpus, 1 worker):  2.33 ms.  Throughput: 428.78 iter/sec.
Timings for 2304K FFT length (1 cpu, 1 worker):  8.28 ms.  Throughput: 120.81 iter/sec.
Timings for 2304K FFT length (4 cpus, 1 worker):  2.43 ms.  Throughput: 410.95 iter/sec.
Timings for 2400K FFT length (1 cpu, 1 worker):  8.33 ms.  Throughput: 120.11 iter/sec.
Timings for 2400K FFT length (4 cpus, 1 worker):  2.47 ms.  Throughput: 405.26 iter/sec.
Timings for 2560K FFT length (1 cpu, 1 worker):  9.99 ms.  Throughput: 100.09 iter/sec.
Timings for 2560K FFT length (4 cpus, 1 worker):  3.15 ms.  Throughput: 317.78 iter/sec.
Timings for 2688K FFT length (1 cpu, 1 worker):  9.11 ms.  Throughput: 109.82 iter/sec.
Timings for 2688K FFT length (4 cpus, 1 worker):  2.71 ms.  Throughput: 368.54 iter/sec.
Timings for 2800K FFT length (1 cpu, 1 worker): 10.63 ms.  Throughput: 94.08 iter/sec.
Timings for 2800K FFT length (4 cpus, 1 worker):  3.08 ms.  Throughput: 325.05 iter/sec.
Timings for 2880K FFT length (1 cpu, 1 worker): 10.43 ms.  Throughput: 95.89 iter/sec.
Timings for 2880K FFT length (4 cpus, 1 worker):  3.15 ms.  Throughput: 317.87 iter/sec.
Timings for 3072K FFT length (1 cpu, 1 worker): 11.16 ms.  Throughput: 89.58 iter/sec.
Timings for 3072K FFT length (4 cpus, 1 worker):  3.42 ms.  Throughput: 292.76 iter/sec.
Timings for 3200K FFT length (1 cpu, 1 worker): 12.50 ms.  Throughput: 80.01 iter/sec.
Timings for 3200K FFT length (4 cpus, 1 worker):  4.22 ms.  Throughput: 237.21 iter/sec.
Timings for 3360K FFT length (1 cpu, 1 worker): 12.04 ms.  Throughput: 83.05 iter/sec.
Timings for 3360K FFT length (4 cpus, 1 worker):  3.60 ms.  Throughput: 277.76 iter/sec.
Timings for 3456K FFT length (1 cpu, 1 worker): 12.53 ms.  Throughput: 79.83 iter/sec.
Timings for 3456K FFT length (4 cpus, 1 worker):  3.83 ms.  Throughput: 260.77 iter/sec.
Timings for 3584K FFT length (1 cpu, 1 worker): 12.35 ms.  Throughput: 80.99 iter/sec.
Timings for 3584K FFT length (4 cpus, 1 worker):  3.77 ms.  Throughput: 265.59 iter/sec.
Timings for 3840K FFT length (1 cpu, 1 worker): 14.60 ms.  Throughput: 68.47 iter/sec.
[Fri Apr 28 23:34:03 2017]
Timings for 3840K FFT length (4 cpus, 1 worker):  4.45 ms.  Throughput: 224.65 iter/sec.
Timings for 4000K FFT length (1 cpu, 1 worker): 14.57 ms.  Throughput: 68.62 iter/sec.
Timings for 4000K FFT length (4 cpus, 1 worker):  4.43 ms.  Throughput: 225.88 iter/sec.
Timings for 4032K FFT length (1 cpu, 1 worker): 15.07 ms.  Throughput: 66.37 iter/sec.
Timings for 4032K FFT length (4 cpus, 1 worker):  4.46 ms.  Throughput: 224.39 iter/sec.
Timings for 4096K FFT length (1 cpu, 1 worker): 15.28 ms.  Throughput: 65.46 iter/sec.
Timings for 4096K FFT length (4 cpus, 1 worker):  4.67 ms.  Throughput: 213.92 iter/sec.
Timings for 4480K FFT length (1 cpu, 1 worker): 16.08 ms.  Throughput: 62.20 iter/sec.
Timings for 4480K FFT length (4 cpus, 1 worker):  4.77 ms.  Throughput: 209.45 iter/sec.
Timings for 4608K FFT length (1 cpu, 1 worker): 16.70 ms.  Throughput: 59.87 iter/sec.
Timings for 4608K FFT length (4 cpus, 1 worker):  5.14 ms.  Throughput: 194.71 iter/sec.
Timings for 4800K FFT length (1 cpu, 1 worker): 17.94 ms.  Throughput: 55.74 iter/sec.
Timings for 4800K FFT length (4 cpus, 1 worker):  5.63 ms.  Throughput: 177.49 iter/sec.
Timings for 5120K FFT length (1 cpu, 1 worker): 19.97 ms.  Throughput: 50.07 iter/sec.
Timings for 5120K FFT length (4 cpus, 1 worker):  6.32 ms.  Throughput: 158.27 iter/sec.
Timings for 5376K FFT length (1 cpu, 1 worker): 20.26 ms.  Throughput: 49.35 iter/sec.
Timings for 5376K FFT length (4 cpus, 1 worker):  6.03 ms.  Throughput: 165.94 iter/sec.
Timings for 5600K FFT length (1 cpu, 1 worker): 21.11 ms.  Throughput: 47.38 iter/sec.
Timings for 5600K FFT length (4 cpus, 1 worker):  6.28 ms.  Throughput: 159.12 iter/sec.
Timings for 5760K FFT length (1 cpu, 1 worker): 21.76 ms.  Throughput: 45.96 iter/sec.
Timings for 5760K FFT length (4 cpus, 1 worker):  6.73 ms.  Throughput: 148.50 iter/sec.
Timings for 6144K FFT length (1 cpu, 1 worker): 22.97 ms.  Throughput: 43.53 iter/sec.
Timings for 6144K FFT length (4 cpus, 1 worker):  7.15 ms.  Throughput: 139.81 iter/sec.
Timings for 6400K FFT length (1 cpu, 1 worker): 25.43 ms.  Throughput: 39.33 iter/sec.
Timings for 6400K FFT length (4 cpus, 1 worker):  8.29 ms.  Throughput: 120.67 iter/sec.
Timings for 6720K FFT length (1 cpu, 1 worker): 25.94 ms.  Throughput: 38.55 iter/sec.
[Fri Apr 28 23:39:06 2017]
Timings for 6720K FFT length (4 cpus, 1 worker):  7.89 ms.  Throughput: 126.77 iter/sec.
Timings for 6912K FFT length (1 cpu, 1 worker): 27.46 ms.  Throughput: 36.42 iter/sec.
Timings for 6912K FFT length (4 cpus, 1 worker):  7.86 ms.  Throughput: 127.21 iter/sec.
Timings for 7168K FFT length (1 cpu, 1 worker): 27.80 ms.  Throughput: 35.98 iter/sec.
Timings for 7168K FFT length (4 cpus, 1 worker):  8.45 ms.  Throughput: 118.40 iter/sec.
Timings for 7680K FFT length (1 cpu, 1 worker): 30.19 ms.  Throughput: 33.13 iter/sec.
Timings for 7680K FFT length (4 cpus, 1 worker):  9.43 ms.  Throughput: 106.00 iter/sec.
Timings for 8000K FFT length (1 cpu, 1 worker): 31.67 ms.  Throughput: 31.58 iter/sec.
Timings for 8000K FFT length (4 cpus, 1 worker): 10.43 ms.  Throughput: 95.89 iter/sec.
Timings for 8064K FFT length (1 cpu, 1 worker): 32.99 ms.  Throughput: 30.31 iter/sec.
Timings for 8064K FFT length (4 cpus, 1 worker):  9.35 ms.  Throughput: 106.92 iter/sec.
Timings for 8192K FFT length (1 cpu, 1 worker): 31.91 ms.  Throughput: 31.34 iter/sec.
Timings for 8192K FFT length (4 cpus, 1 worker):  9.87 ms.  Throughput: 101.31 iter/sec.

Last fiddled with by kladner on 2017-04-29 at 05:04
kladner is offline   Reply With Quote
Old 2017-04-29, 06:15   #22
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Quote:
Originally Posted by GP2 View Post
Since hwloc now automatically determines the topology, the program knows that there are two CPUs present.
I'm almost done with the benchmark on a c4.8xlarge, if you want to save yourself the trouble.
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
29.2 benchmark help #2 (Ryzen only) Prime95 Software 10 2017-05-08 13:24
Benchmark Variances Fred Software 5 2016-04-01 18:15
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
Does anyone have i7 920? for Benchmark? cipher Twin Prime Search 2 2009-04-14 20:16
Benchmark Weirdness R.D. Silverman Hardware 2 2007-07-25 12:16

All times are UTC. The time now is 17:21.


Sun Aug 1 17:21:08 UTC 2021 up 9 days, 11:50, 0 users, load averages: 1.99, 1.55, 1.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.