mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-06-11, 11:40   #1
bgbeuning
 
Dec 2014

3·5·17 Posts
Default Mini ITX with LGA 2011 (4 memory channels)

The ASRock EPC612D4I is a Mini ITX with LGA 2011 socket so it has 4 memory channels.
(Everyone knows prime95 loves memory channels.)

I initially thought I would use an i7-5820K (6 core / 12 thread) but decided against it.
The approved memory for the EPC uses ECC, the i7 does not support ECC, and
the i7 is not on the approved CPU list. So I used a Xeon E5-1620 V3 (3.5 GHz).
(The E5-1xxx do not support dual CPU so can be clocked higher.)

I think 6 core and 4 memory channels is a good fit, but the 6 core Xeon was
a budget buster.

Then I used a memory part from the approved list
Kingston KVR21SE15D8/8HA DDR4 2133 ECC. (SODIMM)

Here are the benchmark results

Quote:
Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
CPU speed: 3498.40 MHz, 4 hyperthreaded cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 10 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.7, RdtscTiming=1

Best time for 4096K FFT length: 19.368 ms., avg: 19.408 ms.
Timing FFTs using 2 threads on 1 physical CPU.
Best time for 4096K FFT length: 20.469 ms., avg: 21.333 ms.
Timing FFTs using 2 threads on 2 physical CPUs.
Best time for 4096K FFT length: 10.171 ms., avg: 10.782 ms.
Timing FFTs using 3 threads on 3 physical CPUs.
Best time for 4096K FFT length: 7.256 ms., avg: 7.791 ms.
Timing FFTs using 4 threads on 4 physical CPUs.
Best time for 4096K FFT length: 5.786 ms., avg: 6.665 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 4096K FFT length: 6.527 ms., avg: 7.695 ms.

Timings for 4096K FFT length (1 cpu, 1 worker): 19.32 ms. Throughput: 51.75 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 19.75, 19.79 ms. Throughput: 101.14 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 20.27, 20.14, 20.14 ms. Throughput: 148.62 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 20.55, 20.63, 20.52, 20.57 ms. Throughput: 194.48 iter/sec.
Timings for 4096K FFT length (1 cpu hyperthreaded, 1 worker): 21.13 ms. Throughput: 47.33 iter/sec.
Timings for 4096K FFT length (2 cpus hyperthreaded, 2 workers): 21.53, 21.61 ms. Throughput: 92.70 iter/sec.
Timings for 4096K FFT length (3 cpus hyperthreaded, 3 workers): 22.20, 21.90, 22.22 ms. Throughput: 135.71 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 22.64, 22.37, 22.74, 22.67 ms. Throughput: 176.97 iter/sec.
It was obvious from the prices of the parts that this would have a hard time competing
with the H110 and i5-6500 combinations. The C612 chipset and Xeon combination
gets about the same performance, so the extra price is not worth it.

I was also wondering if SODIMM memory had the same bandwidth as regular memory.
It seems about the same.


The next board I am looking at is the ASUS H110T/CSM. It costs a little more but
does not need the Pico PSU. (Some of these boards need 19V power, but this one
says 12V or 19V.) Also being low profile means better air flow in a server case
and they can be rotated 90 degree from my other boards.
bgbeuning is offline   Reply With Quote
Old 2016-06-11, 12:07   #2
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

13×31 Posts
Default

In your situation you have a moderate amount of CPU power with a ton of memory bandwidth. So overall, your CPU is running pretty close to its max potential, but that potential isn't as high as others.

I recently got what was sold as an E5-2683v3. No, I wouldn't get one new but there are some really cheap ones on ebay, cheaper than a 6600k.

Genuine Intel(R) CPU @ 2.00GHz
CPU speed: 1861.88 MHz, 14 cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 35 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.7, RdtscTiming=1

Timings for 4096K FFT length (1 cpu, 1 worker): 26.05 ms. Throughput: 38.38 iter/sec.
Timings for 4096K FFT length (2 cpus, 2 workers): 26.99, 26.94 ms. Throughput: 74.18 iter/sec.
Timings for 4096K FFT length (3 cpus, 3 workers): 29.02, 28.05, 28.09 ms. Throughput: 105.71 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 29.58, 29.41, 29.21, 29.18 ms. Throughput: 136.32 iter/sec.
Timings for 4096K FFT length (5 cpus, 5 workers): 30.57, 30.17, 30.04, 30.05, 29.81 ms. Throughput: 165.96 iter/sec.
Timings for 4096K FFT length (6 cpus, 6 workers): 31.30, 31.84, 31.02, 30.95, 30.77, 30.76 ms. Throughput: 192.92 iter/sec.
Timings for 4096K FFT length (7 cpus, 7 workers): 32.53, 32.44, 32.44, 32.44, 32.24, 32.34, 32.29 ms. Throughput: 216.12 iter/sec.
Timings for 4096K FFT length (8 cpus, 8 workers): 34.91, 33.97, 33.91, 33.89, 33.75, 33.84, 33.86, 33.57 ms. Throughput: 235.57 iter/sec.
Timings for 4096K FFT length (9 cpus, 9 workers): 36.00, 35.81, 35.78, 35.73, 35.71, 35.70, 35.76, 35.64, 35.53 ms. Throughput: 251.82 iter/sec.
Timings for 4096K FFT length (10 cpus, 10 workers): 38.31, 37.83, 37.87, 37.76, 37.66, 37.77, 37.77, 37.75, 37.66, 37.51 ms. Throughput: 264.64 iter/sec.
Timings for 4096K FFT length (11 cpus, 11 workers): 41.20, 40.45, 40.54, 40.43, 40.28, 40.40, 40.54, 40.40, 40.41, 40.20, 40.32 ms. Throughput: 271.82 iter/sec.
Timings for 4096K FFT length (12 cpus, 12 workers): 43.45, 43.37, 43.20, 43.15, 42.98, 43.17, 43.25, 43.07, 43.12, 42.96, 43.32, 42.88 ms. Throughput: 278.03 iter/sec.
Timings for 4096K FFT length (13 cpus, 13 workers): 47.06, 46.63, 46.46, 46.44, 46.11, 46.20, 46.29, 46.31, 46.12, 46.05, 46.38, 46.48, 46.00 ms. Throughput: 280.49 iter/sec.
Timings for 4096K FFT length (14 cpus, 14 workers): 50.55, 50.42, 50.01, 51.65, 49.51, 49.57, 49.87, 49.93, 49.65, 49.37, 49.67, 49.74, 49.73, 50.00 ms. Throughput: 280.16 iter/sec.

Above is a quick partial copy and paste from an earlier run. Might be more interesting to try with fewer workers...

All cores were capped to 2.3 GHz, HT disabled. I'm running non-ECC, single rank, quad channel ram at 2133 (no ram OC possible). I've heard elsewhere that ECC may come at some performance impact but I don't have data either way on that. Even if so it would not significantly affect things here. I'm not aware of any differences between SODIMM and regular sized ones of similar specifications other than the physical aspects obviously.

I think if anyone is considering building a farm around prime finding activities, the cheap but higher end v3 Xeons on ebay could be a consideration. Specifically aim for the v3 models as FMA instruction helps a lot over the even cheaper E5-2670 for example. There were also some cheaper 12 core models too. They're all low clock, but the number of cores makes up for it, and for multi-threaded tasks they offer nice throughput. If these had been around at the time, I would have skipped on the 6600k and 6700k boxes I did earlier as the Xeons would give ball park 50% more performance for a similar overall cost.
mackerel is offline   Reply With Quote
Old 2016-06-12, 04:51   #3
axn
 
axn's Avatar
 
Jun 2003

12F816 Posts
Default

With 35MB L3 cache, a single 4M FFT would fit in completely, so a 1w14t might give even higher thruput
axn is online now   Reply With Quote
Old 2016-06-13, 18:36   #4
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·5·7·47 Posts
Default

Quote:
Originally Posted by axn View Post
With 35MB L3 cache, a single 4M FFT would fit in completely, so a 1w14t might give even higher thruput
At the 4M FFT size, you will definitely have degradation trying to run 14 workers at once.

Stick with 1 or 2 workers... I use 1 worker per CPU using all the cores and that works well for me. Do the benchmark again and enable the option to have it do multiple cores per worker and you should see the throughput is going to be optimized with 1 worker, 14 threads.

I'm not really sure why the benchmark values for "14 CPUs/14 workers" is so far removed from what I see in a real world test, where memory contention between all those workers drags things to a crawl, but that's definitely been my experience.
Madpoo is offline   Reply With Quote
Old 2016-06-14, 12:38   #5
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2·3·29 Posts
Default

Please post the 4096K FFT benchmark with 1 worker/4(1620) or 14(2683) threads results.

Use the following in prime.txt to cut to the chase
Code:
MinBenchFFT=4096
MaxBenchFFT=4096
BenchTime=30
BenchMultithreads=1
BenchHyperthreads=0
OnlyBenchThroughput=1
OnlyBenchMaxCPUs=1
OnlyBench5678=0
BenchAllComplex=0

Last fiddled with by xtreme2k on 2016-06-14 at 12:44
xtreme2k is offline   Reply With Quote
Old 2016-06-14, 19:55   #6
bgbeuning
 
Dec 2014

3×5×17 Posts
Default xeon 1620 results

This is the output with those lines added to prime.txt

Quote:
Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
CPU speed: 3500.11 MHz, 4 hyperthreaded cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 10 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1

Timings for 4096K FFT length (4 cpus, 1 worker): 5.41 ms. Throughput: 184.95 iter/sec.
Timings for 4096K FFT length (4 cpus, 2 workers): 10.87, 10.78 ms. Throughput: 184.77 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 21.33, 21.52, 21.05, 21.21 ms. Throughput: 188.02 iter/sec.
bgbeuning is offline   Reply With Quote
Old 2016-06-15, 11:23   #7
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2×3×29 Posts
Default

My understanding is 188 it/s is a good speed for a Haswell-E quad core.
The board is amazing now looking at some reviews. I must admit DDR4 ECC SODIMM is not an easy find

Plenty of upgrade opportunities for the S2011-3 platform as well.

If you can source some cheap 2683v3/2675v3/2673v3 or v4s sky is the limit!
xtreme2k is offline   Reply With Quote
Old 2016-06-18, 10:32   #8
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

1100100112 Posts
Default

Just had a chance to do the testing also.

Code:
[Sat Jun 18 10:46:08 2016]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Genuine Intel(R) CPU @ 2.00GHz
CPU speed: 1861.06 MHz, 14 cores
CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 35 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 28.9, RdtscTiming=1

Timings for 4096K FFT length (14 cpus, 1 worker):  3.06 ms.  Throughput: 327.04 iter/sec.
Timings for 4096K FFT length (14 cpus, 2 workers):  7.05,  6.92 ms.  Throughput: 286.37 iter/sec.
Timings for 4096K FFT length (14 cpus, 7 workers): 24.36, 24.41, 24.23, 24.29, 24.26, 24.28, 24.32 ms.  Throughput: 288.00 iter/sec.
Timings for 4096K FFT length (14 cpus, 14 workers): 49.28, 49.22, 49.10, 49.18, 48.86, 48.91, 49.32, 48.79, 49.16, 48.94, 49.06, 48.97, 48.87, 48.79 ms.  Throughput: 285.53 iter/sec.
So... better than running multiple workers, but if you scale the single core results up to represent an unlimited peak possible rate, still way below that.

Code:
Timings for 4096K FFT length (14 cpus, 1 worker):  5.97 ms.  Throughput: 167.46 iter/sec.
Timings for 4096K FFT length (14 cpus, 2 workers):  6.39,  6.40 ms.  Throughput: 312.71 iter/sec.
Timings for 4096K FFT length (14 cpus, 7 workers): 24.80, 24.84, 24.81, 24.78, 24.77, 24.70, 24.69 ms.  Throughput: 282.58 iter/sec.
Timings for 4096K FFT length (14 cpus, 14 workers): 49.90, 49.91, 49.96, 50.04, 49.69, 49.78, 50.25, 49.76, 49.97, 49.72, 49.92, 49.85, 49.78, 49.67 ms.  Throughput: 280.72 iter/sec.
Interesting to note the 28.7 single worker run is below expectations.
mackerel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
2011's POTY Xyzzy Lounge 38 2012-06-19 12:36
Largest k*2^n-1 Primes in 2011 Kosmaj Riesel Prime Search 0 2012-01-01 16:52
End of the world May 21st, 2011? jasong Lounge 67 2011-05-30 04:15
Plans and goals for 2011 mdettweiler No Prime Left Behind 3 2010-11-05 18:55
How do I get British news channels over the web? jasong jasong 9 2007-09-26 11:21

All times are UTC. The time now is 07:24.

Mon Mar 1 07:24:19 UTC 2021 up 88 days, 3:35, 0 users, load averages: 1.72, 1.67, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.