![]() |
|
|
#12 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
Xeon would have advantages. It would allow you to get a head start on developing for AVX512.
You could potentially use 1 motherboard for the same number of cores. 1 power supply. 1 os. Simpler case etc. You have looked a 8x2 arrangement. What about 4x4? Are 4 socket systems really expensive? |
|
|
|
|
|
#13 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
2·7·461 Posts |
Four-socket Xeon systems are amazingly expensive, and the processors available for them are quite slow (e.g. 12 2.1GHz HSW cores for \$3800).
The Skylake Xeons that you'd need for AVX512 development are unlikely to be available before Spring 2017. I have an account somewhere containing some money intended to turn into a dual 12-core SKL Xeon machine in mid-2017, but will have to save fairly intently to get there by then. |
|
|
|
|
|
#14 |
|
"Curtis"
Feb 2005
Riverside, CA
585410 Posts |
|
|
|
|
|
|
#15 |
|
Einyen
Dec 2003
Denmark
22×863 Posts |
Did you consider Haswell-E like the 5820K ? or the equivalent Xeon LGA 2011-v3 processors? The quad channel DDR4 is really good for LL-testing as my Dual vs Quad channel tests showed.
On my 5960X it is most efficient to use all 8 cores on a single exponent rather than running several at once. 62M exponent at 3360K FFT took 38.5 hours. Last fiddled with by ATH on 2015-12-30 at 19:54 |
|
|
|
|
|
#16 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
For your consideration, here is some raw data 11 different cpu speeds vs 2 different ram speeds
|
|
|
|
|
|
#17 | |
|
Serpentine Vermin Jar
Jul 2014
2×13×131 Posts |
Quote:
That equates to exponents in the 36M'ish range which is in the middle of that "sweet spot" that I've found for running multiple workers with minimal memory thrashing. I'm not sure what exponent range the 4M FFT size is used... I guess 70-72M? That's well beyond where I start to see serious degradation when multiple workers are using that size (for me it was generally anything above 58M, whatever FFT size that would be). In other words, what works good for optimal throughput at one FFT size won't be as good for other FFT sizes. You may get better throughput using multiple cores on one worker so that you're stressing the CPU more and the memory won't start bottlenecking. |
|
|
|
|
|
|
#18 |
|
"Jeff"
Feb 2012
St. Louis, Missouri, USA
100100001012 Posts |
Have you thought about harnessing the power of the Dark Side?
http://robot6.comicbookresources.com...s-a-gaming-pc/ |
|
|
|
|
|
#19 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
4M FFTs are used for 73M to 78M. This is near where my itx box would be doing a lot of first time LL testing. |
|
|
|
|
|
|
#20 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
The good news is that I believe I can extrapolate from the data the most cost effective build at today's hardware prices and my estimated electric rates. Details to follow. |
|
|
|
|
|
|
#21 | |
|
Dec 2014
FF16 Posts |
Quote:
[Thu Dec 10 17:40:00 2015] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz CPU speed: 2464.28 MHz, 4 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 6 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.7, RdtscTiming= Timings for 4096K FFT length (1 cpu, 1 worker): 24.18 ms. Throughput: 41.36 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 24.70, 24.51 ms. Throughput: 81.28 iter/sec. Timings for 4096K FFT length (3 cpus, 3 workers): 26.83, 26.82, 26.83 ms. Throughput: 111.84 iter/sec. Timings for 4096K FFT length (4 cpus, 4 workers): 30.77, 30.77, 30.88, 30.87 ms. Throughput: 129.77 iter/sec. |
|
|
|
|
|
|
#22 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
100000010101112 Posts |
This is how I went about deciding my optimal dream build. Let's start with a base line 5 CPU system using overclocked memory:
5 ASRock Z170M-ITX/ac motherboards @130 = 650 5 2x4GB DDR4-3200 @60= 300 5 I5-6600 CPUs (3.3GHz, 65W) @230 = 1150 1 Samsung 850 EVO SSD @90 = 90 4 PicoPSU picoPSU-120 @40 = 160 1 Case, power supply, network switch -- approximate value $$100 Each of the 5 units will consume 65W CPU, 4W memory, 15W(?) mobo or about 425W total. Add in 15% power supply inefficiency for a total of 500W at the wall. Total cost of 3 year ownership = 2450 parts + 3 * 500 = 3950 Total cost of 4 year ownership = 2450 parts + 4 * 500 = 4450 Now lets guess the throughput of this system using the Haswell data posted earlier. A 2.2GHz Haswell with DDR3-2133 gets 131.8 thoughput. In this system, each CPU will run 50% faster (3.3GHz vs. 2.2GHz) with 50% faster memory (DDR4-3200 vs. DDR3-2133). Thus 131.8 + 50% = 197.7. Actually should be better than that since Skylake CPU is slightly more efficient than a Haswell CPU. But we'll leave the expected throughput number at 197.7 Now lets define a metric to optimize -- expected throughput per dollar (TPD). 3 year TPD = 5 CPUs * 197.7 * 3 years / 3950 = 0.7508 4 year TPD = 5 CPUs * 197.7 * 4 years / 4450 = 0.8885 Let's compare that to a second system built with cheaper motherboards that do not allow overclocking. We will save $60 for each motherboard and $20 for each RAM pair, for a total of $400. Expected throughput for each CPU is 165.4 (that is what a 3.4GHz Haswell gets using DDR3-2133). Now let's look at our TPD metric: 3 year TPD = 5 CPUs * 165.4 * 3 years / 3550 = 0.6989 4 year TPD = 5 CPUs * 165.4 * 4 years / 4050 = 0.8168 Not nearly as good as the previous system. Now we'll try a cheaper 3.2 GHz CPU in the base system. This saves 25 dollars per CPU. Expected TPD won't go down much probably to 193 or 194. 3 year TPD = 5 CPUs * 193.5 * 3 years / 3825 = 0.7588 4 year TPD = 5 CPUs * 193.5 * 4 years / 4325 = 0.8948 That's better than the first system How about overclocking? Each K-series CPU will cost $50 more. I assume power draw is proportial to frequency and the square of the voltage. As an example, lets target a 200MHz frequency increase. I'll assume a frequency increase requires a tiny voltage bump. Thus, CPU power goes from 65W to 65 * 3.5/3.3 * (1.17/1.15)^2 or an increase of 7.5W when taking power supply ineffiency. This is less than the 91W TDP listed for K-series CPUs. A 5% increase in throughput is a fairly generous assumption -- 197.7 * 1.05 is 207.6. 3 year TPD = 5 CPUs * 207.6 * 3 years / (3950+5*50+5*7.5*3) = 0.7221 4 year TPD = 5 CPUs * 207.6 * 4 years / (4450+5*50+5*7.5*4) = 0.8561 Not worth the money. Conclusion: It is best to create an overclocked memory system using the cheaper I5-6500 locked processor. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| A dream, will stay a dream ( new Nvidia Quadro) | firejuggler | GPU Computing | 0 | 2018-03-28 16:02 |
| @ George | Gordon | GMP-ECM | 2 | 2017-09-04 04:05 |
| Dream Build | cappy95833 | Hardware | 10 | 2014-03-29 15:02 |
| Dream PC | plandon | Hardware | 39 | 2009-08-30 09:36 |
| He had a dream | fetofs | Puzzles | 8 | 2006-07-09 09:33 |