![]() |
|
|
#166 | |
|
Serpentine Vermin Jar
Jul 2014
331310 Posts |
Quote:
I did a quick Google and found this: http://colfaxresearch.com/knl-avx512/ There's a spot on there where it lists the output of "cat /proc/cpuinfo" and "vmx" is listed (that link itself might prove useful since it talks about the 512-bit AVX) On the other hand though, I went to the official Intel spec page, and you're right, it says no virtualization support. Boo! http://ark.intel.com/products/94033/...30-GHz-64-core That's disappointing since it means I wouldn't be able to use any of these for work (virtual hosting). Bummer. |
|
|
|
|
|
|
#167 |
|
Sep 2003
258510 Posts |
So it sounds like this chip is specifically for the HPC niche market, and some other processor will be intended for servers. Few of us will actually use one, in home computing farms or cloud computing.
|
|
|
|
|
|
#168 | |
|
∂2ω=0
Sep 2002
República de California
1164710 Posts |
Quote:
As the funding page states, this system is aimed at the HPC-for-number-theory code-development folks. There are probably less than a dozen such who regularly post here, but that is the inherent nature of DC projects like GIMPS. |
|
|
|
|
|
|
#169 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
Thanks to an incredibly attentive development team and a very small additional patch, I was able to get the latest version of HJWASM to assemble gwnum.a for linux successfully.
I am still working out some nuances in the rest of the packaging/build process for prime95 on my linux dev box but things do seem to be moving forward. There is now a way to build prime95 with support for AVX-512 instructions. |
|
|
|
|
|
#170 | |
|
Serpentine Vermin Jar
Jul 2014
3,313 Posts |
Quote:
![]() I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past. I can generate a list of suitable worktodo lines for you if you'd like. Note that a custom build won't have the checksum or security code or whatever it is that an official "George" build would have, so it wouldn't be accepted by the server. |
|
|
|
|
|
|
#171 | ||
|
"/X\(‘-‘)/X\"
Jan 2013
55628 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#172 | |
|
"David"
Jul 2015
Ohio
11·47 Posts |
Quote:
I was aware of the security code issue. I suppose if I was malicious I could just make that work... I don't expect to submit anything to primenet unless it is promoted up to a working build that has been through all the paces. Right now my goal was just to make things buildable on a KNL system so that I, or George, or whoever could poke at AVX-512. |
|
|
|
|
|
|
#173 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Intel says the 16GB HBM memory has 4x the bandwidth of the 3-channel DDR4 ram. FFT data will easily fit in 16GB, so the good news is we should be running out of HBM memory at all times. Compare KNL to a 4-core Skylake with 2-channel DDR4 ram, the KNL system will have 4x (HBM vs DDR4) times 1.5x (3-channel vs 2-channel), or 6x the memory bandwidth. Unfortunately, we have 6x memory bandwidth feeding 16x number of cores!
A Skylake system is hurting on memory bandwidth, the KNL is going to be downright starving. We're looking at roughly 33% FPU utilization. I do not have any good ideas on reducing memory bandwidth requirements any further. The only option may be to run 64 cores of TF hyperthreaded alongside a 64 core FFT. |
|
|
|
|
|
#174 |
|
"/X\(‘-‘)/X\"
Jan 2013
2·5·293 Posts |
I believe KL is 6 channel, not 3 channel.
The cores are based off Atom (Silvermont) and are clocked slower, but have 4 hyperthreads each. The chip we're getting probably runs at 1.3 GHz. The CPU we're getting probably only has 64 cores enabled. Edit: Each core will have double the FP. Apparently the onboard 16 GB can deliver over 400 GB/s versus the 42 GB/s a Skylake with 2 channel DDR4-3200 gets. If we consider a 4 GHz Skylake, we have roughly 10.4 times the CPU, but we'll have about 10 times the bandwidth. So it might not be so bad. Edit: Updated to reflect comments from ldesnogu Last fiddled with by Mark Rose on 2016-09-06 at 20:30 |
|
|
|
|
|
#175 |
|
Jan 2008
France
10001001102 Posts |
Skylake can do 16 DP FLOPs/cycle. So 4 cores at 4 GHz will give 256 GFLOPs/s.
KNL is 32 DP FLOPs/cycle. So 64 cores at 1.3 GHz will give 2662 GFLOPs/s. That's more than 10 times a 4-core Skylake, not 5. |
|
|
|
|
|
#176 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
16FE16 Posts |
Quote:
Didn't you mention a while back that it may be possible to reduce memory bandwidth by using integer ffts rather than floating point. This may be an inaccurate memory. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Reservations | ET_ | Operazione Doppi Mersennes | 495 | 2020-12-19 19:41 |
| Reservations | kar_bon | Riesel Prime Data Collecting (k*2^n-1) | 129 | 2016-09-05 09:23 |
| Reservations? | R.D. Silverman | NFS@Home | 15 | 2015-11-29 23:18 |
| Intel Xeon Phi - Knights Corner | BotXXX | Hardware | 16 | 2012-06-21 23:54 |
| 4-5M Reservations | paulunderwood | 3*2^n-1 Search | 15 | 2008-06-08 03:29 |