![]() |
[QUOTE=airsquirrels;441532]I believe I read that KNL does not support virtualization instructions.[/QUOTE]
Hmm... that would be troubling... I did a quick Google and found this: [URL="http://colfaxresearch.com/knl-avx512/"]http://colfaxresearch.com/knl-avx512/[/URL] There's a spot on there where it lists the output of "cat /proc/cpuinfo" and "vmx" is listed (that link itself might prove useful since it talks about the 512-bit AVX) On the other hand though, I went to the official Intel spec page, and you're right, it says no virtualization support. Boo! [URL="http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core"]http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core[/URL] That's disappointing since it means I wouldn't be able to use any of these for work (virtual hosting). Bummer. |
[QUOTE=Madpoo;441542]On the other hand though, I went to the official Intel spec page, and you're right, it says no virtualization support.[/QUOTE]
So it sounds like this chip is specifically for the HPC niche market, and some other processor will be intended for servers. Few of us will actually use one, in home computing farms or cloud computing. |
[QUOTE=GP2;441546]So it sounds like this chip is specifically for the HPC niche market, and some other processor will be intended for servers. Few of us will actually use one, in home computing farms or cloud computing.[/QUOTE]
Which is why the fact that - unlike previous Intel 'Xeon Phi specials' which proved evolutionary dead ends in this regard - the instruction set will also carry forward to future wider release, including the PC market was an absolute must for justifying raising money for early adoption of such a system. As the funding page states, this system is aimed at the HPC-for-number-theory code-development folks. There are probably less than a dozen such who regularly post here, but that is the inherent nature of DC projects like GIMPS. |
Thanks to an incredibly attentive development team and a very small additional patch, I was able to get the latest version of HJWASM to assemble gwnum.a for linux successfully.
I am still working out some nuances in the rest of the packaging/build process for prime95 on my linux dev box but things do seem to be moving forward. There is now a way to build prime95 with support for AVX-512 instructions. |
[QUOTE=airsquirrels;441562]Thanks to an incredibly attentive development team and a very small additional patch, I was able to get the latest version of HJWASM to assemble gwnum.a for linux successfully.
I am still working out some nuances in the rest of the packaging/build process for prime95 on my linux dev box but things do seem to be moving forward. There is now a way to build prime95 with support for AVX-512 instructions.[/QUOTE] If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not. :smile: I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past. I can generate a list of suitable worktodo lines for you if you'd like. Note that a custom build won't have the checksum or security code or whatever it is that an official "George" build would have, so it wouldn't be accepted by the server. |
[QUOTE=Madpoo;441664]If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not. :smile:[/quote]
It causes spikes in the [url=http://www.mersenne.org/primenet/graphs.php]graphs[/url]. [quote]I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past. I can generate a list of suitable worktodo lines for you if you'd like.[/QUOTE] They're done up to 2.06M, actually. I usually run a few hundred to test new hardware. |
[QUOTE=Madpoo;441664]If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not. :smile:
I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past. I can generate a list of suitable worktodo lines for you if you'd like. Note that a custom build won't have the checksum or security code or whatever it is that an official "George" build would have, so it wouldn't be accepted by the server.[/QUOTE] That's not a bad thought, easy enough to validate things. I was aware of the security code issue. I suppose if I was malicious I could just make that work... I don't expect to submit anything to primenet unless it is promoted up to a working build that has been through all the paces. Right now my goal was just to make things buildable on a KNL system so that I, or George, or whoever could poke at AVX-512. |
preliminary KNL analysis
Intel says the 16GB HBM memory has 4x the bandwidth of the 3-channel DDR4 ram. FFT data will easily fit in 16GB, so the good news is we should be running out of HBM memory at all times. Compare KNL to a 4-core Skylake with 2-channel DDR4 ram, the KNL system will have 4x (HBM vs DDR4) times 1.5x (3-channel vs 2-channel), or 6x the memory bandwidth. Unfortunately, we have 6x memory bandwidth feeding 16x number of cores!
A Skylake system is hurting on memory bandwidth, the KNL is going to be downright starving. We're looking at roughly 33% FPU utilization. I do not have any good ideas on reducing memory bandwidth requirements any further. The only option may be to run 64 cores of TF hyperthreaded alongside a 64 core FFT. |
I believe KL is 6 channel, not 3 channel.
The cores are based off Atom (Silvermont) and are clocked slower, but have 4 hyperthreads each. The chip we're getting probably runs at 1.3 GHz. The CPU we're getting probably only has 64 cores enabled. Edit: Each core will have double the FP. Apparently the onboard 16 GB can deliver over 400 GB/s versus the 42 GB/s a Skylake with 2 channel DDR4-3200 gets. If we consider a 4 GHz Skylake, we have roughly 10.4 times the CPU, but we'll have about 10 times the bandwidth. So it might not be so bad. Edit: Updated to reflect comments from ldesnogu |
Skylake can do 16 DP FLOPs/cycle. So 4 cores at 4 GHz will give 256 GFLOPs/s.
KNL is 32 DP FLOPs/cycle. So 64 cores at 1.3 GHz will give 2662 GFLOPs/s. That's more than 10 times a 4-core Skylake, not 5. |
[QUOTE=Prime95;441759]Intel says the 16GB HBM memory has 4x the bandwidth of the 3-channel DDR4 ram. FFT data will easily fit in 16GB, so the good news is we should be running out of HBM memory at all times. Compare KNL to a 4-core Skylake with 2-channel DDR4 ram, the KNL system will have 4x (HBM vs DDR4) times 1.5x (3-channel vs 2-channel), or 6x the memory bandwidth. Unfortunately, we have 6x memory bandwidth feeding 16x number of cores!
A Skylake system is hurting on memory bandwidth, the KNL is going to be downright starving. We're looking at roughly 33% FPU utilization. I do not have any good ideas on reducing memory bandwidth requirements any further. The only option may be to run 64 cores of TF hyperthreaded alongside a 64 core FFT.[/QUOTE] 16x cores which will be a fair bit slower per core at least before avx512. Is there any point in developing avx512 code while the current systems are so memory bound? Can we actually expect any improvement? Didn't you mention a while back that it may be possible to reduce memory bandwidth by using integer ffts rather than floating point. This may be an inaccurate memory. |
| All times are UTC. The time now is 06:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.