mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-09-04, 16:49   #166
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I believe I read that KNL does not support virtualization instructions.
Hmm... that would be troubling...

I did a quick Google and found this:
http://colfaxresearch.com/knl-avx512/

There's a spot on there where it lists the output of "cat /proc/cpuinfo" and "vmx" is listed (that link itself might prove useful since it talks about the 512-bit AVX)

On the other hand though, I went to the official Intel spec page, and you're right, it says no virtualization support. Boo!
http://ark.intel.com/products/94033/...30-GHz-64-core

That's disappointing since it means I wouldn't be able to use any of these for work (virtual hosting). Bummer.
Madpoo is offline   Reply With Quote
Old 2016-09-04, 18:23   #167
GP2
 
GP2's Avatar
 
Sep 2003

258510 Posts
Default

Quote:
Originally Posted by Madpoo View Post
On the other hand though, I went to the official Intel spec page, and you're right, it says no virtualization support.
So it sounds like this chip is specifically for the HPC niche market, and some other processor will be intended for servers. Few of us will actually use one, in home computing farms or cloud computing.
GP2 is offline   Reply With Quote
Old 2016-09-04, 20:39   #168
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1164710 Posts
Default

Quote:
Originally Posted by GP2 View Post
So it sounds like this chip is specifically for the HPC niche market, and some other processor will be intended for servers. Few of us will actually use one, in home computing farms or cloud computing.
Which is why the fact that - unlike previous Intel 'Xeon Phi specials' which proved evolutionary dead ends in this regard - the instruction set will also carry forward to future wider release, including the PC market was an absolute must for justifying raising money for early adoption of such a system.

As the funding page states, this system is aimed at the HPC-for-number-theory code-development folks. There are probably less than a dozen such who regularly post here, but that is the inherent nature of DC projects like GIMPS.
ewmayer is offline   Reply With Quote
Old 2016-09-04, 20:54   #169
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

Thanks to an incredibly attentive development team and a very small additional patch, I was able to get the latest version of HJWASM to assemble gwnum.a for linux successfully.

I am still working out some nuances in the rest of the packaging/build process for prime95 on my linux dev box but things do seem to be moving forward. There is now a way to build prime95 with support for AVX-512 instructions.
airsquirrels is offline   Reply With Quote
Old 2016-09-05, 20:18   #170
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Thanks to an incredibly attentive development team and a very small additional patch, I was able to get the latest version of HJWASM to assemble gwnum.a for linux successfully.

I am still working out some nuances in the rest of the packaging/build process for prime95 on my linux dev box but things do seem to be moving forward. There is now a way to build prime95 with support for AVX-512 instructions.
If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not.

I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past.

I can generate a list of suitable worktodo lines for you if you'd like.

Note that a custom build won't have the checksum or security code or whatever it is that an official "George" build would have, so it wouldn't be accepted by the server.
Madpoo is offline   Reply With Quote
Old 2016-09-05, 20:29   #171
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55628 Posts
Default

Quote:
Originally Posted by Madpoo View Post
If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not.
It causes spikes in the graphs.

Quote:
I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past.

I can generate a list of suitable worktodo lines for you if you'd like.
They're done up to 2.06M, actually. I usually run a few hundred to test new hardware.
Mark Rose is offline   Reply With Quote
Old 2016-09-05, 20:42   #172
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Unhappy

Quote:
Originally Posted by Madpoo View Post
If you get to a point where you want to test that build to make sure it's spitting out the same results as mprime would (sans any AVX-512 stuff of course...just to make sure it works equally as well) I'd suggest trying it out by doing triple checks of some small exponents... why not.

I've done triple-checks on all exponents below 2M (that didn't already have 3), and it's not a bad way to compare results of a new build to some tried & true residues from the past.

I can generate a list of suitable worktodo lines for you if you'd like.

Note that a custom build won't have the checksum or security code or whatever it is that an official "George" build would have, so it wouldn't be accepted by the server.
That's not a bad thought, easy enough to validate things.

I was aware of the security code issue. I suppose if I was malicious I could just make that work...

I don't expect to submit anything to primenet unless it is promoted up to a working build that has been through all the paces. Right now my goal was just to make things buildable on a KNL system so that I, or George, or whoever could poke at AVX-512.
airsquirrels is offline   Reply With Quote
Old 2016-09-06, 19:12   #173
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default preliminary KNL analysis

Intel says the 16GB HBM memory has 4x the bandwidth of the 3-channel DDR4 ram. FFT data will easily fit in 16GB, so the good news is we should be running out of HBM memory at all times. Compare KNL to a 4-core Skylake with 2-channel DDR4 ram, the KNL system will have 4x (HBM vs DDR4) times 1.5x (3-channel vs 2-channel), or 6x the memory bandwidth. Unfortunately, we have 6x memory bandwidth feeding 16x number of cores!

A Skylake system is hurting on memory bandwidth, the KNL is going to be downright starving. We're looking at roughly 33% FPU utilization.

I do not have any good ideas on reducing memory bandwidth requirements any further. The only option may be to run 64 cores of TF hyperthreaded alongside a 64 core FFT.
Prime95 is offline   Reply With Quote
Old 2016-09-06, 19:59   #174
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

I believe KL is 6 channel, not 3 channel.

The cores are based off Atom (Silvermont) and are clocked slower, but have 4 hyperthreads each. The chip we're getting probably runs at 1.3 GHz. The CPU we're getting probably only has 64 cores enabled. Edit: Each core will have double the FP.

Apparently the onboard 16 GB can deliver over 400 GB/s versus the 42 GB/s a Skylake with 2 channel DDR4-3200 gets.

If we consider a 4 GHz Skylake, we have roughly 10.4 times the CPU, but we'll have about 10 times the bandwidth.

So it might not be so bad.

Edit: Updated to reflect comments from ldesnogu

Last fiddled with by Mark Rose on 2016-09-06 at 20:30
Mark Rose is offline   Reply With Quote
Old 2016-09-06, 20:10   #175
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

10001001102 Posts
Default

Skylake can do 16 DP FLOPs/cycle. So 4 cores at 4 GHz will give 256 GFLOPs/s.

KNL is 32 DP FLOPs/cycle. So 64 cores at 1.3 GHz will give 2662 GFLOPs/s. That's more than 10 times a 4-core Skylake, not 5.
ldesnogu is offline   Reply With Quote
Old 2016-09-06, 20:20   #176
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

16FE16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Intel says the 16GB HBM memory has 4x the bandwidth of the 3-channel DDR4 ram. FFT data will easily fit in 16GB, so the good news is we should be running out of HBM memory at all times. Compare KNL to a 4-core Skylake with 2-channel DDR4 ram, the KNL system will have 4x (HBM vs DDR4) times 1.5x (3-channel vs 2-channel), or 6x the memory bandwidth. Unfortunately, we have 6x memory bandwidth feeding 16x number of cores!

A Skylake system is hurting on memory bandwidth, the KNL is going to be downright starving. We're looking at roughly 33% FPU utilization.

I do not have any good ideas on reducing memory bandwidth requirements any further. The only option may be to run 64 cores of TF hyperthreaded alongside a 64 core FFT.
16x cores which will be a fair bit slower per core at least before avx512. Is there any point in developing avx512 code while the current systems are so memory bound? Can we actually expect any improvement?

Didn't you mention a while back that it may be possible to reduce memory bandwidth by using integer ffts rather than floating point. This may be an inaccurate memory.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reservations ET_ Operazione Doppi Mersennes 495 2020-12-19 19:41
Reservations kar_bon Riesel Prime Data Collecting (k*2^n-1) 129 2016-09-05 09:23
Reservations? R.D. Silverman NFS@Home 15 2015-11-29 23:18
Intel Xeon Phi - Knights Corner BotXXX Hardware 16 2012-06-21 23:54
4-5M Reservations paulunderwood 3*2^n-1 Search 15 2008-06-08 03:29

All times are UTC. The time now is 06:48.


Fri Aug 6 06:48:25 UTC 2021 up 14 days, 1:17, 1 user, load averages: 3.00, 2.82, 2.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.