![]() |
![]() |
#177 | |
Sep 2016
2×5×37 Posts |
![]() Quote:
IOW, I doubt you can just throw old code at KNL with some tweaks and expect it to perform. I imagine most of the stuff would need to be redesigned from bottom-up. But of course the latter option is likely prohibitive in development costs. Last fiddled with by Mysticial on 2016-09-19 at 15:45 |
|
![]() |
![]() |
![]() |
#178 | |
Sep 2016
19 Posts |
![]() Quote:
Almost every other supercomputer center is going to shy away from KNL mainly for the overhead cost of optimizing the code to work on a very narrow range of specific hardware. To answer your optimization question. My co-worker with a PhD in computer science who has been optimizing and installing code on our systems for decades spent several weeks on a couple applications and was quite disappointed with the results. |
|
![]() |
![]() |
![]() |
#179 | |
Jan 2008
France
22×149 Posts |
![]() Quote:
FWIW I strongly believe it's a dev/support hell to extract a lot of perf from KNL. But I also believe that a single project (such as mprime or mlucas) can get the most of it (at a large cost in dev). |
|
![]() |
![]() |
![]() |
#180 |
"Ben"
Feb 2007
7×13×41 Posts |
![]()
https://xkcd.com/1205/
Edit: Is it not also a dev/support hell to extract a lot of perf from CUDA? I would argue the performance per unit effort is higher with KNL... but maybe I'm biased. Last fiddled with by bsquared on 2016-09-19 at 17:15 |
![]() |
![]() |
![]() |
#181 | |
"David"
Jul 2015
Ohio
11·47 Posts |
![]() Quote:
Would I buy a facility full of just KNL chips? Not for generic workload. If I owned a shipping company I also wouldn't buy an entire fleet of car-haulers and expect to take contracts for various types of shipping. If my business is shipping cars and I rarely use a flatbed or box trailer I just might. |
|
![]() |
![]() |
![]() |
#182 | |
Sep 2016
37010 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#183 | |
Jan 2008
France
22×149 Posts |
![]() Quote:
I just never bought the Intel marketing that wants you to believe that since it's x86 and comes with excellent dev tools, it will be a piece of cake ![]() @airsquirrels: yes and no. Data arrangement, preloads, decode limitations, etc. heavily depend on the particular micro-architecture and core/caches/RAM topology, and this tuning usually requires a lot more dev time than using new instructions, and is not easily applicable to a radically different CPU (such as upcoming Xeon). But I agree with you, the work being done here is very interesting and will open the door for future AVX-512 implementations. I am just not (yet?) convinced this will prove a good value for money/power. I only wished I had more time to play with that beast, it looks so sexy... |
|
![]() |
![]() |
![]() |
#184 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
11010000110002 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#185 | |
Sep 2016
19 Posts |
![]() Quote:
Titans entire compute is 5 million something CUDA cores. nVidia's aim is clear with things like nv-link. I honestly wouldn't be surprised if KNL goes the way of Itanium. |
|
![]() |
![]() |
![]() |
#186 |
∂2ω=0
Sep 2002
República de California
267538 Posts |
![]()
Re. the "which was hardar? Cuda or KNL-specific optimizations?" I refer the interested reader to Oliver Weihe's many-thousands-of-posts mfaktc dev-thread ... yeah, cuda-dev was really easy! /sarc
In my case, a few months of work to ||ize and many-thread-capabilize (new word!) my TF code using broadly supported and standardized pthreading primitives, my very first TF build on KNL yields a result that is quite good compared to current best-of-breed code/gpu combos, and the prophets of doom are already crowing loudly. Heck, they may prove correct, but it is wildly premature to early to make such blanket prophecies, IMO. ================= Re. the 250,000x performance hit I reported yesterday for first-look TF-using-AVX-vector-float-math, I think I know what's going on ... the added clue lay in going back and carefully examining the onscreen output of my dismally slow TF run of MM31 to 50 bits. That should have caught the smallest factor with k = 68745, but didn't. Further comparative examination of the int64 and float-based modpow routines shows the former properly having multithread support - in terms of the one-time-init code section of same alloc'ing nthread times as much local memory as needed by a single thread and then pointing each user thread to its own chunk of that as it comes - but the float-based routines lack this. I.e. in || mode we have multiple threads reading-from/writing-to the same block of local memory. The reads are perhaps not so bad, but the write collisions would definitely seem to account for the symptomology in question, that is, massive slowdown and corrupted results. Why didn't the same run fail in the self-tests which get done prior to the TFing? Because those are done 1-threaded. So when I ||ized my factoring code last year it seems I only added || support for the int64-based modpow routines and left the floating-point ones for later. Later being now. Last fiddled with by ewmayer on 2016-09-20 at 00:31 |
![]() |
![]() |
![]() |
#187 | |
Serpentine Vermin Jar
Jul 2014
5·677 Posts |
![]() Quote:
Even though a faster 4+ socket system in a 2U form factor could run circles around a single 1U KNL box, the price of a single such unit is far in excess of what any of us common folks could absorb. For supercomputing needs, having a 4-socket system in a small space is great for power density, and when you're building a HPC you often start out with what your needs are and then work backwards to the price. Sweet work if you can get it. ![]() |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR development version 3.8.7 is available! | Jean Penné | Software | 39 | 2012-04-27 12:33 |
LLR 3.8.5 Development version | Jean Penné | Software | 6 | 2011-04-28 06:21 |
Do you have a dedicated system for gimps? | Surge | Hardware | 5 | 2010-12-09 04:07 |
Query - Running GIMPS on a 4 way system | Unregistered | Hardware | 6 | 2005-07-04 04:27 |
System tweaks to speed GIMPS | Uncwilly | Software | 46 | 2004-02-05 09:38 |