![]() |
[QUOTE=Madpoo;441142]That's correct... that's due to the x86 compatibility baked into the Atom cores and they have the full complement of AVX, SSE2, etc. support.
Like I mentioned though, running that way it's basically no different than a bunch of slow (1.3 GHz, in this case) cores. [/QUOTE] The benchmarks on the pre-production Xeon Phi 7290 indicate that with the scalar workload (i.e. regular x86 software benchmarks, multi-threaded but not vectorized), it's about 3 times faster than Intel Xeon E5-2697 v4. While most of the folks in this thread seem to have the intent of taking advantage of the ASX-512 and vectorized code (which would indeed yield the max throughput from Xeon Phi), my personal interest is simply making use of all Phi's 256 cores in a straightforward way, with conventional multi-threaded Java. |
[QUOTE=airsquirrels;441174]Well just as soon as the other few participants chip in we can find out![/QUOTE]
David, How many participants do you reckon is needed? |
[QUOTE=tServo;441330]David,
How many participants do you reckon is needed?[/QUOTE] At last tally, we had the following people in for $500 airsquirrels (david) ewmayer (ernst) ATH (andreas) Madpoo (aaron) If we could get another 4 ($4000 total) I can swing the difference. I also could throw up a KickStarter/IndieGogo, or whichever platform people prefer and we could see if a broader group wants to help advance the state of prime95 , Mersenne location, etc. KL is niche now, but getting a headstart on AVX512 ultimately will speed the whole project. |
[QUOTE=airsquirrels;441336]I also could throw up a KickStarter/IndieGogo, or whichever platform people prefer and we could see if a broader group wants to help advance the state of prime95 , Mersenne location, etc. KL is niche now, but getting a headstart on AVX512 ultimately will speed the whole project.[/QUOTE]
While I have not used it, I think GoFundMe would be a better fit than KickStarter/IndieGogo. Kickstarter and IndieGogo tend to be for things like crowdfunding large-scale projects and products, for instance innovative tech gear, indie films, music albums, charitable causes, cultural projects, etc. There are often hundreds of contributors and the overwhelming majority do not know the project runners personally, or know about them beforehand. On the other hand GoFundMe is for personal causes, usually financed by friends and family and acquaintances, and only occasionally some sympathetic stranger. For instance, funding a school trip, helping a bereaved family, paying for medical treatment. One small drawback is that sites like this collect a fee, usually 5%. Ernst mentioned PayPal or check, but not everyone trusts PayPal anymore, and not everyone has a supply of paper checks anymore, not to mention this isn't an option for anyone out of the US (cashing checks from other countries is difficult and costly and mostly impractical). PS, Right now the contributors include a small circle of developers who have their own projects that they want to try out, so they are motivated to move forward right away independently of Prime95. But many of us are basically solely interested in Prime95, and it does seem premature at least until the assembly language issues are determined to have been fully resolved and intentions have been clarified. If fundraising mentions Prime95, it creates expectations that development on it is ready to move forward at the present time, and it's just not clear that that's the case. |
Are there any alternative solutions to consider?
Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code. |
[QUOTE=Prime95;441351]Are there any alternative solutions to consider?
Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code.[/QUOTE] It is true that just AVX-512 work could be roughed in with a viable emulation tool and assembler, however I personally avoid developing any performance software (Android, iOS, etc.) on emulators or simulators if at all possible. I agree it may be misleading to mention Prime95 development. What will actually happen at least initially is much more likely to be Prime95 benchmarking, performance tuning, and other research of the effects of HBM, high core counts, etc. I am sure some of us also just want a chance to play with the bleeding edge of Intel's tech. It is worth saying that while the bulk of the Prime95 work comes from the army of run of the mill machines, there seem to be quite a few high-throughput users such as Madpoo, myself, etc. that are using fairly recent server/enterprise grade hardware and could reasonable get access to v5 Skylake Xeon's yet this year. |
Ok, I set this up. If anyone has comments or wants anything changed let me know. I did mention mersenne.org/prime95 although hopefully not in a way that is misleading. I will also post this in a new thread if the folks here approve.
[url]https://www.gofundme.com/KNL4NumberTheory[/url] I also circulated this to a few good-willed people we may donate just to help. As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease. Otherwise I'm happy to let someone else orchestrate. Finally - if anyone here donating wants to arrange another, less fee-filled way to fund this let me know. |
[QUOTE=airsquirrels;441364]As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease.[/QUOTE]
Somewhat strangely, sometimes trusted people are attacked in their actions. I would like to give you a +1 as being the leader in this action. |
[QUOTE=airsquirrels;441364]
As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease. Otherwise I'm happy to let someone else orchestrate. .[/QUOTE] I applaud your efforts and am considering my level of commitment ( that has nothing to do with your trust, but just with my schedule, etc). 2 questions: Have you considered the water cooled system? I know it's more expensive, but considering that most folk's goal is to peg this thing to the max for hours, it may be worth it. I have slowly come around to LaurV's way of thinking about system cooling. Since I moved to central Illinois years ago, it seems the climate has changed from "midwestern corn belt" to "tropical rain forest." I simply can't run many machines during the summer anymore. I know this is early, but what would be the logistics for actually using this system wrt distributing the available time? It would probably have to be single-user-threaded since the Phi cannot be shared reasonably. I'm not trying to put anyone on the spot here but i'm just curious and don't want my expectations to exceed reality. I think a dialog on this topic would be healthy. |
[QUOTE=Prime95;441351]Are there any alternative solutions to consider?
Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code.[/QUOTE] I've been following this thread for quite a while now and I see that's it's looking more interesting. Intel's emulation tools are actually quite good for correctness testing. I've been using them to test a lot of AVX512 intrinsic code that I've accumulated over the past 3 years since Intel announced AVX512. The catch of course is overhead of the emulation: about 100 - 1000x. So while you're not doing any performance testing through the emulator, it's sufficient to run all your unit tests. If you assume standard desktop CPU models for Skylake Purley, I'm confident it's possible to write code that will be fairly close to optimal without actually having the hardware. Then when the hardware does come out, you can fine-tune it. But I can't say the same about Knights Landing. Based on the recently released literature, KNL's execution core is so drastically different from the usual desktop core that it will be difficult to write optimal code for it without the hardware. For one, KNL's OOE reorder window is significantly smaller than the desktop chips. So the old trick of relying on the CPU's OOE to parallelize across loop iterations with long dependency chains probably won't work that well. Not to mention that the FMA latency is 6 cycles as opposed to only 5/4 on Haswell/Skylake. This is a problem I run into even on Haswell. I have loops where the dependency chain is too long even for Haswell to sufficiently parallelize across iterations, but 16 registers is not enough to unroll it so it doesn't need to reorder as much. And that's where HT bails me out. So for KNL, expect to really work all 32 of those registers and the 4-wide HT. Secondly, KNL has two VPUs for 2 FMAs/cycle throughput. But instruction decoding and dispatch is also only 2-wide. So if I'm interpreting the literature correctly, there will not be any "free" issue slots that can be used for loop counters and prefetching. So we might be entering the world of massive amounts of loop-unrolling. Back in 2014 when I was analyzing ICC's code generation for KNL, I noticed that it really liked to do redundant loads. For example, an untwiddled radix 2 butterfly might look like this in desktop code: [CODE]vmovapd zmm0, ZMMWORD PTR [rax] vmovapd zmm1, ZMMWORD PTR [rbx] vaddpd zmm2, zmm0, zmm1 vsubpd zmm1, zmm0, zmm1[/CODE]But ICC likes to generate this for KNL instead: [CODE]vmovapd zmm0, ZMMWORD PTR [rax] vaddpd zmm2, zmm0, ZMMWORD PTR [rbx] vsubpd zmm1, zmm0, ZMMWORD PTR [rbx][/CODE]I've been wondering for a couple years why it would do that. Now it's almost obvious: It's one less instruction, and there are no "free" issue slots on KNL. It almost makes be wonder if it's worth using a gather-prefetch to simultaneously fetch 16 cache lines in one instruction as opposed to 16 normal prefetches. |
[QUOTE=airsquirrels;441364]
[url]https://www.gofundme.com/KNL4NumberTheory[/url][/QUOTE] One other advantage of GoFundMe is that you receive all donations even if the goal in dollars was not reached, whereas KickStarter/Indiegogo cancel and refund in that case. [QUOTE=airsquirrels;441364]As to credentials - if anyone here does not know or trust me to handle this for some reason[/QUOTE] Not an issue. [QUOTE=airsquirrels;441364]I will also post this in a new thread if the folks here approve.[/QUOTE] Since you've gone ahead with it, might as well make a sticky thread. |
| All times are UTC. The time now is 06:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.