![]() |
|
|
#144 | |
|
Sep 2016
110 Posts |
Quote:
While most of the folks in this thread seem to have the intent of taking advantage of the ASX-512 and vectorized code (which would indeed yield the max throughput from Xeon Phi), my personal interest is simply making use of all Phi's 256 cores in a straightforward way, with conventional multi-threaded Java. |
|
|
|
|
|
|
#145 |
|
"Marv"
May 2009
near the Tannhäuser Gate
2·7·47 Posts |
|
|
|
|
|
|
#146 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
At last tally, we had the following people in for $500
airsquirrels (david) ewmayer (ernst) ATH (andreas) Madpoo (aaron) If we could get another 4 ($4000 total) I can swing the difference. I also could throw up a KickStarter/IndieGogo, or whichever platform people prefer and we could see if a broader group wants to help advance the state of prime95 , Mersenne location, etc. KL is niche now, but getting a headstart on AVX512 ultimately will speed the whole project. Last fiddled with by airsquirrels on 2016-09-01 at 17:36 |
|
|
|
|
|
#147 | |
|
Sep 2003
258510 Posts |
Quote:
Kickstarter and IndieGogo tend to be for things like crowdfunding large-scale projects and products, for instance innovative tech gear, indie films, music albums, charitable causes, cultural projects, etc. There are often hundreds of contributors and the overwhelming majority do not know the project runners personally, or know about them beforehand. On the other hand GoFundMe is for personal causes, usually financed by friends and family and acquaintances, and only occasionally some sympathetic stranger. For instance, funding a school trip, helping a bereaved family, paying for medical treatment. One small drawback is that sites like this collect a fee, usually 5%. Ernst mentioned PayPal or check, but not everyone trusts PayPal anymore, and not everyone has a supply of paper checks anymore, not to mention this isn't an option for anyone out of the US (cashing checks from other countries is difficult and costly and mostly impractical). PS, Right now the contributors include a small circle of developers who have their own projects that they want to try out, so they are motivated to move forward right away independently of Prime95. But many of us are basically solely interested in Prime95, and it does seem premature at least until the assembly language issues are determined to have been fully resolved and intentions have been clarified. If fundraising mentions Prime95, it creates expectations that development on it is ready to move forward at the present time, and it's just not clear that that's the case. |
|
|
|
|
|
|
#148 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
754310 Posts |
Are there any alternative solutions to consider?
Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code. |
|
|
|
|
|
#149 | |
|
"David"
Jul 2015
Ohio
10058 Posts |
Quote:
I agree it may be misleading to mention Prime95 development. What will actually happen at least initially is much more likely to be Prime95 benchmarking, performance tuning, and other research of the effects of HBM, high core counts, etc. I am sure some of us also just want a chance to play with the bleeding edge of Intel's tech. It is worth saying that while the bulk of the Prime95 work comes from the army of run of the mill machines, there seem to be quite a few high-throughput users such as Madpoo, myself, etc. that are using fairly recent server/enterprise grade hardware and could reasonable get access to v5 Skylake Xeon's yet this year. |
|
|
|
|
|
|
#150 |
|
"David"
Jul 2015
Ohio
20516 Posts |
Ok, I set this up. If anyone has comments or wants anything changed let me know. I did mention mersenne.org/prime95 although hopefully not in a way that is misleading. I will also post this in a new thread if the folks here approve.
https://www.gofundme.com/KNL4NumberTheory I also circulated this to a few good-willed people we may donate just to help. As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease. Otherwise I'm happy to let someone else orchestrate. Finally - if anyone here donating wants to arrange another, less fee-filled way to fund this let me know. Last fiddled with by airsquirrels on 2016-09-01 at 22:31 |
|
|
|
|
|
#151 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2·67·73 Posts |
Quote:
I would like to give you a +1 as being the leader in this action. |
|
|
|
|
|
|
#152 | |
|
"Marv"
May 2009
near the Tannhäuser Gate
2·7·47 Posts |
Quote:
2 questions: Have you considered the water cooled system? I know it's more expensive, but considering that most folk's goal is to peg this thing to the max for hours, it may be worth it. I have slowly come around to LaurV's way of thinking about system cooling. Since I moved to central Illinois years ago, it seems the climate has changed from "midwestern corn belt" to "tropical rain forest." I simply can't run many machines during the summer anymore. I know this is early, but what would be the logistics for actually using this system wrt distributing the available time? It would probably have to be single-user-threaded since the Phi cannot be shared reasonably. I'm not trying to put anyone on the spot here but i'm just curious and don't want my expectations to exceed reality. I think a dialog on this topic would be healthy. Last fiddled with by tServo on 2016-09-02 at 00:02 |
|
|
|
|
|
|
#153 | |
|
Sep 2016
14E16 Posts |
Quote:
I've been following this thread for quite a while now and I see that's it's looking more interesting. Intel's emulation tools are actually quite good for correctness testing. I've been using them to test a lot of AVX512 intrinsic code that I've accumulated over the past 3 years since Intel announced AVX512. The catch of course is overhead of the emulation: about 100 - 1000x. So while you're not doing any performance testing through the emulator, it's sufficient to run all your unit tests. If you assume standard desktop CPU models for Skylake Purley, I'm confident it's possible to write code that will be fairly close to optimal without actually having the hardware. Then when the hardware does come out, you can fine-tune it. But I can't say the same about Knights Landing. Based on the recently released literature, KNL's execution core is so drastically different from the usual desktop core that it will be difficult to write optimal code for it without the hardware. For one, KNL's OOE reorder window is significantly smaller than the desktop chips. So the old trick of relying on the CPU's OOE to parallelize across loop iterations with long dependency chains probably won't work that well. Not to mention that the FMA latency is 6 cycles as opposed to only 5/4 on Haswell/Skylake. This is a problem I run into even on Haswell. I have loops where the dependency chain is too long even for Haswell to sufficiently parallelize across iterations, but 16 registers is not enough to unroll it so it doesn't need to reorder as much. And that's where HT bails me out. So for KNL, expect to really work all 32 of those registers and the 4-wide HT. Secondly, KNL has two VPUs for 2 FMAs/cycle throughput. But instruction decoding and dispatch is also only 2-wide. So if I'm interpreting the literature correctly, there will not be any "free" issue slots that can be used for loop counters and prefetching. So we might be entering the world of massive amounts of loop-unrolling. Back in 2014 when I was analyzing ICC's code generation for KNL, I noticed that it really liked to do redundant loads. For example, an untwiddled radix 2 butterfly might look like this in desktop code: Code:
vmovapd zmm0, ZMMWORD PTR [rax] vmovapd zmm1, ZMMWORD PTR [rbx] vaddpd zmm2, zmm0, zmm1 vsubpd zmm1, zmm0, zmm1 Code:
vmovapd zmm0, ZMMWORD PTR [rax] vaddpd zmm2, zmm0, ZMMWORD PTR [rbx] vsubpd zmm1, zmm0, ZMMWORD PTR [rbx] Last fiddled with by Mysticial on 2016-09-02 at 01:20 |
|
|
|
|
|
|
#154 | ||
|
Sep 2003
1010000110012 Posts |
Quote:
Quote:
Since you've gone ahead with it, might as well make a sticky thread. |
||
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Reservations | ET_ | Operazione Doppi Mersennes | 495 | 2020-12-19 19:41 |
| Reservations | kar_bon | Riesel Prime Data Collecting (k*2^n-1) | 129 | 2016-09-05 09:23 |
| Reservations? | R.D. Silverman | NFS@Home | 15 | 2015-11-29 23:18 |
| Intel Xeon Phi - Knights Corner | BotXXX | Hardware | 16 | 2012-06-21 23:54 |
| 4-5M Reservations | paulunderwood | 3*2^n-1 Search | 15 | 2008-06-08 03:29 |