mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-09-01, 03:27   #144
Phi7210
 
Sep 2016

110 Posts
Default

Quote:
Originally Posted by Madpoo View Post
That's correct... that's due to the x86 compatibility baked into the Atom cores and they have the full complement of AVX, SSE2, etc. support.

Like I mentioned though, running that way it's basically no different than a bunch of slow (1.3 GHz, in this case) cores.
The benchmarks on the pre-production Xeon Phi 7290 indicate that with the scalar workload (i.e. regular x86 software benchmarks, multi-threaded but not vectorized), it's about 3 times faster than Intel Xeon E5-2697 v4.

While most of the folks in this thread seem to have the intent of taking advantage of the ASX-512 and vectorized code (which would indeed yield the max throughput from Xeon Phi), my personal interest is simply making use of all Phi's 256 cores in a straightforward way, with conventional multi-threaded Java.
Phi7210 is offline   Reply With Quote
Old 2016-09-01, 16:04   #145
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2·7·47 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Well just as soon as the other few participants chip in we can find out!
David,
How many participants do you reckon is needed?
tServo is offline   Reply With Quote
Old 2016-09-01, 17:32   #146
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by tServo View Post
David,
How many participants do you reckon is needed?
At last tally, we had the following people in for $500

airsquirrels (david)
ewmayer (ernst)
ATH (andreas)
Madpoo (aaron)

If we could get another 4 ($4000 total) I can swing the difference.

I also could throw up a KickStarter/IndieGogo, or whichever platform people prefer and we could see if a broader group wants to help advance the state of prime95 , Mersenne location, etc. KL is niche now, but getting a headstart on AVX512 ultimately will speed the whole project.

Last fiddled with by airsquirrels on 2016-09-01 at 17:36
airsquirrels is offline   Reply With Quote
Old 2016-09-01, 20:18   #147
GP2
 
GP2's Avatar
 
Sep 2003

258510 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I also could throw up a KickStarter/IndieGogo, or whichever platform people prefer and we could see if a broader group wants to help advance the state of prime95 , Mersenne location, etc. KL is niche now, but getting a headstart on AVX512 ultimately will speed the whole project.
While I have not used it, I think GoFundMe would be a better fit than KickStarter/IndieGogo.

Kickstarter and IndieGogo tend to be for things like crowdfunding large-scale projects and products, for instance innovative tech gear, indie films, music albums, charitable causes, cultural projects, etc. There are often hundreds of contributors and the overwhelming majority do not know the project runners personally, or know about them beforehand.

On the other hand GoFundMe is for personal causes, usually financed by friends and family and acquaintances, and only occasionally some sympathetic stranger. For instance, funding a school trip, helping a bereaved family, paying for medical treatment.

One small drawback is that sites like this collect a fee, usually 5%.

Ernst mentioned PayPal or check, but not everyone trusts PayPal anymore, and not everyone has a supply of paper checks anymore, not to mention this isn't an option for anyone out of the US (cashing checks from other countries is difficult and costly and mostly impractical).

PS,
Right now the contributors include a small circle of developers who have their own projects that they want to try out, so they are motivated to move forward right away independently of Prime95. But many of us are basically solely interested in Prime95, and it does seem premature at least until the assembly language issues are determined to have been fully resolved and intentions have been clarified. If fundraising mentions Prime95, it creates expectations that development on it is ready to move forward at the present time, and it's just not clear that that's the case.
GP2 is offline   Reply With Quote
Old 2016-09-01, 20:29   #148
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

754310 Posts
Default

Are there any alternative solutions to consider?

Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code.
Prime95 is offline   Reply With Quote
Old 2016-09-01, 22:04   #149
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10058 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Are there any alternative solutions to consider?

Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code.
It is true that just AVX-512 work could be roughed in with a viable emulation tool and assembler, however I personally avoid developing any performance software (Android, iOS, etc.) on emulators or simulators if at all possible.

I agree it may be misleading to mention Prime95 development. What will actually happen at least initially is much more likely to be Prime95 benchmarking, performance tuning, and other research of the effects of HBM, high core counts, etc. I am sure some of us also just want a chance to play with the bleeding edge of Intel's tech.

It is worth saying that while the bulk of the Prime95 work comes from the army of run of the mill machines, there seem to be quite a few high-throughput users such as Madpoo, myself, etc. that are using fairly recent server/enterprise grade hardware and could reasonable get access to v5 Skylake Xeon's yet this year.
airsquirrels is offline   Reply With Quote
Old 2016-09-01, 22:31   #150
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

20516 Posts
Default

Ok, I set this up. If anyone has comments or wants anything changed let me know. I did mention mersenne.org/prime95 although hopefully not in a way that is misleading. I will also post this in a new thread if the folks here approve.

https://www.gofundme.com/KNL4NumberTheory

I also circulated this to a few good-willed people we may donate just to help.

As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease. Otherwise I'm happy to let someone else orchestrate.

Finally - if anyone here donating wants to arrange another, less fee-filled way to fund this let me know.

Last fiddled with by airsquirrels on 2016-09-01 at 22:31
airsquirrels is offline   Reply With Quote
Old 2016-09-01, 23:14   #151
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease.
Somewhat strangely, sometimes trusted people are attacked in their actions.

I would like to give you a +1 as being the leader in this action.
chalsall is offline   Reply With Quote
Old 2016-09-02, 00:00   #152
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2·7·47 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
As to credentials - if anyone here does not know or trust me to handle this for some reason PM me and I'll try to set your mind at ease. Otherwise I'm happy to let someone else orchestrate.

.
I applaud your efforts and am considering my level of commitment ( that has nothing to do with your trust, but just with my schedule, etc).

2 questions:

Have you considered the water cooled system? I know it's more expensive, but considering that most folk's goal is to peg this thing to the max for hours, it may be worth it. I have slowly come around to LaurV's way of thinking about system cooling. Since I moved to central Illinois years ago, it seems the climate has changed from "midwestern corn belt" to "tropical rain forest." I simply can't run many machines during the summer anymore.

I know this is early, but what would be the logistics for actually using this system wrt distributing the available time? It would probably have to be single-user-threaded since the Phi cannot be shared reasonably. I'm not trying to put anyone on the spot here but i'm just curious and don't want my expectations to exceed reality. I think a dialog on this topic would be healthy.

Last fiddled with by tServo on 2016-09-02 at 00:02
tServo is offline   Reply With Quote
Old 2016-09-02, 00:24   #153
Mysticial
 
Mysticial's Avatar
 
Sep 2016

14E16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Are there any alternative solutions to consider?

Maybe get access to the Intel tools and use their emulation software for development until Purley comes along in 2017? Or we might only need the emulation software if gcc / HJWASM generates AVX-512 code.

I've been following this thread for quite a while now and I see that's it's looking more interesting.

Intel's emulation tools are actually quite good for correctness testing. I've been using them to test a lot of AVX512 intrinsic code that I've accumulated over the past 3 years since Intel announced AVX512. The catch of course is overhead of the emulation: about 100 - 1000x. So while you're not doing any performance testing through the emulator, it's sufficient to run all your unit tests.

If you assume standard desktop CPU models for Skylake Purley, I'm confident it's possible to write code that will be fairly close to optimal without actually having the hardware. Then when the hardware does come out, you can fine-tune it. But I can't say the same about Knights Landing. Based on the recently released literature, KNL's execution core is so drastically different from the usual desktop core that it will be difficult to write optimal code for it without the hardware.

For one, KNL's OOE reorder window is significantly smaller than the desktop chips. So the old trick of relying on the CPU's OOE to parallelize across loop iterations with long dependency chains probably won't work that well. Not to mention that the FMA latency is 6 cycles as opposed to only 5/4 on Haswell/Skylake. This is a problem I run into even on Haswell. I have loops where the dependency chain is too long even for Haswell to sufficiently parallelize across iterations, but 16 registers is not enough to unroll it so it doesn't need to reorder as much. And that's where HT bails me out. So for KNL, expect to really work all 32 of those registers and the 4-wide HT.

Secondly, KNL has two VPUs for 2 FMAs/cycle throughput. But instruction decoding and dispatch is also only 2-wide. So if I'm interpreting the literature correctly, there will not be any "free" issue slots that can be used for loop counters and prefetching. So we might be entering the world of massive amounts of loop-unrolling.

Back in 2014 when I was analyzing ICC's code generation for KNL, I noticed that it really liked to do redundant loads. For example, an untwiddled radix 2 butterfly might look like this in desktop code:

Code:
vmovapd zmm0, ZMMWORD PTR [rax]
vmovapd zmm1, ZMMWORD PTR [rbx]
vaddpd  zmm2, zmm0, zmm1
vsubpd  zmm1, zmm0, zmm1
But ICC likes to generate this for KNL instead:

Code:
vmovapd zmm0, ZMMWORD PTR [rax]
vaddpd  zmm2, zmm0, ZMMWORD PTR [rbx]
vsubpd  zmm1, zmm0, ZMMWORD PTR [rbx]
I've been wondering for a couple years why it would do that. Now it's almost obvious: It's one less instruction, and there are no "free" issue slots on KNL. It almost makes be wonder if it's worth using a gather-prefetch to simultaneously fetch 16 cache lines in one instruction as opposed to 16 normal prefetches.

Last fiddled with by Mysticial on 2016-09-02 at 01:20
Mysticial is offline   Reply With Quote
Old 2016-09-02, 00:30   #154
GP2
 
GP2's Avatar
 
Sep 2003

1010000110012 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
One other advantage of GoFundMe is that you receive all donations even if the goal in dollars was not reached, whereas KickStarter/Indiegogo cancel and refund in that case.

Quote:
Originally Posted by airsquirrels View Post
As to credentials - if anyone here does not know or trust me to handle this for some reason
Not an issue.

Quote:
Originally Posted by airsquirrels View Post
I will also post this in a new thread if the folks here approve.
Since you've gone ahead with it, might as well make a sticky thread.
GP2 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reservations ET_ Operazione Doppi Mersennes 495 2020-12-19 19:41
Reservations kar_bon Riesel Prime Data Collecting (k*2^n-1) 129 2016-09-05 09:23
Reservations? R.D. Silverman NFS@Home 15 2015-11-29 23:18
Intel Xeon Phi - Knights Corner BotXXX Hardware 16 2012-06-21 23:54
4-5M Reservations paulunderwood 3*2^n-1 Search 15 2008-06-08 03:29

All times are UTC. The time now is 06:48.


Fri Aug 6 06:48:24 UTC 2021 up 14 days, 1:17, 1 user, load averages: 3.00, 2.82, 2.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.