![]() |
![]() |
#122 |
"Curtis"
Feb 2005
Riverside, CA
562210 Posts |
![]()
Post deleted, mlucas author answered better than I did.
Last fiddled with by VBCurtis on 2016-09-16 at 04:36 |
![]() |
![]() |
![]() |
#123 | |
Sep 2016
19 Posts |
![]() Quote:
As far as key issues, use ssh -i ~/.ssh/id_rsa or whatever key you created. Your id_rsa.pub key needs to be in your login users directory on the remote machine under the ~/.ssh/authorized_keys file. If you want to do a 'ssh -vvvvv' and PM it to me, I can take a look at it for you. 100 iterations of M77597293 with FFT length 4194304 = 4096 K Res64: 8CC30E314BF3E556. AvgMaxErr = 0.293024554. MaxErr = 0.328125000. Program: E14.1 Res mod 2^36 = 5569242454 Res mod 2^35 - 1 = 22305398329 Res mod 2^36 - 1 = 64001568053 Clocks = 00:00:02.610 / **************************************************************************** / Done ... Edit: Missed a 0 1000 iterations of M77597293 with FFT length 4194304 = 4096 K Res64: 5F87421FA9DD8F1F. AvgMaxErr = 0.292703043. MaxErr = 0.343750000. Program: E14.1 Res mod 2^36 = 67274379039 Res mod 2^35 - 1 = 26302807323 Res mod 2^36 - 1 = 54919604018 Clocks = 00:00:23.097 Last fiddled with by xathor on 2016-09-16 at 05:13 |
|
![]() |
![]() |
![]() |
#124 |
∂2ω=0
Sep 2002
República de California
1175510 Posts |
![]()
@xathor:
Thanks - still waiting to hear from the sysadmin before trying other ssh stuff. Good, you got a build - you def. want to use 1000 (or even 10000) iterations with multiple threads. You'll probably need the enhanced affinity-selection stuff I just added to my dev-branch code for decent || scaling, but it'll be interesting to see what the dumb-affinity mode of the current release does on KNL, anyway. |
![]() |
![]() |
![]() |
#125 | |
Aug 2010
Republic of Belarus
2628 Posts |
![]() Quote:
![]() Could you please do Trial Factoring? Very interesting to see the result. As far i know mprime using AVX for TF job?! |
|
![]() |
![]() |
![]() |
#126 |
"David"
Jul 2015
Ohio
11·47 Posts |
![]()
Sorry guys, I had some obligations yesterday evening that prevented me from debugging the SSH access. I should have that resolved today and the rest of the accounts setup. It sounds like we have another system we can test with now as wel, which is great.
To clarify, I have not seen the thread flip flop issue manifest itself on KNL, at least with mprime workload. My summary of performance is that it is realistically a nice fast system, but not quite as game changing as we thought it might be. At least not without software work. |
![]() |
![]() |
![]() |
#127 | |
Sep 2016
19 Posts |
![]() Quote:
Here is a Haswell (dual E5-2670v3 24c AVX2) for comparison: 1000 iterations of M77597293 with FFT length 4194304 = 4096 K Res64: 5F87421FA9DD8F1F. AvgMaxErr = 0.292735259. MaxErr = 0.343750000. Program: E14.1 Res mod 2^36 = 67274379039 Res mod 2^35 - 1 = 26302807323 Res mod 2^36 - 1 = 54919604018 Clocks = 00:00:09.605 / **************************************************************************** / Done ... Here is a Ivy-Bridge (dual E5-2670v2 20c AVX) for comparison: 1000 iterations of M77597293 with FFT length 4194304 = 4096 K Res64: 5F87421FA9DD8F1F. AvgMaxErr = 0.249028471. MaxErr = 0.312500000. Program: E14.1 Res mod 2^36 = 67274379039 Res mod 2^35 - 1 = 26302807323 Res mod 2^36 - 1 = 54919604018 Clocks = 00:00:08.712 / **************************************************************************** / Done ... Last fiddled with by xathor on 2016-09-16 at 14:23 |
|
![]() |
![]() |
![]() |
#128 | |
"Forget I exist"
Jul 2009
Dartmouth NS
2×3×23×61 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#129 | |
Serpentine Vermin Jar
Jul 2014
338510 Posts |
![]() Quote:
With Xeon Phi x200 you have the physical cores (64 on the lower-end models we're talking about). What you really have are 32 "tiles", each with 2 of those Atom cores, and each tile has it's own L1/L2 cache that the 2 cores share. Meanwhile, each of those 64 cores have a pair of vector processing units (VPUs) which I think is where we're getting the "128 cores" notion. In my mind I see that as analogous to the hyperthread integer pipeline in old Xeons...it's just more sophisticated now and can do floating point/AVX/SSE good things too. In theory (and hopefully in practice) it shouldn't matter which VPU on the core is handling the work, just like it doesn't matter currently which physical or HT core you're affining to on other chips, because it's really the same thing, just different pipeline. What *would* matter is how the CPUs map to the operating system. It might be better to have the 2 cores on the same tile, and the 4 VPUs on that same tile, to be working in the same thread (if it's a multithreaded worker). There's probably very little data the 4 threads would need to share with each other, but if, as you say, the CPU itself makes it's own decisions on which VPU will handle things, at least if they're sharing an L2 cache then it shouldn't matter. For that, it would be best to turn off the all-to-all L2 consistency since we'd be deliberately using affinity to keep core consistency during any operation. FYI, each core can handle 4 threads so it may appear to the OS as 256-cores, but really only 128 VPUs which matters the most for Mersenne prime hunting. I don't know what the capabilities are of the 4 threads per core... I'm guessing those are mainly integer just like current hyperthreading. Useful for some things, not for others. Making sure Prime95 worker threads are mapping properly to cores with their own VPU, and keeping them affined to it, seems crucial. In your testing, were you able to do that successfully? It seems like that would be very dependent on the program itself... if the OS is handling CPU loads or letting the CPU itself assign cores to requests, it's probably not doing as good a job as if you micromanaged that aspect. ![]() |
|
![]() |
![]() |
![]() |
#130 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2×32×7×53 Posts |
![]()
Not quite. It is just another CPU state. An extra set of RAX-R15, RIP, RFLAGS etc.. So all the caches and computing resources are completely shared. It is just the register set, flags and current state that is duplicated. Not too dissimilar from standard software threading, just done at the hardware level instead.
|
![]() |
![]() |
![]() |
#131 | |
Sep 2016
19 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#132 |
Jan 2008
France
22×149 Posts |
![]()
Knight Corner was round robin for sure. Are you sure that still is the case? In a way it would make sense since the Silvermont cores used in KNL were not designed with HT in mind.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR development version 3.8.7 is available! | Jean Penné | Software | 39 | 2012-04-27 12:33 |
LLR 3.8.5 Development version | Jean Penné | Software | 6 | 2011-04-28 06:21 |
Do you have a dedicated system for gimps? | Surge | Hardware | 5 | 2010-12-09 04:07 |
Query - Running GIMPS on a 4 way system | Unregistered | Hardware | 6 | 2005-07-04 04:27 |
System tweaks to speed GIMPS | Uncwilly | Software | 46 | 2004-02-05 09:38 |