mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Open Projects > y-cruncher

Reply
 
Thread Tools
Old 2019-08-30, 21:02   #12
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3·53 Posts
Default

Quote:
Originally Posted by Mysticial View Post
Was Superfetch enabled? I've seen weird things with the memory compression that it does. But not like this.
I'll have to check after a visit to Google to find out what it is...

Quote:
48 GB is probably prohibitive. Is there anyway you can produce just a mini-dump instead of the full dump. This is Windows right?
When I right-click on the process it only gave one option, and I'm not aware of any other setting for it.

Win10 1903, just like the other system.

I've started cloning a Win7 pro install from my only system left with it. Would still be interesting to see regardless. Edit: win7 wont boot, which isn't entirely surprising. So that's out.

Last fiddled with by mackerel on 2019-08-30 at 21:32
mackerel is offline   Reply With Quote
Old 2019-08-30, 22:22   #13
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3·53 Posts
Default

Superfetch doesn't seem to exist in the version of Win10 I'm using, at least the service isn't listed.

I tried running 10b again, and to my surprise this time it seemed to go without any weirdness. I did it again after, and got the same time to within <0.01 seconds. Maybe Windows is doing something hidden in the background, which isn't being listed in any of the process reports. Another day I'll try to repeat that with the Xeon I had trouble with last time. The "fix" might be to leave the system on but idle for some time.

With an overclock, I think I'm in with a good chance of beating your 14 core time on hwbot :)
mackerel is offline   Reply With Quote
Old 2019-09-02, 07:34   #14
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3·53 Posts
Default

I went back to the Xeon where I saw the same problem originally. Left it running for some hours on a different task before doing a 10b run, and got what looks like a clean time. It also near enough matched one of the runs from my previous session so I didn't get any improvement.

Also...

7. 5min 24sec 197ms Mysticial 3604 MHz Intel Core i9 7940X AIO
8. 5min 50sec 295ms mackerel 3700 MHz Intel Core i9 7920X Air

I'm just under thermal throttling at this setting. A couple of cores get much hotter than others and it looks like they drop down a bit in clock when it hits 109C. I didn't want to run in that state for any amount of time, so didn't leave it to continue at higher settings. Would y-cruncher be let's say significantly sub-optimal if some cores were different speeds?

At some point I'll get chilled water on it, and haven't decided if I should do a delid yet.

I've also wondered if there exists software (preferably under Windows) that can work out core to core latency, as well as core to ram on a particular IMC. Individual cores, not aggregate results. I'm thinking that could be used to deduce the logical, and therefore physical topology of the CPU and it would be interesting to see if the hot cores are because they're surrounded by other hot cores, or if they're indicative of uneven thermal interface. I think I saw software for linux that measured core to core latency but forget its name.
mackerel is offline   Reply With Quote
Old 2019-09-02, 14:52   #15
Stargate38
 
Stargate38's Avatar
 
"Daniel Jackson"
May 2011
14285714285714285714

22·149 Posts
Default

Nowadays, it's called SysMain.
Stargate38 is offline   Reply With Quote
Old 2019-09-09, 07:03   #16
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7·47 Posts
Default

Quote:
Originally Posted by mackerel View Post
Would y-cruncher be let's say significantly sub-optimal if some cores were different speeds?
In the power-of-two core-count case, the workload has near perfect static load-balancing. Thus, any form of "negative" imbalance may have a disproportionate effect on performance due to stragglers. "Negative" here would be something like a core that's slower than the rest, or other forms of jitter. "Positive" imbalance is the opposite and probably won't be felt at all.

In the non-power-of-two case, static load-balancing isn't possible, so y-cruncher will double up the task decomposition to spam the CPU with lots of smaller work units - thus relying on dynamic load-balancing. While it decreases efficiency, it also makes it more resistant to jitter and imbalanced hardware such as what you describe.


Quote:
I've also wondered if there exists software (preferably under Windows) that can work out core to core latency, as well as core to ram on a particular IMC. Individual cores, not aggregate results. I'm thinking that could be used to deduce the logical, and therefore physical topology of the CPU and it would be interesting to see if the hot cores are because they're surrounded by other hot cores, or if they're indicative of uneven thermal interface. I think I saw software for linux that measured core to core latency but forget its name.
I've tried this on Linux. Core-to-core latency is very noisy and is highly sensitive to where the synchronization variable is located in the L3. While there are measurable differences in latency between different pairs of cores, they aren't precise enough to deduce where on the die the core is. Likewise temperatures are even more noisy. By stressing cores individually, you can kind of tell which other cores the heat will spill to. But as in the case of the latencies, temperature data isn't precise enough.

A method which does work (credit to a colleague of mine) was to examine the hardware counters which can be used to determine which direction coherency traffic goes. I don't remember the details, but by testing every pair of cores and examining the direction of the coherency traffic on the mesh, he was able to determine the physical locations of every core. And by elimination, it also determines where the dead cores are.

Last fiddled with by Mysticial on 2019-09-09 at 07:11
Mysticial is offline   Reply With Quote
Old 2019-09-09, 11:49   #17
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

5678 Posts
Default

Quote:
Originally Posted by Mysticial View Post
I've tried this on Linux. Core-to-core latency is very noisy and is highly sensitive to where the synchronization variable is located in the L3. While there are measurable differences in latency between different pairs of cores, they aren't precise enough to deduce where on the die the core is. Likewise temperatures are even more noisy. By stressing cores individually, you can kind of tell which other cores the heat will spill to. But as in the case of the latencies, temperature data isn't precise enough.
https://pcper.com/2017/06/the-intel-...ssor-review/3/
I wonder how testing like the above was carried out. There seems to be some observable difference which I was hoping would be sufficient for this exercise. Even if noisy, could it be worked around in some way? For example, when doing P95 benching, I run multiple times and take the best result. Not average, not mean. My thinking is, anything unwanted getting in the way would make things slower. There shouldn't be any mechanism that makes it faster (which would be more interesting if it exists!). So with multiple runs, you should converge towards best case.

Edit: on further look, at 3rd chart in link, the differences don't appear constant. So while there is variation, it is probably more random than I was thinking.

Last fiddled with by mackerel on 2019-09-09 at 11:55
mackerel is offline   Reply With Quote
Old 2019-09-09, 16:31   #18
Mysticial
 
Mysticial's Avatar
 
Sep 2016

14916 Posts
Default

Quote:
Originally Posted by mackerel View Post
https://pcper.com/2017/06/the-intel-...ssor-review/3/Edit: on further look, at 3rd chart in link, the differences don't appear constant. So while there is variation, it is probably more random than I was thinking.
From what I've seen (and maybe it's just a symptom of the test), the variation is huge. Individual latencies have a standard deviation comparable to the mean. And the location of the sync variable in the L3 can affect latencies by as much as 2x.

The L3 is NUMA, and cachelines are hashed across all of them. Which means that if you're ping-ponging data between two cores, the coherency traffic will need to travel to the portion of the L3 where the data sits - IOW a 3rd core. So for every pair of cores, if you test random addresses and histogram the latencies, you will get up to N spikes where N is the # of cores. If the test is correct and you get less than N spikes, it's usually because some of the spikes are overlapping. Conversely, if the die has cache enabled on disabled cores, you may get more than N spikes.

So what you're testing isn't just 2 cores, but 2 cores + the L3 of an unknown 3rd core. If you manage to get the sync variable in the L3 of one of the cores you're testing, there may be some asymmetry depending on which core it's on.

That said, if you completely isolate all this data, you'll have much more consistent results that may be enough to infer physical locations. But I haven't tried. And it's probably non-trivial to say the least.

Last fiddled with by Mysticial on 2019-09-09 at 16:34
Mysticial is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
records for primes 3.14159 Information & Answers 8 2018-12-09 00:08
Records for complete factorisation Brian-E Math 25 2009-12-16 21:40
gmp-ecm records page question yqiang GMP-ECM 6 2007-05-18 12:23
What happens to our existing records when we form a team? kwstone Teams 5 2005-05-06 03:38
Records in January wblipp ElevenSmooth 10 2004-03-22 01:26

All times are UTC. The time now is 22:20.

Thu May 28 22:20:27 UTC 2020 up 64 days, 19:53, 0 users, load averages: 1.26, 1.51, 1.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.