mersenneforum.org Well that was a bit odd
 Register FAQ Search Today's Posts Mark Forums Read

 2015-11-14, 08:03 #1 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2·7·461 Posts Well that was a bit odd A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it. Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours. So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times. Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
 2015-11-14, 08:17 #2 Dubslow Basketry That Evening!     "Bunslow the Bold" Jun 2011 40 /proc/sys/vm/drop_caches bill@Gravemind⌚0214 ~/bin ∰∂ man swapoff # http://linux-mm.org/Drop_Caches`
2015-11-14, 19:52   #3
Serpentine Vermin Jar

Jul 2014

3·11·101 Posts

Quote:
 Originally Posted by fivemack A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it. Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours. So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times. Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
Maybe the massive disk swapping was creating a lot of interrupts that basically nuked any CPU intensive tasks?

I know this is anecdotal, but I've seen instances where a full AV scan (during the middle of the workday... who would schedule that by a policy?) made even simple things like scrolling a text window go horribly slow. Strange interactions, my friend... strange interactions.

2015-11-14, 23:44   #4
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

2·7·461 Posts

Quote:
 Originally Posted by Madpoo Maybe the massive disk swapping was creating a lot of interrupts that basically nuked any CPU intensive tasks? I know this is anecdotal, but I've seen instances where a full AV scan (during the middle of the workday... who would schedule that by a policy?) made even simple things like scrolling a text window go horribly slow. Strange interactions, my friend... strange interactions.
I can entirely understand massive swapping making everything very slow while it's happening.

What I don't understand is why the jobs would still be slow after the memory pressure had gone away - surely once their working set is swapped back in it's all going to be fine. I guess it's possible that something in the OS is insufficiently NUMA-aware, and the pages in the working set were reloaded into physical memory which wasn't directly attached to the core running the process - but that's so essential a thing for an OS to do that I really wouldn't expect contemporary-Linux not to do it.

 2015-11-14, 23:59 #5 Xyzzy     Aug 2002 3·2,837 Posts The SysRq key might have been useful. Even if it wasn't, it would be interesting!
 2015-11-15, 00:20 #6 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 144668 Posts http://blog.jcole.us/2010/09/28/mysq...-architecture/ is the post I remember reading about this class of mystery. I clearly should have poked around in the bits of /proc which let root see something like the virtual-to-physical mapping for a given process. The problem with sysrq is my very limited desire to go out to the outbuilding in the rain to press a physical key on a physical keyboard (a physical keyboard which is covered with sticky aphid ooze, because a plant grew into the out-building and aphids started to eat it) Last fiddled with by fivemack on 2015-11-15 at 00:23
 2015-11-15, 00:50 #7 Xyzzy     Aug 2002 3×2,837 Posts A few weeks ago we lost the wireless connection to our (headless) NUC. We decided to plug our desktop's keyboard into the NUC and do the SysRq reboot blind. The NUC sits on top of the desktop. Both are running "long-term" jobs. The keyboard and mouse plugs are identical in appearance and there is very little room behind our desktop. Of course, we unplugged the mouse connector from the desktop and plugged it into the NUC. When we did the magic reboot sequence our desktop rebooted. Thankfully, there are savefiles for the "long-term" jobs and our browser remembers tabs. But we lost our 50+ day uptime.
 2015-11-15, 00:53 #8 Xyzzy     Aug 2002 3·2,837 Posts We don't run a swap file or partition. Occassionally we overallocate memory and the OOM thingie zaps some stuff. With as much memory as you have, do you need swap?
2015-11-15, 07:16   #9
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts

Quote:
 Originally Posted by fivemack http://blog.jcole.us/2010/09/28/mysq...-architecture/ is the post I remember reading about this class of mystery. I clearly should have poked around in the bits of /proc which let root see something like the virtual-to-physical mapping for a given process. The problem with sysrq is my very limited desire to go out to the outbuilding in the rain to press a physical key on a physical keyboard (a physical keyboard which is covered with sticky aphid ooze, because a plant grew into the out-building and aphids started to eat it)
In my experience, the kernel doesn't necessarily immediately deswap what it can when the pressure's gone, hence my little shell script above to force it to clear the swap.

2015-11-15, 09:15   #10
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

2×7×461 Posts

Quote:
 Originally Posted by Xyzzy We don't run a swap file or partition. Occassionally we overallocate memory and the OOM thingie zaps some stuff. With as much memory as you have, do you need swap?
I have occasionally stuck on a 50GB swap partition ('sudo swapon /scratch/50GB') so that I could run a 60GB job by just 'kill -STOP' the rest of the activity on the machine, letting it swap out, running the big job, and then 'kill -CONT' and let it swap back in.

But mostly I have swap because installers provide swap by default and I'm never quite confident with the LVM runes required to get rid of it.

The OOM killer used to be rather indiscriminate but does now seem to pick the obvious target and zap it.

All times are UTC. The time now is 21:48.

Fri Sep 30 21:48:42 UTC 2022 up 43 days, 19:17, 0 users, load averages: 1.73, 1.51, 1.31