mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2015-11-14, 08:03   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·7·461 Posts
Default Well that was a bit odd

A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it.

Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours.

So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times.

Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
fivemack is offline   Reply With Quote
Old 2015-11-14, 08:17   #2
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

I can't speak to the cause, but I have this little shell script laying around:


Code:
bill@Gravemind⌚0214 ~/bin ∰∂ which swap
/home/bill/bin/swap
bill@Gravemind⌚0214 ~/bin ∰∂ cat swap
#!/bin/bash
sudo cache
sudo swapoff -a
sudo swapon -a
bill@Gravemind⌚0214 ~/bin ∰∂ which cache
/usr/bin/cache
bill@Gravemind⌚0214 ~/bin ∰∂ cat `!!`
cat `which cache`
#! /bin/bash
if [ -z $1 ]; then
	derp=3
else
	derp=$1
fi
sync
echo $derp > /proc/sys/vm/drop_caches
bill@Gravemind⌚0214 ~/bin ∰∂ man swapoff
# http://linux-mm.org/Drop_Caches
Dubslow is offline   Reply With Quote
Old 2015-11-14, 19:52   #3
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3·11·101 Posts
Default

Quote:
Originally Posted by fivemack View Post
A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it.

Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours.

So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times.

Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
Maybe the massive disk swapping was creating a lot of interrupts that basically nuked any CPU intensive tasks?

I know this is anecdotal, but I've seen instances where a full AV scan (during the middle of the workday... who would schedule that by a policy?) made even simple things like scrolling a text window go horribly slow. Strange interactions, my friend... strange interactions.
Madpoo is offline   Reply With Quote
Old 2015-11-14, 23:44   #4
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·7·461 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Maybe the massive disk swapping was creating a lot of interrupts that basically nuked any CPU intensive tasks?

I know this is anecdotal, but I've seen instances where a full AV scan (during the middle of the workday... who would schedule that by a policy?) made even simple things like scrolling a text window go horribly slow. Strange interactions, my friend... strange interactions.
I can entirely understand massive swapping making everything very slow while it's happening.

What I don't understand is why the jobs would still be slow after the memory pressure had gone away - surely once their working set is swapped back in it's all going to be fine. I guess it's possible that something in the OS is insufficiently NUMA-aware, and the pages in the working set were reloaded into physical memory which wasn't directly attached to the core running the process - but that's so essential a thing for an OS to do that I really wouldn't expect contemporary-Linux not to do it.
fivemack is offline   Reply With Quote
Old 2015-11-14, 23:59   #5
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

3·2,837 Posts
Default

The SysRq key might have been useful.

Even if it wasn't, it would be interesting!

Xyzzy is offline   Reply With Quote
Old 2015-11-15, 00:20   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

144668 Posts
Default

http://blog.jcole.us/2010/09/28/mysq...-architecture/ is the post I remember reading about this class of mystery. I clearly should have poked around in the bits of /proc which let root see something like the virtual-to-physical mapping for a given process.

The problem with sysrq is my very limited desire to go out to the outbuilding in the rain to press a physical key on a physical keyboard (a physical keyboard which is covered with sticky aphid ooze, because a plant grew into the out-building and aphids started to eat it)

Last fiddled with by fivemack on 2015-11-15 at 00:23
fivemack is offline   Reply With Quote
Old 2015-11-15, 00:50   #7
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

3×2,837 Posts
Default

A few weeks ago we lost the wireless connection to our (headless) NUC. We decided to plug our desktop's keyboard into the NUC and do the SysRq reboot blind. The NUC sits on top of the desktop. Both are running "long-term" jobs. The keyboard and mouse plugs are identical in appearance and there is very little room behind our desktop. Of course, we unplugged the mouse connector from the desktop and plugged it into the NUC. When we did the magic reboot sequence our desktop rebooted. Thankfully, there are savefiles for the "long-term" jobs and our browser remembers tabs. But we lost our 50+ day uptime.

Xyzzy is offline   Reply With Quote
Old 2015-11-15, 00:53   #8
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

3·2,837 Posts
Default

We don't run a swap file or partition. Occassionally we overallocate memory and the OOM thingie zaps some stuff.

With as much memory as you have, do you need swap?
Xyzzy is offline   Reply With Quote
Old 2015-11-15, 07:16   #9
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Default

Quote:
Originally Posted by fivemack View Post
http://blog.jcole.us/2010/09/28/mysq...-architecture/ is the post I remember reading about this class of mystery. I clearly should have poked around in the bits of /proc which let root see something like the virtual-to-physical mapping for a given process.

The problem with sysrq is my very limited desire to go out to the outbuilding in the rain to press a physical key on a physical keyboard (a physical keyboard which is covered with sticky aphid ooze, because a plant grew into the out-building and aphids started to eat it)
In my experience, the kernel doesn't necessarily immediately deswap what it can when the pressure's gone, hence my little shell script above to force it to clear the swap.
Dubslow is offline   Reply With Quote
Old 2015-11-15, 09:15   #10
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We don't run a swap file or partition. Occassionally we overallocate memory and the OOM thingie zaps some stuff.

With as much memory as you have, do you need swap?
I have occasionally stuck on a 50GB swap partition ('sudo swapon /scratch/50GB') so that I could run a 60GB job by just 'kill -STOP' the rest of the activity on the machine, letting it swap out, running the big job, and then 'kill -CONT' and let it swap back in.

But mostly I have swap because installers provide swap by default and I'm never quite confident with the LVM runes required to get rid of it.

The OOM killer used to be rather indiscriminate but does now seem to pick the obvious target and zap it.
fivemack is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 21:48.


Fri Sep 30 21:48:42 UTC 2022 up 43 days, 19:17, 0 users, load averages: 1.73, 1.51, 1.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔