mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-10-07, 07:45   #1
Frailie60
 
Oct 2018

1 Posts
Default Well that was a bit odd

A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it.

Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours.

So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times.

Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
Frailie60 is offline   Reply With Quote
Old 2018-10-07, 08:25   #2
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

133678 Posts
Default

Quote:
Originally Posted by Frailie60 View Post
A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it.

Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours.

So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times.

Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
IME Linux is stupid when it comes to memory allocations for anything other than a basic system with one CPU.
retina is online now   Reply With Quote
Old 2018-10-08, 11:41   #3
jnml
 
Feb 2012
Prague, Czech Republ

22×41 Posts
Default

Quote:
Originally Posted by retina View Post
IME Linux is stupid when it comes to memory allocations for anything other than a basic system with one CPU.
AFAICT it's quite the opposite. The kernel assumes the last high memory pressure is likely
to repeat soon and shrinks the swap-in-use size only after that does not happen for some
time. Meanwhile several effects kick in making things slower.

It's a tradeoff that works better in high overall resource demand scenarios compared to releasing things as soon as not needed.
jnml is offline   Reply With Quote
Old 2018-10-08, 12:46   #4
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

10110111101112 Posts
Default

Quote:
Originally Posted by jnml View Post
... shrinks the swap-in-use size only after that does not happen for some
time.
What is the "some time"? Is it configurable?
retina is online now   Reply With Quote
Old 2018-10-08, 12:50   #5
jnml
 
Feb 2012
Prague, Czech Republ

2448 Posts
Default

Quote:
Originally Posted by retina View Post
What is the "some time"? Is it configurable?
IDK, my machine seems to take few minutes to get back to nearly normal after a big spike in
memory usage (using almost all swap space).

But the swap file does not go completely back to zero, only down to ~30%. Maybe some
time later it shrinks even more.
jnml is offline   Reply With Quote
Old 2018-10-08, 14:36   #6
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

22×103 Posts
Default

Based on my experience with Linux, the kernel doesn't make much of an effort to move memory back to RAM until the program swapped out actually needs to be used. On my computer (8 GB RAM, 24 GB swap), I had a huge spike in memory usage yesterday which sent my swap usage to a little over 2 GB. My RAM usage has gone back down to 4.8 GB, but it's still using 1.8 GB of swap space.
Happy5214 is offline   Reply With Quote
Old 2018-10-08, 17:37   #7
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2·5,099 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
Based on my experience with Linux, the kernel doesn't make much of an effort to move memory back to RAM until the program swapped out actually needs to be used.
Isn't that exactly what it should do?

Swapping in something which isn't going to be run just uses up memory which could be used by something which is running.

Anyway, swapping behavior can be configured with sysctl(8). Details left as an exercise in reading TFM. If I had to find out for myself, I don't see why you shouldn't educate yourself
xilman is offline   Reply With Quote
Old 2018-10-09, 13:39   #8
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

2·5·11·71 Posts
Default

https://en.wikipedia.org/wiki/Swappiness

Xyzzy is offline   Reply With Quote
Old 2018-10-09, 15:52   #9
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·7·139 Posts
Default

Quote:
Originally Posted by Frailie60 View Post
A few days ago I accidentally allocated seventy gigabytes from a process running on my 64GB 48-core Opteron machine. The process died and I felt nothing of it.

Except that I noticed that other processes which had been running at the time were now running bizarrely slowly - I was getting 300ms rather than 65ms iteration time from mprime, some gnfs-lasieve4I14e jobs were suggesting they would take a week to run whilst ones on adjacent ranges were expecting less than 24 hours.

So far so odd; muttering something about NUMA I killed the mprime process and restarted. And it continued to have 300ms iteration times.

Everything was fixed by a reboot, but I don't understand how forcing the machine into swap would have these persistent bad consequences.
In a NUMA system each CPU has fast access to memory attached to it and slower access to memory attached to other CPUs. So you get better performance if a process is using memory attached to the CPU it's running on. If it gets swapped out, then brought back into memory on other CPUs it will run slower. And Linux probably isn't smart enough to migrate processes back into memory on the CPUs they run on once they have been pushed out by something else.

I don't know the details but that's probably why thing ran slower until you rebooted.

Chris
chris2be8 is offline   Reply With Quote
Old 2018-10-09, 16:00   #10
jnml
 
Feb 2012
Prague, Czech Republ

22·41 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
And Linux probably isn't smart enough to migrate processes back into memory on the CPUs they run on once they have been pushed out by something else.
Chris
Both NUMA and HPC are a thing for long enough that, while considering Linux dominance in
supercomputing, the above quoted is probably not true.
jnml is offline   Reply With Quote
Old 2018-10-09, 16:46   #11
Mark Rose
 
Mark Rose's Avatar
 
"/X\(β€˜-β€˜)/X\"
Jan 2013

2·31·47 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
In a NUMA system each CPU has fast access to memory attached to it and slower access to memory attached to other CPUs. So you get better performance if a process is using memory attached to the CPU it's running on. If it gets swapped out, then brought back into memory on other CPUs it will run slower. And Linux probably isn't smart enough to migrate processes back into memory on the CPUs they run on once they have been pushed out by something else.

I don't know the details but that's probably why thing ran slower until you rebooted.

Chris
That's probably what is happening. It's a case of the program and/or the system administrator not tuning numa behaviour.

https://sitano.github.io/2014/08/20/numa-swap/
Mark Rose is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 12:11.

Mon Nov 30 12:11:43 UTC 2020 up 81 days, 9:22, 4 users, load averages: 1.18, 1.38, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.