I am logged in on pts/1 and using the Bash shell. As shown below, associated with my Bash shell process are three pseudo-files in procfs whose names start with oom. This post discusses the purpose of these files.
# ps PID TTY TIME CMD 1688 pts/1 00:00:00 ps 10290 pts/1 00:00:00 sudo 10291 pts/1 00:00:00 su 10294 pts/1 00:00:00 bash # ls -l /proc/10294/oom* -rw-r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_adj -r--r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_score -rw-r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_score_adj # cat /proc/10294/oom_score 0
It turns out that these three files have to do with Linux out of memory (OOM) management. Linux can be configured to overcommit memory by changing the value of the overcommit_memory kernel variable.
overcommit_memory: This value contains a flag that enables memory overcommitment. When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory. When this flag is 1, the kernel pretends there is always enough memory until it actually runs out. When this flag is 2, the kernel uses a "never overcommit" policy that attempts to prevent any overcommit of memory. Note that user_reserve_kbytes affects this policy. This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it. The default value is 0.
This allows memory allocation functions such as malloc() to allocate virtual memory with no guarantee that physical storage for it exists.
Memory overcommitment is useful. Without it, a system may fail to fully utilize its memory. Overcommitting memory allows a system to use virtual memory in a more efficient way but with the risk of running out of physical memory. This is fine until the kernel cannot find sufficient physical memory to back a virtual memory page when needed.
The purpose of the kernel OOM killer routine is free up memory for the system when all other memory management freeing techniques fail. It does this by killing selected processes until sufficient memory is freed to stabilize the system. OOM killer has several configuration options that enable some choice in the behaviour of the system when it is faced with an out-of-memory condition.
OOM Killer attempts to select the “best” processes to kill to achieve system stability, i.e. the least number of processes which will free up the maximum amount memory upon termination and which are also the least important processes as far as the system is concerned. Obviously, it will also kill any process sharing the same mm_struct as the selected process.
To facilitate process selection, the kernel maintains an oom_score for each process. The higher the value, the more likelihood of a process and its children getting killed by OOM Killer in an out-of-memory situation.
The oom_score_adj kernel variable exists to enable a user to have some control of the OOM Killer process selection. The deprecated kernel variable oom_adj provides similar functionality.
/oom_adj & /proc/ /oom_score_adj- Adjust the oom-killer score -------------------------------------------------------------------------------- These file can be used to adjust the badness heuristic used to select which process gets killed in out of memory conditions. The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. There is an additional factor included in the badness score: the current memory and swap usage is discounted by 3% for root processes. The amount of "allowed" memory depends on the context in which the oom killer was called. If it is due to the memory assigned to the allocating task's cpuset being exhausted, the allowed memory represents the set of mems assigned to that cpuset. If it is due to a mempolicy's node(s) being exhausted, the allowed memory represents the set of mempolicy nodes. If it is due to a memory limit (or swap limit) being reached, the allowed memory is that configured limit. Finally, if it is due to the entire system being out of memory, the allowed memory represents all allocatable resources. The value of /proc/ /oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, -1000, is equivalent to disabling oom killing entirely for that task since it will always report a badness score of 0. Consequently, it is very simple for userspace to define the amount of memory to consider for each task. Setting a /proc/ /oom_score_adj value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as scoring against the task. For backwards compatibility with previous kernels, /proc/ /oom_adj may also be used to tune the badness score. Its acceptable values range from -16 (OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17 (OOM_DISABLE) to disable oom killing entirely for that task. Its value is scaled linearly with /proc/ /oom_score_adj. The value of /proc/ /oom_score_adj may be reduced no lower than the last value set by a CAP_SYS_RESOURCE process. To reduce the value any lower requires CAP_SYS_RESOURCE. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work.
As stated earlier, processes to be killed are selected based on their badness score which is visible to a user as /proc/<PID>/oom_score. See this article in LWN (Linux Weekly News) for more information about how badness is calculated. The process, and any children, with the highest badness score is killed first.
There is lots more to OOM Killer than I have time to cover in this post. Just do an Internet search and you will find plenty of additional information.