Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

Introduction to Operating Systems

Hey Habr! I would like to bring to your attention a series of articles-translations of one interesting literature in my opinion - OSTEP. This material discusses quite deeply the work of unix-like operating systems, namely, work with processes, various schedulers, memory, and other similar components that make up a modern OS. You can see the original of all materials here here. Please note that the translation was made unprofessionally (quite freely), but I hope I retained the general meaning.

Lab work on this subject can be found here:

Other parts:

You can also check out my channel at telegrams =)

Let's take a look at the most fundamental abstraction that an OS exposes to users: a process. The definition of a process is quite simple - it is running program. The program itself is a lifeless thing, located on the disk - it is a set of instructions and possibly some kind of static data waiting to be launched. It is the OS that takes those bytes and runs them, transforming the program into something useful.
Most often, users want to run more than one program at the same time, for example, you can run a browser, game, media player, text editor, and the like on your laptop. In fact, a typical system can run dozens or hundreds of processes at the same time. This fact makes the system easier to use, you never have to worry about whether the CPU is free, you just run programs.

This raises the problem: how to provide the illusion of multiple CPUs? How can an OS create the illusion of a near-infinite amount of CPUs, even if you only have one physical CPU?

The OS creates this illusion through CPU virtualization. By starting one process, then stopping it, starting another process, and so on, the OS can maintain the illusion that there are many virtual CPUs, when in fact it will be one or more physical processors. Such a technique is called division of CPU resources over time. This technique allows users to run as many concurrent processes as they wish. The cost of such a solution is performance - because if the CPU is shared by several processes, each process will be processed more slowly.
To implement CPU virtualization, and especially to do it well, an OS needs both low-level and high-level support. The low level support is called mechanisms are low-level methods or protocols that implement the desired part of the functionality. An example of such functionality is context switching, which gives the OS the ability to stop one program and start another program on the processor. Such time division is implemented in all modern operating systems.
On top of these mechanisms is some logic embedded in the OS, in the form of “policies”. Policy - this is some algorithm for making a decision by the operating system. Such policies, for example, decide which program to run (from a list of commands) first. So, for example, this problem will be solved by a policy called scheduler and when choosing a solution will be guided by data such as: startup history (which program was running the longest in the last minute), what load this process is carrying out (what types of programs were launched), performance metrics (is the system optimized for interactive interaction or for throughput ) and so on.

Abstraction: process

The abstraction of a running program performed by the operating system is what we call process. As mentioned earlier, a process is just a running program, at any moment in time. A program with which we can obtain summary information from various system resources, and which this program accesses or affects during its execution.
To understand the components of the process, you need to understand the state of the system: what the program can read or change during its work. At any point in time, you need to understand what elements of the system are important for the execution of the program.
One of the obvious elements of the system state that a process includes is memory. Instructions are stored in memory. The data that the program reads or writes also resides in memory. Thus, the memory that a process can address (called an address space) is part of the process.
Registers are also part of the system state. Many instructions are directed to change the value of registers or read their value, and thus registers also become an important part of the process.
It should be noted that the state of the machine is also formed from some special registers. For example, IP - instruction pointer - a pointer to the instruction that the program is currently executing. There is also stack pointer and related frame pointer, which are used to control: function parameters, local variables, and return addresses.
Finally, programs often access ROM (Read Only Memory). Such “I/O” (input-output) information should include a list of files currently open by the process.

process API

In order to better understand how a process works, let's look at examples of system calls that should be included in any operating system interface. These APIs are available in one form or another on any OS.

Create (create): The OS must include some method to allow new processes to be created. When you enter a command into the terminal or launch an application by double-clicking on an icon, a request is sent to the OS to create a new process and then run the specified program.
Removal: Since there is an interface for creating a process, the OS must also provide the ability to force the process to be deleted. Most programs will naturally start and end on their own as they run. Otherwise, the user would like to be able to kill them, and thus an interface to stop the process would be useful.
Wait (wait): Sometimes it is useful to wait for a process to complete, so some interfaces are provided to provide the ability to wait.
Misc Control (various controls): In addition to killing and waiting for a process, there are various other control methods. For example, most operating systems provide the ability to freeze a process (stop its execution for a certain period) and then resume (continue execution)
Status (state): There are various interfaces to get some information about the status of a process, such as how long it has been running or what state it is currently in.

Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

Process Creation: Details

One of the interesting things is how exactly programs are transformed into processes. Especially how the OS picks up and runs the program. How exactly the process is created.
First of all, the OS must load the program code and static data into memory (in the process's address space). Programs are usually located on a disk or solid state drive in some executable format. Thus, the process of loading program and static data into memory requires the OS to be able to read those bytes from disk and allocate them somewhere in memory.

In early operating systems, the loading process was eagerly, which means that the entire code was loaded into memory before the program was run. Modern operating systems do this lazily, that is, loading pieces of code or data only when they are required by the program during its execution.

After the code and static data are loaded into the OS memory, there are a few more things to do before starting the process. Some amount of memory must be allocated for the stack. Programs use the stack for local variables, function parameters, and return addresses. The OS allocates this memory and gives it to the process. The stack can also be allocated with some arguments, specifically it populates the parameters of the main() function, such as the array argc and argv.

The operating system may also allocate some memory to the heap of the program. The heap is used by programs for explicitly requested dynamically allocated data.. Programs request this space by calling the function malloc () and clears explicitly by calling the function free(). A heap is needed for data structures such as: linked sheets, hash tables, trees, and others. At first, a small amount of memory is allocated for the heap, but over time, in the process of running the program, the heap can request more memory, through the malloc () library API call. The operating system is involved in allocating more memory to help meet these challenges.

The operating system will also perform initialization tasks, in particular those related to I/O. For example, on UNIX systems, each process has 3 file descriptors open by default, for standard input, standard output, and error. These handles allow programs to read input from the terminal as well as display information on the screen.

Thus, by loading code and static data into memory, creating and initializing the stack, and doing other work related to performing I/O tasks, the OS prepares the site for the process to execute. In the end, the final task remains: to run the program through its entry point, called the main() function. By passing to the execution of the main () function, the OS transfers control of the CPU to the newly created process, thus the program begins to be executed.

Process state

Now that we have some understanding of what a process is and how it is created, let's list the process states it can be in. In its simplest form, a process can be in one of these states:
Running. In the running state, the process runs on the processor. This means that instructions are being executed.
Ready. In the ready state, the process is ready to run, but for some reason the OS does not execute it at the specified time.
blocked. In the blocked state, the process performs some operation that prevents it from being ready to execute until some event occurs. One common example is when a process initiates an IO operation, it becomes blocked and thus some other process can use the processor.

Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

You can imagine these states in the form of a graph. As we can see in the picture, the state of the process can change between RUNNING and READY at the discretion of the OS. When the state of a process changes from READY to RUNNING, it means that the process has been scheduled. In the opposite direction - removed from the layout. At the moment when the process becomes BLOCKED, for example, I initialize an IO operation, the OS will keep it in this state until some event occurs, for example, the completion of IO. at this point, the transition to the READY state and possibly immediately to the RUNNING state, if the OS so decides.
Let's take a look at an example of how two processes go through these states. To begin with, let's imagine that both processes are running, and each uses only the CPU. In this case, their states will look like this.

Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

In the following example, the first process requests IO after some time and enters the BLOCKED state, allowing another process to run (FIG. 1.4). The OS sees that process 0 is not using the CPU and starts process 1. While process 1 is running, IO is terminated and the status of process 0 changes to READY. Finally, process 1 is terminated, and when it is finished, process 0 is launched, executed, and terminates.

Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

Data structure

The OS itself is a program, and just like any other program, it has some key data structures that keep track of various relevant pieces of information. To keep track of the state of each process, the OS will support some process list for all processes in the READY state, and some additional information to keep track of processes that are currently running. Also, the OS should keep track of blocked processes. After the IO is completed, the OS must wake up the desired process and put it in a ready-to-run state.

So, for example, the OS must save the state of the processor's registers. At the moment the process is stopped, the state of the registers is stored in the address space of the process, and at the moment of its continuation, the values ​​of the registers are restored and thus continue the execution of this process.

In addition to the states ready, blocked, running, there are some other states. Sometimes, at the time of creation, a process may have an INIT state. Finally, a process can be placed in the FINAL state when it has already terminated, but information about it has not yet been cleared out. On UNIX systems, this state is called zombie process. This state is useful for cases where the parent process wants to know the return code of the child, for example, usually 0 indicates success, and 1 indicates failure, however, programmers can write additional output codes, signaling various problems. On termination, the parent process makes a final system call, such as wait(), to wait for the child process to terminate and signal to the OS that any data associated with the terminated process can be cleaned up.

Operating Systems: Three Easy Pieces. Part 2: Abstraction: Process (translation)

Key points of the lecture:

Process - the main abstraction of a running program in the OS. At any point in time, a process can be described by its state: the contents of memory in its address space, the contents of processor registers, including the instruction pointer and stack pointer, and IO information, such as open files being read or written.
process API consists of calls that programs can make to processes. Usually these are calls to create, delete, or others.
● The process is in one of many states, including running, ready, blocked. Various events such as scheduling, scheduling exceptions, or waits can change the state of a process from one to the other.
Process list contains information about all processes in the system. Each entry in it is called a process control block, which in reality is a structure that contains all the necessary information about a particular process. 

Source: habr.com

Add a comment