Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

Introduction to Operating Systems

Hey Habr! I would like to bring to your attention a series of articles-translations of one interesting literature in my opinion - OSTEP. This material discusses quite deeply the work of unix-like operating systems, namely, work with processes, various schedulers, memory, and other similar components that make up a modern OS. You can see the original of all materials here here. Please note that the translation was made unprofessionally (quite freely), but I hope I retained the general meaning.

Lab work on this subject can be found here:

Other parts:

You can also check out my channel at telegrams =)

Introduction to the scheduler

The heart of the problem: How to design a scheduler policy
How should the underlying scheduler policy frameworks be developed? What should be the key assumptions? What metrics are important? What basic techniques were used in early computing systems?

Workload Assumptions

Before discussing possible policies, let's start with a few simplifying digressions about the processes running on the system, which are collectively referred to as workload. By defining workload as a critical part of policy building, and the more you know about workload, the better policy you can write.

Let's make the following assumptions about the processes running in the system, sometimes also called jobs (tasks). Almost all of these assumptions are not realistic, but are necessary for the development of thought.

  1. Each task runs the same amount of time,
  2. All tasks are assigned at the same time,
  3. The assigned task runs until its completion,
  4. All tasks use only the CPU,
  5. The running time of each task is known.

Planner Metrics

In addition to some load assumptions, some other tool for comparing different scheduling policies is needed: scheduler metrics. A metric is just some measure of something. There are a number of metrics that can be used to compare schedulers.

For example, we will use a metric called turnaround time (turn around time). The task turnaround time is defined as the difference between the task completion time and the task arrival time in the system.

Tturnaround=Tcompletion−Tarrival

Since we assumed that all tasks arrived at the same time, then Ta=0 and thus Tt=Tc. This value will naturally change when we change the above assumptions.

Another metric is fairness (fairness, honesty). Productivity and honesty are often opposing characteristics in planning. For example, the scheduler can optimize performance, but at the cost of waiting for other tasks to start, thus reducing fairness.

FIRST IN FIRST OUT (FIFO)

The most basic algorithm that we can implement is called FIFO or first come (in), first served (out). This algorithm has several advantages: it's very easy to implement, and it fits all of our assumptions, doing the job pretty well.

Let's consider a simple example. Let's say 3 tasks were set at the same time. But suppose that task A came a little earlier than all the others, so it will be the first in the execution list, just like B relative to C. Let's assume that each of them will be executed for 10 seconds. What would be the average time to complete these tasks in this case?

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

Counting the values ​​\u10b\u20b- 30 + 3 + 20 and dividing by XNUMX, we get the average program execution time equal to XNUMX seconds.
Now let's try to change our assumptions. In particular, assumption 1, and thus we will no longer assume that each task takes the same amount of time to complete. How will FIFO show itself this time?

As it turns out, different task execution times have an extremely negative impact on the productivity of the FIFO algorithm. Let's assume task A will run for 100 seconds, while B and C will still run for 10 seconds each.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

As you can see from the figure, the average time for the system will be (100+110+120)/3=110. Such an effect is called convoy effectwhen some short-term consumers of some resource will queue up after the heavy consumer. It's like queuing at the grocery store with a full cart in front of you. The best solution to the problem is to try to change the cash register or to relax and breathe deeply.

Shortest Job First

Is it possible to somehow solve a similar situation with heavy processes? Certainly. Another type of planning is calledShortest Job First (SJF). Its algorithm is also quite primitive - as the name implies, the shortest tasks will be launched first, one after the other.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

In this example, the result of running the same processes will be an improvement in the average turnaround time of programs and it will be equal to 50 instead of 110, which is almost 2 times better.

Thus, given the assumption that all tasks arrive at the same time, the SJF algorithm seems to be the most optimal algorithm. However, our assumptions still do not seem realistic. This time, let's change assumption 2 and this time imagine that the tasks can stay at any time, and not all at the same time. What problems can this lead to?

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

Imagine that task A (100s) arrives first and starts to run. At the moment t=10, tasks B, C arrive, each of which will take 10 seconds. So the average execution time is (100+(110-10)+(120-10))3 = 103. What could the scheduler do to improve things?

Shortest Time-to-Completion First (STCF)

In order to improve the situation, we omit assumption 3 that the program is started and runs to completion. In addition, we will need hardware support and as you might guess we will use timer to interrupt a running task and context switching. Thus, the scheduler can do something at the moment tasks B and C arrive - stop the execution of task A and put tasks B and C into processing and, after they finish, continue the execution of process A. Such a scheduler is called STCFor Preemptive Job First.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

The result of this scheduler will be the following result: ((120-0)+(20-10)+(30-10))/3=50. Thus, such a scheduler becomes even more optimal for our tasks.

Metric Response Time

Thus, if we know the running time of the tasks and that these tasks use only the CPU, STCF will be the best solution. And once in the early days, these algorithms worked and pretty well. However, now the user spends most of his time at the terminal and expects productive interactive interaction from it. Thus, a new metric was born - response time (response).

The response time is calculated as follows:

Tresponse=Tfirstrun−Tarrival

Thus, for the previous example, the response time would be: A=0, B=0, C=10 (abg=3,33).

And it turns out that the STCF algorithm is not so good in a situation where 3 tasks arrive at the same time - it will have to wait until the small tasks are completely completed. Thus, the algorithm is good for the turnaround time metric, but bad for the interactivity metric. Imagine sitting at the terminal trying to type characters in the editor and having to wait more than 10 seconds because some other task is taking up the processor. It's not very pleasant.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

Thus, we are faced with another problem - how can we build a scheduler that is sensitive to response time?

Round Robin

To solve this problem, an algorithm was developed Round Robin (RR). The basic idea is quite simple: instead of running tasks until they are completely finished, we will run a task for a certain period of time (called a time slice) and then switch to another task from the queue. The algorithm repeats its work until all tasks are completed. In this case, the running time of the program must be a multiple of the time after which the timer will interrupt the process. For example, if the timer interrupts the process every x=10ms, then the size of the process execution window should be a multiple of 10 and be 10,20 or x*10.

Consider an example: Tasks ABC arrive at the same time in the system and each of them wants to work for 5 seconds. The SJF algorithm will run each task to the end before starting another one. In contrast, the RR algorithm with run window = 1s will go through the tasks as follows (Fig. 4.3):

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)
(SJF Again (Bad for Response Time)

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)
(Round Robin (Good For Response Time)

Average response time for RR algorithm (0+1+2)/3=1, while for SJF (0+5+10)/3=5.

It is logical to assume that the time window is a very important parameter for RR, the smaller it is, the higher the response time. However, you can't make it too small either, because the context switch time will also play a role in overall performance. Thus, the timing of the execution window is set by the OS architect and depends on the tasks that are planned to be executed in it. A context switch is not the only housekeeping operation that wastes time - a running program operates on many other things, such as various caches, and each switch needs to save and restore this environment, which can also be time consuming.

RR is a great scheduler if it were just a response time metric. But how will the task turnaround time metric behave under this algorithm? Consider the example above, when the running time A, B, C=5s and arrive at the same time. Task A will finish at 13, B at 14, C at 15s, and the average turnaround time will be 14s. Thus, RR is the worst algorithm for the turnover metric.

In more general terms, any algorithm like RR is fair, it divides the CPU time equally among all processes. And thus, these metrics are constantly in conflict with each other.

Thus, we have several opposed algorithms and at the same time there are still several assumptions left - that the time of the task is known and that the task only uses the CPU.

Mixing with I/O

First of all, let's remove assumption 4 that the process only uses the CPU, naturally this is not the case and processes can also access other hardware.

The moment a process requests an I/O operation, the process enters the blocked state, waiting for the I/O to complete. If the I/O is sent to the hard disk, then such an operation can take up to several ms or longer, and the processor will be idle at this point. During this time, the scheduler can occupy the processor with any other process. The next decision the scheduler will have to make is when the process completes its I/O. When this happens, an interrupt will occur and the OS will put the process that called the I/O into the ready state.

Let's consider an example of several tasks. Each of them needs 50ms of CPU time. However, the first one will access I / O every 10ms (which will also be executed in 10ms). And Process B is just using a 50ms CPU with no I/O.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

In this example, we will use the STCF scheduler. How will the scheduler behave if you run a process like A on it? He will do the following - first complete process A, and then process B.

Operating Systems: Three Easy Pieces. Part 4: Introduction to the planner (translation)

The traditional approach to solving this problem is to interpret each 10-ms subtask of Process A as a separate task. Thus, when starting with the STJF algorithm, the choice between a 50ms task and a 10ms task is obvious. Then, when subtask A completes, process B and I/O will be started. After the I/O is complete, it will be customary to restart the 10ms process A instead of process B. Thus, it is possible to implement an overlap when the CPU is used by another process while the first one is waiting for I/O. And as a result, the system is better utilized - at the moment when interactive processes are waiting for I / O, other processes can be executed on the processor.

The oracle is no more

Now let's try to get rid of the assumption that the running time of the task is known. This is generally the worst and most unrealistic assumption of the entire list. In fact, in an average regular OS, the OS itself usually knows very little about the execution time of tasks, how then to build a scheduler without knowing how long the task will be executed? Perhaps we could use some RR principles to solve this problem?

Сonclusion

We covered the basic ideas of task scheduling and looked at 2 families of schedulers. The first one launches the shortest task at the beginning and thus increases the turnaround time, while the second is torn between all tasks equally, increasing the response time. Both algorithms are bad where algorithms of the other family are good. We also looked at how parallel use of the CPU and I/O can improve performance, but did not solve the problem with OS clairvoyance. And in the next lesson, we'll look at a planner that looks to the immediate past and tries to predict the future. And it's called multi-level feedback queue.

Source: habr.com

Add a comment