Well-Fed Philosophers or Competitive .NET Programming
Let's see how concurrent and parallel programming works in .Net, using the Philosophers Dining Problem as an example. The plan is this, from the synchronization of threads / processes, to the actor model (in the following parts). The article may be useful for the first acquaintance or in order to refresh your knowledge.
Why do it at all? Transistors reach their minimum size, Moore's law rests on the limitation of the speed of light and therefore an increase is observed in the number, more transistors can be made. At the same time, the amount of data is growing, and users expect an immediate response from the systems. In such a situation, "normal" programming, when we have one executing thread, is no longer effective. You need to somehow solve the problem of simultaneous or concurrent execution. Moreover, this problem exists at different levels: at the level of threads, at the level of processes, at the level of machines in the network (distributed systems). .NET has high-quality, time-tested technologies for quickly and efficiently solving such problems.
Task
Edsger Dijkstra posed this problem to his students as early as 1965. The established formulation is as follows. There is a certain (usually five) number of philosophers and the same number of forks. They sit at a round table, forks between them. Philosophers can eat from their plates of endless food, think or wait. To eat a philosopher, you need to take two forks (the last one shares the fork with the first). Picking up and putting down a fork are two separate actions. All philosophers are silent. The task is to find such an algorithm that all of them would think and be full even after 54 years.
First, let's try to solve this problem through the use of a shared space. The forks lie on the common table and the philosophers simply take them when they are and put them back. Here there are problems with synchronization, when exactly to take surebets? what if there is no fork? etc. But first, let's start the philosophers.
To start threads, we use a thread pool through Task.Run method:
var cancelTokenSource = new CancellationTokenSource();
Action<int> create = (i) => RunPhilosopher(i, cancelTokenSource.Token);
for (int i = 0; i < philosophersAmount; i++)
{
int icopy = i;
// ΠΠΎΠΌΠ΅ΡΡΠΈΡΡ Π·Π°Π΄Π°ΡΡ Π² ΠΎΡΠ΅ΡΠ΅Π΄Ρ ΠΏΡΠ»Π° ΠΏΠΎΡΠΎΠΊΠΎΠ². ΠΠ΅ΡΠΎΠ΄ RunDeadlock Π½Π΅ Π·Π°ΠΏΡΡΠΊΠ°Π΅ΡΡΡΡ
// ΡΡΠ°Π·Ρ, Π° ΠΆΠ΄Π΅Ρ ΡΠ²ΠΎΠ΅Π³ΠΎ ΠΏΠΎΡΠΎΠΊΠ°. ΠΡΠΈΠ½Ρ ΡΠΎΠ½Π½ΡΠΉ Π·Π°ΠΏΡΡΠΊ.
philosophers[i] = Task.Run(() => create(icopy), cancelTokenSource.Token);
}
The thread pool is designed to optimize thread creation and deletion. This pool has a queue with tasks and the CLR creates or removes threads depending on the number of these tasks. One pool for all AppDomains. This pool should be used almost always, because. no need to bother with creating, deleting threads, their queues, etc. It is possible without a pool, but then you have to use it directly Thread, this is useful for cases when you need to change the priority of a thread, when we have a long operation, for a Foreground thread, etc.
In other words, System.Threading.Tasks.Task class is the same Thread, but with all sorts of conveniences: the ability to run a task after a block of other tasks, return them from functions, conveniently interrupt them, and more. etc. They are needed to support async / await constructions (Task-based Asynchronous Pattern, syntactic sugar for waiting for IO operations). We'll talk about this later.
CancelationTokenSource here it is needed so that the thread can terminate itself at the signal of the calling thread.
Sync Issues
Blocked Philosophers
Okay, we know how to create threads, let's try to have lunch:
Here we first try to take the left fork, and then the right fork, and if it works out, then we eat and put them back. Taking one fork is atomic, i.e. two threads cannot take one at the same time (incorrect: the first reads that the fork is free, the second - too, the first takes, the second takes). For this Interlocked.CompareExchange, which must be implemented with a processor instruction (TSL, XCHG), which locks a piece of memory for atomic sequential reading and writing. And SpinWait is equivalent to the construct while(true) only with a little "magic" - the thread takes the processor (Thread.SpinWait), but sometimes transfers control to another thread (Thread.Yeild) or falls asleep (Thread.Sleep).
But this solution doesn't work, because the flows soon (for me within a second) are blocked: all philosophers take their left fork, but not the right one. The forks array then has the values: 1 2 3 4 5.
In the figure, blocking threads (deadlock). Green - execution, red - synchronization, gray - the thread is sleeping. The rhombuses indicate the start time of Tasks.
The Hunger of the Philosophers
Although it is not necessary to think especially much food, but hunger makes anyone give up philosophy. Let's try to simulate the situation of starvation of threads in our problem. Starvation is when a thread is running, but without significant work, in other words, this is the same deadlock, only now the thread is not sleeping, but is actively looking for something to eat, but there is no food. In order to avoid frequent blocking, we will put the fork back if we could not take another one.
The important thing about this code is that two out of four philosophers forget to put down their left fork. And it turns out that they eat more food, while others begin to starve, although the threads have the same priority. Here they are not completely starving, because. bad philosophers put their forks back sometimes. It turns out that good people eat about 5 times less than bad ones. So a small error in the code leads to a drop in performance. It is also worth noting here that a rare situation is possible when all philosophers take the left fork, there is no right one, they put the left, wait, take the left again, etc. This situation is also a starvation, more like a deadlock. I failed to repeat it. Below is a picture for a situation where two bad philosophers have taken both forks and two good ones are starving.
Here you can see that the threads wake up sometimes and try to get the resource. Two of the four cores do nothing (green graph above).
Death of a Philosopher
Well, another problem that can interrupt a glorious dinner of philosophers is if one of them suddenly dies with forks in his hands (and they will bury him like that). Then the neighbors will be left without lunch. You can come up with an example code for this case yourself, for example, it is thrown out NullReferenceException after the philosopher takes the forks. And, by the way, the exception will not be handled and the calling code will not just catch it (for this AppDomain.CurrentDomain.UnhandledException and etc.). Therefore, error handlers are needed in the threads themselves and with graceful termination.
Waiter
Okay, how do we solve this deadlock, starvation, and death problem? We will allow only one philosopher to reach the forks, add a mutual exclusion of threads for this place. How to do it? Suppose that a waiter is standing next to the philosophers, who gives permission to any one philosopher to take the forks. How do we make this waiter and how philosophers will ask him, the questions are interesting.
The simplest way is when the philosophers will simply constantly ask the waiter for access to the forks. Those. now philosophers will not wait for a fork nearby, but wait or ask the waiter. At first, we use only User Space for this, in it we do not use interrupts to call any procedures from the kernel (about them below).
Solutions in user space
Here we will do the same as we used to do with one fork and two philosophers, we will spin in a cycle and wait. But now it will be all philosophers and, as it were, only one fork, i.e. it can be said that only the philosopher who took this βgolden forkβ from the waiter will eat. For this we use SpinLock.
SpinLock this is a blocker, with, roughly speaking, the same while(true) { if (!lock) break; }, but with even more "magic" than in SpinWait (which is used there). Now he knows how to count those waiting, put them to sleep a little, and more. etc. In general, does everything possible to optimize. But we must remember that this is still the same active cycle that eats up processor resources and keeps the flow, which can lead to starvation if one of the philosophers becomes more priority than others, but does not have a golden fork (Priority Inversion problem). Therefore, we use it only for very very short changes in shared memory, without any third-party calls, nested locks, and other surprises.
Drawing for SpinLock. The streams are constantly "fighting" for the golden fork. There are failures - in the figure, the selected area. The cores are not fully utilized: only about 2/3 by these four threads.
Another solution here would be to use only Interlocked.CompareExchange with the same active wait as shown in the code above (in the starving philosophers), but this, as already said, could theoretically lead to blocking.
About Interlocked It should be noted that there is not only CompareExchange, but also other methods for atomic read AND write. And through the repetition of the change, in case another thread has time to make its changes (read 1, read 2, write 2, write 1 is bad), it can be used for complex changes to a single value (Interlocked Anything pattern).
Kernel Mode Solutions
To avoid wasting resources in a loop, let's see how we can block a thread. In other words, continuing our example, let's see how the waiter puts the philosopher to sleep and wakes him up only when necessary. First, let's look at how to do this through the kernel mode of the operating system. All structures there are often slower than those in user space. Several times slower, for example AutoResetEvent maybe 53 times slower SpinLock [Richter]. But with their help, you can synchronize processes throughout the system, managed or not.
The basic construct here is the semaphore proposed by Dijkstra over half a century ago. A semaphore is, simply put, a positive integer managed by the system, and two operations on it, increment and decrement. If it fails to decrease, zero, then the calling thread is blocked. When the number is incremented by some other active thread/process, then the threads are skipped and the semaphore is again decremented by the number passed. One can imagine trains in a bottleneck with a semaphore. .NET offers several constructs with similar functionality: AutoResetEvent, ManualResetEvent, Mutex and himself Semaphore. We will use AutoResetEvent, this is the simplest of these constructions: only two values ββ0 and 1 (false, true). Her Method WaitOne() blocks the calling thread if the value was 0, and if 1, lowers it to 0 and skips it. A method Set() raises to 1 and lets one waiter through, who again lowers to 0. Acts like a subway turnstile.
Let's complicate the solution and use the lock for each philosopher, and not for all at once. Those. now there can be several philosophers at once, and not one. But we again block access to the table in order to correctly, avoiding races (race conditions), take surebets.
To understand what is happening here, consider the case when the philosopher failed to take the forks, then his actions will be as follows. He is waiting for access to the table. Having received it, he tries to take the forks. Did not work out. It gives access to the table (mutual exclusion). And passes his "turnstile" (AutoResetEvent) (they are initially open). It gets into the cycle again, because he has no forks. He tries to take them and stops at his "turnstile". Some more fortunate neighbor on the right or left, having finished eating, unlocks our philosopher, "opening his turnstile." Our philosopher passes it (and it closes behind it) for the second time. He tries for the third time to take the forks. Good luck. And he passes his turnstile to dine.
When there are random errors in such code (they always exist), for example, a neighbor is incorrectly specified or the same object is created AutoResetEvent for all (Enumerable.Repeat), then the philosophers will be waiting for the developers, because Finding errors in such code is quite a difficult task. Another problem with this solution is that it doesn't guarantee that some philosopher won't go hungry.
Hybrid Solutions
We've looked at two approaches to timing, when we stay in user mode and loop, and when we block thread through the kernel. The first method is good for short locks, the second for long ones. It is often necessary to first briefly wait for a variable to change in a loop, and then block the thread when the wait is long. This approach is implemented in the so-called. hybrid structures. Here are the same constructs as for kernel mode, but now with a user mode loop: SemaphorSlim, ManualResetEventSlim etc. The most popular design here is Monitor, because in C# there is a well-known lock syntax. Monitor this is the same semaphore with a maximum value of 1 (mutex), but with support for waiting in a loop, recursion, the Condition Variable pattern (more on that below), etc. Let's look at a solution with it.
Here we are again blocking the entire table for access to the forks, but now we are unblocking all threads at once, and not neighbors when someone finishes eating. Those. first, someone eats and blocks the neighbors, and when this someone finishes, but wants to eat again right away, he goes into blocking and wakes up his neighbors, because. its waiting time is less.
This is how we avoid deadlocks and the starvation of some philosopher. We use a loop for a short wait and block the thread for a long one. Unblocking everyone at once is slower than if only the neighbor was unblocked, as in the solution with AutoResetEvent, but the difference should not be large, because threads must stay in user mode first.
Π£ lock syntax has nasty surprises. Recommend to use Monitor directly [Richter] [Eric Lippert]. One of them is that lock always out of Monitor, even if there was an exception, in which case another thread could change the shared memory state. In such cases, it is often better to go to deadlock or somehow safely terminate the program. Another surprise is that Monitor uses synchronization blocks (SyncBlock), which are present in all objects. Therefore, if an inappropriate object is selected, you can easily get a deadlock (for example, if you lock on an interned string). We use the always hidden object for this.
The Condition Variable pattern allows you to more concisely implement the expectation of some complex condition. In .NET, it is incomplete, in my opinion, because in theory, there should be several queues on several variables (as in Posix Threads), and not on one lok. Then one could make them for all philosophers. But even in this form, it allows you to reduce the code.
many philosophers or async / await
Okay, now we can effectively block threads. But what if we have a lot of philosophers? 100? 10000? For example, we received 100000 requests to the web server. It will be overhead to create a thread for each request, because so many threads will not run in parallel. Will only run as many as there are logical cores (I have 4). And everyone else will just take away resources. One solution to this problem is the async / await pattern. Its idea is that the function does not hold the thread if it needs to wait for something to continue. And when it does something, it resumes its execution (but not necessarily on the same thread!). In our case, we will wait for the fork.
SemaphoreSlim has for this WaitAsync() method. Here is an implementation using this pattern.
Method with async / await is translated into a tricky state machine that immediately returns its internal Task. Through it, you can wait for the completion of the method, cancel it, and everything else that you can do with Task. Inside the method, the state machine controls the execution. The bottom line is that if there is no delay, then the execution is synchronous, and if there is, then the thread is released. For a better understanding of this, it is better to look at this state machine. You can create chains from these async / await methods.
Let's test. Work of 100 philosophers on a machine with 4 logical cores, 8 seconds. The previous solution with Monitor only ran the first 4 threads and the rest didn't run at all. Each of these 4 threads was idle for about 2ms. And the async / await solution ran all 100, with an average wait of 6.8 seconds each. Of course, in real systems, idle for 6 seconds is unacceptable and it is better not to process so many requests like this. The solution with Monitor turned out to be not scalable at all.
Conclusion
As you can see from these small examples, .NET supports many synchronization constructs. However, it is not always obvious how to use them. I hope this article was helpful. For now, this is the end, but there is still a lot of interesting things left, for example, thread-safe collections, TPL Dataflow, Reactive programming, Software Transaction model, etc.