Concurrency and parallelism - from threads, thread pools to tasks

Concurrency and Parallelism

  Usually we think that concurrency refers to the simultaneous occurrence of multiple events, and parallel refers to the execution of multiple tasks together. I prefer another description, that is: the called operation starts and ends independently of the control flow that invoked it, and the work is performed concurrently with the current control flow, achieving concurrency. Parallelism refers to decomposing a problem into smaller parts and asynchronously initiating the processing of each part so that they can be processed concurrently.

  In addition, concurrency does not only refer to multithreaded programming, in fact multithreading is just a form of concurrent programming. In C#, asynchronous programming, parallel programming, TPL data flow, responsive programming, etc. are all concurrent programming technologies.

  Concurrent programming is not exclusive to large servers. In many occasions, our program needs to ensure timely response to user operations, especially in scenarios such as reading and writing data, and service communication. This is one of the purposes of concurrent programming.

Thread - Thread

  Threads are the most fundamental means of parallelizing applications and distributing asynchronous operations. It is the lowest level of abstraction in user-mode programs. Threads provide little support for structure and control. Programming threads directly is an older way.

  In C#, we usually create a thread by passing in a method delegate to the constructor of the Thread class, and call the Start() method to start the thread running. In addition, the Start method can also pass parameters to the delegate method of the thread, and we can also use the closure mechanism of the lambda expression to pass parameters.

static void Print()
{
    Console.WriteLine("Hello World");
}

Thread t = new Thread(Print);
t.Start();

Some other methods of thread objects:

  • Sleep(): Tell the operating system not to allocate a time slice to the thread within the specified time, and the sleep time of the thread is usually inaccurate. It is often used to simulate high-latency operations and save the processor from doing meaningless calculations. When the specified time is 0, it means that the thread will give up the remaining time in the current time slice. In task-based asynchronous programming, it is common to introduce a delay using await Task.Delay() .
  • Join(): Inform the current thread that it needs to wait for the end of this thread before continuing to execute. The parameter indicates how long to wait at most.
  • Abort(): Attempts to destroy the thread. It will inject a ThreadAbortException for the thread, and even if the exception is caught and ignored, it will continue to re-throw the exception to ensure that the thread is destroyed. But everyone still does not recommend using the Abort() function for the following reasons:
  1. If the thread's control point is in the finally code block, it will not try to trigger the exception, because it may be performing critical cleanup operations at this time and should not be interrupted. Nor in unmanaged code, as that would break the CLR itself. The CLR will wait until the control point is out of the above situation before trying to raise an exception, but the effect is not guaranteed.
  2. The control point may be in the lock code block, and the lock cannot prevent exceptions from occurring, but the synchronization mechanism will be destroyed, which will affect thread safety and bring unpredictable results.
  3. It may damage the data structure of the process or the basic class library data of the program, causing an error.

  The inconvenience of thread operation needs to be considered far more than these. For example: we need to find all prime numbers within 1-1000000, we can easily parallelize this problem. If our computer has 8 cores, then we only need to open 8 threads, each thread calculates 125,000 numbers, find out which are prime numbers and bring them together. But is it really that simple?

  1. Should an 8-core computer have 8 threads? Is it necessary to calculate prime numbers within 20?
  2. How to ensure that the workload of the threads is equal? Is the amount of calculation required to calculate the prime numbers within 1-125000 equal to the calculation of 875001-1000000?
  3. What if a thread generates an error and terminates unexpectedly?
  4. How to solve the synchronization problem when the results of each thread are aggregated?

  Such a program has no scalability at all. It is not easy to solve these problems, Task Parallel Library (TPL) is such a framework. It can control the number of threads well, thread load, generate reliable results and error reports, and easily cooperate with other threads.

  For manual thread management, the most natural way is to resort to thread pools.

Thread pool - ThreadPool

  A thread pool is a component that manages a large number of threads for executing work items. It does not create a new thread for a certain task, but queues the task in the thread pool and executes it by an idle thread in the pool. By distributing tasks in this way, the cost of thread creation and destruction can be reduced, and thread overload can be reduced.

  For example: In the example of finding a prime number within 1-1000000, we can use the thread pool method to divide the data source into smaller blocks, and add each block as a task to the run queue of the thread pool. In this way, the program will have better scalability and performance improvement.

        static IEnumerable<int> PrimesInRange(int start,int end)
        {
            List<int> primes = new List<int>();
            const int chunkSize = 100;
            int complete = 0;
            ManualResetEvent alldone = new ManualResetEvent(false);
            int chunks = (end - start) / chunkSize;
            for(int i = 0; i < chunks; i++)
            {
                int chunkStart = start + i * chunkSize;
                int chunkEnd = chunkStart + chunkSize;
                ThreadPool.QueueUserWorkItem(_ =>
                {
                    for (int number = chunkStart; number < chunkEnd; number++)
                    {
                        if (IsPrime(number))
                        {
                            lock (primes)
                            {
                                primes.Add(number);
                            }
                        }
                    }
                    if(Interlocked.Increment(ref complete) == chunks)
                    {
                        alldone.Set();
                    }
                });             
            }
            alldone.WaitOne();
            return primes;
        }

Disadvantages of thread pool

  The thread pool assumes that all tasks have a short running time, so it can try to ensure that the processor goes all out to complete the task, instead of inefficiently performing multiple tasks through time slices. The short working time can also ensure that the thread pool can recover threads in a timely manner .

  Unlike Thread and Task, the thread pool does not provide a thread reference that is performing a given job, so we cannot manage threads and synchronize with threads. Microsoft's C# team believes that what developers really have to do is build high-level abstractions, using threads and thread pools as implementation details. This abstraction is implemented by the Task Parallel Library.

Task Parallel Library (TPL)

What is a task?

  TPL provides an object that can represent asynchronous work. Work can be understood as some code blocks that need to be executed asynchronously. It is abstracted as Task. It provides us with a convenient API to interact with jobs. This looks very similar to a delegate. Isn't a delegate also an object that encapsulates a piece of code? The task is actually to convert the synchronous execution of the delegate into asynchronous.

Creating a Task object is very simple.

  1. Created using the Task constructor, which is similar to Thread in writing, but the meaning and implementation behind them are completely different.
    Task t1 = new Task(Print);
    t1.Start();
  1. Created using the Task.Factory.StartNew factory method:
    Task t2 = Task.Factory.StartNew(Print);
  1. Use the Task.Run method to create, which is a shortcut for Task.Factory.StartNew.
    Task t3 = Task.Run(Print);

Wait for all tasks to finish:

    Task.WaitAll(new Task[] { t1, t2, t3 });

  The emergence of the task parallel library is regarded as another abstraction on the thread pool, which hides the underlying code that interacts with the thread pool and provides a more convenient fine-grained API. Task has the following functions:

  1. The task scheduler decides which thread to execute a given task. By default, the task is pushed into the CLR thread pool queue, but such as the UI thread can be sent to a specified thread.
  2. It is possible to wait for a task to complete and get the result.
  3. Provides a follow-up action, usually called a callback. The callback function can be executed not only when the task is completed normally, we can specify the callback task for a pioneer task when it succeeds or fails, and also specify whether the callback task is executed on the pioneer task thread or on other threads, etc.
  4. Handle exceptions. Can handle exceptions including single tasks, hierarchical tasks, and other related tasks.
  5. Cancel the task.

  In addition, there is a very useful helper method called Parallel.Invoke that executes a set of tasks and waits for all of them to return.

            Parallel.Invoke
                (
                    ()=>Print(),
                    ()=>Print()
                );

Guess you like

Origin blog.csdn.net/qq_40404477/article/details/102165017