1. Detailed explanation of JMM for concurrent programming

1. Modern computer theoretical model and working principle

The modern computer model is based on the von Neumann computer model. When the computer is running, it first fetches the first instruction from the memory, and through the decoding of the controller, according to the requirements of the instruction, fetches the data from the memory to perform the specified operation and logic. The operation waits for processing, and then sends the result to the memory according to the address. Next, take out the second instruction and complete the specified operation under the command of the controller. And so on. until a stop command is encountered.
The program is stored in the same way as the data. According to the order of programming, the instructions are taken out step by step, and the operations specified by the instructions are automatically completed. This is the most basic working model of the computer. This principle was first proposed by the American-Hungarian mathematician von Neumann in 1945, so it is called the von Neumann computer model.

The five core components of a computer:

  • Controller (Control): It is the central nervous system of the entire computer. Its function is to interpret the control information specified by the program, control it according to its requirements, schedule programs, data, addresses, and coordinate the work of various parts of the computer and access to memory and peripherals. wait.

  • Calculator (Datapath): The function of the calculator is to perform various arithmetic operations and logical operations on the data, that is, to process the data.

  • Memory (Memory): The function of the memory is to store information such as programs, data, and various signals and commands, and to provide this information when needed.

  • Input (Input system): The input device is an important part of the computer. The input device and the output device are combined as external devices, referred to as peripherals. The function of the input device is to transfer programs, raw data, text, characters, control commands or on-site collection data and other information into the computer. Common input devices include keyboards, mice, photoelectric input machines, tape drives, disk drives, CD drives, etc.

  • Output (Output system): The output device is an important part of the computer as well as the input device. It outputs information such as the intermediate or final results of the external computer, various data symbols and text or various control signals in the computer. Commonly used output devices for microcomputers include display terminals, CRTs, printers, laser printers, plotters, tapes, CD-ROMs, etc.

    The picture below - Von Neumann computer model diagram
    insert image description here
    Modern computer hardware structure schematic diagram:
    insert image description here
    The internal structure division of the CPU:

  • control unit

  • arithmetic unit

  • Storage unit
    insert image description here
    1. Control unit
    The control unit is the command and control center of the entire CPU, composed of instruction register IR (Instruction Register), instruction
    decoder ID (Instruction Decoder) and operation controller OC (Operation Controller), and coordinates the entire computer Orderly work is extremely important. According to the user's pre-programmed program, it takes out each instruction from the memory in turn, puts it in the instruction register IR, and determines what operation should be performed through instruction decoding (analysis), and then operates the controller OC according to the determined timing. Send micro-operation control signals to the corresponding components. The operation controller OC mainly includes control logic such as beat pulse generator, control matrix, clock pulse generator, reset circuit and start-stop circuit.
    2. Arithmetic unit
    The arithmetic unit is the core of the arithmetic unit. Arithmetic operations (including basic operations such as addition, subtraction, multiplication, and their additions) and logical operations (including shifts, logical tests, or comparisons of two values) can be performed. In contrast to the control unit, the arithmetic unit operates under the command of the control unit, that is, all operations performed by the arithmetic unit are directed by the control signal sent by the control unit, so it is an execution unit.
    3. Storage unit
    The storage unit includes the CPU on-chip cache Cache and register group, which is the place where data is temporarily stored in the CPU, and the data waiting to be processed or the data that has been processed are stored in it. Memory time is short. The register is an internal component of the CPU. The register has a very high read and write speed, so the data transfer between the registers is very fast. The use of registers can reduce the number of times the CPU accesses the memory, thereby increasing the working speed of the CPU. The register set can be divided into special-purpose registers and general-purpose registers. The role of special-purpose registers is fixed, and the corresponding data are stored respectively; while general-purpose registers are widely used and can be specified by the programmer.
    Computer hardware multi-CPU architecture:
    insert image description here
    multi-CPU
    A modern computer usually consists of two or more CPUs. If you want to run multiple programs (processes), if there is only one CPU, it means frequent process context switching, because even if a single CPU is multi-core, it is only multi-core. Each processor core is shared by other devices, so multiple processes must frequently switch process contexts, which is very expensive.
    CPU multi-core
    In addition to the processor core, a modern CPU includes registers, storage devices such as L1L2L3 caches, floating-point arithmetic units, integer arithmetic units, and other auxiliary computing devices, as well as internal buses. A multi-core CPU means that there are multiple processor cores on one CPU. What are the benefits of this? For example, now we want to run a multi-threaded program on a computer, because it is a thread in a process, so we need some Share some storage variables. If the computer is a single-core single-thread CPU, it means that different threads of this program need to communicate on the external bus between the CPUs frequently, and at the same time deal with different caches between different CPUs. The problem of inconsistency, so in this scenario, the multi-core single-CPU architecture can play a great advantage, and the communication is all on the internal bus, sharing the same cache.
    CPU Registers
    Every CPU contains a series of registers, which are the basis of the memory within the CPU. The CPU can perform operations on registers much faster than on main memory. This is because the CPU accesses registers much faster than main memory.
    CPU cache
    , or high-speed cache memory, is a small-capacity but high-speed memory located between the CPU and the main memory. Since the speed of the CPU is much higher than that of the main memory, it takes a certain period of time for the CPU to directly access data from the memory. A part of the data that the CPU has just used or recycled is stored in the Cache. When the CPU uses this part of the data again, it can be retrieved from the Cache. Call directly in the middle, reduce the waiting time of the CPU, and improve the efficiency of the system.
    Level 1 Cache (L1 Cache) Level 2 Cache (L2 Cache) Level 3 Cache (L3 Cache)
    memory
    A computer also contains a main memory. All CPUs have access to main memory. Main memory is usually much larger than the cache in the CPU.
    The process of CPU reading memory data
    requires only one step to read the value of register XX: direct reading.
    It takes 1-3 steps (or more) for the CPU to fetch a certain value of the L1 cache: lock the cache line, fetch a certain data
    , and unlock it. If it is not locked, it will be slow.
    If the CPU wants to get a certain value from the L2 cache, it first needs to get it from the L1 cache, which does not exist in L1. In L2, L2 starts to lock. After locking, the data in L2 is copied to L1, and then the read L1 is executed. Process, the above 3 steps, and then unlock. The same is true for the CPU's access to the L3 cache, except that it is first copied from L3 to L2, from L2 to L1, and from L1 to the CPU. The CPU fetches memory is the most complicated: notify the memory controller to occupy the bus bandwidth, notify the memory to lock, initiate a memory read request, wait for the response, save the response data to L3 (if not, go to L2), then from L3/2 to L1, and then From L1 to the CPU, after which the bus is unlocked.
    Problems in a multi-threaded environment
    Cache coherence problem In a multiprocessor system, each processor has its own cache, and they share the same main memory
    (MainMemory). The storage interaction based on the cache solves the conflict between the speed of the processor and the memory, but it also introduces a new problem: cache coherence (Cache Coherence). When the calculation tasks of multiple processors all involve the same main memory area, it may lead to the inconsistency of their respective cache data. If this happens, whose cache data will prevail when synchronizing back to the main memory What? In order to solve the problem of consistency, each processor needs to follow some protocols when accessing the cache, and operate according to the protocol when reading and writing. Such protocols include MSI, MESI (Illinois Protocol), MOSI, Synapse , Firefly and Dragon Protocol ,etc

insert image description here
Instruction reordering problem
In order to make full use of the computing units inside the processor, the processor may perform out-of-order execution (Out-Of-Order Execution) optimization on the input code, and the processor will execute out-of-order execution after calculation Result reorganization ensures that the result is consistent with the result of sequential execution, but does not guarantee that the order of calculation of each statement in the program is consistent with the order in the input code. Therefore, if there is a computing task that depends on the intermediate results of another computing task, its sequence cannot be guaranteed by the order of the code. Similar to the processor's out-of-order execution optimization, the Java virtual machine's just-in-time compiler also has a similar instruction reordering (Instruction Reorder) optimization

2. What is a thread

When a modern operating system runs a program, it creates a process for it. For example, when a Java program is started, the operating system creates a Java process. The smallest unit of CPU scheduling in modern operating systems is a thread, also called a lightweight process (Light Weight Process). Multiple threads can be created in a process. These threads have their own attributes such as counters, stacks, and local variables, and can Access shared memory variables. The processor switches between these threads at high speed, giving the user the impression that these threads are executing at the same time.
Thread implementation can be divided into two categories:
1. User-Level Thread
2. Kernel-Level Thread
Before understanding thread classification, we need to understand the two concepts of system user space and kernel space , taking the 4G memory space as an example,
insert image description here
Linux reserves several page frames for kernel code and data structures, and these pages will never be transferred out to disk. Linear addresses from 0x00000000 to 0xc0000000 (PAGE_OFFSET) can be referenced by user code and kernel code (ie user space). The linear addresses from 0xc0000000 (PAGE_OFFSET) to 0xFFFFFFFF can only be accessed by kernel code (that is, kernel space). Both the kernel code and its data structures must reside in this 1 GB address space, but the bigger consumer of this address space is the virtual mapping of physical addresses.

This means that out of the 4 GB of memory space, only 3 GB can be used for user applications. A process can only run in user mode or kernel mode. User programs run in user mode, while system calls run in kernel mode. The stacks used in these two modes are different: the general stack is used in the user mode, and the fixed-size stack (usually the size of a memory page) is used in the kernel mode.

Each process has its own 3G user space, and they share 1GB of kernel space. When a process enters kernel space from user space, it no longer has its own process space. This is why we often say that thread context switching involves switching from user mode to kernel mode.

User thread : refers to the thread implemented in the user program without kernel support, which does not depend on the operating system core. The application process uses the thread library to provide the functions of creating, synchronizing, scheduling and managing threads to control the user thread. In addition, the user thread is created and managed by the application process using the thread library, and does not depend on the operating system core. No user mode/kernel mode switching is required, and the speed is fast. The operating system kernel is unaware of the existence of multithreading, so a thread blocking will block the entire process (including all its threads). Since the processor time slice allocation here is based on the process as the basic unit, the execution time of each thread is relatively reduced.

Kernel threads : All management operations for threads are done by the operating system kernel. The kernel saves the state and context information of the thread. When a thread executes a system call that causes blocking, the kernel can schedule other threads of the process to execute. On a multi-processor system, the kernel can assign multiple threads belonging to the same process to run on multiple processors, improving the parallelism of process execution. Since thread creation, scheduling, and management are done by the kernel, these operations are much slower than user-level threads, but still faster than process creation and management operations. Most operating systems on the market, such as Windows, Linux, etc., support kernel-level threads.
The difference in principle is shown in the figure below.
insert image description here
The relationship between Java threads and system kernel threads.
insert image description here
Java threads
There are two ways to create threads in the JVM.

  1. new java.lang.Thread().start()
  2. Use JNI to attach a native thread to the JVM.
    For the new java.lang.Thread().start() method, only when the start() method is called will the thread be actually created in the JVM. The main life The cycle steps are:
  3. Create the corresponding JavaThread instance
  4. Create the corresponding instance of OSThread
  5. Create the native thread of the actual underlying operating system
  6. Prepare the corresponding JVM state, such as ThreadLocal storage space allocation, etc.
  7. The underlying native thread starts running, calling the run() method of the Object generated by java.lang.Thread
  8. When the run() method of the Object generated by java.lang.Thread finishes executing and returns, or throws an exception and terminates,
    the native thread is terminated
  9. Release the thread resources related to the JVM, clear the corresponding JavaThread and OSThread,
    and attach a native thread to the JVM for JNI. The main steps are:
  10. Apply for connection to the executing JVM instance through JNI call AttachCurrentThread
  11. JVM creates corresponding JavaThread and OSThread objects
  12. Create the corresponding java.lang.Thread object
  13. Once the Object of java.lang.Thread is created, JNI can call Java code
  14. After calling DetachCurrentThread through JNI, JNI disconnects from the JVM instance
  15. JVM clears the corresponding JavaThread, OSThread, java.lang.Thread objects
    Java thread life cycle:
    insert image description here

3. Why use concurrency? What problems will concurrency cause?

1. Why is concurrent
programming used? The essence of concurrent programming is actually the use of multi-threading technology. Under the background of modern multi-core CPUs, the trend of concurrent programming has been spawned. Through concurrent programming, the computing power of multi-core CPUs can be maximized. Performance get promoted. In addition, in the face of complex business models, parallel programs are more suitable for business needs than serial programs, and concurrent programming is more in line with this business split.
Even a single-core processor supports multi-threaded execution of code, and the CPU implements this mechanism by assigning CPU time slices to each thread. The time slice is the time allocated by the CPU to each thread. Because the time slice is very short, the CPU keeps switching threads for execution, making us feel that multiple threads are executing at the same time. The time slice is generally tens of milliseconds (ms).
Concurrency is not equal to parallelism: Concurrency refers to the alternate execution of multiple tasks, while parallelism refers to "simultaneous execution" in the true sense. In fact, if there is only one CPU in the system and multithreading is used, parallel execution cannot be performed in a real system environment, and it can only be performed alternately by switching time slices, which becomes a concurrent execution task. True parallelism can only occur in systems with multiple CPUs.
Advantages of concurrency:

  1. Make full use of the computing power of multi-core CPU;
  2. Facilitate business splitting and improve application performance;

Problems caused by concurrency:
1. In high-concurrency scenarios, frequent context switching is caused.
2. Thread safety issues in critical sections are prone to deadlocks. Deadlocks will cause system functions to be unavailable

The CPU executes tasks cyclically through the time slice allocation algorithm. After the current task executes a time slice, it will switch to the next task. However, the state of the previous task will be saved before switching, so that the state of this task can be loaded again when switching back to this task next time. So the process from saving to reloading a task is a context switch.

Thread context switching process:
insert image description here

4. What is the JMM model

The Java Memory Model (JMM for short) is an abstract concept that does not really exist. It describes a set of rules or specifications, through which each variable in the program is defined (including instance fields, static fields and An access method for the elements that make up the array object). The entity of the JVM running the program is a thread, and when each thread is created, the JVM will create a working memory (called stack space in some places) for storing thread-private data, and the Java memory model stipulates that all variables are stored in Main memory, the main memory is a shared memory area, which can be accessed by all threads, but the operation of the thread on the variable (read assignment, etc.) must be performed in the working memory, first copy the variable from the main memory to its own working memory space, Then operate on the variable, and write the variable back to the main memory after the operation is completed. The variable in the main memory cannot be directly manipulated. The copy of the variable in the main memory is stored in the working memory. As mentioned earlier, the working memory is the memory of each thread. Private data area, so different threads cannot access each other's working memory, and the communication (passing values) between threads must be done through the main memory.

JMM is different from the JVM memory area model
. The division of JMM and JVM memory areas is a different conceptual level. It is more appropriate to say that JMM describes a set of rules, through which each variable in the program controls the access of the shared data area and the private data area. In the way, JMM revolves around atomicity, orderliness, and visibility. The only similarity between JMM and the Java memory area is that there are shared data areas and private data areas. In JMM, the main memory belongs to the shared data area. To some extent, it should include the heap and method areas, while the working memory data thread private data The area, to some extent, should include the program counter, virtual machine stack, and native method stack.

Thread, working memory, main memory work interaction diagram (based on the JMM specification): the
insert image description here
main memory
mainly stores Java instance objects, and all instance objects created by threads are stored in the main memory, regardless of whether the instance object is a member variable or a method. Local variables (also called local variables), of course, also include shared class information, constants, and static variables. Since it is a shared data area, thread safety issues may occur when multiple threads access the same variable.

The working memory
mainly stores all local variable information of the current method (the working memory stores a copy of the variable in the main memory), and each thread can only access its own working memory, that is, the local variables in the thread are invisible to other threads , even if the two threads execute the same piece of code, they will each create a local variable belonging to the current thread in their own working memory, of course, including the bytecode line number indicator and related Native method information. Note that since the working memory is the private data of each thread, threads cannot access the working memory, so the data stored in the working memory does not have thread safety issues.

According to the data storage type and operation mode of the main memory and working memory of the JVM virtual machine specification, for a member method in an instance object, if the method contains local variables that are basic data types (boolean, byte, short, char, int, long, float, double), will be directly stored in the frame stack structure of the working memory, but if the local variable is a reference type, then the reference of the variable will be stored in the frame stack of the functional memory, and the object instance will be stored in the main memory (shared data area, heap). But for the member variables of the instance object, no matter it is a basic data type or a wrapper type (Integer, Double, etc.) or a reference type, it will be stored in the heap area. As for static variables and information about the class itself, it will be stored in main memory. It should be noted that the instance object in the main memory can be shared by multiple threads. If two threads call the same method of the same object at the same time, the two threads will copy the data to be operated to their own working memory. In the operation, it is not refreshed to the main memory until the operation is completed


insert image description here
The relationship between the Java memory model and the hardware memory architecture is shown in the figure below.
Through the understanding of the previous hardware memory architecture, Java memory model, and Java multithreading implementation principles, we should have realized that multithreaded execution will eventually be mapped to the hardware. Execute on the processor, but the Java memory model is not exactly consistent with the hardware memory architecture. For hardware memory, there are only the concepts of registers, cache memory, and main memory, and there is no distinction between working memory (thread private data area) and main memory (heap memory). That is to say, the division of memory by the Java memory model has no effect on hardware memory. It has no effect, because JMM is just an abstract concept, a set of rules, and does not actually exist. Whether it is data in the working memory or the data in the main memory, it will be stored in the main memory of the computer for computer hardware. Of course, there are It may be stored in the CPU cache or register, so in general, the Java memory model and the computer hardware memory architecture are in a cross relationship, which is a cross between abstract concept division and real physical hardware. (Note that the same is true for Java memory area division)
insert image description here
The necessity of JMM
After understanding the specific relationship between Java memory area division, hardware memory architecture, Java multi-threading implementation principle and Java memory model, let's talk about Java memory The need for the model to exist. Since the entity of the JVM running the program is a thread, and when each thread is created, the JVM will create a working memory (called stack space in some places) for it to store thread-private data, and the variable operations in the thread and the main memory must pass The working memory is done indirectly. The main process is to copy the variable from the main memory to each thread’s respective working memory space, and then operate on the variable. After the operation is completed, the variable is written back to the main memory. If there are two threads simultaneously Operations on variables of instance objects in memory may induce thread safety issues.
Assume that there is a shared variable x in the main memory, and now there are two threads A and B operating on the variable x=1 respectively, and there is a copy of the shared variable x in the working memory of the A/B threads. Suppose now that thread A wants to modify the value of x to 2, but thread B wants to read the value of x, then is the value read by thread B the value 2 after the update of thread A or the value 1 before the update? Answer Yes, not sure, that is, thread B may read the value 1 before the update of thread A, or read the value 2 after the update of thread A. This is because the working memory is a private data area for each thread, and the thread When the A variable x, first copy the variable from the main memory to the working memory of the A thread, and then operate on the variable, and then write the variable x back to the main memory after the operation is completed, and it is similar for the B thread, so that It may cause data consistency problems between the main memory and the working memory. If thread A is writing data back to the main memory after modification, while thread B is reading the main memory at this time, copy x=1 to its own working memory In this way, the value read by thread B is x=1, but if thread B starts to read after thread A has written x=2 back to the main memory, then what thread B reads at this time is x= 2. But which situation will happen first?
As shown in the following example diagram:
insert image description here
the above specific interaction protocol between the main memory and the working memory, that is, how to copy a variable from the main memory to the working memory, how to copy it from the working memory The implementation details between synchronization to main memory, the Java memory model defines the following eight operations to complete.

JMM-Introduction to Eight Operations of Synchronization
(1) lock (lock): Act on a variable in the main memory, mark a variable as a thread exclusive state
(2) unlock (unlock): act on a variable in the main memory, put a variable in lock The state variable is released, and the released variable can be locked by other threads
(3) read (read): Act on the variable of the main memory, transfer a variable value from the main memory to the working memory of the thread, so that subsequent The load action uses
(4) load (load): acts on the variable of the working memory, which puts the variable value obtained by the read operation from the main memory into the variable copy of the working memory (5) use (use): acts on the working
memory A variable in the memory, which transfers a variable value in the working memory to the execution engine (6) assign (assignment): acts on a variable in the working memory, which assigns a value received from the execution engine to a variable in the working memory (7
) store (storage): a variable that acts on the working memory, transfers the value of a variable in the working memory to the main memory for subsequent write operations (8)
write (writing): acts on a variable in the working memory, it Transfers a store operation from the value of a variable in working memory to a variable in main memory

If you want to copy a variable from the main memory to the working memory, you need to execute the read and load operations sequentially. If you want to synchronize the variable from the working memory to the main memory, you need to execute the store and write operations sequentially. However, the Java memory model only requires that the above operations must be executed in order, and there is no guarantee that they must be executed sequentially.
insert image description here
Synchronization rule analysis
1) A thread is not allowed to synchronize data from the working memory back to the main memory for no reason (without any assign operation)
2) A new variable can only be born in the main memory, not in the working memory Direct use of an uninitialized (load or assign) variable. That is, before implementing use and store operations on a variable, you must first assign and load operations yourself.
3) A variable can only be locked by one thread at the same time, but the lock operation can be executed multiple times by the same thread. After multiple executions of the lock, the variable will be unlocked only after the same number of unlock operations are performed. lock and unlock must appear in pairs.
4) If the lock operation is performed on a variable, the value of this variable in the working memory will be cleared, and the value of the variable needs to be initialized by re-executing the load or assign operation before the execution engine uses this variable.
5) If a variable has not been locked by the lock operation in advance, it is not allowed to perform the unlock operation on it; nor is it allowed to unlock a variable that is locked by other threads.
6) Before executing the unlock operation on a variable, the variable must be synchronized to the main memory (execute store and write operations)

Reference: "The Art of Java Concurrent Programming"

Guess you like

Origin blog.csdn.net/qq_39513430/article/details/109400036