Chapter 1 of "CLR Via C#"

Table of contents

1. Classification of books and key points of the whole book (simplified as much as possible)

  1. Book classification

  2. Key points of the book

2. The execution model of the first chapter CLR

  1. The focus of this chapter (simplify as much as possible)

  2. Chapter Contact

    2.1 One and Two:

    2.2 One-three

3. Detailed explanation of this chapter

  1. Compile the source code into a managed module

  2. Merge managed modules into assemblies

  3. Load the common language in progress

  4. Execute the code of the assembly

  5. IL and Verification

  6. Insecure Code

  7. Native code generator: NGen.exe

  8. Framework class library

  9. Generic type system

  10. Common language specification

  11. Interoperability with unmanaged code

Four, the problem

  1. Why is there a CLR?

  2. What is the use of different languages?

  3. What exactly is an assembly, and what will become an assembly in the project.

  4. What is the role of the logical representation and physical representation of the assembly. How should these areas be used in the project.

  5. What is the role of the self-description of the assembly.

  6. What are executable files and DLL files.

  7. What is the process?

  8. What exactly are robustness and stability?

5. Mind map


1. Classification of books and key points of the whole book (simplified as much as possible)

  1. Book classification

        Theoretical papers mainly talk about the foundation of CLR, C# and .Net.

  2. Key points of the book

        From the shallower to the deeper, see the essence of CLR and .NET, know what it is, penetrating into it, explore the mechanism of CLR and .NET, and know why.

2. The execution model of the first chapter CLR

  1. The focus of this chapter (simplify as much as possible)

        It roughly introduces the process of CLR executing the application program, compiles the code process, and introduces the relationship between IL, metadata and assembly.

  2. Chapter Contact

    2.1 One and Two:

        The first chapter briefly introduces metadata and assemblies, and the second chapter will introduce them in more detail.

    2.2 One-three

        The first chapter introduces the basics of assembly, and the third chapter introduces the strong naming assembly in detail.

3. Detailed explanation of this chapter

        Try to use diagrams and metaphors to record. Document it in a way that will teach others.

  1. Compile the source code into a managed module

        The Common Language Runtime ( CLR ) is a runtime for multiple programming languages . The core functionality of the CLR is available to all languages. As long as the compiler is oriented to the CLR it will do.

        Compile the source code process: the compiled result is a managed module (managed module, 32-bit or 64-bit Windows portable executable, PE32 or PE32+ file)

source code compilation process

        Managed modules include PE32 or PE32+ headers, CLR headers, metadata, and IL (intermediate language) code.

Managed Module Composition

        Native code compilers (native code compliers ) generate code for a specific CPU architecture (x86, x64, or ARM). Conversely, every compiler targeting the CLR generates IL code , sometimes called managed code .

        Compilers targeting the CLR generate complete metadata ( metedata ) in each managed module , describing what is defined and referenced. Metadata is a superset of older technologies ( COM 's "Type Library" (Type Library) and "Interface Definition Language" (Interface Definition Language, IDL) files) . The compiler generates metadata and code at the same time , binds them together , and embeds them in the installation module, so that the metadata and the IL code it describes are never out of sync .

        The C# compiler always generates modules that contain managed code ( IL ) and managed data (garbage-collectable data types). In order to execute modules that contain managed code and managed data, users must have the CLR installed on their computers .

  2. Merge managed modules into assemblies

        The CLR actually works with assemblies . An assembly is a logical grouping of one or more modules / resource files and is the smallest unit for reuse, security, and version control . Can generate single or multi-file assemblies.

        Assembly generation process: Some managed modules or resource files are handled by tools that generate assemblies . The tool generates PE32(+) files that represent logical groupings of files. PE32(+) contains a manifest for a collection of metadata tables . Metadata tables describe the files that make up an assembly, the publicly exported types ( public types) implemented by the files in the assembly, and the resource or data files associated with the assembly.

Assembly generation process

        The compiler turns the generated managed modules into assemblies by default. An assembly separates its logical representation from its physical representation . Assembly modules contain information about referenced assemblies , allowing assemblies to be self-describing. This self-describing information allows the CLR to determine what the assembly's immediate dependencies are . No need to store information in the registry.

  3. Load the common language in progress

        Each assembly generated can be either an executable application or a DLL (one of a set of types used by the executable). Both are executed by code in the CLR management assembly. This means that the target machine must have .Net Framework installed .

  4. Execute the code of the assembly

        Managed assemblies contain both metadata and IL. IL is a machine language that has nothing to do with the CPU and can be regarded as an object-oriented machine language . Note: High-level languages ​​can only use a subset of the full functionality of the CLR, whereas IL assembly language allows developers to access the full functionality of the CLR.

        Converting the method's IL into native CPU instructions is the responsibility of the CLR's JIT (just-in-time) compiler.

        Execution method flow:

  • Before the Main method is executed, the CLR will detect all types (Console class) referenced by the Main code .
  • The CLR allocates an internal data structure to manage access to reference types (the Console structure, which holds the methods and corresponding JITCompiler).
  • In this structure, each method corresponds to a record item , and the record item contains an address, pointing to the implementation of the method (at the time of initialization, the record items are all set to point to an uncoded crotch function JITCompiler inside the CLR).
  • The Main method calls WriteLine, and JITCompiler is called. JITCompiler is responsible for compiling the method's IL code into native CPU instructions .
  • JITCompiler saves the native CPU instructions into a dynamically allocated memory block, and returns to the data structure created by the CLR for the type, modifying this method's reference to JITCompiler to point to the memory block .
  • The second call to WriteLine will directly execute the code of the memory block (WriteLine has been verified and compiled for the first execution).

Call method flow - first call to WriteLine

Call method flow - second call to WriteLine

        The JIT compiler saves the native code memory as dynamic memory , and the compiled code will be lost when the program is closed . The JIT compiler will only compile again if the program is run again.

        The CLR's JIT compiler can optimize native code .

  5. IL and Verification

        IL is stack-based , and all its instructions push operands into an execution stack and pop results from the stack. The biggest advantage of IL is the robustness and security of the application . When IL is compiled into native CPU instructions, there will be a verification process . Check the high-level IL code to make sure everything is safe (each method has the correct parameters, types, etc.).

        Each Windows process has its own virtual address space . The independent address space can achieve robustness and stability, and one process cannot interfere with another process (you can't simply trust a process, what if there is a werewolf in the middle of a process?).

        The CLR provides the ability to execute multiple managed applications within a single operating system process, each within an AppDomain . Each managed EXE file runs in its own independent address space by default, and this address space has only one AppDomain.

  6. Insecure Code

        Microsoft C# generates safe code by default. When the JIT compiler compiles an unsafe method, it will check whether the assembly where the method is located has been granted the System.Security.Permissions.SecurityPermission permission, and whether the SkipVerification flag of the System.Security.Permissions.SecurityPermissionFlag is set. After successful setting, JIT will compile unsafe code, otherwise System.InvalidProgramException or System.Security.VerificationException will be thrown.

        Microsoft provides the PEVerify.exe program to check all methods of assemblies and report methods that contain unsafe code.

  7. Native code generator: NGen.exe

        The NGen.exe tool provided by the .NET Framework can compile the IL code into native code when the application is installed on the user's computer . The JIT compiler does not need to compile IL code when it is running, which helps to improve the performance and startup speed of the application. When the CLR loads the assembly file, it will first check whether there are native files generated by NGen, and then JIT compiles the IL.

        NGen.exe advantages:

  1. Improve application startup speed
  2. To reduce the working set of the application (in all the memory of the process, the part that has been mapped to physical memory), NGen.exe compiles the IL code into native code and saves it in a separate folder, so that if There are multiple processes that need to be used, and they can be used together through memory mapping, without copying a separate code.

        shortcoming:

  1. Without intellectual property protection, it is impossible to distribute only the files generated by NGen without including the IL code. When the CLR is running, it requires access to the metadata of the assembly for functions such as reflection and serialization. This time it is required to publish the assembly containing IL and metadata. Prevents the CLR from using files generated by NGen.exe.
  2. The files generated by NGen may be out of sync. When the CLR loads the NGen files, it will compare many features of the precompiled code with the current environment. If there is a mismatch, it must be compiled with JIT.
  3. Poor execution performance, NGen cannot make assumptions about the execution environment like the JIT compiler, and NGen cannot optimize the use of specific CPU instructions.

        Use NGen with caution .

  8. Framework class library

        .Net Framework includes Framework Class Library (Framework Class Library, FCL). FCL is a collective term for a set of DLL assemblies . There are a huge number of type definitions. For example:

  1. Web service (Web service)
  2. HTML-based Web Forms/MVC applications (websites)
  3. "Rich" Windows GUI applications
  4. Windows console application
  5. windows service
  6. database stored procedure
  7. component library

        FCL contains a huge number of types, so related classes should be placed in a separate namespace, such as Object, integer, character, string, exception handling, etc. are placed in the System namespace.

        To use any function of the Framework , you need to know which type provides this function and which namespace this type is contained in .

  9. Generic type system

        Everything in the CLR revolves around types. Types allow code written in one programming language to communicate with code written in another programming language. Microsoft defines a specification describing the definition and behavior of types "Common Type System" (Common Type System, CTS).

  1. Field: A data variable that is part of an object's state.
  2. Method: A function that performs an operation on an object.
  3. Properties: Properties allow validation of input parameters and object state before accessing values, or computing a value when necessary.
  4. Event (Event): An event implements a notification mechanism between an object and other related objects.

        CTS also formulates type visibility rules and type member access rules (public, private, protected, internal, etc.).

        Don't pay too much attention to the CTS rules, because you will learn this syntax when you learn the language .

  10. Common language specification

        Objects created in different languages ​​can communicate with each other through COM. The Common Language Specification (CLS) defines the minimum feature set that all languages ​​include.

  11. Interoperability with unmanaged code

        Don't read it, use it again.

Four, the problem

  1. Why is there a CLR?

answer:

        Simply put, a runtime is a complete set of specifications that specifies all aspects needed to create and run a program . The goal of the CLR is to make programming easy .

        Programming definitely needs to interact with hardware, such as receiving user input, network communication, etc. It then needs to be compiled to be executed by the hardware. For example, the C++ language can only be compiled and run by binding to the corresponding hardware architecture (x86) and operating system (Windows, macOS). And the formed executable program only provides the information needed to run a program. There is no reference to the library or the like to allow the methods inside to execute (similar to printf). For example, C++ programs usually use the standard library (called msvcrt.dll on Windows), which contains most of the commonly used functions (such as printf), but only this one library file will not work. If programmers want to use this library, they must also have a header file (such as stdio.h) that matches it. It can be seen that the existing executable file format standards cannot simultaneously : 1. meet the requirements of running the program; 2. provide other information or binary files necessary to make the program complete.

The CLR can solve these problems. Because it has developed a very complete set of specifications . This set of specifications describes all the details needed in the complete life cycle of a program, from construction and binding to deployment and execution. And it supports multi-language interaction . The difficulty in the interaction between languages ​​is that they can only use the basic functions provided by the operating system to interact with other languages. Because the abstraction level of the operating system is too low (for example, the operating system does not know what the heap memory that supports garbage collection is ), cross-language interaction is usually complicated. By providing a common language runtime, the CLR allows languages ​​to use higher-level structures to interact (such as structures that support GC ) , which greatly simplifies the complexity of interactions.

  2. What is the use of different languages?

        Answer: Different languages ​​are for different needs, which can save development time.

  3. What exactly is an assembly, and what will become an assembly in the project.

        Answer: In the VS development environment, a solution can contain multiple projects , and each project is an assembly .

        The application contains the application domain (AppDomain), assembly (Assembly), module (Module), type (Type), member (EventInfo, FieldInfo, MethodInfo, PropertyInfo)

        There is a subordinate relationship between them, that is to say, an AppDomain can include N assemblies, an assembly can include N modules, a module can include N types, and a type can include N members. They are all under the System.Reflection namespace.

        The CLR manages this application domain, including loading each assembly into the appropriate application domain and  controlling the memory layout of the type hierarchy in each assembly .

        All the code we write will be compiled into the assembly file (.exe .dll), and loaded into the memory as an Assembly object at runtime to run , and each class ( Class Interface ) is loaded into the memory as a Type object , and the class Members (methods, fields, properties, events, constructors) loaded into memory also have corresponding objects.

  4. What is the role of the logical representation and physical representation of the assembly. How should these areas be used in the project.

        answer:

        Logical structure: the logical relationship between entity data elements, that is, the basis for the understanding of entity properties, is an abstract model .

        Physical structure: data element computer storage means computer data comprehension logical structure computer language mapping .

        Separate the physical representation of the assembly from the logical representation, put rarely used types and files into separate files , and use these files as part of the assembly, and download them if needed at runtime, which not only saves It saves disk space and saves installation time. Through assemblies, it can be deployed in different places while still treating all files as a whole.

  5. What is the role of the self-description of the assembly.

        Answer: It allows the CLR to judge the directly dependent object to execute the code without registering additional information through the registry.

  6. What are executable files and DLL files.

        Answer: Executable files (.exe files) and class library files (.dll files), the former is a file that can be directly executed , and the latter is a file for the former to call .

  7. What is the process?

        Answer: Narrow definition: A process is an instance of a program that is running .

        Broad definition: A process is a running activity of a program with certain independent functions on a certain data set. It is the basic unit of the dynamic execution of the operating system . In the traditional operating system, the process is not only the basic allocation unit, but also the basic execution unit.

  8. What exactly are robustness and stability?

Answer: Robustness mainly describes the insensitivity of the system to parameter changes . When you provide a fixed parameter, it can produce stable and predictable output. Reliability describes the correctness of a system. For example, if you gave the worker money for renovations, he ran away and was unreliable; he did the renovations and was reliable, but he stayed at your house after the renovations, and you didn’t drive him away (release the memory he occupied), so you said the program was unreliable. robust.

5. Mind map

Guess you like

Origin blog.csdn.net/weixin_51374560/article/details/128746575