The underlying operating mechanism and principle of php

1 The design philosophy and characteristics of PHP

  • Multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that a request will not affect the overall service. Of course, with the development of the times, PHP has already supported the multi-threaded model.

  • Weakly typed language: Unlike C/C++, Java, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined at the beginning. It will be determined during operation and implicit or explicit type conversion may occur. The flexibility of this mechanism is very convenient and efficient in web development.

  • The engine (Zend) + component (ext) mode reduces internal coupling.

  • The middle layer (sapi) isolates the web server and PHP.

  • The syntax is simple and flexible, and there is not much specification.


2 The core architecture of PHP

The PHP core architecture is shown in the figure below, which can be simply divided into four layers from bottom to top: 
Write picture description here

  1. Zend engine: pure C implementation, is the core part of PHP, it translates PHP code (a series of compilation processes such as lexical and grammatical analysis) into executable opcode processing and implements the corresponding processing methods, and realizes the basic data structure (such as hashtable, oo), memory allocation and management, and corresponding api methods for external calls are the core of everything. All peripheral functions are implemented around Zend.

  2. Extensions: Around the Zend engine, extensions provide various basic services in a component-like manner. Our common built-in functions (such as array series) and standard libraries are all implemented through extensions.

  3. Sapi: The full name is Server Application Programming Interface. Sapi uses a series of hook functions to enable PHP to interact with the periphery. This is a very elegant and successful design of PHP. It successfully integrates PHP itself and upper-level applications through sapi. With decoupling isolation, PHP can no longer consider how to be compatible with different applications, and the application itself can also implement different processing methods based on its own characteristics. 
    Some common
    sapis are:  apache2handler: This is the processing method when apache is used as the webserver and the mod_PHP mode is used. It is also the most widely used one. 
    cgi: This is another way of direct interaction between webserver and PHP, which is the famous fastcgi protocol. In recent years, fastcgi+PHP has received more and more applications, and it is also the only way supported by asynchronous webserver. 
    cli: application mode invoked by the command line

  4. Upper application: This is the PHP program that we usually write, and we get a variety of application modes through different sapi methods, such as implementing web applications through webserver, running in scripts on the command line, etc.


3 PHP execution process

Write picture description here

PHP implements a typical dynamic language execution process: After getting a piece of code, after lexical analysis, grammatical analysis and other stages, the source program will be translated into instructions (opcodes), and then the ZEND virtual machine will execute these instructions in sequence to complete operating. PHP itself is implemented in C, so the final calls are all C functions. In fact, PHP can be regarded as a C-developed software.

The core of PHP's execution is the translated instruction, which is opcode.

Opcode is the most basic unit of PHP program execution. An opcode consists of two parameters (op1, op2), return value and processing function. The PHP program is finally translated into a sequence of opcode processing functions.

Several common processing functions:

  • ZEND_ASSIGN_SPEC_CV_CV_HANDLER: variable allocation (a=b)

  • ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER: Function call

  • ZEND_CONCAT_SPEC_CV_CV_HANDLER: string concatenation ab

  • ZEND_ADD_SPEC_CV_CONST_HANDLER: addition operation a+2

  • ZEND_IS_EQUAL_SPEC_CV_CONST: Judge equal a==1

  • ZEND_IS_IDENTICAL_SPEC_CV_CONST: Judge equal a===1


4 Introduction to Zend Engine

As the core of PHP, Zend engine has many classic design mechanisms, mainly the following:

4.1 Implementation of HashTable data organization:

HashTable is the core data structure of Zend. Almost all common functions are implemented in PHP. The PHP array we know is its typical application. In addition, in zend, such as function symbol table and global variables are also implemented based on hash table. 
Zend hash table implements a typical hash table hash structure, and at the same time provides the function of traversing arrays forward and backward by attaching a doubly linked list. Its structure
Write picture description here 
can be seen in the following figure  . In the hash table, there is both a key->value form of hash structure and a doubly linked list mode, which makes it very convenient to support fast search and linear traversal. 
Hash structure: Zend's hash structure is a typical hash table model, which resolves conflicts through linked lists. It should be noted that zend's hash table is a self-growing data structure. When the number of hash tables is full, it will dynamically expand by 2 times and reposition the elements. The initial size is 8. In addition, when performing key->value fast search, zend itself has made some optimizations to speed up the speed by changing space to time. For example, in each element, a variable nKeyLength is used to identify the length of the key for quick determination. 
Double-linked list: Zend hash table realizes linear traversal of elements through a linked list structure. In theory, it is sufficient to use a singly linked list for traversal. The main purpose of using a doubly linked list is to quickly delete and avoid traversal. Zend hash table is a composite structure. When used as an array, it supports common associative arrays and can also be used as sequential index numbers, and even allows a mixture of the two. 
PHP associative array: Associative array is a typical hash_table application. A query process goes through the following steps (as can be seen from the code, this is a common hash query process and adds some quick decisions to speed up the search):

01  getKeyHashValue h;02  index = n & nTableMask;03  Bucket *p = arBucket[index];04  while (p) {05      if ((p->h == h) && (p->nKeyLength == nKeyLength)) {06          RETURN p->data;   07      }08      p=p->next;09  }10  RETURN FALTURE;
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

PHP index array: The index array is our common array, accessed by subscript. For example, arr[0], Zend HashTable is normalized internally, and the hash value and nKeyLength (0) are also assigned to the index type key. The internal member variable nNextFreeElement is the largest id currently allocated, which is automatically incremented by one after each push. It is this kind of normalization that PHP can achieve a mixture of associative and non-associative. Due to the particularity of the push operation, the order of the index keys in the PHP array is not determined by the size of the subscript, but by the order of the push. For example, arr[1] = 2; arr[2] = 3; For a double type key, Zend HashTable treats it as an index key

4.2 Principles of PHP Variable Implementation:

PHP is a weakly typed language and does not strictly distinguish the types of variables. PHP does not need to specify the type when a variable is declared. PHP may perform implicit conversion of variable types during program execution. Like other strongly typed languages, the display type conversion can also be performed in the program. PHP variables can be divided into simple types (int, string, bool), collection types (array, resource, object) and constants (const). All the above variables are the same structure zval at the bottom.

Zval is another very important data structure in zend, used to identify and implement PHP variables. Its data structure is as follows: 
Write picture description here 
Zval structure is mainly composed of three parts: 
type: specifies the type of the variable (integer, string, array) Etc.) 
refcount&is_ref: used to implement reference counting (described later) 
value: the core part, the actual data of the variable is stored, 
Zvalue is used to store the actual data of a variable. Because there are multiple types to be stored, zvalue is a union, and thus weak types are realized.

The correspondence between PHP variable types and their actual storage is as follows:

1   IS_LONG   -> lvalue2   IS_DOUBLE -> dvalue3   IS_ARRAY  -> ht4   IS_STRING -> str5   IS_RESOURCE -> lvalue
  • 1

  • 2

  • 3

  • 4

  • 5

4.2.1 Integer and floating-point variables. 
Integer and floating-point are one of the basic types in PHP, and they are also simple variables. For integers and floating-point numbers, store the corresponding value directly in the zvalue. The types are long and double respectively. 
It can be seen from the zvalue structure that for integer types, unlike strongly typed languages ​​such as c, PHP does not distinguish between int, unsigned int, long, long long and other types. For it, there is only one type of integer, which is long. . From this, it can be seen that in PHP, the range of integer values ​​is determined by the number of bits of the compiler rather than fixed. 
For floating-point numbers, similar to integers, it does not distinguish between float and double but only has one type of double.

In PHP, what if the integer range is out of range? In this case, it will be automatically converted to the double type. Be careful with this. Many tricks are caused by this.

4.2.2 Character Variable

Like integers, character variables are also basic and simple variables in PHP. It can be seen from the zvalue structure that in PHP, a string is composed of a pointer to the actual data and a length structure, which is similar to the string in C++. Since the length is expressed by an actual variable, unlike c, its string can be binary data (including \0). At the same time, in PHP, the string length strlen is an O(1) operation.

When adding, modifying, and appending strings, PHP will reallocate memory to generate new strings. Finally, for security reasons, PHP will still add \0 at the end when generating a string

Common string splicing methods and speed comparison:

Suppose there are the following 4 variables: strA='123'; strB = '456'; intA=123; intB=456; 
Now we will compare and explain the following string splicing methods: 
1 res = strA.strB and res = "strAstrB" In 
this case, zend will re-malloc a piece of memory and perform corresponding processing, and its speed is average. 
2 strA = strA.strB 
is the fastest, zend will directly relloc based on the current strA to avoid repeated copies 
3 res = intA.intB 
is slower, because it needs to do implicit format conversion and actually write The program should also be careful to avoid 
4 strA = sprintf ("%s%s",strA,strB); 
This will be the slowest way, because sprintf is not a language structure in PHP, it is for format recognition and Processing takes a lot of time, and the mechanism itself is also malloc. However, the sprintf method is the most readable and can be flexibly selected according to the actual situation.

4.2.3 Array variables

PHP arrays are implemented naturally through Zend HashTable.

How to achieve foreach operation? The foreach of an array is done by traversing the doubly linked list in the hashtable. For indexed arrays, the efficiency of traversing through foreach is much higher than for for, eliminating the need for key->value search. The count operation directly calls HashTable->NumOfElements, O(1) operation. For a string like '123', zend will convert it to its integer form. arr['123'] and arr[123] are equivalent

4.2.4 Resource variables

Resource type variable is the most complicated variable in PHP, and it is also a composite structure.

PHP's zval can represent a wide range of data types, but it is difficult to fully describe custom data types. Since there is no effective way to describe these composite structures, there is no way to use traditional operators on them. To solve this problem, you only need to refer to the pointer through an essentially arbitrary label (label), which is called a resource.

In zval, for resource, lval is used as a pointer and directly points to the address where the resource is located. Resource can be any composite structure. The familiar mysqli, fsock, memcached, etc. are all resources.

How to use resources:

1 Registration: For a custom data type, you want to use it as a resource. You need to register first, and zend will assign a globally unique identifier to it. 
2 Get a resource variable: For resources, zend maintains a hash_tale of id->actual data. For a resource, only its id is recorded in the zval. When fetching, find a specific value in hash_table by id and return it. 
3 Resource destruction: The data types of resources are diverse. Zend itself has no way to destroy it. Therefore, users are required to provide a destruction function when registering resources. When the resource is unset, zend calls the corresponding function to complete the destruction. Also delete it from the global resource table. 
Resources can stay for a long time, not only after all variables referencing it out of scope, but even after a request ends and a new request is generated. These resources are called persistent resources because they continue to exist throughout the entire life cycle of SAPI unless they are specifically destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, our common mysql_pconnect, persistent resources allocate memory through pemalloc, so that they will not be released when the request ends. For zend, there is no distinction between the two.

4-3. PHP variable management-reference counting and copy-on-write:

Reference counting is widely used in memory reclamation and string manipulation. Zval's reference counting is realized through the member variables is_ref and ref_count. Through reference counting, multiple variables can share the same data. Avoid heavy consumption caused by frequent copying. In the assignment operation, zend points the variable to the same zval and ref_count++ at the same time. In the unset operation, the corresponding ref_count-1. Only when ref_count is reduced to 0 will the destruction operation be actually performed. If it is a reference assignment, zend will modify is_ref to 1.

PHP variables share data through reference counting. What if one of the variables is changed? When trying to write a variable, if Zend finds that the zval pointed to by the variable is shared by multiple variables, it will make a copy of the zval with ref_count of 1, and decrement the refcount of the original zval. This process is called "zval separation". It can be seen that zend only performs copy operations when a write operation occurs, so it is also called copy-on-write (copy-on-write)

For reference variables, the requirements are opposite to those of non-reference variables. Variables assigned by reference must be bound. Modifying one variable modifies all bound variables.

4-4. The realization of PHP local variables and global variables:

How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time. The former is used to maintain global variables. The latter is a pointer to the currently active variable symbol table. When the program enters a function, zend will allocate a symbol table x for it and point active_symbol_table to a. In this way, the distinction between global and local variables is realized.

Obtaining variable values: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When acquiring, it will find the corresponding zval from the table and return it based on the identifier.

Use global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in the active_symbol_table (the value of the referenced variable needs to be updated, everyone will update it together), if there is no variable with the same name in the symbol_table, it will be created first.


5 PHP memory management

See reference address 2.

Guess you like

Origin blog.csdn.net/zy17822307856/article/details/112691167