Understand the JIT of PHP 8

Insert picture description here

The JIT (Just In Time) compiler of PHP 8 will be integrated into php as an extension. The Opcache extension is used to convert certain opcodes directly into instructions from the cpu at runtime.

This means that after using JIT, Zend VM does not need to interpret certain opcodes, and these instructions will be directly executed as CPU-level instructions.

JIT for PHP 8

The impact of the PHP 8 Just In Time (JIT) compiler is unquestionable. But so far, I found that I don't know much about what JIT should do.

After many researches and giving up, I decided to check the PHP source code myself. Combining some of my knowledge of the C language and all the scattered information I have collected so far, I propose this article, I hope it can help you better understand the PHP JIT.

To put it simply: When JIT works as expected, your code will not be executed by Zend VM, but directly executed as a set of CPU-level instructions.

This is the whole idea.

But in order to understand it better, we need to consider how php works internally. Not very complicated, but needs some introduction.

How is the PHP code executed?

It is well known that PHP is an interpreted language, but what does this sentence mean?

Every time PHP code (command line script or WEB application) is executed, it must go through a PHP interpreter. The most commonly used are PHP-FPM and CLI interpreters.

The job of the interpreter is simple: it receives PHP code, interprets it, and returns the result.

The general interpreted language is this process. Some languages ​​may reduce a few steps, but the general idea is the same. In PHP, the process is as follows:

  1. Read the PHP code and interpret it as a set of keywords called Tokens. This process lets the interpreter know what code has been written in each program. This step is called Lexing or Tokenizing.
  2. After getting the Tokens collection, the PHP interpreter will try to parse them. An abstract syntax tree (AST) is generated through a process called Parsing. Here AST is a node set indicating which operations to perform. For example, "echo 1 + 1" actually means "print the result of 1 + 1" or more specifically "print an operation, this operation is 1 + 1".
  3. With AST, it is easier to understand operations and priorities. Converting the abstract syntax tree into an operation that can be executed by the CPU requires an expression for transition (IR), which we call Opcodes in PHP. The process of converting AST to Opcodes is called compilation.
  4. With Opcodes, here comes the interesting part: executing code! PHP has an engine called Zend VM, which can receive a series of Opcodes and execute them. After executing all Opcodes, Zend VM will terminate the program.

This picture can make you more clear:
Insert picture description here

A simplified version of the PHP explanation process overview.

As you can see. Here is a question: Even if the PHP code has not changed, will this process be followed every time it is executed?

Let's look back at Opcodes. correct! This is why the Opcache extension exists.

Opcache extension

The Opcache extension is included with PHP, and there is usually no need to disable it. It is best to open Opcache when using PHP.

Its role is to add a memory shared cache layer to Opcodes. Its job is to extract the newly generated Opcodes from the AST and cache them for execution.

You can skip the Lexing/Tokenizing and Parsing steps.

This is a schematic diagram of the process including the Opcache extension:
Insert picture description here
PHP uses Opcache's explanation process. If the file has already been parsed, PHP will get the cached Opcodes for it instead of parsing it again.

Perfectly skip the Lexing/Tokenizing, Parsing and Compiling steps.

Side note: This is awesome
! Allows you to tell PHP FPM to parse the code base, convert it to Opcodes and cache it before execution.

Do you want to know how JIT participates in this interpretation process? This article will explain.

What is the effect of Just In Time compilation?

Zeev listening in
after I clarified the JIT actually do anything.

If Opcache extensions can get Opcodes faster and transfer them directly to Zend VM, then JIT allows them to run without using Zend VM at all.

Zend VM is a program written in C that acts as a layer between Opcodes and the CPU. JIT directly generates compiled code at runtime, so PHP can

The Zend VM is skipped and executed directly by the CPU. In theory, the performance will be better.

This sounds strange, because before compiling into machine code, you need to write a specific implementation for each type of structure. But in fact this is reasonable.

PHP's JIT uses a library called DynASM (Dynamic Assembler), which maps a set of CPU instructions in a specific format to assembly codes of many different CPU types. Therefore, the compiler only needs to use DynASM to convert Opcodes into machine code of a specific structure.

However, there is a problem that has bothered me for a long time.

If preloading can parse PHP code into Opcodes before execution, and DynASM can compile Opcodes into machine code (Just In Time Compilation), why don't we immediately use Ahead of Time Compilation to compile PHP immediately?

By listening to Zeev's broadcast, one of the reasons I found is that PHP is a weakly typed language, which means that PHP usually doesn't know the type of the variable before Zend VM tries to execute an opcode.

You can check the Zend_value union type to know that many pointers point to different types of variables. Whenever Zend VM tries to get a value from Zend_value, it will use a macro like ZSTR_VAL to get a pointer to the string in the union type.

For example, this Zend VM handler handles "less than or equal to" (<=) expressions. Look at it encoding so many if else branches, just for type inference.

Using machine code to perform type inference logic is not feasible and may become slower.

Evaluating first and then compiling is not a good choice, because compiling to machine code is a CPU-intensive task. Therefore, it is not good to compile everything at runtime.

So how does Just In Time compilation work?

Now we know that we cannot infer the type well to compile ahead of time. We also know that the computational cost of compiling at runtime is very high. So what are the benefits of JIT to PHP?

In order to find a balance, PHP's JIT tries to compile only valuable Opcodes. To this end, JIT will analyze Opcodes to be executed by Zend VM and check possible compilations. (According to the configuration file)

When a certain Opcode is compiled, it will hand over the execution to the compiled code, not to Zend VM. It looks as follows:
Insert picture description here

PHP's JIT interpretation process. If compiled, Opcodes will not be executed by Zend VM.

Therefore, in the Opcache extension, there are two detection instructions to determine whether to compile Opcode. If necessary, the compiler will use DynASM to convert this Opcode into machine code and execute this machine code.

Interestingly, since the compiled code in the current interface has a MB limit (also configurable), the code execution must be able to seamlessly switch between JIT and interpreted code.

By the way, this talk by Benoit Jacquemont on php's JIT helped me understand the whole thing.

I'm still not sure when the compilation part will be effective, but I think I don't really want to know now.

So your performance gains may not be great

I hope everyone now knows why most php applications don't get a lot of performance gains because of the use of just-in-time compilers. This is why Zeev recommends analyzing and experimenting with different JIT configurations for your application is the best approach.

If you are using PHP FPM, you will usually share the compiled opcodes between multiple requests, but this still does not change the rules of the game.

This is because JIT optimizes computationally intensive operations, and most php applications today are more I/O constrained than anything else. If you need to access the disk or the network anyway, it doesn't matter whether the processing operation is compiled or not. It will be very similar in time.

unless…

You are doing something that is not constrained by I/O, like image processing or machine learning. Anything that does not touch I/O will benefit from the JIT compiler.

This is why people now say that we prefer to write native functions in PHP instead of C. If you still want to compile this function, the overhead will be unexpressive.

Guess you like

Origin blog.csdn.net/qq_15915293/article/details/114029992