Tencent and China Reading Technology Cooperation Microservice Framework Tars Adds PHP

Liang Chen (Ted), working in the Technology Center of China Literature Group, is responsible for the WEB background development of Qidian Chinese website. He was responsible for the development of the marketing QQWeb backend and QQ official account Web backend development of the Tencent Shanghai Enterprise Product Department. He has his own experience and insights on the technical architecture of large-scale websites. He is the developer of TSF2.0 framework of Tencent's open source project, the developer of Tencent's open-source component Tars-PHP, and the developer and maintainer of several PHP extension components of Tencent.

introduction

As an excellent RPC framework and service deployment operation and maintenance solution open sourced by Tencent, TARS has been introduced into practice by China Literature Group. At the same time, China Literature Group has supplemented the capabilities of TARS at the PHP language level, making TARS even more powerful. The TARS-PHP solution is simple and efficient, the interface is easy to maintain and expand, the code is automatically generated, and the functions of integrated addressing, service discovery, monitoring, and reporting are included. It has experienced the test and baptism of China Literature Group's online business, which fully proves the advantages of this solution.

Project address: https://github.com/Tencent/Tars/tree/master/php

"PHP is the best language in the world"

As we all know, at the beginning of the birth of PHP, it was the development of WEB sites. But for a long time, it has been impossible to get rid of the hat of weak type, the performance of scripting language. As the Internet industry continues to evolve, and user needs and infrastructure continue to change, the PHP language itself has also been evolving. Whether it is the emergence of SWOOLE, or the performance improvement of PHP7, it has enriched and assisted the application of PHP itself.

I believe that everyone will find in the development that PHP, which is often in the middle layer of the WEB, actually has many pain points. Receiving front-end HTTP requests and calling various back-end services and storage services often becomes a performance bottleneck for a site. Among them, the excessive redundancy of the HTTP protocol and the loss caused by the upper-layer encapsulation are a relatively prominent problem.

Developers not only have to deal with the throughput drop brought by the use of synchronous HTTP calling libraries, but also suffer from the inefficiency of the HTTP protocol itself, as well as JSON and XML protocols in information transmission. To solve this problem, a simple binary protocol is used at the TCP protocol layer. Only in this way can the business use less transmission bandwidth and carry more transmission content, thereby improving throughput and WEB service serving capabilities.

Enter image description

At the same time, at the level of actual development, the maintenance cost of the communication protocol between the PHP logic layer and the background service is relatively high. At the same time, when adding or modifying interface fields on the background service side, the calling side often needs to cooperate with the modification. In many cases, the complete compatibility of the interfaces cannot be guaranteed, which leads to online operation problems. Therefore, this kind of binary protocol should make the interface easy to maintain and at the same time easy to expand.

In addition, from the perspective of development efficiency, the original development always contains a lot of repetitive work that has to be done. Because every time a new protocol is developed, the code is difficult to reuse, and JSON and XML do not allow you to share part of the data. At the same time, a very real problem is that providers of different HTTP interfaces often define interfaces according to their own moods and habits.

A common example is the definition of return codes, some people call it ret, some people call it code, and some people call it r, and it's all-encompassing. Therefore, this kind of repetitive and uninteresting development work has brought a great physical and psychological burden to the caller's development students. Based on this requirement, both the server and the client can automatically generate the calling code according to the protocol and interface, and a solution to ensure smooth joint debugging is essential.

Furthermore, the caller's discovery of back-end services and reporting and monitoring of calls are also a common problem. How the back-end service is discovered and how the back-end interface is discovered are what the caller really wants to know. At the same time, it is very necessary for the caller to report the invocation of the back-end service to the central server, and the central server can dynamically adjust the load of the back-end service according to the collected information to ensure the high availability of the service. To achieve such a requirement, a solution that integrates monitoring, master addressing, reporting channels, and load balancing functions must be introduced.

Tars, as Tencent's excellent RPC framework and service deployment operation and maintenance solution, can meet all the above requirements. By introducing the complete solution of Tars-PHP, developers can use the binary Tars protocol, which greatly compresses the traffic of service requests. At the same time, it can also use the PHP extension parsed by the Tars protocol to improve the performance of packaging and unpacking, thereby improving the task processing capability of a single process. Third, tools that automatically generate code can also improve developer efficiency.

Tars-PHP solution

The open source solution of Tars-PHP starts with the binary protocol:

binary protocol

The HTTP protocol is probably the most widely used protocol at the application layer. The existing versions of HTTP are mainly 1.0 and 1.1. It does a very concise application layer protocol encapsulation on the basis of the TCP protocol, the content of plain text, and the distinction between Header and Body. All make the use and understanding of this protocol very convenient. But inevitably, the simplicity of use and reading means redundancy of information, and in order to transmit a small amount of content, it often requires a large amount of traffic.

The other two well-known protocols are JSON and XML. These two protocols are used in API interaction. They are highly readable, easy to understand, have rich language client support, and have outstanding protocol expression capabilities. advantage of the operator. Let's first look at the same piece of information and the amount of data required by both.

Suppose there is a school, a student, if identified in JSON, as follows:

{
    "school":
    {
        "student":{
            "name":"ted",
            "age":18,
            "degree":"master"
        }
    }
}

Very simple structure, a total of 65 characters are required to describe. And if you change to XML:

<school>
        <student>
            <name>ted</name>
            <age>18</age>
            <degree>master</degree>
        </student>
</school>

A total of 92 characters are required. From the point of view of informatics, the information entropy is obviously too low. Therefore, in order to achieve higher performance of communication and use of less bandwidth, the introduction of binary protocol is imperative.

As a binary protocol, the Tars protocol has self-evident advantages compared to the above two protocols. Find the flexibility in JSON and XML above, that is, not specifying the type of the field. But inevitably, this flexibility comes with a big loss in performance. Therefore, Tars defines eight basic data types, which are optimized by encoding different data types:

bool、byte、short、int、long、float、double 、string

At the same time, in order to meet business needs, three complex types of struct (including arbitrary fields), vector (array), and map (key-value structure) that can nest data and enrich the expressiveness of the protocol are extended.

According to the above performance structure, several structs can be completed. The first is the student structure:

struct student {
    0 required string name; // tag为0,type为string,实际数据为ted,共5个字节
    1 required byte age; // tag为1,type为short,实际数据为18, 共2个字节
    2 required string degree; // tag为2,type为string,实际数据为master,共7个字节
}

As can be seen from the comments, the number of bytes required for the three fields is 14, plus the start of the structure and the identifier of the end of the structure are 2 bytes, and only 16 bytes are needed in total. In contrast, this is only 1/4 of JSON, and 1/5 of the same information identified by the XML protocol, which is a high-level decision. Cleverly using the protocol to force the agreement to exchange the readability of transmission, this is a binary protocol with high information entropy the trick.

In order for PHP to be fully integrated with Tars, it must have the ability to act as a client and as a server.

Tars-PHP client

As a client, it must be able to meet the needs of rapid development, and it must also be able to combine with the existing common usage of PHP, and at the same time give examples of remote calls. Based on these requirements, the following features are implemented in the client solution:

  • Implemented the PHP extension and corresponding test cases for packing, unpacking, encoding and decoding using the TUP protocol;
  • Implemented the tars2php tool to generate corresponding PHP class files from Tars protocol files;
  • Implemented the secondary encapsulation including the network library and the code example of the remote call;

As the most core step of client implementation, it is to support the TUP protocol. The TUP protocol is the upper layer of the Tars protocol. It encapsulates some information necessary for sending and receiving packets through a fixed data structure, such as return values, input and output parameters, the status of the packet itself, and the packet count, etc., to non-Tars native clients and Tars servers. The protocol to communicate with. Tars-PHP chooses to use PHP extension as the implementation method in the scheme supporting the TUP protocol.

The PHP language itself is most criticized for its inefficiency in CPU-intensive operations. Due to the not very efficient ZEND virtual machine, loose data structures, and weak typing, it is inefficient to pack and unpack CPU-intensive tasks. Hence, the PHP extension came into being. By introducing high-performance C/C++ class libraries and some native C/C++ implementations, PHP has caught up in performance processing. This is the original intention of implementing the main logic of packaging and unpacking in an extended manner.

Let's first take a look at the structure of the PHP5x language:

Enter image description

The lowest-level Server API is used for PHP to communicate with Webserver, which is mainly used in conjunction with APACHE. The PHPCORE layer on its upper left is to provide the most basic file and network operation capabilities. The ZEND on the upper right is a tool used to compile the PHP scripting language into machine code. The top layer is the extension layer. This layer will make full use of ZEND's API and PHPCORE's capabilities to directly write code that ZEND can execute and understand efficiently, eliminating the need for the process of compiling PHP scripts into machine code, thus greatly improving execution performance. effectiveness.

If you want to design this extension, you must express the data structure of Tars above in C language, and design an encoder and decoder based on this data structure. Another aspect that needs to be considered is that it must be as simple and easy to use as possible at the PHP level, which poses a relatively high challenge to the design of extensions. On the one hand, performance must be taken into account. On the other hand, the Struct in the Tars protocol must be expressed as a Class in PHP:

Enter image description

As can be clearly seen from the figure, the structure SimpleStruct is decomposed into three parts:

  • TAG section
  • member variable section
  • the fields described by the variable

The TAG part is crucial, this part is used to represent the TAG value of each element in the Struct. This is also the final content contained in the binary package when the TUP encoding and decoding are actually performed. Why have TAG? This is because the TAG itself saves space compared to the description of the textual nature of the field in JSON.

The second part is the member variables of the class, which correspond to the variables in the Struct of the Tars protocol one by one. This exists to carry the actual value of the corresponding variable. In this way, the real data can be packed and unpacked.

In order to build a bridge between TAG and variables, there is a third part: the Fields part. This part is a map of TAG and its corresponding variable attribute. Contains the name of the variable, whether the variable is required, and the type of the variable. Through this information, on the one hand, the binary encoding of the Tars protocol is realized, and the mapping during decoding is also realized. It can be said to kill two birds with one stone.

Then after complex extension design and implementation, it is necessary to compare the packaging and unpacking performance of the extension implementation with the packaging and unpacking performance of the native PHP implementation. From the following table, it is very obvious that the extended implementation has an absolute advantage in performance:

Enter image description

It can be clearly seen from this table that whether it is a simple Tars protocol or a complex Tars protocol, the performance of packaging and unpacking using extensions is more than ten times higher than that of native PHP. When complex business logic is encountered and a large number of background services using the Tars protocol need to be called, this efficiency improvement will increase the throughput of the service by an order of magnitude.

After completing the extended compilation, developers can easily use the TUP protocol to package, unpack, and encode and decode.

// 针对基本类型的打包和解包的方法,输出二进制buf
$buf = \TASAPI::put*($name, $value);
$value = \TUPAPI::get*($name, $buf);

// 针对Struct,传输对象,返回结果的时候,以数组的方式返回,其元素与类的成员变量一一对应
$buf = \TUPAPI::putStruct($name, $clazz);
$result = \TUPAPI::getStruct($name, $clazz, $buf);

// 针对Vector,传入完成pushBack的Vector
$buf = \TUPAPI::putVector($name, TARS_Vector $clazz);
$value = \TUPAPI::getVector($name, TARS_Vector $clazz, $buf);

// 针对Map,传入完成pushBack的Map
$buf = \TUPAPI::putMap($name, TARS_Map $clazz);
$value = \TUPAPI::getMap($name, TARS_Map $clazz, $buf);

// 需要将上述打好包的数据放在一起用来编码
$inbuf_arr[$name] = $buf;
// 进行tup协议的编码,返回结果可以用来传输、持久化
$reqBuffer = \TUPAPI::encode(
                         $iVersion=3,
                         $iRequestId,
                         $servantName,
                         $funcName,
                         $cPacketType=0,
                         $iMessageType=0,
                         $iTimeout,
                         $context=[],
                         $statuses=[],
                         $inbuf_arr);
// 进行tup协议的解码
$ret = \TUPAPI::decode($respBuffer);
$code = $ret['code'];
$msg = $ret['msg'];
$buf = $ret['sBuffer'];

In order to facilitate developers to expand the problem that specific functions and parameters cannot be found, tars-ide-helper is provided at the same time:auto prompt

Taking PHPSTORM as an example, you only need to import it into the corresponding INCLUDE path to achieve automatic prompting:Import method

In addition to the ability to package and unpack, Tars-PHP also provides the ability to send and receive over the network. The network transceiver mainly implements the following points:

  • TarsAssistant.php file: loaded through COMPOSER, and the bottom layer has built-in SOCKET native network layer to send and receive packets;
  • Automatically generate PHP's Class according to the Interface, and seamlessly integrate with TarsAssistant
  • Provide fault-tolerant processing such as Exception;

Once the automatic code generation is completed, the user can easily call the remote Tars service through the following code:

	require_once "./vendor/autoload.php";

    $ip = "";// taf服务ip
    $port = 0;// taf服务端口
    $servant = new App\Server\Servant\servant($ip,$port);

    $in1 = "test";
    $ss1 = new SimpleStruct();
    $ss1->id = 1;
    $ss1->count = 2;
    $ss1->page = 3;

    try {
        $intVal = $servant->singleParam($in1,$ss1,$out1);
    }
    catch(phptars\TarsException $e) {
        // 错误处理
    }

Tars-PHP server

In addition to building Tars-PHP's ability as a client, the server's ability is also essential. In order to meet the needs of different business scenarios, Tars-PHP mainly focuses on two types of services on the server side.

The first type is HTTP service, which will use SWOOLE2.0 as the basis for network transmission and reception to realize a high-performance, simple and easy-to-use WEB service-oriented framework. This framework will support common WEB framework features such as basic routing, middleware, and MVC architecture. At the same time, it will also integrate common clients such as Redis, Mysql, Http, Multicall, Tars, etc., to facilitate the WEB service to call the background service. More importantly, access to the Tars platform enables services to be monitored and restarted, enjoying the one-stop convenience brought by the Tars operation and maintenance platform. Now the first version of the framework has been implemented and used in China Literature Group. After the test is mature, it will be open sourced in time.

The second category is the TCP service, which also relies on SWOOLE2.0 at the bottom layer, but the protocol is changed from HTTP to support for TUP and Tars. In terms of framework implementation, it will be consistent with the server side of JAVA and C++, and the bottom layer integrates network capabilities. Users only need to care about the service name, interface parameters and their own business processing logic. Of course, this service must also be combined with the Tars operation and maintenance platform. Now the first version of the framework's support for the TUP protocol has been completed, and it will be used and verified in the business after the completion of the underlying support of the Tars protocol.

business practice

China Literature Group used the Tars-PHP solution in the process of background service management and transformation. On the one hand, all the interfaces of the WEB background and background services are all switched from the original HTTP interface to the TCP network transmission based on the Tars protocol. Relying on the automatic code generation of Tars-PHP, the development efficiency is greatly improved, which ensures the smooth and timely launch of the project. At the same time, this solution based on PHP extension also ensures efficient code execution efficiency, and the processing time of a single request is significantly shortened compared to the original HTTP interface call.

On the other hand, since the WEB background service used is resident in memory, it is implemented based on SWOOLE. Therefore, it is different from the Apache and PHP-FPM inherent in the original PHP in terms of publishing, starting, monitoring, etc. Therefore, as mentioned above, accessing the Tars platform and enjoying a series of functions such as monitoring, keep-alive, and logging will greatly improve the convenience of operation, maintenance and expansion of the service itself. Today, in its online services, more than ten services have cut in and stably run HTTP services connected to the Tars platform. The release, expansion, and operation and maintenance of these services are completely dependent on the Tars platform, which is very convenient.

In addition to the use of Tars platform operation and maintenance, there is also a set of solutions for service discovery on the backend side of Reading WEB.

For remote service address management, the worst solution is to write it to a local file. This solution cannot cope with the needs of rapid capacity expansion and server offline, which will bring a lot of workload to subsequent operation and maintenance.

A slightly better solution is to store the virtual IP locally, then you only need to adjust the virtual IP each time to realize the automatic mapping and change of the service address. However, this means that for each background service to be called, a series of information such as its corresponding virtual IP, HOST information, and interface information need to be stored, and the maintenance cost is also high.

A more general solution is a unified configuration center that provides services. Every time a background service needs to be called, the latest address of the service is pulled from the configuration center according to the unique identifier. In this way, on the one hand, it can achieve no perception of business when shrinking and expanding. On the other hand, the configuration center can also allocate the most suitable service machine address to each client through the addressing situation of the service, such as the computer room or the nearest SET allocation, etc. . The local service only needs to provide two capabilities. The first is to be able to call the periodic addressing service and store it in the local storage to ensure the speed of addressing. The second is to be able to receive commands from the configuration center to update the address of a specific service. If these two points can be achieved, efficient addressing and reliable addressing can be achieved.

In actual use, combined with the actual business situation, on the one hand, request the address of the service from the master every minute, obtain an available service address by polling, and then put it into the local high-speed shared memory, which is convenient for this minute. Repeated reads within. On the other hand, every time a service is called, the time-consuming and success rate reporting of the service call is automatically integrated at the bottom layer. Under the two-pronged approach, calls to remote services are no longer as difficult to maintain, develop, and monitor as in the past, but are managed clearly and efficiently.

Epilogue

In terms of development efficiency, using Tars-PHP gets rid of redundant business code and improves the automation of code development by means of automatic generation.

In terms of performance, the Tars-PHP solution has greatly improved the performance by introducing extensions, so that performance is no longer the "death" of PHP.

In terms of ease of use, by providing the network transceiver component of TarsAssistant, the transceiver package does not need to be implemented separately. Later, a higher-performance Swoole will be introduced as a powerful tool for socket transceivers to further improve network performance.

In the future, the SERVER side solution of Tars-PHP will also be open sourced as soon as possible, so as to provide a complete solution including client and server. This complete set of WEB background Tars-PHP development system can truly achieve high performance, high efficiency and high availability. And China Literature Group will continue to cooperate and practice with Tencent on Tars-PHP technical solutions. Developers are welcome to try it out!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325437008&siteId=291194637