C ++ General Assembly | Hang Seng Yan Zhang Yue: the financial sector architecture and high-frequency trading system speed exploration of key technologies

http://rdcqii.hundsun.com/portal/article/UFT-614.html

October 28, 2016 C ++ and System Software Technology Conference held in Shanghai, C ++ Bjarne Stroustrup and the father of many technology leaders at home and abroad in the field of software engineering and front-line combat experts invited to participate in the conference.

Hang Seng e-chief architect, Le Hang Seng Institute expert Zhang Yan issued a "financial industry high frequency speed trading system architecture and key technologies of exploration" on the topic.

 

[Why do you need speed transaction? ]

Just meet the general needs of ordinary retail investors, but professional investors have greater demands for speed and services, such as stock index arbitrage demand, policy needs of the market information service of the demand.

With the improvement of professional investors the speed and service requirements, the traditional database has been unable to meet the requirements, there must be a revolutionary change to meet the needs of a new era, and thus speed the transaction came into being.

Professional investors and institutional clients, the choice of the trading system memory, take market turnover of the main push mode; for ordinary investors and retail investors, the choice of database transaction systems, taking market turnover polling mode.

 

[Joy] newborn first generation UFT-
* Note: UFT (Ultra Fast Trading) refers to the Hang Seng trading system speed.

In 2008, the first wave of demand from the futures, then we only have the general idea, and not a very good architectural design. But we tried to reduce the practices of traditional databases. We found that a shareholders related data is limited manner when looking for data can be organized in advance, without reordering.

The first generation of UFT, futures commission business decreased from 20ms to delay 100us, our performance to two orders of magnitude improvement can be said to be a revolutionary change.

At the same time, also created some new problems: 
● pure handwriting, high requirements for developers, need to be familiar with C / C ++ technology and business knowledge;
● low development efficiency, develop new business cycle is long;
● business and data coupled affect the stability of the system, troubleshoot difficulties;
● poor scalability, applied to the basic needs of other business scenarios from scratch.

 

[Pleasure] the growth of the second generation UFT-

In 2012, stock index futures appear. Various quantization approach, program trading occurs. In addition to futures companies, a large number of fund companies, securities companies are beginning to make quantitative investment, we need to quantify systems, increasing application scenarios, the original approach, there have been some new problems.

This is the technical requirements of technical personnel, business requirements are higher, so we do a more careful design of the entire system. Encapsulation of the package, the delamination of the delamination. Hang Seng traditional practices of their own to do a development tool for reducing development effort. As we all know, c ++ compilers are relying on a lot of time to achieve ease of use. When the compiler change does not move, and then developed to enhance ease of use by development tools. UFT development tools offer many advantages in the following areas.

The Hang Seng has followed rigor and regulatory financial industry, but also in strict compliance with the organizational structure speed trading system, and the data model also made some considerations.

UFT second-generation fast-growing, but also solve many problems:
◕ package memory database, transaction support, the index can be persistent, can be applied to various business scenarios;
◕ trading unit, pre-associated data, the body was treeing, narrow look, without even having to look for;
◕ development tools, development of high efficiency, less error-prone.

 

[Third-generation] UFT- new challenge

13 years time, the Hang Seng to the needs of the new board, the HKEx. Exchange requirements are relatively high. For example, the switching time is shorter, the data can not be lost. Therefore, we have adopted a multicast live, a sequencing process of architecture. Upon receipt of an order, we send two or multiple servers at the same time to deal with. The fastest server to handle output outside, there are certain advantages to this performance. Some businesses have requested the order, the order of the results of different output will be different.

Therefore, there is a problem here: sequencing. All things lined up, handed over to each node processing, handling re-transmission out. With such single-threaded mode, when the system is wrong, it is easier to find a solution. 

Excellencies Exchange emphasis on the principle of open and fair justice. Therefore, the system of exchange is easy to do. To enhance reliability, we use two network cards. Single-threaded processing time required on each one as short as possible, so in order to enhance overall performance. At that time the minimum requirement is 40 000 pens per match, which we did not do anything optimization, network optimization nor even, it reached 70,000 pens.

Of course, the most important thing queuing mechanism, sequence lined up, do multicast. The advantage of this is: The core may have several groups. Each group has its own division, say only one set match, and another set of matching output market needs over outwardly. Later, our R & D team turn the architecture of such a relatively specific business scenarios extend from the exchange to more business scene. And began to take batch mode, to enhance the speed of data processing.

 

[We want to] break more

Performance is a permanent theme. On the speed trading, the Hang Seng has reached the magnitude of the increase, but we have been thinking, if there are more places you can break through. We believe that to achieve a greater increase, to upgrade a CPU memory access performance, and second, to reduce network latency, Third, through a dedicated FPGA hardware to improve performance.

 

01 | upgrade the CPU accesses memory performance 

>> reduce cache latency

Reduce cache latency associated with the machine architecture. Server architecture is divided into three kinds: SMP- symmetric multi-processor architecture, MPP- massive parallel processing, NUMA- non-uniform memory access structure. The simplest of these is SMP, its multiple CPU shared memory. NUMA architectures, CPU access to memory as much as possible to do exclusive, shared there will be competition. 
The second point, CPU inside there will be cached. The difference between cache and secondary cache, one difference in speed, the second is the difference between exclusive and shared the third is the capacity difference, a cache is relatively small, two relatively large. This involves the data cache design. By designing the increase hit rate, so as to enhance performance. Size will seriously affect our performance while we affect the design of data structures.

>> cache coherency protocol
buffers the cache coherency protocol, and Oracle RAC in a similar technique described earlier. Different CPU core has its own cache, the cache in order to ensure consistency of these data, it is necessary to ensure through complex cache coherency protocol.

>> general store visit

In addition to the primary, secondary cache as well as a store, it is our general store. Our program has a virtual address to access memory called virtual memory, it is a fact with physical memory mapping, in the process mapping, the cache TLB handled properly will cause turbulence.

 

02 | reduce network latency

In addition to memory access latency, network delay is more serious. Apt dozen microseconds. We all know that switching user mode and kernel mode is very time consuming, currently the best time card back and forth as long as two microseconds, its handling of the most important point is that by eliminating the user and kernel mode switching, taking into account memory copy overhead, thread switching overhead to reduce latency network access.

 

03 | FPGA dedicated hardware

FPGA be increased by the performance of hardware parallelism, of course, have achieved greater complexity, it is impossible to put all of the service on the FPGA. FPGA can interact with the host through the PCI interface. Hang on the basis of the overall architecture of the FPGA has also done some research, some consider more low-latency requirements of business on the FPGA, while others still be implemented in software.

Guess you like

Origin www.cnblogs.com/dhcn/p/12105812.html