How does Xianyu's real-time reach system play with cross-terminal development?

  In the series of technical articles on Omega's real-time reach system, the three subsystems of behavior collection center , CEP rule center , and user reach center have been introduced in detail. Xianyu defines its own DSL language (domain specific language), which converts complex code development into a concise expression of SQL-like form, and in the bottom layer, different high-level languages ​​that can be used on the end-side, front-end and cloud , Such as: python, c++, javascript, java, etc. The use of DSL expressions in this way reduces not only the technical threshold, but also improves the efficiency of research and development. This brings up the question to be introduced in this article: how to implement the translation from a custom DSL language to a variety of low-level high-level languages?

Problems encountered in DSL language translation

  The Omega system not only implements the complex event calculation (CEP) engine in the cloud, but also implements their respective complex event calculation engines on the end-side and front-end. Compared with the cloud that can calculate cross-user behavior, the end-side and front-end CEP are more focused on single The calculation of user behavior is more real-time and safer. Because of the differences in the implementation of the CEP calculation engines at each end, developers can only be limited to developing in their respective fields. There is a higher technical threshold for cross-end development, and the time cost will also be uncontrollable. Therefore, we propose to use a custom DSL language to shield the technical differences at each end. In an ideal situation, developers should only focus on business logic, and other technical details should not spend energy, as shown in the red part below.

  The technical differences in the implementation of CEP calculation engines at each end mainly include input data, CEP calculation API, execution container, and result output. In terms of input data, because the input data processed by the end-side, front-end and the cloud are different, for example: the end-side/front-end can process the data of the user staying on a certain page for 5s, but the cloud cannot perceive it, which requires a custom DSL The language must be compatible with the difference of input data at each end at the data input level; in terms of CEP calculation API, the basic CEP calculation API designed at each end can be different, such as: i=i+1 and i++, there is no unified set of protocol specifications It is easy to cause rampant and brutal expansion, and it will also increase the difficulty of later unified translation; in terms of execution containers, the end-side and front-end are calculated on Ali’s end computing container Walle, and the cloud is calculated on Ali’s stream computing container blink; In terms of result output, since each end corresponds to the same user contact center, the calculation result agreement is basically the same on each end. After clarifying the differences between each end, you can sort out the following core content:

Input data protocol: compatible with the difference of input data at each end;

CEP calculation API: to facilitate the realization of unified translation and the control of basic protocols;

Translation framework selection and translation: Compatible with the translation work of high-level languages ​​at each end, so as to facilitate the unification of the ability to upgrade and iterate at each end;

Shielding execution container differences: Solve the mapping relationship between the execution container and the CEP calculation engine at each end;

Design and Implementation of DSL Language Translation

Unification of input data

  Regarding the differences in the input data at each end, the industry's more common practice is to build a common data template layer to shield the differences in the input data at each end, and each end registers its own input data instance as needed. The advantage of this is that the input of the custom DSL language can be unified, and in the process of subsequent translation to each end language, it is converted according to the registered specific examples of each end that meets the template specification. We also use this method to deal with the differences in the input data at each end. The following is the input data protocol template we defined:

{    "eventAlias":"事件别名",    "eventCode":"PUBLISH_ITEM",    "eventDesc":"卖家的详情被浏览",    "eventTime":"事件发生时间",    "updateTime":"事件更新时间",    "partitionId":"分区id",    "userId":"用户id",    "extraInfo":{        "itemId":"商品id",        "buyerId":"买家id",        "sellerId":"卖家id",        "itemType":"商品类型",        "itemStatus":"商品状态",        "categoryId":"类目id",        "latitude":"经度",        "longitude":"纬度",          ...:...    },    "scene":"场景",    "fromScene":"上一个场景",    "toScene":"下一个场景",    "isFirstEnter":"是否首次进入",      "bizId":"唯一Id",    "sessionId":"会话id",    "actionType":"行为类型",    "actionName":"行为标识",    "ownerName":"骆彬"}

Unification of CEP calculation API

  For the unification of CEP computing APIs at each end, the industry's more mature protocol specification is the Flink CEP protocol specification. The basic computing API split is more reasonable, and the acceptance of each end is higher. Therefore, based on the Flink CEP protocol specification, we have defined a set of computing API protocol specifications for the Xianyu CEP computing engine. Each end can implement specific APIs according to the protocol. The protocol specifications are as follows:

public static <X> Pattern<X, X> begin(final String name); public Pattern<T, F> where(IterativeCondition<F> condition);public Pattern<T, F> or(IterativeCondition<F> condition);public Pattern<T, F> until(IterativeCondition<F> untilCondition);public Pattern<T, F> within(Time windowTime);public Pattern<T, T> next(final String name);public Pattern<T, T> notNext(final String name);public Pattern<T, T> followedBy(final String name);public Pattern<T, T> notFollowedBy(final String name);public Pattern<T, T> followedByAny(final String name);public Pattern<T, F> times(int times);public Pattern<T, F> allowCombinations();public Pattern<T, F> consecutive();public static <T, F extends T> GroupPattern<T, F> begin(Pattern<T, F> group);public GroupPattern<T, F> next(Pattern<T, F> group);

Translation framework and implementation

  After unifying the input data and the CEP calculation API, you can start to customize the translation design from the DSL language to the unified CEP calculation API. Because the CEP calculation engine has various end implementations, the translation framework must be able to support translation in multiple target languages. At present, many translation frameworks used in the industry include Antlr V4, parboiled, and Apache Calcite. Their respective characteristics are shown in the following table:

  Combining the characteristics of the above various translation frameworks, the translation framework of Antlr V4 can friendly support our needs for multi-language translation, and development is more convenient. Finally, we chose the translation framework of Antlr V4. According to the custom DSL grammar and the unified CEP calculation API, a set of grammar analysis files can be designed, and then Antlr V4 generates the DSL grammar parser and AST grammar tree, and finally, combines the characteristics of each end to complete the translation from the AST tree node to the high-level language , The general process is shown in the figure below:

Perform container shielding

  For the one-to-many mapping relationship between the execution container and the CEP calculation engine at each end, we added the concept of a DSL rule type. Use the DSL rule type to associate the corresponding execution container, thereby shielding the developer's perception of the underlying execution container. In addition, we designed a DSL editor and provided auxiliary functions such as grammar and event prompts, audit flow, resource management, and result query. I believe it will provide developers with a friendly experience.

Actual application effect

  At present, Omega's translation program has been tested by double eleven practice, and it has achieved remarkable results in reducing technical thresholds and improving development efficiency. By shielding the differences in language implementation at each end through a custom DSL language, people with a little experience in SQL can quickly enter the development, the technical threshold for development has plummeted, and developers can focus on the implementation of business logic. Through the practice of Double Eleven, the development of business rules through custom DSL language can reduce the amount of development in the previous week to 1-2 hours. The average development time for a DSL business rule is about 10 minutes, and the development efficiency is doubled.

Follow-up development plan

  Omega's development ecology has already reached a certain scale, and has exported a series of core protocol standards, providing a simple and efficient integrated development and operation and maintenance environment. At present, the end-side and the cloud have been connected, and the front-end will be connected in the future to develop into a wider field. The technical details of translation at each end will be introduced in detail later. In addition, based on the core protocol standards of the translator, Xianyu DSL language capabilities will be further deepened, and the external output protocol standards and mature translator products are also planned.

Guess you like

Origin blog.csdn.net/weixin_38912070/article/details/110211728