Front-end reporting | data analysis and efficiency improvement full link solution

Introduction: The  front-end chat early and live dry goods

Author: Busy fish technology - cloud listen

Background introduction

Xianyu was established in 2014, and now it has achieved a leap from 0 to 10 million DAU. With the rapid development of business, business decision-making methods have been upgraded. From the most primitive experience-driven to more scientific and reasonable data-driven. To be data-driven, you need to do a lot of data analysis and a lot of data report development. In the entire data analysis link, there are some R&D pain points:

  • BI resources are tight and slow to respond
  • SQL query speed is slow and waiting time is long
  • The front-end and server-end joint debugging costs are high
  • Data types are highly complex, and it is difficult to visually discover valuable information

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization.002.jpeg

Business status

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization.003.jpeg


The current process of data analysis is divided into three parts

  • SQL development
  • Application development
  • Data visualization, foreground output analysis report

It takes an average of 5 days or more to complete an entire development process . Let's take a step by step look at the current problems of each R&D node and whether we can optimize and solve them.

 

SQL development

The class of students responsible for SQL development is BI. Because BI lacks the concept of engineering abstraction, when every data development demand comes, it is necessary to re-develop the SQL code from 0 to 1, but the basic logic of most data analysis needs is Similarly, there is no reusability, which makes the overall efficiency extremely low. Can I add the concept of engineering abstraction to the field of SQL? I abstracted SQL into one atomic SQL . The front desk only needs to specify the principle SQL assembly rules. The assembly layer assembles the atomic SQL to obtain the final SQL query string, and then obtains the desired query results. Through this, repeat The SQL can be precipitated and reused, greatly reducing the time cost of repeated development. In addition, because the data volume of SQL query reaches more than 100 million, the time-consuming waiting for each request takes tens of minutes or more . The user experience is very poor, which reduces the efficiency of problem analysis. Our expectation is that it can take tens of minutes. The time-consuming waiting has been reduced to the second level. Currently, one of the products launched by Alibaba Cloud is an analytical database, which can meet the needs of big data computing scenarios to return query results in seconds, and we can use it directly.

 

Application development

Those who are responsible for application development are server-side students who use Java to develop applications. The main work is API packaging and data assembly. The front-end positioning has been generalized from Chetuzi to a very broad field. The boundaries of our capabilities are constantly expanding. There is an old saying: " Any application that can be written in JavaScript will eventually be written in JavaScript ". Why don't we directly use Serverless for this layer of applications, and use the capabilities of FaaS to implement it?

 

data visualization

Front-end students are responsible for the final data visualization. The server will agree with the front-end on the data format to be revealed. What kind of data the server will reveal, the front-end will display the full amount of data by default, so there will be a problem, especially in the business scenario of Xianyu, give a case: To show a trend chart of fluctuations in data indicators, there are probably dozens or hundreds of indicators. If the front desk renders these dozens or hundreds of curves at the same time, the front desk will feel a dense cluster of colors, which is difficult for him to go. Perceive some valuable data fluctuation information inside. In this, can the front desk do something to dig deep into some details? Can you extract the more significant information in the data that may be helpful to the business, and display it directly to the front desk?

 

Technical solutions

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization. 004.jpeg


Based on the above thinking, I designed such a set of technical solutions, divided into 3 layers:

  • Data layer
  • Service layer (FaaS)
  • Presentation layer

The overall logic is that the presentation layer will report  DSL (a structured description) to the FaaS layer. The FaaS layer will go through a layer of SQL assembly and generation logic, and put the final SQL in the data layer for query, and the data layer will query the results The result is returned to the FaaS layer, and then the FaaS layer passes the data back to the front desk for visual display after data processing.


Looking down on the details, the capabilities provided by the data layer revolve around two points, that is, to provide business data that meets the needs of analysis, and to make it easier to manage all underlying data. This part depends on the collaboration of BI students. The data storage location will be in two locations. The first is the offline data table. The original data is calculated offline, and scheduled tasks are scheduled for data cleaning. The data is stored in the offline data table, and then the offline data table is integrated through data. Synchronize the data to our analytical database.


The logic of the FaaS layer includes: providing basic monitoring capabilities, including access control, log management, and defining standard query API access parameters, as well as data processing. In the application logic abstraction, the reusability is abstracted into the SDK. The key points of the SDK implementation are 4 points:

  • Designed a data pipeline mode , why design this data pipeline mode? It is because the entire data analysis process can be broken down into three major parts, namely data acquisition, data processing, and data visualization. The entire link is like a data pipeline flow. In order to make the entire data flow processing process more flexible and pluggable, such a pipeline data processing mode is designed
  • The knowledge base , when the application is initialized, will automatically identify all the underlying data sets, classify the data types of the underlying data sets, and generate corresponding semantic information. The knowledge base will serve as the cornerstone of our entire solution for the following The entire data processing and semantic transformation processing provide a very important capability
  • The SQL generator is what I mentioned above. I abstracted and precipitated the SQL written by BI students with atomic granularity, and passed the DSL to the front desk to describe the assembly rules. After assembly, there will be a layer of dynamic compilation and AST syntax analysis. Logic to generate a final SQL string
  • Data processing. The input data of the front-end visualization chart can be abstracted. What we do here is to convert the result of the SQL query with the standard data source of the front-end visualization chart, and do a layer of data format conversion, and the conversion process is abstract and can be understood as yes A black box does not need to be compatible with various data formats for the front desk.


The current capability provided by the presentation layer is a data billboard, which provides the ability to render smart charts at the bottom layer . Smart chart rendering is divided into two parts:

  • Fish around busy business scenario, define a series of able to highlight the fluctuations in business information, visualization chart rendering valuable business data and put it into our abstract charting library inside
  • Introduce algorithm capabilities, intelligently extract data, and other types of algorithm capabilities like saliency extraction , which can more intuitively enhance valuable information to front-end users, and weaken some information that may be of little value to the business during visualization rendering

Service layer FaaS

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization. 006.jpeg


The service layer FaaS is based on the Midway Faas-Serverless development framework provided by the group. A series of BaaS  basic services are abstracted at the bottom layer  , including database query, permission verification, log reporting, and knowledge base.

After the front-end input is parsed through the DSL, it will be further subdivided into several partial DSLs. Each partial DSL describes the specific assembly rules of SQL, and dynamically compiled logic is executed for each partial DSL. Dynamic compilation will generate SQL string intermediate products, and then perform  AST syntax tree analysis on the intermediate products to determine which atomic SQL indexes the same table, same partition, and different fields, merge the repeated atomic SQL, and merge them You will get the final SQL string. Pass the SQL string to the database query to get the query result.
In the Xianyu business scenario, the most used are: X-axis, Y-axis business charts, such as distribution charts, trend charts, and Excel detailed two-dimensional array visualization charts. Data processing is responsible for converting the data format of the results of the SQL query.

After the data conversion is completed, the semantic processing of the data can be continued . When storing the table, in order to reduce the storage cost, the storage capacity of the stored chart will be required to be as small as possible. When storing the table, Number will be used to represent some String type expressions. For example: whether the conference will use 0 and 1 to express "no/yes". However, when the front desk is visualized, if 0 and 1 are directly exposed to the front desk, it will be overwhelming for operations and products, and he needs to spend a certain amount of understanding cost to transform his thinking. One thing I did is to insert a semantic processing into the logic of data processing. Based on the previously automatically generated knowledge base, first do the data processing of the data results of the SQL query, and then perform a layer of semantic transformation, and then 0 is followed by 1 Convert to "No/Yes", and finally return this part of data to the front desk through the interface.

Front-end smart chart rendering

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization.009.jpeg


Here is a more intuitive example diagram. The red line is the enlarged part below. If you only look at the picture above, you don't actually see any valuable information. Because the Y-axis is enlarged, the local indicator changes are smoothed out. If you look directly, you will think that these indicators have no fluctuations. For the business, it needs to pay attention to fluctuations and sudden changes. If the information is smoothed out during visualization, the window where the user finds the problem is basically closed by you. What the smart rendering does here is to extract the saliency through the logic of algorithm processing, and display the data with obvious fluctuations and reference value for the business first, and selectively weaken the other processing. On the right is the implemented code snippet. The overall implementation logic is to traverse all X-axis and Y-axis recursively, checking the intersections and the range of Y-axis fluctuation interval.

Results

C15-5 Cloud Listening-How to quickly calculate and analyze business big data visualization. 008.jpeg
For developers, you only need to describe the DSL clearly, and the entire dashboard will be rendered directly.
On the display side of the front desk, part of the SQL assembly opening will be revealed. Users can customize and modify interactively, and get timely feedback within seconds (refer to the left side of the figure above). For further analysis, you can customize the people you want to view and filter. The demographics, dimensions, and tags of the users are drilled down layer by layer. For users, both the efficiency and the experience of finding valuable information have been greatly improved.
Overall, the results are as follows:

  • Support  SQL flexible assembly and realize intelligent rendering of charts
  • The research and development cost that could have been  5 man-days was reduced to  0.5 man-days
  • Real-time calculation and second-level return (1062 milliseconds for 170 million data queries)

Future outlook

In the future, we will further upgrade to enable the entire data analysis process to achieve Low-Code or even No-Code. The front desk can obtain data from the source, complete the steps of data processing, data visualization, and generate data kanbans through simple drag and drop. Explore the direction of front-end intelligence, help businesses locate problems faster, find potential problems, assist business decision-making, and support rapid business development.

 

Original link
This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/109199954