[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Editor's note:

On the evening of September 10th, the "Cloud Plus Data, Smart Drive Future" data science forum hosted by Qiniu Cloud was held as scheduled. In the live broadcast, Chen Chao, Vice President of Qiniu Cloud Product and R&D, brought us a wonderful speech with the topic "Pandora Best Practices of Qiniu Cloud Machine Data Analysis Platform". The following is a transcript of the speech.

Guest profile

Chen Chao, Vice President of Qiniu Cloud Product and R&D, is responsible for Qiniu Cloud's product planning and R&D system. In recent years, he has focused on machine vision, distributed computing and machine learning and other fields, with a very rich distributed computing system and large-scale Experience in the design and implementation of machine learning systems, and in-depth research on distributed databases.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

The topic of my speech today is "Qiniu Cloud Machine Data Analysis Platform Pandora Best Practice". Before introducing Pandora, we can first understand the current overall architecture of Qiniu Cloud. As shown in the figure below, the bottom line of the figure is the part of Qiniu Cloud cloud, including live cloud, real-time audio and video cloud, and camera monitoring cloud. All data in it is gathered into a heterogeneous data lake born out of object storage. in. Above the data lake is a visual data analysis platform and a machine data analysis platform. Pandora we are talking about today belongs to the analysis platform of machine data.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Pandora belongs to this module of machine data intelligence in the entire territory of Qiniu Cloud. Machine data contains several parts, such as Iot data and data of various devices can become machine data.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

What is machine data

We have a simple definition of it: machine data is data produced by any machine or system. For example, data generated by servers, data generated by sensors, and data generated by various applications, including network devices, and so on. A feature of machine data, it is composed of a large number of unstructured time series data. For the machine data we are dealing with, there is no pre-defined schema, and its data format is particularly large, and it is difficult to predict and define, which means it is difficult to predict what format will come in, or I pre-defined Good format.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Features of Pandora and collection process

Pandora positioning as a real-time analysis platform for machine data, what are its characteristics?
The first is that Pandora natively supports schema free data. That is to say, you can dynamically add and delete fields at any time. Pandora further natively supports the ability of schema on read, so how the data is generated will be like entering Pandora without any processing. At the same time we support the function of model acceleration. Iteratively optimize the data model through SPL layered persistence, columnar storage, CodeGen, vector computing and other technologies.
The second feature is the cloud-native architecture. Pandora's entire system can avoid the difficulty of preprocessing modeling caused by ETL processing. At the same time, computing (dynamic resources) and storage (static resources) are separated, reducing costs, improving computing flexibility, and being complete The hot-warm-cold data lifecycle management can greatly reduce storage costs.
Third, it is our distinctive point: SPL's powerful analytical and expression capabilities. Supports rich machine learning commands to meet a large number of machine data analysis and AI scenarios; at the same time, it supports SPL real-time calculation, the result is exported to the docking system, and the business closed loop
is completed. The fourth point is that we have a powerful system expansion capability. In other words, you can think of Pandora as an os, in which you can use Pandora's various native capabilities to create your own APP ecosystem. The SDK supports pluggable expansion of the platform's visualization system and business organization. At the same time, Pandora can not only support SQL, but also expand SPL computing capabilities through Python, Go+, etc.; it also supports platform chart output, which can be integrated into business systems to complete data value output.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Below is a panoramic view of Pandora. From the perspective of big data analysis, collecting first, then processing, cleaning, and finally analyzing and applying is the common practice of all big data vendors. Pandora is unique in that it supports real-time indexing of raw data formats, which means that in addition to analysis, it can also support retrieval services. The entire retrieval and analysis are unified into the analysis engine of SPL, which can support retrieval and analysis services at the same time. It means that users don’t care about retrieval or analysis, and can be solved in one stop.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Pandora's data collection management process is shown in the figure below. Through this process, as long as there is data generated, it can be obtained in a very convenient way.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Schema On Road

After the data is obtained, it is Pandora's highlight-Schema On Read. We can see the difference of Pandora from this picture. Pandora can perform dynamic analysis during analysis after the original data is directly uploaded. In other words, only one copy of the original data is needed, and our multiple data models can answer all the questions of different users. The advantage of this is that you can perform various modeling for various data. For the changed data format, just make some small changes in Pandora to be fully compatible. This is not possible with a pure log system.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

SPL: Standard language for machine analysis data

SQL is the standard language for our machine data analysis. With a single line of SQL commands, you can retrieve, analyze, and visualize alarms. It supports direct processing of raw data and is optimized for time series data. You can see that when we defined machine data before, we included a large amount of time series structured data, so it gave us a lot of room for optimization.

In addition, we have built multiple storage engines in the data lake, and can connect to various storage engines. Finally, we wrote a formula SPL = SQL + Unix Pipeline, which means that we can search through SPL, and we can even write parsing conditions, but without the trouble of writing code.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

By providing a high-level language for machine data analysis and supporting complex search, aggregation and correlation analysis capabilities, the processing capabilities are more powerful. Support various mathematical operations, correlation analysis, transaction analysis, predictive analysis, etc. In transaction analysis, it is possible to analyze a collection of continuously interrelated events, which is why Pandora is particularly suitable for security scenarios.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

At present, we have integrated the capabilities of SPL into the platform. Users can easily access data. Feature activation, algorithm modeling, effect display, value application, etc. can all be completed in it.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Pandora extension application

With SPL capabilities, Pandora supports users to accumulate their knowledge on it. So our philosophy is that Pandora is not a platform, we hope it is delivering value. In other words, I hope to deposit knowledge in Pandora's App Store in the form of apps. Pandora's App Store, as an inaccessible part of the complete product architecture, forms a complete link from data access to data display.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Storage architecture: Data full lifecycle management
faces the situation of large amounts of data and emphasizes real-time processing. We build full lifecycle management of data in Pandora's storage architecture.

Under the dual considerations of cost and performance, we achieve complete decoupling of computing and storage in a coherent process, and resources can change as needed; data can be stored for long-term historical data, and the accumulated massive historical data can support future machine learning , AIOps and other scenarios.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

At the same time, Pandora is also working hard to connect with Qiniu Cloud storage, so that everyone can enjoy the highest possible performance when directly analyzing Qiniu Cloud cloud storage.

Technical point sharing

We have made forward index and inverted index, as well as mixed row and column storage, and also support hierarchical storage and On Read, CodeGen, and vectorization. The optimization of TimeSeries can be reflected in the calculation engine, allowing users to experience faster performance and more stable services.

[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

Pandora case

01A
large insurance company

It is mainly used for the whole life cycle intelligent management of the overall log of the information system, and realizes the unified collection, merge processing, centralized storage, correlation analysis and intelligent management of the log. It is suitable for solving three major scenarios in the field of operation and maintenance development:
IT operation and maintenance, security auditing, business operation analysis; it can provide online monitoring, operation and maintenance data support, problem diagnosis, fault warning, resource monitoring, user behavior auditing, and rule extraction , Filing, *** traceability, business trend analysis and other services.
[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

02A
top mobile phone manufacturing company

The second case is the monitoring diagnosis and root cause analysis of a top mobile phone manufacturer. The mobile phone must have a test process before it leaves the factory, but every time it is tested, a lot of test data will be generated to reflect the malfunction of the mobile phone. When there is no manufacturing system knowledge, employees need to look at it, which is very complicated. With the empowerment of Pandora, mobile phone manufacturers can remotely monitor the production quality of our workshop, and quickly locate the cause of the failure.
[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

03A
leading semiconductor company

The following is an example of a semiconductor company. The semiconductor industry chain is very long. At the source is a device called a single crystal furnace to refine single crystal silicon. Pandora can help single crystal silicon to check its health, which means that we can detect the failure of the single crystal furnace in time, give an alarm when the failure occurs, avoid unnecessary input of raw materials, and stop loss in time. We can see from the figure that Pandora analyzes through multiple dimensions, collects data through sensors in the single crystal furnace, and monitors production and predictive maintenance of equipment.
[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

04Analysis of
Intelligent Connected Vehicles

The last case is an intelligent networked car. Pandora can determine the car's data, such as when the steering wheel is turned on, when the brake is stepped on, etc., which can be seen at a glance in Pandora.
[Dry Goods Sharing] Chen Chao: Best Practice of Pandora, Qiniu Cloud Machine Data Analysis Platform

You can see from the above cases that the data targeted by Pandora is very irregular and time stamped data generated by the machine. Therefore, we have more interesting applications in finance, manufacturing, and Internet of Vehicles. Pandora also hopes to empower more related industries and achieve industrial upgrading through big data and AI.

Guess you like

Origin blog.51cto.com/7741292/2534856