From the avenue to the simplicity, Yanhuang Data must be an extremely easy-to-use domestic big data analysis basic software|Ai Analysis Research

Since the birth of big data technology in the early 2000s, in order to cope with increasingly rich application scenarios, increasingly complex data types, and gradually expanding data scale, the big data industry has gradually developed a variety of technical routes.

Today, big data products and technologies are in full bloom, and a number of big data manufacturers have emerged in the domestic market in recent years to meet the data processing needs of various application scenarios, such as large-scale offline data processing, real-time data analysis, Heterogeneous data analysis, etc., provide different products and solutions.

Yanhuang Data is one of the cutting-edge big data real-time analysis platform manufacturers for heterogeneous data.

In the past three years since its establishment, Yanhuang Data has continued to cultivate the real-time analysis track of heterogeneous data, adhered to the domestic self-research and productization route, and blazed a unique route in the highly competitive domestic big data market, and gained market recognition .

What is the market demand and development prospect of real-time analysis of heterogeneous data? Why did Yanhuang Data choose to enter this market? What are the advantages of Yanhuang Data's team and products? With these questions in mind, Aianalysis conducted an in-depth interview with He Ning, chairman of Yanhuang Data, and Wang Guodong, CTO.

Heterogeneous data real-time analysis platform: giving users the ability to freely explore unknown data

With the massive growth of heterogeneous data, how to quickly obtain insights from the data has become a challenge

At the application level of traditional big data analysis, enterprises usually focus on fixed reports, interactive query analysis and other scenarios to analyze a large amount of structured data. With the deepening of the Internet and digitalization of enterprise business, the following two structural changes have taken place in the data characteristics of enterprises:

1) The data sources of enterprises are more extensive, such as data may come from various business systems, applications, databases, Internet of Things devices, etc., and in many scenarios, it is necessary to combine data from multiple sources for correlation analysis.

2) A large amount of semi-structured data (such as CSV, JSON, XML, etc.) and unstructured data (such as documents, audio, video, etc.) are gradually generated in the enterprise system. According to IDC's prediction, the total amount of global data will reach 175ZB by 2025, and more than 80% of the data will be semi-structured and unstructured data that is difficult to process.

Figure 1: Global data volume and composition in 2025
insert image description here

However, in the face of these multi-source and heterogeneous data, if the traditional data processing and analysis methods are used, that is, through write-time modeling, the data schema is pre-defined, and then the data is processed by ETL and imported into the data warehouse to support specified scenarios. It is difficult to effectively mine the value of such heterogeneous data because of the query and analysis of the data:

First of all, the traditional data processing mode requires close cooperation between departments. The data department schedules data processing and modeling in advance according to the needs put forward by the data user department. Get analysis results.

Secondly, in most analysis scenarios for heterogeneous data, such as analyzing log data, the angles from which users need to analyze data and the dimensions of data to be used are uncertain, and it is difficult to use pre-planned methods to analyze analyze the data.

Heterogeneous data real-time analysis platform, designed for query analysis of multi-source heterogeneous data

In order to meet the needs of enterprises for efficient query and analysis of heterogeneous data, the track of heterogeneous data real-time analysis platform has gradually become clear and clear in the domestic market in recent years. Take Yanhuang Data's heterogeneous data real-time analysis platform product as an example. It can integrate various unstructured or structured data from multiple data sources, and adopts the method of modeling at the time of reading, so that enterprises can analyze data when needed. When querying and analyzing, quickly query and analyze the original data through custom rules to support the analysis requirements of scenarios such as intelligent operation and maintenance, security compliance, and a large number of innovative business analysis.

Figure 2: Yanhuang data heterogeneous data real-time analysis platform
insert image description here

To support real-time analysis of heterogeneous data, "read-time modeling" is the most critical technology. The read-time modeling technology allows users to customize rules when reading data, automatically extracts fields required for analysis from original data according to algorithms, and supports users to dynamically adjust data query rules according to business needs, thereby avoiding heavy traditional ETL work. Improve the flexibility of heterogeneous data processing.

Wang Guodong, CTO of Yanhuang Data, believes that the flexibility of time-to-read modeling allows users to quickly complete the iteration of the data model according to changes in analysis requirements at a relatively low cost, and realize "time to value" in complex heterogeneous data analysis .

Therefore, as the demand for heterogeneous data analysis continues to grow, the value of the real-time analysis platform for heterogeneous data lies in helping users reduce their dependence on data collaboration processes and data development and management teams. By providing efficient data analysis tools, empowering The ability of users to freely and efficiently explore data and mine the value of data.

2. "Things first, people first", Yanhuang Data is the most suitable team

If you want to select the best team in the domestic heterogeneous data analysis field, Yanhuang Data must be one of them. This big data company was established in July 2020. The core team members are all from the former Chinese R&D center of the American star big data company Splunk. The team has profound technology and experience accumulation in the field of heterogeneous data analysis.

At the end of 2019, under the background that the United States gradually imposed a technical blockade on China and the Chinese market environment became more complicated, Splunk announced that it would move its R&D center out of China, which also provided an opportunity for the establishment of Yanhuang Data. He Ning, who once served as the global vice president of Splunk and the general manager of China R&D Center, is now the chairman of Yanhuang Data, said: "At that point in time, my founding team and I, out of the ideal of technical people, hoped to use our own Its expertise provides a flexible and easy-to-use analysis tool for the industry, and at the same time, in order to break the technological monopoly of foreign companies, it also creates independent and controllable domestic solutions in the field of modeling and heterogeneous data analysis during reading."

With such an original intention and vision, He Ning founded Yanhuang Data with Ye Xiaolu, Wang Guodong, Ni Yue, who were core R&D members of Splunk China R&D Center, and several former Splunk senior engineers.

During the interview, as the topic deepened, we found more and more that this team has many advantages, so that Yanhuang Data has relatively perfected its products in only three years since its establishment, and also harvested a number of top customers in the industry. Such as Zhongan Insurance, Shanghai Electric Power, Knowledge Planet, etc., thus successfully opening up the situation in the domestic market.

Figure 3: The founding team of Yanhuang Data (from left to right: Ni Yue, Ye Xiaolu, He Ning, Wang Guodong)
insert image description here

At present, the main R&D personnel of Yanhuang Data come from Splunk.

First of all, this allowed Yanhuang Data to quickly build a mature team and form a complete R&D capability, and this team has undergone sufficient running-in during the Splunk period, ensuring efficient and smooth development in the entire product R&D process cooperate.

Secondly, in a world-class big data company like Splunk, we are deeply involved in the research and development of heterogeneous data analysis product technology, which allows the team to establish a deep understanding of customer needs, market environment, and key technologies in this field.

Finally, the experience of serving many top foreign software product companies has made Yanhuang Data's team very familiar with the development process, architecture design, technology selection, etc. of big data products. These experiences will help the company continue to develop a good software product.

Of course, with the development of the company, Yanhuang Data has gradually absorbed a group of outstanding people with expertise in marketing and business, making the entire team more capable and able to cope with market competition and environmental changes.

3. Keep abreast of changes in market demand and continue to create a useful data platform product

Relying on the team's accumulation of time-reading modeling technology and heterogeneous data analysis platform development, after its establishment, Yanhuang Data quickly developed and polished a heterogeneous data real-time analysis platform product covering major functions.

However, due to the continuous deepening of enterprise digitalization in recent years and the particularity of the needs of Chinese enterprise users, the complete reproduction of Splunk's product technology can no longer meet the needs of today's Chinese market.

Therefore, Yanhuang Data has continuously observed user needs in the past two years, and has continuously upgraded products, hoping to provide enterprise users with easy-to-use and easy-to-use data platform products. To sum up, Yanhuang Data has continuously optimized and innovated heterogeneous data real-time analysis platform products mainly from the following four levels.

  • Adhere to the product route

For many enterprise users, processing and analyzing heterogeneous data is very difficult. If you need to consider and solve the construction problems between big data technology stacks by yourself, the effect is often not good, and it also consumes a lot of extra energy.

Therefore, Yanhuang Data provides a one-stop data platform product, providing end-to-end capabilities from data import, data integration, data modeling, data storage, data analysis, data service, data visualization, etc., so that users can open the box Ready-to-use, eliminating the complicated work of infrastructure construction and configuration.

In addition to standardized products, Yanhuang Data has also fully considered the potential customization needs of many large domestic enterprises.

Specifically, the strategy adopted by Yanhuang Data is to design 80% of the functions that belong to the common needs of users on a complete data platform as standardized products, and for the remaining 20% ​​of the individual needs of different customers, Yanhuang Data encapsulates relevant capabilities and builds a partner ecosystem. These partners use their own expertise to fully understand the special needs of customers and complete corresponding functional development. This not only meets the needs of customers, but also allows Yanhuang Data to focus on continuous polishing and upgrading of products.

  • Continuously optimize the underlying computing engine

Although read-time modeling provides an effective solution for real-time analysis of heterogeneous data, but read-time modeling also has an obvious shortcoming, that is, the computational overhead of read-time modeling is relatively large, which is in the data scale When the value is large, if the engine is not optimized deeply, the performance of data query will be significantly affected.

In response to this problem, Yanhuang Data has continued to carry out a lot of optimization work on the self-developed big data computing engine. Including vectorized computing, real-time compilation, data compression based on columnar storage, concurrent task scheduling, etc., a lot of exquisite design and optimization have been done to speed up computing speed, improve data throughput, and reduce computing power consumption. Relatively instant analysis results are achieved in most scenarios.

  • Adopt a new infrastructure

Different from the Splunk era, today's enterprises have a very different computing environment and data scale for data analysis than before. On the one hand, many business systems and data platforms of some enterprises are deployed on the cloud, and at the same time have high requirements for the flexibility of resource usage; on the other hand, as the scale of enterprise data continues to increase, how to deal with high-throughput, Data analysis has also become an urgent problem for enterprises to solve.

Yanhuang Data is also keenly aware of these changes, so it started planning and upgrading the platform infrastructure very early, introducing cloud-native and distributed architecture and technologies.

In terms of cloud native, the Yanhuang data platform is designed based on a brand-new cloud native architecture. Whether it is deployed in the cloud or privately deployed, it can provide users with elastic expansion capabilities of storage and computing resources through the separation of storage and computing, so as to effectively deal with Peak query demand, and greatly reduce the cost of operation and maintenance and implementation and deployment.

In terms of distribution, Yanhuang Data released a new version of its data platform product in June this year, which adopts a distributed architecture and improves the processing performance for large-scale and high-concurrency data.

  • Focus on platform ease of use

The ease of use of the data platform is often a factor that enterprise users will consider when choosing a product. By reducing cumbersome operations, users can focus on data analysis and gain insights. Therefore, Yanhuang Data has always focused on improving the usability of the data platform.

The first is the query language. Different from similar heterogeneous data analysis platforms in the industry, which usually use customized search languages, such as SPL, Elastic query DSL, etc., Yanhuang Data Platform supports users to use standard SQL language for data query, in addition to supporting filtering and mapping , deduplication, aggregation, sorting, association and other basic SQL query capabilities, it also provides a large number of extensions to standard functions and table functions, and also supports user-defined functions, which greatly reduces the learning threshold for users.

Secondly, the Yanhuang data platform provides a wealth of dashboard functions. Based on the platform-integrated visualization library echarts, users can use various common visualization solutions and store data analysis methods and analysis results in it to accelerate the value of internal data within the enterprise. sharing and delivery.

In-depth application scenarios, using data to improve customer business efficiency

In the end, the data platform needs to provide insight into the user's business in specific application scenarios and improve business efficiency. In general, Yanhuang Data's products can be used as a data platform and performance tool for the digital transformation of enterprises, providing users with the ability to quickly analyze and solve business problems in various real scenarios such as operation and maintenance, security, and manufacturing.

Figure 4: Product Positioning of Yanhuang Data Platform
insert image description here

  • Cross-domain data analysis to improve enterprise IT governance level

In the field of IT operation and maintenance, Yanhuang Data's heterogeneous data real-time analysis platform can provide the ability to correlate multi-source log data, helping users efficiently understand and locate problems in scenarios such as IT basic platform operation and maintenance and business system quality analysis.

Operation and maintenance of enterprise IT basic platform: Yanhuang data platform can help enterprise users integrate various product logs of the basic platform, and provide enterprise-level inspection, monitoring, statistics, reports, unified view, and usage perspective through a one-stop tool combination , Auxiliary planning capabilities, so the operating status of the IT platform of the enterprise can be seen at a glance.

Business system quality measurement and analysis: Yanhuang data platform has the ability to conveniently access data from various sources, and can quickly correlate and analyze log data of multiple application systems, provide full-process business quality measurement, and accurately locate systems by drawing application call chains business failure.

  • Efficiency and flexibility are considered to ensure complete protection of the network

With the deepening of enterprise digitization, large and medium-sized enterprises will also face higher threats in terms of network security. The instant analysis capability of Yanhuang data platform for heterogeneous data can help enterprises establish a security operation center (SOC) in the field of network security. By cooperating with the enterprise's security equipment, the platform can continuously analyze threat data, clearly detect the attack link, and accurately locate the source of the attack; at the same time, the platform can also perform cross-domain correlation analysis on multiple security systems, provide regular statistics and reports, and achieve high efficiency. security alert.

Extensively connect heterogeneous data to help enterprises produce

In the manufacturing industry, a large number of sensors on the equipment of leading production-oriented enterprises are continuously collecting various types of IoT data. The data collected by different sensors may have inconsistent data formats and standards, which makes the association analysis of multi-sensor data relatively useful. high difficulty. The Yanhuang data platform can easily realize the correlation and comprehensive analysis of sensor data of various types of production lines, provide accurate insights for product design, production line optimization and other scenarios, and improve the production efficiency of manufacturing enterprises.

5. The potential of heterogeneous data is huge, and the value needs to be further explored

Compared with the huge data scale and potential value of heterogeneous data, the current mining and utilization of heterogeneous data by domestic enterprises is still at a relatively elementary level. With the continuous improvement of the heterogeneous data real-time analysis platform represented by Yanhuang Data in terms of functions, performance, and ease of use, and the gradual improvement of enterprises' awareness of the value of heterogeneous data, it is expected that domestic enterprises will use it in various business scenarios. Strengthen the exploration and analysis of heterogeneous data.

In order to speed up this process, help enterprises in the industry make good use of analysis tools, and broaden application scenarios, Yanhuang Data recently launched a free community version of a one-stop heterogeneous data analysis platform product - Honghu, and provides a corresponding communication platform, aiming at Help developers to flexibly manage massive multi-source heterogeneous data, quickly analyze data characteristics, and help users easily realize data-driven business.

Up to now, the Honghu community has brought together many R&D personnel, data analysts, data scientists and other personnel from Bytedance, China Financial Services Institute, Ping An, Ali, Mobile, Ministry of Public Security Research Institute, Telecom and other well-known enterprises and institutions, and has produced Multiple innovative application cases.

For example, in the application case of the "Network Security Situational Awareness System" of the Third Research Institute of the Ministry of Public Security, the user proposed a network security situational awareness system based on the Honghu Data Platform. Efficient and flexible ultra-large data storage and analysis processing capabilities, based on security big data, improve the ability to discover, identify, understand and analyze security threats, respond to and handle them from a global perspective, and finally realize the implementation of network security situational awareness capabilities.

In the application case of Shanghai Yaocheng Technology's "Observability Platform for Microservice Applications", based on the time-reading modeling engine independently developed by Honghu, users can quickly import and store heterogeneous data, and support dynamic adjustment of data models and analysis parameters. Solidify the model and analysis process. When the business analysis scenario changes, you only need to adjust the SQL analysis statement to respond quickly, meeting the requirements for building observability scenarios. Therefore, it is guaranteed that in microservice applications, when requests span multiple microservices, each microservice is transparent and observable, so that engineers can observe and diagnose problems in a timely and accurate manner.

It can be seen that heterogeneous data analysis is continuously creating important value in multiple business fields. It is believed that in the near future, the instant platform for heterogeneous data analysis will become an essential infrastructure for enterprises.

Guess you like

Origin blog.csdn.net/weixin_45942451/article/details/132111049