What is Big Data System Framework

The application and development of big data is too low-level, which is difficult to learn and involves a wide range of technologies, which restricts the popularization of big data. Now a technology is needed to encapsulate some common and reusable basic codes and algorithms in big data development into class libraries, so as to reduce the learning threshold of big data, reduce the difficulty of development, and improve the development efficiency of big data projects.

There are three applications of big data in work: related to business, such as user portrait, risk control, etc.;

Related to decision-making, the field of data science, understanding statistics and algorithms, is the domain of data scientists; related to engineering, how to implement, how to achieve, and what business problems to solve, this is the job of data engineers.

The characteristics of the data source determine the technical selection of data collection and data storage. I divide it into four categories according to the characteristics of the data source:
the first category: from the perspective of the source, it is divided into internal data and external data;

The second category: from the perspective of structure, it is divided into unstructured data and structured data;

The third category: from the perspective of variability, it is divided into immutable data that can be added and data that can be modified and deleted;

The fourth category is divided into large amounts of data and small amounts of data in terms of scale.

The first element of the big data platform is the data source. The data source we need to deal with is often on the business system. During data analysis, we may not directly process the business data source, but first go through data collection and data storage. Then comes data analysis and data processing.

It can be seen from the whole large ecosystem that a large amount of resources are required to complete data engineering; a large amount of data requires clusters; to control and coordinate these resources, monitoring and coordination of allocation are required; it is more convenient and easier to deploy large-scale data. ; It also involves logs, security, and possibly integration with the cloud. These are the edges of the big data circle and are equally important.

Da Kuai Big Data Platform (DKH) is a one-stop search engine-level, big data general computing platform designed by Da Kuai Company to open up the channel between the big data ecosystem and traditional non-big data companies. By using DKH, traditional companies can easily cross the technical gap of big data and realize the performance of big data platform at the level of search engines.

DKH effectively integrates all the components of the entire HADOOP ecosystem, and is deeply optimized and recompiled into a complete higher-performance big data general computing platform, which realizes the organic coordination of various components. Therefore, compared with the open source big data platform, DKH has a performance improvement of up to 5 times (maximum) in computing performance.

DKH, through the unique middleware technology of Dakuai, simplifies the complex big data cluster configuration to three nodes (master node, management node, computing node), which greatly simplifies the management and operation of the cluster, and enhances the High availability, high maintainability, and high stability of the cluster.

Although DKH has been highly integrated, it still maintains all the advantages of open source systems and is 100% compatible with open source systems. Big data applications developed based on open source platforms can run efficiently on DKH without any changes, and The performance will be improved by up to 5 times.

DKH also integrates the big data integrated development framework (FreeRCH) of Dakuai. The FreeRCH development framework provides more than 20 categories commonly used in big data, search, natural language processing and artificial intelligence development. This method achieves a 10-fold improvement in development efficiency.

The SQL version of DKH also provides the integration of distributed MySQL, the traditional information system, which can seamlessly realize the leap for big data and distributed.

DKH standard platform technology framework

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325557644&siteId=291194637