A brief introduction to the learning of hadoop framework structure

In recent years, with the development of the Internet, especially the development of the mobile Internet, the growth of data has shown an explosive growth momentum. Google's crawler program downloads more than 100 million web pages every day (data in 2000) The explosive growth of data directly promotes the development of massive data processing technology. The three major technical frameworks of large tables, distributed file systems and distributed computing proposed by Google have solved the problem of massive data processing. Google immediately open-sourced its design ideas and published three epoch-making papers. Soon, an open-source framework based on Google's design ideas appeared, which are now very popular hadoop, Maperduce and many Nosql systems. These three technologies are also the core foundation of the entire big data technology.

At present, there are many commercial Hadoop distributions in China. Most of these commercial Hadoop distributions are issued by foreign countries. There are not many pure domestic distributions, such as DKhadoop . Let's take the fast search DKhadoop as an example to introduce the hadoop framework structure to you!

hadoop framework

Graphic: DKhadoop technology technical architecture diagram

The core of hadoop framework structure:

The core design of hadoop's framework structure is: HDFS and MapReduce. HDFS provides storage for massive data, and MapReduce provides computation for massive data.

Big data integrated development framework:

The application development of big data is too low-level, the design technology is very extensive, and the difficulty of learning is naturally much greater. It is even more difficult for beginners to get started. DKhadoop is a fast search that repackages a series of technical frameworks at the bottom. Some common and reusable basic codes and algorithms in big data development are encapsulated into class libraries, which lowers the learning threshold of the big data bureau and reduces the difficulty of development.

The DKhadoop framework structure consists of modules:

Let's take the DKhadoop distribution as an example:

1. The framework consists of six parts: data source and SQL engine, data acquisition (custom crawler) module, data processing module, machine learning algorithm, natural language processing module, and search engine module.

2. Dakuai's big data general computing platform (DKH) has integrated all the components of the development framework with the same version number. If the Dakuai development framework is deployed on the open source big data framework, the components of the platform need to be supported as follows:

(1) Data source and SQL engine: DK.Hadoop, spark, hive, sqoop, flume, kafka

(2) Data collection: DK.hadoop

(3) Data processing module: DK.Hadoop, spark, storm, hive

(4) Machine learning and AI: DK.Hadoop, spark

(5) NLP module: upload server-side JAR package, directly support

(6) Search engine module: not released independently

Dkhadoop is a deeply integrated, recompiled HADOOP distribution that can be released separately. A necessary component when deploying FreeRCH (big fast and big data integrated development framework) independently. DK.HADOOP integrates and integrates the NOSQL database, which simplifies the programming between the file system and the non-relational database; DK.HADOOP improves the cluster synchronization system, making the data processing of HADOOP more efficient.

Regarding the hadoop framework structure, I will briefly introduce these for the time being. Interested friends can try DKhadoop, which is a fast search.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325557526&siteId=291194637