Big Data Architecture Technology

The following figure is the technology of big data recently learned and used, and now make a summary. And in the process of summarizing, further learning and understanding are also carried out.

  

 

 

 

The above are some technologies related to big data that individuals have come into contact with. The subsequent chapters will focus on these technologies. The specific introduction method is from the application and principle aspects.

 

1. Big data real-time streaming architecture

(1) message queue

Message queues are used for data transfer between different applications. Now commonly used ones mainly include Kafka, redis queue, RabbitMQ, ZeroMQ, ActiveMQ, etc.

 

(2) Stream Processing Framework

The stream processing framework mainly provides a computing framework for real-time processing of messages, and the specific implementation is done by the business side itself. Common ones are Storm, Spark Streaming, and Flink.

 

(3) Storage

For data storage, conventional relational databases (such as mysql, Oracle) can also be implemented, but for big data storage, NoSql databases, such as Redis, MongoDB, and HBase, are the most commonly used. Especially for Redis, many large companies encapsulate it for their own business scenarios, providing functions such as monitoring, alarming, and automatic expansion.

2. Big data offline streaming architecture

(1) Data storage

At present, the most commonly used storage methods for big data technology are: HDFS, Hive (based on HDFS)

(2) Data analysis and processing tools

The current data analysis and processing tools are mainly in two directions, MapReduce and Spark.

MapReduce: Commonly used tools are native Hadoop MapReduce, Hive SQL and Pig. Among them, Hadoop MapReduce is the most basic, HIve Sql and pig are the upper-level encapsulation of it, just like assembly language is a relatively basic programming language, and C and Java are the upper-level encapsulation of it, and finally it will be converted into assembly the same reason;

Spark is currently the hottest data processing tool, no one. Compared with Hadoop, it utilizes memory to a greater extent.

 

 

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326615976&siteId=291194637