Architectural Thinking Growth Series Tutorials (11) - Big Data Platform Architecture Design

background

In recent years, people have paid more and more attention to big data and used it more and more frequently. All kinds of data in software IT products have been recorded for better research and analysis.

content

Big Data Platform Technical Architecture

In e-commerce enterprises, the operating data recorded by the system every day reaches an incremental scale of hundreds of GB. In order to ensure that all data can be stored centrally and accessed at any time, more and more enterprises have changed their offline data And other solutions, turn to the open Hadoop system in an all-round way to strike a balance between cost and scalability . Therefore, Internet companies with certain technical strength have built their own big data platforms one after another.

As shown in the figure, this is the technical architecture of a typical big data platform:

Big Data Platform Technical Architecture

The big data platform is composed of data storage, data synchronous distribution, monitoring, offline computing, platform security, resource application and other parts.

data storage

Data storage is the foundation of the entire big data platform, including HDFS, HBase, Hive, MapReduce, Storm, etc. The following is an introduction to the main framework, and detailed information can be obtained from search engines.

  • HDFS, Distributed File System, the core component of Hadoop.
  • MapReduce, distributed data processing, one of the cores of Hadoop.
  • HBase, a distributed, column-stored database, uses HDFS as the underlying storage, and supports MapReduce batch calculations and point queries.
  • Zookeeper, a distributed, highly available coordination service. Provides basic services such as distributed locks for building distributed applications.
  • Hive, a distributed data warehouse, Hive manages data stored in HDFS and provides a SQL-based query language for querying data.
  • Hama, a distributed parallel computing framework based on Hadoop, is based on the implementation framework of Map/Reduce and Bulk Synchronous. The operating environment needs to be associated with Zookeeper, HBase, and HDFS components.
  • Mahout, a MapReduce-based machine learning algorithm library, runs on Hadoop clusters.
  • Cassandra, a hybrid non-relational database similar to Google's BigTable.

The above are some open source data frameworks used in the data storage layer.

other components

  • Data synchronous distribution

This component manages data synchronization and distribution in a unified manner, and can realize asynchronous and distributed data synchronization and distribution.

  • monitor

It refers to the monitoring and early warning of the services and resources of the big data platform, including the availability, performance, system load, and response time of resource requests of data storage, etc.

  • off-line computing

Modules for processing offline computing tasks, including task containers, task scheduling timers, exception capture and other modules, to ensure that offline computing tasks can run as planned when resources allow.

  • platform security

It mainly includes the management of data access rights, and divides data into different security levels for management. When accessing certain data with high security levels, an approval process will be triggered, and access can only be accessed after approval by the supervisor.

  • resource application

It refers to initiating a usage request for the computing or storage resources of the big data platform, where each data operation access is recorded for future audit.

 

Recommended resources

Big Data Technology Learning Route

Previous Chapter Tutorial

Architectural Thinking Growth Series Tutorials (10) - E-commerce Search Engine Architecture Design

The series of tutorials

Architectural Thinking Growth Series Tutorials

my column

 

 

At this point, all the introductions are over

 

 

-------------------------------

-------------------------------

 

My CSDN homepage

About me (personal domain name, more information about me)

My open source project collection Github

 

I look forward to learning, growing and encouraging together with everyone , O(∩_∩)O Thank you

Welcome to exchange questions, you can add personal QQ 469580884,

Or, add my group number  751925591 to discuss communication issues together

Don't talk about falsehood, just be a doer

Talk is cheap,show me the code

Guess you like

Origin blog.csdn.net/hemin1003/article/details/114928714