background
In recent years, people have paid more and more attention to big data and used it more and more frequently. All kinds of data in software IT products have been recorded for better research and analysis.
content
Big Data Platform Technical Architecture
In e-commerce enterprises, the operating data recorded by the system every day reaches an incremental scale of hundreds of GB. In order to ensure that all data can be stored centrally and accessed at any time, more and more enterprises have changed their offline data And other solutions, turn to the open Hadoop system in an all-round way to strike a balance between cost and scalability . Therefore, Internet companies with certain technical strength have built their own big data platforms one after another.
As shown in the figure, this is the technical architecture of a typical big data platform:
The big data platform is composed of data storage, data synchronous distribution, monitoring, offline computing, platform security, resource application and other parts.
data storage
Data storage is the foundation of the entire big data platform, including HDFS, HBase, Hive, MapReduce, Storm, etc. The following is an introduction to the main framework, and detailed information can be obtained from search engines.
- HDFS, Distributed File System, the core component of Hadoop.
- MapReduce, distributed data processing, one of the cores of Hadoop.
- HBase, a distributed, column-stored database, uses HDFS as the underlying storage, and supports MapReduce batch calculations and point queries.
- Zookeeper, a distributed, highly available coordination service. Provides basic services such as distributed locks for building distributed applications.
- Hive, a distributed data warehouse, Hive manages data stored in HDFS and provides a SQL-based query language for querying data.
- Hama, a distributed parallel computing framework based on Hadoop, is based on the implementation framework of Map/Reduce and Bulk Synchronous. The operating environment needs to be associated with Zookeeper, HBase, and HDFS components.
- Mahout, a MapReduce-based machine learning algorithm library, runs on Hadoop clusters.
- Cassandra, a hybrid non-relational database similar to Google's BigTable.
The above are some open source data frameworks used in the data storage layer.
other components
- Data synchronous distribution
This component manages data synchronization and distribution in a unified manner, and can realize asynchronous and distributed data synchronization and distribution.
- monitor
It refers to the monitoring and early warning of the services and resources of the big data platform, including the availability, performance, system load, and response time of resource requests of data storage, etc.
- off-line computing
Modules for processing offline computing tasks, including task containers, task scheduling timers, exception capture and other modules, to ensure that offline computing tasks can run as planned when resources allow.
- platform security
It mainly includes the management of data access rights, and divides data into different security levels for management. When accessing certain data with high security levels, an approval process will be triggered, and access can only be accessed after approval by the supervisor.
- resource application
It refers to initiating a usage request for the computing or storage resources of the big data platform, where each data operation access is recorded for future audit.
Recommended resources
Big Data Technology Learning Route
Previous Chapter Tutorial
Architectural Thinking Growth Series Tutorials (10) - E-commerce Search Engine Architecture Design
The series of tutorials
Architectural Thinking Growth Series Tutorials
my column
- SpringBoot series column
- Spring Cloud Series Columns
- High availability and high concurrency practical column
- Microservice architecture in practice
- DevOps Practical Column
- Programmatic Advertising Practical Column
At this point, all the introductions are over
-------------------------------
-------------------------------
About me (personal domain name, more information about me)
My open source project collection Github
I look forward to learning, growing and encouraging together with everyone , O(∩_∩)O Thank you
Welcome to exchange questions, you can add personal QQ 469580884,
Or, add my group number 751925591 to discuss communication issues together
Don't talk about falsehood, just be a doer
Talk is cheap,show me the code