Mind map of the most complete components for getting started with big data

words written in front

The title party once added the word "most complete". When typing these two words, his face was a little hot. Suddenly I remembered Teacher Ma's video, people have to have a thick skin in life, and I'm embarrassed if I don't want this, and I'm embarrassed about that. How dare you live?

overview

In the near future, I plan to organize a series of articles as lecture notes for internal training, mainly sharing how to build an enterprise-level big data platform step by step from 0 to 1. The preliminary outline is as follows:

  • Mind Map of Big Data Platform Components
  • Big data platform framework and architecture
  • Big data platform component selection methods and ideas
  • Big data platform scale evaluation and hardware configuration
  • Big data platform deployment and implementation
  • Big data platform storage and HDFS
  • Big data platform data warehouse and Hive/HBase
  • Big data platform resource management and YARN
  • Big data platform batch processing Spark
  • Big data platform real-time query Impala, Trino/Presto
  • Big data platform real-time computing Spark Structed Streaming/Flink, Materialize
  • Big data platform message pipeline Kafka
  • Big data platform query engine and Phinex/Presto/Dreamio
  • Big data platform collection and DataX/Canal/Streamsets/Debezium
  • Big data platform search and Elasticsearch
  • Big data platform multi-dimensional real-time query Snappydata/Clickhouse
  • Big data platform display and Kibana/Davinci, DataEase

mind Mapping

It took a lunch break to sort out the mind map of 100+ commonly used and uncommonly used components of the big data platform; in order to keep two-character words in the first-level node classification, some words were forcibly abbreviated, or other expressions were bluntly changed. The mind map classifies big data open source components and a few commercial components according to dimensions such as acquisition, storage, calculation, query, search, infrastructure, monitoring, operation and maintenance, security, testing, governance, display, and BI; some components have multiple Some category attributes, some category names are inaccurate in order to keep the nouns tidy, so the classification of some components is inevitably far-fetched. It will be improved in the future.
Mind map address:
https://www.processon.com/view/5d54aa6be4b04399f5a52d23#map
Processon continuous maintenance address:
the most complete big data entry component-continuously updated
pdf version Baidu Netdisk: link: https://pan.baidu. com/s/19wq_I7tzt29ropSC-UF7Eg extraction code: hy8d

insert image description here

update plan

Delete unpopular components with fewer audiences, and only keep Apache, large enterprises, and components with more than 800 Stars.
Sorting out the selection analysis and pressure test comparison of similar components.
Sorting out the construction process and ideas of enterprise-level big data platforms and
focusing on the production-level deployment of commonly used big data components. , tuning, monitoring operation and maintenance, architecture principles, and actual combat cases
After inspection and no major errors, upload to github for maintenance

original address

For any questions in the article, welcome to add WeChat to discuss DawSongZhao:

Guess you like

Origin blog.csdn.net/zdsx1104/article/details/124418861