Summary of the strongest knowledge system of the core framework of big data technology|| (2021 version) (including interview questions)

Preface

Insert picture description here

I hope to provide you with some methods of learning big data and some basic frameworks, and provide some personal experiences and suggestions, which will be updated continuously in the future.

hadoop framework

Hadoop is an important framework for big data development. Its core is HDFS and MapReduce. HDFS provides storage for massive amounts of data, and MapReduce provides calculations for massive amounts of data. Therefore, you need to focus on mastering it. In addition, you also need to master Hadoop clusters. , Hadoop cluster management, YARN and Hadoop advanced management and other related technologies and operations.

Serial number content link address
1 Hadoop framework super detailed explanation https://blog.csdn.net/qq_43674360/article/details/105317651
2 Hadoop distributed cluster construction (extreme focus) https://blog.csdn.net/qq_43674360/article/details/112411356
3 Window 10 hadoop cluster construction https://blog.csdn.net/qq_43674360/article/details/105317651
4 HDFS command operations https://blog.csdn.net/qq_43674360/article/details/109056244
5 Hadoop: Detailed explanation of the shuffle process of MapReduce https://blog.csdn.net/qq_43674360/article/details/109449024
6 hadoop case https://blog.csdn.net/qq_43674360/article/details/112413016

Hive data warehouse

Hive is a data warehouse tool based on Hadoop. It can map structured data files to a database table and provide simple SQL query functions. SQL statements can be converted into MapReduce tasks for operation. It is very suitable for statistical analysis of data warehouses. . For Hive, you need to master its installation, application, and advanced operations.

Serial number content link address
1 Hive's learning route (mind map) https://blog.csdn.net/qq_43674360/article/details/109802256
2 Chapter 1 Basic Concepts of Hive https://blog.csdn.net/qq_43674360/article/details/109803190
3 Chapter 2 Hive Installation https://blog.csdn.net/qq_43674360/article/details/109803527
4 Chapter 3 Hive Data Types https://blog.csdn.net/qq_43674360/article/details/109997630
5 Chapter 4 DDL Data Definition https://blog.csdn.net/qq_43674360/article/details/109998795
6 Chapter 5 DML data manipulation (hive notes) https://blog.csdn.net/qq_43674360/article/details/110002170
7 Chapter 6 Simple query of hive (where, groupby, join, order by, etc.) https://blog.csdn.net/qq_43674360/article/details/110850375
8 Chapter 7 Hive function explanation (custom function) https://blog.csdn.net/qq_43674360/article/details/110877499
9 Chapter 8 hive compression and storage https://blog.csdn.net/qq_43674360/article/details/110877608
10 Chapter 9 hive enterprise-level tuning https://blog.csdn.net/qq_43674360/article/details/110878084
11 Chapter 10 Hive Actual Combat YouTube Video (Simplified Version) https://blog.csdn.net/qq_43674360/article/details/110880957
12 Chapter 11 hive common errors and solutions https://blog.csdn.net/qq_43674360/article/details/110881164

ZooKeeper coordination service system

ZooKeeper is an important component of Hadoop and Hbase. It is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, domain name services, distributed synchronization, component services, etc. You must master ZooKeeper in the development of big data Implementation methods of commonly used commands and functions.

Serial number content link address
1 Getting started with zookeeper https://blog.csdn.net/qq_43674360/article/details/110948110
2 The internal principle of zookeeper (election mechanism, interview focus) https://blog.csdn.net/qq_43674360/article/details/110948760
3 Distributed installation of zookeeper in practice https://blog.csdn.net/qq_43674360/article/details/110948976
4 Zookeeper client command list (detailed, with picture operation steps) https://blog.csdn.net/qq_43674360/article/details/111039779
5 Zookeeper cluster one-click startup and shutdown scripts (pictured) https://blog.csdn.net/qq_43674360/article/details/111047891
6 zookeeper API basics https://blog.csdn.net/qq_43674360/article/details/111195413
7 Zookeeper monitors the dynamic online and offline of server nodes (code copy can be used) https://blog.csdn.net/qq_43674360/article/details/111251034

HBase

HBase是一个分布式的、面向列的开源数据库,它不同于一般的关系数据库,更适合于非结构化数据存储的数据库,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,大数据开发需掌握HBase基础知识、应用、架构以及高级用法等。

phoenix

Redis

Redis是一个key-value存储系统,其出现很大程度补偿了memcached这类key/value存储的不足,在部分场合可以对关系数据库起到很好的补充作用,它提供了Java,C/C++,C#,PHP,JavaScript,Perl,Object-C,Python,Ruby,Erlang等客户端,使用很方便,大数据开发需掌握Redis的安装、配置及相关使用方法。

Flume

Flume是一款高可用、高可靠、分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。大数据开发需掌握其安装、配置以及相关使用方法。

SSM

SSM框架是由Spring、SpringMVC、MyBatis三个开源框架整合而成,常作为数据源较简单的web项目的框架。大数据开发需分别掌握Spring、SpringMVC、MyBatis三种框架的同时,再使用SSM进行整合操作。

Kafka

Kafka是一种高吞吐量的分布式发布订阅消息系统,其在大数据开发应用上的目的是通过Hadoop的并行加载机制来统一线上和离线的消息处理,也是为了通过集群来提供实时的消息。大数据开发需掌握Kafka架构原理及各组件的作用和使用方法及相关功能的实现。

Scala

Scala是一门多范式的编程语言,大数据开发重要框架Spark是采用Scala语言设计的,想要学好Spark框架,拥有Scala基础是必不可少的,因此,大数据开发需掌握Scala编程基础知识!

Spark

Spark is a fast and universal computing engine designed for large-scale data processing. It provides a comprehensive and unified framework for managing the needs of big data processing of various data sets and data sources. Big data development needs Master Spark basics, SparkJob, Spark RDD, spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming, Spark ML and other related knowledge.

Azkaban

Azkaban is a batch workflow task scheduler, which can be used to run a set of tasks and processes in a specific order within a workflow. Azkaban can be used to complete the task scheduling of big data. Big data development requires mastering Azkaban's related configuration and Grammar rules.

Common tools

intellij idea
LICEcap
git
sourcetree
navicat
mobaxterm
vmvare

Guess you like

Origin blog.csdn.net/qq_43674360/article/details/112396719