Article Directory
Preface
I hope to provide you with some methods of learning big data and some basic frameworks, and provide some personal experiences and suggestions, which will be updated continuously in the future.
hadoop framework
Hadoop is an important framework for big data development. Its core is HDFS and MapReduce. HDFS provides storage for massive amounts of data, and MapReduce provides calculations for massive amounts of data. Therefore, you need to focus on mastering it. In addition, you also need to master Hadoop clusters. , Hadoop cluster management, YARN and Hadoop advanced management and other related technologies and operations.
Serial number | content | link address |
---|---|---|
1 | Hadoop framework super detailed explanation | https://blog.csdn.net/qq_43674360/article/details/105317651 |
2 | Hadoop distributed cluster construction (extreme focus) | https://blog.csdn.net/qq_43674360/article/details/112411356 |
3 | Window 10 hadoop cluster construction | https://blog.csdn.net/qq_43674360/article/details/105317651 |
4 | HDFS command operations | https://blog.csdn.net/qq_43674360/article/details/109056244 |
5 | Hadoop: Detailed explanation of the shuffle process of MapReduce | https://blog.csdn.net/qq_43674360/article/details/109449024 |
6 | hadoop case | https://blog.csdn.net/qq_43674360/article/details/112413016 |
Hive data warehouse
Hive is a data warehouse tool based on Hadoop. It can map structured data files to a database table and provide simple SQL query functions. SQL statements can be converted into MapReduce tasks for operation. It is very suitable for statistical analysis of data warehouses. . For Hive, you need to master its installation, application, and advanced operations.
ZooKeeper coordination service system
ZooKeeper is an important component of Hadoop and Hbase. It is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, domain name services, distributed synchronization, component services, etc. You must master ZooKeeper in the development of big data Implementation methods of commonly used commands and functions.
Serial number | content | link address |
---|---|---|
1 | Getting started with zookeeper | https://blog.csdn.net/qq_43674360/article/details/110948110 |
2 | The internal principle of zookeeper (election mechanism, interview focus) | https://blog.csdn.net/qq_43674360/article/details/110948760 |
3 | Distributed installation of zookeeper in practice | https://blog.csdn.net/qq_43674360/article/details/110948976 |
4 | Zookeeper client command list (detailed, with picture operation steps) | https://blog.csdn.net/qq_43674360/article/details/111039779 |
5 | Zookeeper cluster one-click startup and shutdown scripts (pictured) | https://blog.csdn.net/qq_43674360/article/details/111047891 |
6 | zookeeper API basics | https://blog.csdn.net/qq_43674360/article/details/111195413 |
7 | Zookeeper monitors the dynamic online and offline of server nodes (code copy can be used) | https://blog.csdn.net/qq_43674360/article/details/111251034 |
HBase
HBase是一个分布式的、面向列的开源数据库,它不同于一般的关系数据库,更适合于非结构化数据存储的数据库,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,大数据开发需掌握HBase基础知识、应用、架构以及高级用法等。
phoenix
Redis
Redis是一个key-value存储系统,其出现很大程度补偿了memcached这类key/value存储的不足,在部分场合可以对关系数据库起到很好的补充作用,它提供了Java,C/C++,C#,PHP,JavaScript,Perl,Object-C,Python,Ruby,Erlang等客户端,使用很方便,大数据开发需掌握Redis的安装、配置及相关使用方法。
Flume
Flume是一款高可用、高可靠、分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。大数据开发需掌握其安装、配置以及相关使用方法。
SSM
SSM框架是由Spring、SpringMVC、MyBatis三个开源框架整合而成,常作为数据源较简单的web项目的框架。大数据开发需分别掌握Spring、SpringMVC、MyBatis三种框架的同时,再使用SSM进行整合操作。
Kafka
Kafka是一种高吞吐量的分布式发布订阅消息系统,其在大数据开发应用上的目的是通过Hadoop的并行加载机制来统一线上和离线的消息处理,也是为了通过集群来提供实时的消息。大数据开发需掌握Kafka架构原理及各组件的作用和使用方法及相关功能的实现。
Scala
Scala是一门多范式的编程语言,大数据开发重要框架Spark是采用Scala语言设计的,想要学好Spark框架,拥有Scala基础是必不可少的,因此,大数据开发需掌握Scala编程基础知识!
Spark
Spark is a fast and universal computing engine designed for large-scale data processing. It provides a comprehensive and unified framework for managing the needs of big data processing of various data sets and data sources. Big data development needs Master Spark basics, SparkJob, Spark RDD, spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming, Spark ML and other related knowledge.
Azkaban
Azkaban is a batch workflow task scheduler, which can be used to run a set of tasks and processes in a specific order within a workflow. Azkaban can be used to complete the task scheduling of big data. Big data development requires mastering Azkaban's related configuration and Grammar rules.
Common tools
intellij idea
LICEcap
git
sourcetree
navicat
mobaxterm
vmvare