Java Big Data Learning Route

Three-month big data research and development learning plan practical analysis

http://blog.csdn.net/GitChat/article/details/78341484



The first stage (basic stage)

1) Linux learning (just learn from Brother Bird)—–20 Hours

of Linux operating system introduction and installation.
Linux common commands.
Linux common software installation.
Linux networking.
firewall.
Shell programming, etc.
Official website: https://www.centos.org/download/
Chinese community: http://www.linuxidc.com/Linux/2017-09/146919.htm

2) Advanced Java Learning ("In-depth Understanding of Java Virtual Machine", "Java high concurrency combat") - 30 hours

to master multi-threading.
Master the queue under concurrent packets.
Learn about JMS.
Master JVM technology.
Master reflection and dynamic proxies.
Official website: https://www.java.com/zh_CN/
Chinese community: http://www.java-cn.com/index.html

Recommended books:
"Write Your Own Java Virtual Machine"
"Java Core Technology Volume II: Advanced Features (10th Edition of the original book)"


3) Zookeeper learning (you can refer to this blog for learning: http://www.cnblogs.com/wuxl360/p/5817471.html)

Introduction to Zookeeper distributed coordination service.
Installation and deployment of Zookeeper cluster.
Zookeeper data structures, commands.
The principle and election mechanism of Zookeeper.
Official website: http://zookeeper.apache.org/
Chinese community: http://www.aboutyun.com/forum-149-1.html

The second stage (tackling stage)

4) Hadoop ("Hadoop Authoritative Guide")— 80 Hours

HDFS

Concepts and Features of HDFS.
Shell operations for HDFS.
How HDFS works.
Java application development for HDFS.
MapReduce

runs the WordCount sample program.
Understand the operation mechanism inside MapReduce.
MapReduce program running process analysis.
The decision mechanism of the concurrent number of MapTask.
Combiner component application in MapReduce.
Serialization framework and application in MapReduce.
Sorting in MapReduce.
Custom partitioning implementation in MapReduce.
The shuffle mechanism of MapReduce.
MapReduce utilizes data compression for optimization.
The relationship between MapReduce programs and YARN.
MapReduce parameter optimization.
Java application development of MapReduce

Official website: http://hadoop.apache.org/
Chinese document: http://hadoop.apache.org/docs/r1.0.4/cn/
Chinese community: http://www.aboutyun.com /forum-143-1.html

5) Hive ("Hive Development Guide") – 20 hours

Hive basic concepts

Hive application scenarios.
The relationship between Hive and hadoop.
Hive vs. traditional databases.
Hive's data storage mechanism.
Hive Basic Operations

DDL operations in Hive.
How to implement efficient JOIN query in Hive.
Hive's built-in function application.
Advanced usage of the Hive shell.
Hive common parameter configuration.
Tips for using Hive custom functions and Transform.
Hive UDF/UDAF development example.
Hive execution process analysis and optimization strategy

Official website: https://hive.apache.org/
Chinese introductory document: http://www.aboutyun.com/thread-11873-1-1.html
Chinese community: http://www.aboutyun.com/thread-7598-1-1.html

6) HBase (" The Definitive Guide to HBase)—20-hour

introduction to hbase.
habse install.
hbase data model.
hbase command.
hbase development.
hbase principle.
Official website: http://hbase.apache.org/
Chinese document: http://abloz.com/hbase/book.html
Chinese community: http://www.aboutyun.com/forum-142-1.html

7) Scala (Learn Scala Quickly) – 20 hour

overview of Scala.
Scala compiler installed.
Scala basics.
Arrays, maps, tuples, collections.
Classes, objects, inheritance, traits.
Pattern matching and sample classes.
Learn about concurrent programming with Scala Actors.
Understand Akka.
Understand Scala higher-order functions.
Understand Scala implicit conversions.
Official website: http://www.scala-lang.org/
Beginner Chinese Tutorial: http://www.runoob.com/scala/scala-tutorial.html

8) Spark ("Spark Authoritative Guide") - 60 hours

enter image description here

Spark core

Spark overview.
Spark cluster installation.
Execute the first Spark case program (seeking PI).
RDD

enter image description here

RDD overview.
Create RDDs.
RDD programming API (Transformation and Action Operations).
RDD Dependencies
RDD Cache
DAG (Directed Acyclic Graph)
Spark SQL and DataFrame/DataSet

enter image description here

Spark SQL overview.
DataFrames.
DataFrame common operations.
Write a Spark SQL query program.
Spark Streaming

enter image description here

enter image description here

park Streaming overview.
Understand DStreams.
DStream related operations (Transformations and Output Operations).
Structured Streaming

Others (MLlib and GraphX) In

general work, if it is not data mining, machine learning is generally not used, and you can wait until you need to use it to learn more.

Official website: http://spark.apache.org
Chinese document (but the version is a bit old): https://www.gitbook.com/book/aiyanbo/spark-programming-guide-zh-cn/details
Chinese community: http: //www.aboutyun.com/forum-146-1.html

9) Python (recommended Liao Xuefeng's blog - 30 hours

10) Build a cluster with a virtual machine, install all the tools, and develop a small demo - 30 Hours

can use VMware to build 4 virtual machines, and then install the above software to build a small cluster (I personally tested, I7, 64-bit, 16G memory, it can be run completely, the following is attached to the virtual machine I used to build a cluster when I was studying. operating document)











Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326647899&siteId=291194637