Three-month big data research and development learning plan practical analysis
http://blog.csdn.net/GitChat/article/details/78341484
The first stage (basic stage)
1) Linux learning (just learn from Brother Bird)—–20 Hours
of Linux operating system introduction and installation.
Linux common commands.
Linux common software installation.
Linux networking.
firewall.
Shell programming, etc.
Official website: https://www.centos.org/download/
Chinese community: http://www.linuxidc.com/Linux/2017-09/146919.htm
2) Advanced Java Learning ("In-depth Understanding of Java Virtual Machine", "Java high concurrency combat") - 30 hours
to master multi-threading.
Master the queue under concurrent packets.
Learn about JMS.
Master JVM technology.
Master reflection and dynamic proxies.
Official website: https://www.java.com/zh_CN/
Chinese community: http://www.java-cn.com/index.html
Recommended books:
"Write Your Own Java Virtual Machine"
"Java Core Technology Volume II: Advanced Features (10th Edition of the original book)"
3) Zookeeper learning (you can refer to this blog for learning: http://www.cnblogs.com/wuxl360/p/5817471.html)
Introduction to Zookeeper distributed coordination service.
Installation and deployment of Zookeeper cluster.
Zookeeper data structures, commands.
The principle and election mechanism of Zookeeper.
Official website: http://zookeeper.apache.org/
Chinese community: http://www.aboutyun.com/forum-149-1.html
The second stage (tackling stage)
4) Hadoop ("Hadoop Authoritative Guide")— 80 Hours
HDFS
Concepts and Features of HDFS.
Shell operations for HDFS.
How HDFS works.
Java application development for HDFS.
MapReduce
runs the WordCount sample program.
Understand the operation mechanism inside MapReduce.
MapReduce program running process analysis.
The decision mechanism of the concurrent number of MapTask.
Combiner component application in MapReduce.
Serialization framework and application in MapReduce.
Sorting in MapReduce.
Custom partitioning implementation in MapReduce.
The shuffle mechanism of MapReduce.
MapReduce utilizes data compression for optimization.
The relationship between MapReduce programs and YARN.
MapReduce parameter optimization.
Java application development of MapReduce
Official website: http://hadoop.apache.org/
Chinese document: http://hadoop.apache.org/docs/r1.0.4/cn/
Chinese community: http://www.aboutyun.com /forum-143-1.html
5) Hive ("Hive Development Guide") – 20 hours
Hive basic concepts
Hive application scenarios.
The relationship between Hive and hadoop.
Hive vs. traditional databases.
Hive's data storage mechanism.
Hive Basic Operations
DDL operations in Hive.
How to implement efficient JOIN query in Hive.
Hive's built-in function application.
Advanced usage of the Hive shell.
Hive common parameter configuration.
Tips for using Hive custom functions and Transform.
Hive UDF/UDAF development example.
Hive execution process analysis and optimization strategy
Official website: https://hive.apache.org/
Chinese introductory document: http://www.aboutyun.com/thread-11873-1-1.html
Chinese community: http://www.aboutyun.com/thread-7598-1-1.html
6) HBase (" The Definitive Guide to HBase)—20-hour
introduction to hbase.
habse install.
hbase data model.
hbase command.
hbase development.
hbase principle.
Official website: http://hbase.apache.org/
Chinese document: http://abloz.com/hbase/book.html
Chinese community: http://www.aboutyun.com/forum-142-1.html
7) Scala (Learn Scala Quickly) – 20 hour
overview of Scala.
Scala compiler installed.
Scala basics.
Arrays, maps, tuples, collections.
Classes, objects, inheritance, traits.
Pattern matching and sample classes.
Learn about concurrent programming with Scala Actors.
Understand Akka.
Understand Scala higher-order functions.
Understand Scala implicit conversions.
Official website: http://www.scala-lang.org/
Beginner Chinese Tutorial: http://www.runoob.com/scala/scala-tutorial.html
8) Spark ("Spark Authoritative Guide") - 60 hours
enter image description here
Spark core
Spark overview.
Spark cluster installation.
Execute the first Spark case program (seeking PI).
RDD
enter image description here
RDD overview.
Create RDDs.
RDD programming API (Transformation and Action Operations).
RDD Dependencies
RDD Cache
DAG (Directed Acyclic Graph)
Spark SQL and DataFrame/DataSet
enter image description here
Spark SQL overview.
DataFrames.
DataFrame common operations.
Write a Spark SQL query program.
Spark Streaming
enter image description here
enter image description here
park Streaming overview.
Understand DStreams.
DStream related operations (Transformations and Output Operations).
Structured Streaming
Others (MLlib and GraphX) In
general work, if it is not data mining, machine learning is generally not used, and you can wait until you need to use it to learn more.
Official website: http://spark.apache.org
Chinese document (but the version is a bit old): https://www.gitbook.com/book/aiyanbo/spark-programming-guide-zh-cn/details
Chinese community: http: //www.aboutyun.com/forum-146-1.html
9) Python (recommended Liao Xuefeng's blog - 30 hours
10) Build a cluster with a virtual machine, install all the tools, and develop a small demo - 30 Hours
can use VMware to build 4 virtual machines, and then install the above software to build a small cluster (I personally tested, I7, 64-bit, 16G memory, it can be run completely, the following is attached to the virtual machine I used to build a cluster when I was studying. operating document)
Java Big Data Learning Route
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=326647899&siteId=291194637
Recommended
Ranking