Course Outline
Phase I: Getting Started with Big Data spark Introduction
Lesson 1: Getting Started with Big Data Overview
Knowledge Point 1: The history of big data technology
Knowledge Point 2: big data applications as well as future
Knowledge Point 3: hadoop ecosystem Introduction
Knowledge Point 4: hadoop Evolution and Development Framework
Knowledge Point 5: Large data storage system hdfs analytical principle
Knowledge Point 6: map-reduce parsing principle
Knowledge Point 7: distributed resource management principles to resolve yarn
Actual project: the development of work-based mr yarn combat
Lesson: spark an overview of the development of the technology stack
Knowledge Point 1: spark Past and Present
Knowledge Point 2: spark1.X technology stack Overview
Knowledge Point 3: spark2.4 technology stack Overview
spark3.0 and future outlook: 4 knowledge
Knowledge Point 5: spark applications in large companies
Actual project: running a spark program
Lesson: spark API application development and introduction
Knowledge Point 1: spark explain the core concepts
Knowledge Point 2: partition and dependence of rdd
Knowledge Point 3: rdd API to explain the transformation
Knowledge Point 4: rdd API in action explain
Actual project: Use spark rdd for log data analysis
The second stage: spark principle analysis and application tuning
Lesson Four: spark principle and mode of operation
Knowledge Point 1: spark operating mode
Knowledge Point 2: spark explain the implementation process
Knowledge Point 3: spark internal principle rdd Comments
Knowledge Point 4: spark broadcast variable accumulator explain
Actual project: the use of variable broadcast encoding user information to achieve the recommended system
Lesson: spark cluster applications and optimization analysis
Knowledge Point 1: spark web ui explain
Knowledge Point 2: spark application monitoring and analysis
Knowledge Point 3: spark history server principle analysis
Knowledge Point 4: spark metrics monitoring
Real items: spark history server build deployment
Actual projects: from monitoring to start a log troubleshooting and optimization
Lesson Six: spark core Core explain
Knowledge Point 1: spark shuffle three modes Detailed
Knowledge Point 2: spark memory management analysis
Knowledge Point 3: spark Resource Management Application
Knowledge Point 4: spark rdd Storage Management
Actual project: Reconstruction and optimization of existing applications spark
Lesson Seven: spark Performance Tuning
Knowledge Point 1: spark development Tuning
Knowledge Point 2: spark resource tuning
Knowledge Point 3: spark inclined tuning data
Knowledge Point 4: spark tuning memory management
Real items: spark shuffle tune the code case
The third stage: spark ad hoc queries and explain the flow calculation
Lesson Eight: spark sql explain
Knowledge Point 1: History spark sql development
Knowledge Point 2: spark sql 1.X and 2.X
Knowledge Point 3: spark operating principle sql analysis
Knowledge Point 4: spark sql logic to explain the principles of the plan
Knowledge Point 5: spark sql physical principles to explain the plan
Knowledge Point 6: dataset and explain dataframe
Knowledge Point 7: spark sql udf development of custom registration function
Knowledge Point 8: spark thrift server explain
Actual project: Based spark sql king of glory hero 2.4.0 Analysis
Lesson 9: Introduction to computing flow and spark streaming
Knowledge Point 1: spark streaming | storm | flink | structured streaming comprehensive comparison
Knowledge Point 2: The Message Queuing kafka, rocket mq resolve practical
Knowledge Point 3: spark streaming operating principle
Knowledge Point 4: spark streaming high-level abstraction dstream
Knowledge Point 5: structured streaming operating principle Introduction
Actual project: Code read real-time log data and statistics
Lesson Ten: Real-time computing platform (design and actual)
Knowledge Point 1: Introduction to Real-Time Big Data architecture (kudu, druid, couchbase)
Knowledge Point 2: Real-time computing platform architecture design and selection method
Knowledge Point 3: real-time calculation of practice and difficult analysis, analysis of performance bottlenecks and high qps
Real items: real-time log platform statistics
The fourth stage: spark view of computing and high-end applications of machine learning
Lesson Eleven: spark diagram to explain the computing and mlib
Knowledge Point 1: Introduction property map
Knowledge Point 2: edge, vertex, triplet introduction and create
Figure of operational attributes: knowledge point 3
Knowledge Point 4: graph algorithms Introduction
Knowledge Point 5: spark mlib Introduction
Real items: Tuning of FIG.
Lesson Twelve recommendation combat system
Knowledge Point 1: Scene recommendation system, why the need for recommendation systems
Process Description recommendation system: Knowledge Point 2
Knowledge Point 3: collaborative filtering recommendation algorithm
Knowledge Point 4: youtube recommendation Introduction
Actual project: collaborative filtering recommendation based on the spark mllib
Obtaining (Remarks Spark big data)
Reproduced in: https: //www.jianshu.com/p/a54d32cf2d90