Comprehensive and in-depth analysis of Spark2--knowledge points, source code, tuning, JVM, graph computing, projects

Comprehensive and in-depth analysis of Spark2--knowledge points, source code, tuning, JVM, graph computing, project 

course learning address: http://www.xuetuwuyou.com/course/220 The
course comes from Xuetu Wuyou.com: http://www The .xuetuwuyou.com

course has 14 chapters and 316 sections. The course conducts a comprehensive analysis from various technical points related to Spark, and finally combines the actual projects: user interactive behavior analysis system, DMP user portrait system, and makes a comprehensive application of Spark Explanation, it can be said that one set is in hand, and it is invincible to play all over the world!


Chapter 1: Scala
Task 1: Comparison of Java and Scala
Task 2: Why Learn Scala
Task 3: Scala Compiler Installation
Task 4: Writing the First Scala Program
Task 5: Scala Tools Installation
Task 6: Programming with IDEA
Task 7: idea JAR Package
Task 8: Variable Declaration
Task 9: Scala Data Types
Task 10: If Expressions
Task 11: Code Blocks
Task 12: Loop-while
Task 13: Loop-for
Task 14: Scala Operators
Task 15: Method Definitions
Task 16: Define the function
Task 17: Decorate the design
Task 18: Java to explain functional programming
Task 19: Knowledge review
Task 20: Fixed-length arrays and edge-length arrays
Task 21: Array conversion and traversal
Task 22: Commonly used algorithms for arrays
Task 23: Map collections
Task 24: Tuple operations
Task 25: List collection operations
Task 26: Scala implements word counts
Task 27: Set Collection Operation
Task 28: Lazy Feature
Task 29: Scala Course Description
Task 30: Class Definition
Task 31: View a given class file
Task 32: Primary and Auxiliary Constructors
Task 33: Morning Review
Task 34: Objects
Task 35: The apply method
Task 36: Trait
Task 37: Extending the application
Task 38: Inheritance
Task 39: Abstract classes
Task 40: Pattern matching
Task 41: Scala string printing
Task 42: Sample class
Task 43: Option( Some,None)
Task 44: Partial Functions
Task 45: Closures
Task 46: curring
Task 47: Hermit Parameters
Task 48: Hermit Transformations
Task 49: Hermit Conversion Timing 2 Case Demonstration
Task 50: Hermit Conversion Case 1
Task 51: Hermit Conversion Case 2
Task 52: Upper and Lower Bounds
Task 53: Upper Bounds
Task 54: Nether Cases
Task 55: View Boundaries
Task 56: Covariance
Task 57 : Inversion
Task 58: Knowledge Summary
Task 59: Socket Job
Task 60: Job Requirement Analysis
Task 61: Job Code Implementation
Task 62: Description of Actor Knowledge
Task 63: Actor Basic Concept Explanation
Task 64: Actor Case Demonstration
Task 65: Case 2 Requirements Analysis
Task 66: Case Code Demonstration (Top)
Task 67: Case Code Demonstration (

Part 2) Chapter 2: SparkCore
Task 68: How to Learn Open Source Technology
Task 69: What is Spark
Task 70: Four Features of Spark
Task 71: 4spark quick use (top)
task 72: Spark quick use (bottom)
task 73: What is RDD
task 74: Demonstrate what is RDD
task 75: Spark task running process
Task 76: 9hadoop cluster construction
Task 77: Build Spark Cluster
Task 78 : Build SparkHA
Cluster Task 84: How to create an RDD Task 85: Instructions on Spark scripts Task 86: Transformation and action principles Task 87: Broadcast variables Task 88: Accumulate variables Task 89: Demonstrate the use of shared variables Task 90: Persistence Task 91: Checkpoint Task 92 : Supplementary notes on persistence Task 93: Standalone operation mode Task 94: Spark-on-yarn Task 95: Spark-on-yarn principle description Task 96: HistoryServer service configuration Task 97: map-flatMap-filter Task 98: sortBykey-reduceBykey Task 99: join-union-cogroup Task 100: intersection-distinct-cartes





















Task 101: mapPartitions-repartition-coal
Task 102: Complement the difference between coalesce and repartition
Task 103: aggregateByKey-mapPartitionsWi
Task 104: Description of the Action operator
Task of the collect operator
Task 106: Spark secondary sorting
Task 107: Narrow and wide dependencies
Task 108: Example analysis of narrow and wide dependencies
Task 109: Glossary
Task 110: Stage division algorithm
Task 111: Scheduling of Spark tasks

Chapter 3: Spark tuning
Task 112: Avoid creating duplicate RDD
tasks 113 : Reuse the same RDD as much as possible
Task 114: Persist RDDs that are used multiple times
Task 115: Try to avoid using shuffle operators
Task 116: Use map-side pre-aggregation shuffle operations
Task 117: Use high-performance computing
Subtask 118: Broadcasting Large Variables
Task 119: Optimizing Serialization Performance Using Kryo
Task 120: Optimizing Data Structures
Task 121: Data Localization
Task 122: Principles of Data Skew and How to Locate Data Skew
Task 123: Preprocessing Data Using Hive ETL
Task 124: Filter a few keys that cause skew
Task 125: Improve the parallelism of shuffle operations
Task 126: Two-stage aggregation (local aggregation + global aggregation)
Task 127: Convert reduce join to map join
Task 128: Sampling skew keys and split join operation
Task 129: Join using random prefix and expanding RDD
Task 130: Comprehensive application of various solutions
Task 131: Various shuffle versions
Task 132: Shuffle tuning
Task 133: Spark resource tuning
Task 134: Spark 1.5 version Memory Model
Task 135: Spark II's Memory Model
Task 136: Whole-stageCodeGeneration

Chapter 4: JVM Tuning
Task 137: JVM Architecture
Task 138: How the Three Regions Work Together
Task 139: Heap Structure
Task 140: jdk 8 Memory Model
Task 141: Heap Memory Overflow Case Demo
Task 142: Brief Introduction to MA Tool
Task 143: GC Log Format Description
Task 144: Heap Memory Configuration Demo
Task 145: Stack Parameter Configuration
Task 146: Introduction to Garbage Collection Algorithm
Task 147: stop-the -world
Task 148: Garbage Collection Algorithms
Task 149: Introduction to the Garbage Collector
Task 150: Demonstration of Common Collector Configurations
Task 151: CMS Garbage Collector
Task 152: HadoopJVM Tuning Demonstration
Task 153: Introduction to the Garbage Collector
Task 154: Introduction to Performance Monitoring Tools
Task 155: Large objects directly enter the old age

Chapter 5: SparkCore source code analysis
Task 156: How to find source code
Task 157: How to associate source code
Task 158: Master startup process
Task 159: Master and Worker startup process
Task 160: Sparak- Submit Submission Process
Task 161: SparkContext Initialization
Task 162: Create TaskScheduler
Task 163: DAGScheduelr Initialization
Task 164: TaskSchedulerImp Start
Task 165: Master Resource Scheduling Algorithm
Task 166: TaskSchedulerImlUML Diagram
Task 167: Executor Registration
Task 168: Executor Startup UML Diagram
Task 169 : Spark task submission
task 170: Task task running
Task 171: Detailed process of Spark task submission
Task 172: Drawing summary of Spark task submission process
Task 173: In-depth analysis of BlockManager
Task 174: In-depth analysis of CacheManager

Chapter 6: SparkSQL
Task 175: Description of the default number of partitions
Task 176: SparkCore official case demonstration
Task 177: Spark's past and present
Task 178: Spark's release notes
Task 179: What is DataFrame
Task 180: First experience with DataFrame
Task 181: RDD to DataFrame method 1
Task 182: RDD to DataFrame method 2
Task 183: RDD VS DataFrame
task 184: SparkSQL data source - load
task 185: SparkSQL data source - save
task 186: SparkSQL data source json and parquet
task 187: SparkSQL data source jdbc
task 188: Spark data source Hive
task 189: ThriftServer
task 190: SparkSQL case demonstration
Task 191: SparkSQL and Hive integration
Task 192: SparkSQL UDF
Task 193: UDAF of SparkSQL
Task 194: Window Function of SparkSQL
Task 195: GoupBy and agg
Task 196: Knowledge Summary

Chapter 7: kafka
Task 197: Why Kafka Appears
Task 198: Core Concepts of
Kafka Task 199: Core Concepts of kafka Combing again
Task 200: Introduction to various languages
​​Task 201: The benefits of the message system
Task 202: The classification of the message system and the difference between (pull, push)
Task 203: The architecture of the kafka cluster
Task 204: The construction of the kafka cluster
Task 205: Cluster Test Demonstration
Task 206: HA of kafka data
Task 207: Design of kafka
Task 208: Kafka code test
Task 209: Job
Task 210: Kafka offset

Chapter 8: SparkStreaming
Task 211: Briefly talk about the future of SparkStreaming
Task 212: SparkStreaming Running Process
Task 213: Detailed DStream Drawing
Task 214: Flow Computing Process
Task 215: SocketStreaming Case Demonstration
Task 216: HDFSDStream case demonstration
Task 217: UpdateStateBykey case demonstration
Task 218: Transform blacklist filtering demonstration
Task 219: Window operation case demonstration
Task 220: Transform blacklist filtering demonstration supplementary
Task 221: ForeachRDD case demonstration
Task 222: kafka-sparkStreaming Integration Demonstration
Task 223: Kafka consumes data in multiple threads
Task 224: Kafka uses thread pools to consume data in parallel

Chapter 9: Streaming Tuning
Task 225: SparkStreaming Fault Tolerance
Task 226: SparkStreaming VS Storm
Task 227: SparkStremiang and Kafka Integration ( Manual offset control
Task 228: SparkStreaming tuning parallelism
Task 229: SparkStreaming tuning memory
Task 230: SparkStreaming tuning serialization
Task 231: SparkStreaming tuning JVM&GC
Task 232: SparkStreaming tuning individual tasks run slowly
Task 233: Resource instability in SparkStreaming tuning
Task 234: SparkStreaming data volume surge

Chapter 10: Streaming source code
Task 235: 1SparkStreaming source code introduction Preface
Task 236: SparkStreaming operating principle
Task 237: SparkStreaming communication model principle
Task 238: StremaingContext initialization
Task 239: Receiver startup process introduction
Task 240: Receiver startup process UML summary
Task 241: Block Generation Principle Analysis
Task 242: Block Generation and Storage Principle Analysis
Task 243: Responsibility Chain Mode
Task 244: BlockRDD Generation and Job Task Submission
Task 245: BlockRDD Generation and Job Task Submission Summary

Chapter 11: sparkgraphx
Task 246: Graph Computing Introduction
Task 247: Graph Computing Case Demonstration
Task 248: Basic Composition of
Graphs Task 249: Graph Storage
Task 250: Finding Friends Case Demonstration

Chapter 12: Spark2VSSpark1
Task 251: New Features of Spark
Task 252: RDD&DataFrame&DataSet
Task 253: RDD&DataFrame&DataSet
Task 254: SparkSession Access Hive Supplementary Instructions
Task 255: DataFrame and DataSetAPI Merge

Chapter 13: Comprehensive Project: User Interactive Behavior Analysis System
Task 256: Project Process Introduction
Task 257: Project Overall Overview
Task 258: Data Sources for Big Data Projects
Task 259: Project Background
Task 260: Common Concepts
Task 261: Project Requirements
Task 262: Project Organizing Process
Task 263: Thinking From Table Design
Task 264: Get Task Parameters
Task 265: Requirements-Data Information
Task 266: Requirements-Filter Sessions Based on Conditions
Task 267: Requirements-Example Description
Task 268: Demand 1 Click to place an order and pay category TopN (top)
Task 269: Demand 1 click to place an order to pay category TopN (bottom)
Task 270: Demand 2 demand analysis
Task 271: Demand 2 data information
Task 272: Demand 2 acquire users Behavior data
Task 273: Join user table and information table
of requirement 2 Task 274: Requirement 2 analysis again
Task 275: Customize UDF function
of requirement 2 Task 276: Customize UDAF function of requirement 2
Task 277: Statistics of clicks of products in each region of requirement 2
Task 278: Require two city information table and commodity information table join
Task 279: Requirement 2 Statistics on popular commodities in various regions
Task 280: Requirement 2 persists the results to the database
Task 281: Requirement 2 summary
Task 282: Requirement 3 demand analysis
Task 283: Requirement 3 data information
Task 284: Requirement 3 thought out
task 285 : Requirement 3 obtains data from kafka
Task 286: Requirement 3 blacklists the data
Task 287: Requirement 3 dynamically generates a blacklist (top)
Task 288: Requirement 3 dynamically generates a blacklist (bottom)
Task 289: Requirement 3 real-time statistics every day Clicks on advertisements in various provinces and cities
Task 290: Requirement 3 Real-time statistics on traffic clicks in various provinces
Task 291: Requirement 3 Real-time statistics on advertisement click trends
Task 292: Requirement 3 Summary

Chapter 14: DMP User Portrait System
Task 293: Project Background
Task 294: DSP Process
Task 295: Project process description
Task 296: Utils tool development
Task 297: Requirement 1 function development
Task 298: Package and submit the code to the cluster for operation
Task 299: Requirement 2 description
Task 300: Report requirement description
Task 301: Statistics of provinces and cities Quantity distribution
Task 302: Define the word table statistical function
Task 303: Province and city report statistics
Task 304: App Report Statistics
Task 305: User Portrait Requirements
Task 306: Labeling
Task 307: Merge Context Tags
Task 308: Context Tag Test Run
Task 309: Why Do We Need Graph Computing
Task 310: Basic Concepts of Graphs
Task 311: Simple Case
demo _
_


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326085143&siteId=291194637