Big Data technologies to learn: Spark one must master the language

 

Big Data Spark popularization of technology, demand for professionals is also increasing. Spark the language training center also has a large data corresponding course, Spark learning, but also a stage slowly learning, generally speaking need to go through the following stages:

First stage: a skilled master Scala language

1, Spark framework is the use of Scala language, refined and elegant. Spark To become a master, you have to read the Spark source code, you must master Scala ,;

2, although now the Spark can be multi-language Java, Python and other application development, but the fastest and best development API support is still and always will be the way of the Scala API, so you must master Scala to write complex Spark and high-performance distributed applications;

3, in particular, the master Scala trait, apply, function programming, generic, and inverter covariant and the like;

Phase II: Spark proficient platform itself available to the developer API

1, for master RDD Spark development model, the master and the action of various functions using the transformation;

2, the master Spark dependence and narrow width and lineage-dependent mechanism;

3, master calculation process RDD, such as Stage division, Spark application submitted works to the basic process and Worker cluster node basis, etc.

Phase III: Spark-depth core

This stage is the core part of the Spark to in-depth study by Spark source framework:

1, through the grasp Spark source of job submission process;

2, through the source code control task scheduling Spark clusters;

3, especially proficient DAGScheduler, every step of the detail work inside the TaskScheduler and Worker nodes;

Fourth Estate: grasp the core framework based on the use Spark

Spark

As a master of the cloud computing era of big data, has significant advantages in terms of real-time stream processing, drawing techniques, machine learning, NoSQL query, we use Spark most of the time when in use on its frame, such as Shark, Spark Streaming and so on:

1, Spark Streaming is a very good real-time stream processing framework, to master its DStream, transformation and checkpoint etc;

2, Spark off-line statistical analysis, Spark 1.0.0 version launched on the basis of Shark Spark SQL, efficient off-line statistical analysis capabilities have significantly improved, it is important to grasp;

3, for the Spark such as machine learning and GraphX ​​to master its principles and usage;

Fifth class: do business level Spark project

Spark project through a complete representative to run through all aspects of Spark, including architectural design of the project, the analysis used technology to achieve development, operation and maintenance, a complete grasp at every stage and the details, so you can make You'll be able to calmly face the vast majority of Spark project.

Sixth class: Provides Spark Solutions

1, a thorough grasp every detail Spark framework source;

2, there is provided Spark solutions under different scenarios according to the needs of different business scenarios;

3, according to actual needs, secondary development framework on the basis of Spark, Spark build their own framework

Today, as we put together a large part of the tutorial and share data, each person can choose according to their needs, required little friends can share learning materials at + skirt 199 plus 427 and finally 210 numbers to link it wants.

Guess you like

Origin www.cnblogs.com/wuxiaoxia888/p/10990664.html