How to become a big spark master data?

Spark is originated from the University of California, Berkeley AMPLab cluster computing platform, which is based on in-memory computing, Hadoop performance over a hundred times, from multiple iterative batch processing, data warehousing eclectic, stream processing and computing and other computing paradigm, is rare the all-rounder. Spark adopt a unified technology stack to resolve all the core issues, such as stream processing, drawing techniques, machine learning, NoSQL Query cloud computing big data, with a sound ecological system, which directly lays its unified cloud big data field the dominance. Spark along with the popularization of technology, the demand for professionals is growing. Spark professionals is also sought in the future, you can easily get a million salary. And in order to become a master Spark also need one by one, from practicing internal strength: Generally need to go through the following stages:


webp

First stage: a skilled master Scala language

1, Spark framework is the use of Scala language, refined and elegant. Spark To become a master, you have to read the Spark source code, you must master Scala ,;

2, although now the Spark can be multi-language Java, Python and other application development, but the fastest and best development API support is still and always will be the way of the Scala API, so you must master Scala to write complex Spark and high-performance distributed applications;

3, in particular, the master Scala trait, apply, function programming, generic, and inverter covariant and the like;

[Big Data to develop learning materials collection method: Join Big Data learning technology exchange group 458,345,782, click a group chat, private letters administrator can receive a free

Phase II: Spark proficient platform itself available to the developer API  

1, for master RDD Spark development model, the master and the action of various functions using the transformation;

2, the master Spark dependence and narrow width and lineage-dependent mechanism;

3, master calculation process RDD, such as Stage division, Spark application submitted works to the basic process and Worker cluster node basis, etc.

Phase III: Spark-depth core

This stage is the core part of the Spark to in-depth study by Spark source framework:

1, through the grasp Spark source of job submission process;

2, through the source code control task scheduling Spark clusters;

3, especially proficient DAGScheduler, every step of the detail work inside the TaskScheduler and Worker nodes;

Fourth Estate: grasp the core framework based on the use Spark

Spark as a master of the big data era of cloud computing, real-time stream processing, drawing techniques, machine learning, NoSQL query and so has a significant advantage, when we use the Spark in use most of the time frame on which such as Shark, Spark Streaming and so on:

1, Spark Streaming is a very good real-time stream processing framework, to master its DStream, transformation and checkpoint etc;

2, Spark off-line statistical analysis, Spark 1.0.0 version launched on the basis of Shark Spark SQL, efficient off-line statistical analysis capabilities have significantly improved, it is important to grasp;

3, for the Spark such as machine learning and GraphX ​​to master its principles and usage;

Fifth class: do business level Spark project

Spark project through a complete representative to run through all aspects of Spark, including architectural design of the project, the analysis used technology to achieve development, operation and maintenance, a complete grasp at every stage and the details, so you can make You'll be able to calmly face the vast majority of Spark project.

Sixth class: Provides Spark Solutions

1, a thorough grasp every detail Spark framework source;

2, there is provided Spark solutions under different scenarios according to the needs of different business scenarios;

3, according to actual needs, secondary development framework on the basis of Spark, Spark build their own framework;

The previously described six stages Spark becomes the master in the first and second stages can step through learning, a subsequent stage is preferably carried out by three expert or expert guidance next step, a final stage, the basic that is, to the "no stroke win a trick," the period, much to comprehend the intentions to complete.


Guess you like

Origin blog.51cto.com/14217196/2405197