Spark how to become a master?

 

How to handle large data can become a little faster, the answer is to ask a spark, because it is based on memory, can effectively reduce the number of landing data. Spark Hadoop performance than a hundred times, from multiple iterative batch processing, data warehousing eclectic, stream processing and computing various computing paradigm, rare all-rounder.

 

 

Spark adopt a unified technology stack to resolve all the core issues, such as stream processing, drawing techniques, machine learning, NoSQL Query cloud computing big data, with a sound ecological system, which directly lays its unified cloud big data field the dominance.

Spark along with the popularization of technology, the demand for professionals is growing. Spark professionals is also sought in the future, you can easily get a million salary. And in order to become a master Spark also need one by one, from practicing internal strength: Generally need to go through the following stages:


Share Big Data learning exchange qq group 606 859 705 recommendation before I created

Whether it is big or want to switch to cow college students want to learn

I welcome all kind of small series, today's data files have been uploaded to the group, from time to time to share dry,

Including my own sort of an updated tutorial for large data 2019 study, welcome beginners and advanced junior partner



A: a skilled master Scala language

Spark framework is the use of Scala language, refined and elegant. Spark To become a master, you have to read the Spark source code, you must master Scala.

Although the Spark can now multi-language Java, Python and other application development, but the fastest and best development API support is still and always will be the way of the Scala API, so you have to master complex and high Scala to write Spark performance of distributed applications

 

 

In particular, the master Scala trait, apply, function programming, generic, and inverter covariant and the like;


II: Spark proficient platform itself available to the developer API

RDD master for the development model Spark track the action and various functions using the transformation; Spark control of the wide and narrow-dependent and lineage-dependent mechanism;

RDD master the calculation process, such as Stage division, Spark application submitted to the basic process of cluster and node-based works Worker, etc.


III: Spark-depth core

This stage is the core part of the Spark to in-depth study by Spark source framework:

Spark source code control tasks by submission process; by Spark source code control task scheduling cluster; in particular, to master the details of the work DAGScheduler, TaskScheduler and internal Worker nodes each step;


Four: grasp the core framework based on the use Spark

Spark cloud computing as a master of the big data era, has significant advantages in terms of real-time stream processing, drawing techniques, machine learning, NoSQL query, we use Spark most of the time when in use on its frame, such as Shark , Spark Streaming and so on.

Spark Streaming is a very good real-time stream processing framework, to master its DStream, transformation and checkpoint etc;

Spark off-line statistical analysis, Spark 1.0.0 version launched on the basis of Shark Spark SQL, off-line statistical analysis of functional efficiency has significantly improved, it is important to grasp;

For the Spark such as machine learning and GraphX ​​to master its principles and usage;


Five: do business level Spark project

Spark project through a complete representative to run through all aspects of Spark, including architectural design of the project, the analysis used technology to achieve development, operation and maintenance, a complete grasp at every stage and the details, so you can make You'll be able to calmly face the vast majority of Spark project.


Six: Spark provide solutions

Thorough grasp every detail Spark framework source; Spark providing solutions in different scenarios according to the needs of different business scenarios; according to actual needs, secondary development framework on the basis of Spark, Spark build their own framework;

The previously described six stages Spark becomes the master in the first and second stages can step through learning, a subsequent stage is preferably carried out by three expert or expert guidance next step, a final stage, the basic that is, to the "no stroke win a trick," the period, much to comprehend the intentions to complete.

Guess you like

Origin blog.csdn.net/qq_41753040/article/details/90345715