Big data development engineer finished

download: Big Data Development Engineer

Big Data Development Engineer [End]
The technology system in this set of big data courses includes the current mainstream Hadoop, Spark, and Flink three technology ecosystems, covering the most common technical components in the enterprise, and can meet the needs of everyone in the company.

Q: How long will this course take? What level can I reach after studying?
The time to complete this set of big data is related to everyone’s foundation, acceptance ability and time arrangement. Generally speaking, if you can guarantee 1 hour of class every day and practice for at least 2 hours, you can finish it in 3~4 months. of. It is recommended to maintain continuous learning, so that the learning effect is better, and to do a good preview through the mind map of the video, and the e-book to consolidate the video content. After learning, you can reach the level of middle-level big data engineer and meet the big data job needs of most companies.

Q: Is this big data course enough for work?
Enough, the current technology system in this set of big data courses includes the current mainstream Hadoop, Spark, Flink three major technology ecosystems, covering the most common technology components in the enterprise, can meet everyone's work needs in the company.

Q: I am currently a java programmer, and big data is a zero foundation, can I learn it well?
Yes, java programmers have a natural advantage in learning big data. Most of the technical frameworks in big data are developed based on java, and it is easy to learn. In addition, this set of courses is accompanied by a complete e-book, which is convenient for you to check and fill vacancies in time, and this set of video tutorials comes with supporting subtitles, which will make learning easier.

一、WordCount

(Emphasize WordCount programming as the focus and stop practicing, if there are repeated steps in the following example, simply skip it)

1. Open idea and create scala project

Among them, JDK and Scala SDK are the ways of java and scala

2. Create two subdirectories under the src folder, one cluster is used for running spark, and the other local is used for debugging on idea. (The out directory and META-INF are automatically generated after the jar package is created, but not at the beginning) Then create scala.class in two folders respectively

3. Then if we want to stop spark programming, we have to import spark related packages

File → Projecte Structure → Libraries → "+" → Java → select the jars folder in the spark directory

ps: In fact, our programming does not use all the packages in this directory for the time being. We can only import the required packages, but it takes time to find; it can also import all, but the entire project will become bloated, then click OK and then click OK , Back to the interface, our related packages are imported

4. The next step is the formal programming, let’s start with the code of WordCount

//Refers to the directory of cluster

package cluster

//Imported SparkConf and SparkContext two classes of spark

import org.apache.spark. {SparkConf, SparkContext}

object WordCount {

def main(args: Array[String]) {

if (args.length < 1) {

System.err.println("Usage: ")

System.exit(1)

}

//Instantiate the configuration, used to edit the related information of our task, the later setAppName can be set when the console is running

val conf = new SparkConf().setAppName("MySparkApp")

// sc is Spark Context, which refers to the "context", which is the environment in which we operate, and we need to pass conf as a parameter;

val sc = new SparkContext(conf)

//Get a text file (on hdfs) through sc, args(0) is the parameter passed in on our console, if local is running, it will pass in a local text path

val line = sc.textFile(args(0))

//The following is the detailed execution code of wordcount

line.flatMap(.split("")).map((, 1)).reduceByKey(+).collect().foreach(println)

sc.stop()

}

}

This is the code of WordCount, used to run on the Spark platform. If you need to test on idea, you can change args(0) to the path of a text file in detail. For example, create a data folder in the project directory and throw test.txt up, args(0) can be corrected to "data/test.txt"; then set sc to

val spark=new SparkContext("local","SparkPI")

Such a local form can be run directly on idea.

5. Packaged into a jar file

File → Projecte Structure → Artifacts → "+" → JAR → From modules with dependencies... (The meaning of this option is to encapsulate all the external packages we import, Empty means not counting the spark package we import )

Then the Main Class chooses our cluster, local is what we use for local testing, and the name of the Main Class must be remembered, and it will be used later on spark. Then click ok to create related files. If you have created it before, you need to delete the previous related information, that is, the META-INF folder under the project to successfully create it.

Back to the main interface, then Build → BuildArtifacts can create the jar package by itself.

6. Idea will put the created jar package into the out folder under the project, we find it, for convenience, put the jar package into the spark directory, then open it, enter the META-INF folder, and put the suffix name Delete the files of .DSA .SF .RSA,

As the investigation materials said that there are problems with the signature of some packages, it will cause us to find the main class during operation. In fact, it is indeed...

7. We enter the spark directory, submit the task through spark-submit under the bin folder, and execute the command ./bin/spark-submit -h to obtain the assistance documents. I will pick a few commonly used ones:

8.

Basic format:

Usage: spark-submit [options] [app arguments]

Usage: spark-submit --kill [submission ID] --master [spark://...]

Usage: spark-submit --status [submission ID] --master [spark://...]

Usage: spark-submit run-example [options] example-class [example args]

Guess you like

Origin blog.51cto.com/15079112/2592048