Semester Summary (spark)

·

After studying the course for one semester, we all know that Scala is the abbreviation of scalable language. It is a multi-paradigm programming language. It was designed by Martin Odersky of the Federal Institute of Technology in Lausanne in 2001 based on the work of funnel. The original intention of the design is to integrate Various features of object-oriented programming and functional programming.

1. Overview of Spark—Scala

Scala is a high-level language that combines object-oriented and functional programming and aims to express common programming patterns in a concise, elegant, and type-safe manner. Scala is powerful enough not only to write simple scripts, but also to build large systems.

Scala runs on the Java platform, and the Scala program will be compiled into a class bytecode file through the jvm, and then run on the operating system. Its runtime performance is usually comparable to that of Java programs, and Scala code can call Java methods, inherit Java classes, implement Java interfaces, etc. Almost all Scala codes use Java class libraries extensively.

Scala is fully compatible with Java. In fact, Scala adds a layer of coding "shell" on the basis of Java language, so that programmers can develop programs through functional programming. Since Scala is finally compiled into .class, it is still Java in essence, so Java API can be called arbitrarily in Scala. The benefits are obvious: Jva programmers can switch to Scala more easily; the original Java API can still be used in Scala; the Java platform in the company can use Scala without replacement.

2. Functional programming

·Functional programming: split the solution of all complex problems into several functions. Each function can realize a part of the function, use the processing of the function many times, and finally solve the problem.

Compared with object-oriented programming, functional programming is more abstract. The advantage is that the code can be very concise, and more constants are used instead of variables to solve problems. This additional benefit: when threads are concurrent, multi-threading can be reduced or even eliminated Concurrency security issues are especially suitable for applications dealing with high-concurrency and distributed scenarios. Functional programming can use higher-order functions, functions are first-class citizens, and can be written more flexibly.

·Functional programming is not the development of object-oriented programming, but another way to solve problems. There is no absolute difference between the two, and each has its own advantages and disadvantages in different scenarios.

Introduction and installation of spark

First, we need to access Scala online tools in the browser:

· View code

Select the version of Scala, to select version 2.11.12 Scala

3. Install Scala on Windows

Go to Scala official website to download

Then download the Scala installer locally

During the period, because the download path was incorrect, the following programs could not be installed normally, and then downloaded and installed from the official website to the local, and then the environment variables of Scala should be configured to test whether Scala is successfully installed and start the Scala execution statement. Test whether Scala is installed successfully, check the Scala version, start Scala, execute statements, and install Scala on linux

Fourth, log in to the ied virtual machine

1. Use FinalShell to log in to the ied virtual machine on the win7 virtual machine—upload the Scala installation package to the ied virtual machine—unzip the Scala installation package to the specified directory

2、tar -zxvf scala-2.11.12.tgz -C /usr/local

3. Configure Scala environment variables - after saving and exiting, execute the command: source/etc/profile

4. Test whether Scala is installed successfully

5. There are two modes in the use of Scala: interactive mode and compilation mode

Then learn Scala variables and data types

6. In the intellij IDEA development environment where Scala is built, scalafmt is installed, and when creating a project and selecting a path, I always get it wrong, either there is no new file or the wrong choice is made, and sometimes an error statement will be displayed after creating a project; it is also useful The factorial function is implemented to print a right triangle. Create RDD: In the cluster, use the parallelize() method to create RDD, and use the makeRDD method to create RDD. Errors always occur when executing commands. Later, the errors are eliminated with the help of classmates.

7. SparkSQL case analysis - create a Maven project, add dependencies and build plug-ins, modify the name of the source directory, and change the source directory from Java to Scala.

Learning SparkSQL data source - Hive table

8. SparkSQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, which are not included in the default Spark distribution, Spark will automatically load these Hive dependencies if they are configured on the classpath. It is important to note that these Hive dependencies must be present on all Worker nodes, as they require access to the Hive serialization and deserialization library (SerDes) in order to access data stored in Hive.

9. Copy the Hive configuration file hive-site.xml to the Spark configuration directory, execute the command: cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf, enter the Spark configuration directory, and edit the Hive configuration file hive-site.xml Start Hive's metastore, start Spark Shell, import SparkSession and execute the command: import org.apache.spark.sqlSparkSession. Finally, view the generated hive table on the Hive client.

END

During this semester of study, I was very interested in this at the beginning, and followed the teacher’s lecture step by step. Sometimes I encountered more problems and errors, but in the end I solved a series of problems, and then I myself Record the wrong questions in the text, which is beneficial to the later work. In the future, not only in study but also in life, I will have a positive heart and a problem-solving attitude.

Guess you like

Origin blog.csdn.net/py20010218/article/details/125264624