How to use Pyspark with different run modes in Jupyter

Assuming that your environment has the following things installed, how to install them in detail is beyond the scope of this article.
Specific suspicious references Three minutes to get jupyter and pyspark integration

  1. anaconda
  2. findspark
  3. pyspark

How to run pyspark in different modes

We all know that spark is divided into local, standalone, yarn-client, yarn-cluster and other operating modes. Since you want to use jupyter, you naturally want to be interactive, so how to interact in different modes?

The author summarizes as follows:

  1. local mode

import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext("local", "First App")

2.standalone
requires incoming address and port

import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext("spark://192.168.5.129:7077", "First App")

3.yarn-client

import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext("yarn-client", "First App")

3. The yarn-cluster
cluster mode is generally used directly for execution after the development is completed. It is not suitable for the interactive mode. The author has not tried it. I will not introduce it here.

About SparkContext

In fact, in the SparkContext class, the parameters that can be passed in each location correspond to the shell command line. After noticing this, you can see what values ​​each parameter can accept by looking at the documentation. For details, see the official spark documentation .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324999236&siteId=291194637