Flink Stream Batch Integrated Computing (9): Flink Python

Table of contents

Use Python dependencies

Use a custom Python virtual environment

Method 1: Create a Python virtual environment on a node in the cluster

Method 2: Create a Python virtual environment on the local development machine

Use JAR package

use data file


Use Python dependencies

Let me introduce you how to use Python dependencies through the following scenarios:

  • Use a custom Python virtual environment
  • Use third-party Python packages
  • Use JAR package
  • use data file

Use a custom Python virtual environment

Method 1: Create a Python virtual environment on a node in the cluster

set -e

# Create a virtual environment for Python .

python3.6 -m venv venv

#Activate the Python virtual environment.

source venv/bin/activate

# Prepare the Python virtual environment.

pip install --upgrade pip

#Install PyFlink dependencies .

pip install "apache-flink==1.13.0"

#Exit the Python virtual environment.

deactivate

After the command is executed, a directory named venv will be generated, which is the virtual environment of Python 3.6. You can also modify the above script to install other versions of the Python virtual environment.

In order to use the Python virtual environment, you can choose to distribute the Python virtual environment to all nodes in the cluster, or you can specify to use the Python virtual environment when submitting a PyFlink job.

The following commands show different PyFlink job submission use cases:

  • Execute PyFlink job:
$ ./bin/flink run --python examples/python/table/batch/word_count.py
  • Run a PyFlink job with pyFiles and the main entry module specified in --pyModule:
./bin/flink run \
--pyModule batch.word_count \
--pyFiles examples/python/table/batch
  • Submit a PyFlink job on a JobManager running on a specific host <jobmanagerHost> (adjust the command accordingly):
$ ./bin/flink run \
 --jobmanager <jobmanagerHost>:8081 \
 --python examples/python/table/batch/word_count.py
  • Run the PyFlink job in Per-Job mode on the yarn cluster:
$ ./bin/flink run \
--target yarn-per-job
--python examples/python/table/batch/word_count.py

Method 2: Create a Python virtual environment on the local development machine

set -e

#Download the Python 3.7 miniconda.sh script.

wget "https://repo.continuum.io/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh" -O "miniconda.sh"

# Add execute permission for Python 3.7 miniconda.sh script.

chmod +x miniconda.sh

# Create a virtual environment for Python .

./miniconda.sh -b -p venv

# Activate the Conda Python virtual environment.

source venv/bin/activate ""

# Install PyFlink dependencies.

pip install "apache-flink==1.13.0"

# Exit the Conda Python virtual environment.

conda deactivate

# Delete cached packages.

rm -rf venv/pkgs

# Package the prepared Conda Python virtual environment.

zip -r venv.zip venv

After the command is executed, a file named venv.zip will be generated, which is the virtual environment of Python 3.7. You can also modify the above script to install a different version of the Python virtual environment, or install the required third-party Python package in the virtual environment.

Use JAR package

If Java classes are used in your Flink Python job, for example, when Connector or Java-defined functions are used in the job, you need to specify the JAR package where the Connector or Java-defined functions are located.

PyFlink jobs referencing Java UDFs or external connectors. The JAR file specified in --jarfile will be uploaded to the cluster.

$ ./bin/flink run \
--python examples/python/table/batch/word_count.py \
--jarfile <jarFile>

use data file

If you need to access data files, such as model files, in your Flink Python job, you can access them through Python Archives .

  • Execute PyFlink job, add source and resource files, the files specified in --pyFiles will be added to PYTHONPATH, so they are available in Python code.
$ ./bin/flink run \
--python examples/python/table/batch/word_count.py \
--pyFiles file:///user.txt,hdfs:///$namenode_address/username.txt
 

Guess you like

Origin blog.csdn.net/victory0508/article/details/131452205