win10 desktop kubernetes deploy spark

Prerequisites:

  • Have installed the desktop version of docker and kubernetes of win10
  • Ingress has been installed on kubernetes

The following steps are referenced from:
https://testdriven.io/blog/deploying-spark-on-kubernetes/

  1. spark docker image:
    Dockerfile:
FROM java:openjdk-8-jdk

# define spark and hadoop versions
ENV SPARK_VERSION=3.0.0
ENV HADOOP_VERSION=3.3.0

# download and install hadoop
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
        tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
    ln -s hadoop-${HADOOP_VERSION} hadoop && \
    echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native

# download and install spark
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
        tar -zx && \
    ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
    echo Spark ${SPARK_VERSION} installed in /opt

# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin

You can find the corresponding Dockerfile file in this github repository here.
Repository address: https://github.com/testdrivenio/spark-kubernetes

Compile the image:

docker build -f docker/Dockerfile -t spark-hadoop:3.0.0 ./docker

If you don’t want to compile the image locally, you can directly use the compiled docker image. The name of the image on dockerhub is mjhea0/spark-hadoop:3.0.0
docker pull down and use the docker tag command to rename spark-hadoop:3.0.0. Up

docker image ls spark-hadoop

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
spark-hadoop        3.0.0               8f3ccdadd795        11 minutes ago      911MB

Spark Master

spark-master-deployment.yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-master
spec:
  replicas: 1
  selector:
    matchLabels:
      component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: spark-hadoop:3.0.0
          command: ["/spark-master"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

spark-master-service.yaml:

kind: Service
apiVersion: v1
metadata:
  name: spark-master
spec:
  ports:
    - name: webui
      port: 8080
      targetPort: 8080
    - name: spark
      port: 7077
      targetPort: 7077
  selector:
    component: spark-master

Deploy the spark master and start the service:

kubectl create -f ./kubernetes/spark-master-deployment.yaml
kubectl create -f ./kubernetes/spark-master-service.yaml

verification:

$ kubectl get deployments

NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           12s


$ kubectl get pods

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-6c4469fdb6-rs642   1/1     Running   0          6s

Spark Workers

spark-worker-deployment.yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-worker
spec:
  replicas: 2
  selector:
    matchLabels:
      component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: spark-hadoop:3.0.0
          command: ["/spark-worker"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: 100m

deploy

$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml

verification:


NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           92s
spark-worker   2/2     2            2           6s


$ kubectl get pods

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-6c4469fdb6-rs642   1/1     Running   0          114s
spark-worker-5d4bdd44db-p2q8v   1/1     Running   0          28s
spark-worker-5d4bdd44db-v4d84   1/1     Running   0          28s

Ingress

ingress is used to set up the web interface to access the spark master (the code in the github repository is used to set up minikube, this is different, the version of ingress needs to be updated a bit)
minikube-ingress.yaml (original version) :

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: minikube-ingress
  annotations:
spec:
  rules:
  - host: spark-kubernetes
    http:
      paths:
      - path: /
        backend:
          serviceName: spark-master
          servicePort: 8080

minikube-ingress.yaml (modified version, applicable to kubernetes version 1.19 and later, ingress-controller needs to be installed in advance):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - http:
       paths:
       - path: /
         pathType: Prefix
         backend:
           service:
             name: spark-master
             port:
               number: 8080

Create ingress object:

$ kubectl apply -f ./kubernetes/minikube-ingress.yaml

Visit http://127.0.0.1 to see the management interface of spark master, as follows
Insert picture description here

test

Start pyspark in spark-master

$ kubectl get pods

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-6c4469fdb6-rs642   1/1     Running   0          10m
spark-worker-5d4bdd44db-p2q8v   1/1     Running   0          8m42s
spark-worker-5d4bdd44db-v4d84   1/1     Running   0          8m42s

$ kubectl exec spark-master-6c4469fdb6-rs642 -it -- pyspark
words = 'the quick brown fox jumps over the\
        lazy dog the quick brown fox jumps over the lazy dog'
seq = words.split()
data = spark.sparkContext.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)

After that you can see the following result:

{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}

Guess you like

Origin blog.csdn.net/yao_zhuang/article/details/113920860