解决ValueError: Cannot run multiple SparkContexts at once; existing SparkContext

一、问题描述

创建sparkcontext和SparkSession,连接spark集群时报错,如题ValueError: Cannot run multiple SparkContexts at once; existing SparkContext

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.functions import min, max
from pyspark.sql.functions import monotonically_increasing_id, lit, col, struct
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql.functions import udf
from pyspark import SparkContext, SparkConf
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, ArrayType, DoubleType, FloatType
from pyspark.ml.feature import Word2Vec
import os
import pandas as pd
import numpy as np
from tqdm import tqdm 
import heapq

# 创建 sparkcontext
# sc.stop()
# local[*]时最大线程数
conf = SparkConf().setAppName("data_process_first").setMaster("local[*]")
sc = SparkContext(conf=conf)

# 创建SparkSession,连接spark集群
spark = SparkSession.builder.appName('mypyspark_test1') \
        .master("local")\
        .config("spark.driver.memory","30G")\
        .config("spark.executor.memory","30G")\
        .getOrCreate()

二、解决方法

因为之前已经启动了sparkContext,不能重复启动,所以在SparkConf()上一句运行sc.stop()即可,即关闭spark集群。
在这里插入图片描述
sparkContext即完成了一个spark集群的连接,可以在该集群上创建RDD和广播变量。创建时至少传入master(上面栗子local[*]是使用当前所有的线程运行)和appname两个参数。

Reference

[1] Spark启动时的master参数以及Spark的部署方式

猜你喜欢

转载自blog.csdn.net/qq_35812205/article/details/124395130