Common errors, problems and solutions of pyspark [continuous update].

一、报错:Py4JError: An error occurred while calling o46.fit


Environment: Centos7, Python3.7, spark2.4.6, java1.8.0_211, scala2.11.12
error code snippet:

from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer

# (id, text, label) 
training = spark.createDataFrame([
    (0, "a b c d e spark", 1.0),
    (1, "b d", 0.0),
    (2, "spark f g h", 1.0),
    (3, "hadoop mapreduce", 0.0)
], ["id", "text", "label"])
training.show()# tokenizer, hashingTF, and lr.
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

model = pipeline.fit(training)

The error report is roughly as follows: [It is recommended to filter the key parts by yourself]

Exception happened during processing of request from ('127.0.0.1', 48756)
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/root/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:44278)
py4j.protocol.Py4JError: An error occurred while calling o46.fit
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:
py4j.protocol.Py4JError: An error occurred while calling o46.fit

During handling of the above exception, another exception occurred:

Py4JError: An error occurred while calling o46.fit

Analysis:
After running, this section will cause an error:

model = pipeline.fit(training)

It is an error caused by running on Alibaba Cloud ECS, Centos system, student computer. Then I searched the Internet for many reasons and solutions for the problems, and I found that they were all unsuccessful. Then I changed to a local virtual machine and ran it successfully. The environment of the local virtual machine is exactly the same as the student machine bought by Alibaba Cloud, but the configuration is different. Therefore, it should be a problem that the configuration is too low.

一、报错:NameError: name 'long' is not defined


Environment: Centos7, Python3.7, spark2.4.6, java1.8.0_211, scala2.11.12

The reason for the error: Is there no long type in Python3.x, only int type. There are both long and int types in Python 2.x.

Change long to int.

Guess you like

Origin blog.csdn.net/qq_42658739/article/details/107784679