一、报错:Py4JError: An error occurred while calling o46.fit
Environment: Centos7, Python3.7, spark2.4.6, java1.8.0_211, scala2.11.12
error code snippet:
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer
# (id, text, label)
training = spark.createDataFrame([
(0, "a b c d e spark", 1.0),
(1, "b d", 0.0),
(2, "spark f g h", 1.0),
(3, "hadoop mapreduce", 0.0)
], ["id", "text", "label"])
training.show()# tokenizer, hashingTF, and lr.
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
model = pipeline.fit(training)
The error report is roughly as follows: [It is recommended to filter the key parts by yourself]
Exception happened during processing of request from ('127.0.0.1', 48756)
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/root/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:44278)
py4j.protocol.Py4JError: An error occurred while calling o46.fit
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
py4j.protocol.Py4JError: An error occurred while calling o46.fit
During handling of the above exception, another exception occurred:
Py4JError: An error occurred while calling o46.fit
Analysis:
After running, this section will cause an error:
model = pipeline.fit(training)
It is an error caused by running on Alibaba Cloud ECS, Centos system, student computer. Then I searched the Internet for many reasons and solutions for the problems, and I found that they were all unsuccessful. Then I changed to a local virtual machine and ran it successfully. The environment of the local virtual machine is exactly the same as the student machine bought by Alibaba Cloud, but the configuration is different. Therefore, it should be a problem that the configuration is too low.
一、报错:NameError: name 'long' is not defined
Environment: Centos7, Python3.7, spark2.4.6, java1.8.0_211, scala2.11.12
The reason for the error: Is there no long type in Python3.x, only int type. There are both long and int types in Python 2.x.
Change long to int.