[Error record] PySpark running error ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset )





1. Error message



Core error message:

  • WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
  • java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

In PyCharm, calling PySpark to perform calculation tasks will report the following error:

D:\001_Develop\022_Python\Python39\python.exe D:/002_Project/011_Python/HelloPython/Client.py
23/08/01 11:25:24 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/01 11:25:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PySpark 版本号 :  3.4.1
查看文件内容 :  ['Tom Jerry', 'Tom Jerry Tom', 'Jack Jerry']
查看文件内容展平效果 :  ['Tom', 'Jerry', 'Tom', 'Jerry', 'Tom', 'Jack', 'Jerry']
转为二元元组效果 :  [('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jack', 1), ('Jerry', 1)]
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
最终统计单词 :  [('Tom', 3), ('Jack', 1), ('Jerry', 3)]

Process finished with exit code 0

insert image description here





2. Solution (install Hadoop operating environment)



Core error message:

  • WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
  • java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

PySpark generally runs with the Hadoop environment, if the Hadoop runtime environment is not installed in Windows, the above error will be reported;

Hadoop releases are available for download at https://hadoop.apache.org/releases.html ;
insert image description here

The latest version is 3.3.6, click the binary (checksum signature) link under Binary download
insert image description here
to enter the Hadoop 3.3.6 download page:

insert image description here

The download address is:

https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

The official download speed is very slow;

insert image description here

Here is a Hadoop version, Hadoop 3.3.4 + winutils, CSDN 0 points download address:

After downloading, unzip Hadoop, the installation path is D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4;

insert image description here

In environment variables, set

HADOOP_HOME = D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4

System environment variables;

insert image description here

In the Path environment variable, add

%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin

environment variable;

insert image description here

Set JAVA_HOME in D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4\etc\hadoop\hadoop-env.cmd script to the real JDK path;

Will

set JAVA_HOME=%JAVA_HOME%

change into

set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_91

insert image description here

Copy the hadoop.dll and winutils.exe files in winutils-master\hadoop-3.3.0\bin to the C:\Windows\System32 directory;

insert image description here

Restart the computer, be sure to restart ;

Then on the command line, execute

hadoop -version

Verify that Hadoop is installed;

Guess you like

Origin blog.csdn.net/han1202012/article/details/132042385