Summary of common problems running Spark locally under Windows operating system

Summary of common problems running Spark locally under Windows operating system

Preface

Spark Structured Streaming+Kafka+Hbase Scala version tutorial, overall entrance.

text

There are many pitfalls when running Spark under Windows. Here is a summary of some of the main problems I encountered. I will continue to update when I encounter new problems in the future.

1.winutils.exe

Exception:
Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executablenull\bin\winutils.exe in the Hadoop binaries.
Solution:
This exception does not matter, it does not affect the running of the program. I Just ignore it.
However, the Internet said that downloading winutils.exe and placing it under windows/System32 also said that the HADOOP_HOME environment variable was set. Anyway, I tried everything mentioned on the Internet, but it didn't work. ╮(╯▽╰)╭But
I have actually solved this problem. For the specific solution, please see question 3.
Here I give 2 download addresses that should be genuine:
1. HADOOP environment download address:
http://archive.apache.org/dist/hadoop/core/
After entering, find your corresponding version. The cluster HADOOP version I use is 3.1.1
So I downloaded 3.1.1, the address is as follows
http://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/hadoop-3.1.1.tar.gz
2.winutils. exe download address, I don’t know if this is official, but it is really complete
https://github.com/cdarlint/winutils
Insert image description here
3. I copied all the files in the bin of winutils.exe in step 2 to the bin of hadoop-3.1.1 downloaded in step 1. In this way, my local environment should be the most complete. Of course, winutils.exe I also put hadoop.dll and hadoop.dll into Windows/System32 as mentioned on the Internet.

2.null chmod 0644

Exception:
(null) entry in command string: null chmod 0644
Solution:
Download the hadoop.dll file
and copy it to the c:\windows\system32 directory
in the download address in question 1, which contains hadoop.dll and winutils. exe
https://github.com/cdarlint/winutils

3.HADOOP_HOME

Exception:
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
Solution:
It is recommended online to set the windows environment variable HADOOP_HOME to point to the hadoop environment in question 1, but it has never worked for me. , and this problem only occurred after I introduced HBASE dependency. There was no such problem before the following configuration was added to pom.xml

		<dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.2.3</version>
        </dependency>

There is no way to solve this problem, so I also fixed the winutils.exe problem in question 1, which I said I didn’t like to do. ╮(╯▽╰)╭ Unfortunately
, nothing mentioned on the Internet works. Here is a little trick I summarized, which is to change it directly according to the exception prompt. If the exception prompts that hadoop.home.dir is not set, then I directly change it in the first line of the code. Set it up.

object CommonTaskLocal {
    
    
  def main(args: Array[String]): Unit = {
    
    
  //就这一行代码直接搞定,hadoop环境直接去问题1给的地址里下
    System.setProperty("hadoop.home.dir", "C://hadoop-3.1.1")

Before adding this sentence, the abnormal effect
Insert image description here

After adding that sentence, the results are displayed directly, and the exception of question 1, Could not locate executablenull\bin\winutils.exe, is no longer prompted.
Insert image description here

4.log4j.properties

Structured Streaming supports not logback.xml but log4j.properties by default. Why is there this item in the question? Because when I run it locally, it will print a lot of problems that I don’t want to see, which affects my ability to see the running structure of the program, so I need to solve this. Question, below is my log4j.properties. After editing, just put it in the resources folder.

# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=ERROR, stdout, file, errorfile

#其他配置基本网上找就行,这里log4j.logger.后边换成自己的程序包路径我的程序是在com.lwb这个路径下
log4j.logger.com.lwb=DEBUG


# stdout appender is set to be a ConsoleAppender.
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=DEBUG
# for debug trace
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
#log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] %-5p %c{
    
    1} %x - %m%n
log4j.appender.stdout.layout.ConversionPattern=%d{
    
    HH:mm:ss,SSS} %-5p [%t] %c{
    
    1} %M %L %x - %m%n

Guess you like

Origin blog.csdn.net/lwb314/article/details/113941994
Recommended