Problems encountered by Spark and solutions (suitable for Xiaobai)

Before today is over:

Here's wishing everyone a happy new year!

These are the problems and solutions I have encountered in learning Spark recently (suitable for Xiaobai)

1. How to open Hadoop cluster?

answer:

To open a Hadoop cluster, you can open it all by entering the command: start-all.sh

It can also be opened step by step, such as opening the yarn cluster (responsible for resource management) and entering the command: start-yarn.sh

Then open the hdfs cluster (distributed storage system) and enter the command: start-dfs.sh

2. How to open pychark local mode (in this case, it is a stand-alone operation not a cluster)?

answer:

Switch to /export/server/spark/bin (the path varies from person to person)

Directory input command: ./pyspark

You can enter node1:4040 to open the web page to view the running status of the program

Just enter this command to open stand-alone mode by default

If you want to open pyspark, you must first open the hadoop cluster

3. How to open pychark (running in spark cluster)?

answer:

If you want to open pychark to run on the cluster, you need to enter the address of the cluster, such as what I am learning now, after opening the cluster, open the web page node1:8080, then copy the above address and enter the command in the terminal: ./pyspark --master spark:/ /node1:7077

4. How to open the history server of yarn?

answer:

Enter the command: mr-jobhistory-daemon.sh start historyserver

If you want to open the spark history server, enter the command (in the spark directory): sbin/start-history-server.sh

5. How to configure ssh to remotely interpret python code with a linux cluster interpreter?

answer:

First connect the corresponding node and user, and then enter the user password

Then fill in the address of the python interpreter in linux, and the application confirmation will be successful. The premise is that pycharm needs a professional version (cracked version is also available)

6. How to submit Spark application?

answer:

Upload the program code to the server and submit it through the spark-submit client tool

Notice:

  1. Do not set the master in the code, if you set the spark-submit submission tool based on the code, it will be invalid

  1. When submitting a program to the cluster to run, the read file must be an address that can be accessed by each machine, such as uploading to HDFS. If it is on a Linux local machine, each machine must have this file.

at last:

I hope you all have a happy Chinese New Year in 2023! The new year starts, and we go to the new journey together. I wish everyone: Dazhanhong "rabbit", money "rabbit" like brocade, eyebrows "rabbit" spirit, "rabbit" healthy! In the new year, I wish you money "rabbit" worry-free, money "rabbit" like brocade, "rabbit" suddenly rich, great development "rabbit", good things come to "rabbit".

Well, that's all for today's sharing, if there is something unclear or I wrote something wrong, please give me your advice!

Private message, comment me! ! ! ! ! !

Guess you like

Origin blog.csdn.net/hhR888888/article/details/128751026