Common commands of hadoop and yarn

1. Hadoop
official website: https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
The commands under hadoop fs are more commonly used, go to the official website and

check . gz file content:
quote

No need to decompress the whole file: hadoop fs -cat /hdfs_location/part-00000.gz | zcat | head -n 20
  or hadoop fs -cat /hdfs_location/part-00000.gz | zmore
need to decompress the whole file: hadoop fs -text /myfolder /part-r-00024.gz | tail

See: https://stackoverflow.com/questions/31968384/view-gzipped-file-content-in-hadoop

View the file content of .bz2:
Similar to viewing .gz, just replace zcat with bzcat, or zmore

2. Yarn
official website: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html
  • kill任务: yarn application -kill application_1491058351375_633399
  • View logs: yarn logs -applicationId application_1491058351375_633399 | less


3. For the spark startup command,
see: https://spark.apache.org/docs/1.6.1/running-on-yarn.html
Note one parameter: The memory of the JVM itself: spark.yarn.executor.memoryOverhead

4. Spark local Debugging
1. In the pom.xml of the maven project, set all packages to compile instead of provided, so that spark is packaged into the jar.
2. Add the following configuration to Run->Edit Configuration in IntelliJIDEA, and set the JVM parameters to:
-Dspark.master=local[2] -Dspark.driver.memory=2g -Dspark.app.name=SparkPi
spark configuration see: https://spark.apache.org/docs/latest/configuration.html#application- properties

3. It is necessary to ensure that the locally installed scala version is consistent with the required version of spark.
  For spark1.6, scala2.10.x should be installed.
  For spark2.x, scala2.11.x should be installed.

5. Spark local debugging - method 2
1.to https://spark.apache.org/downloads.html to download spark-2.2.1-bin-hadoop2.7.tgz (or other Pre-build version)
2. Unzip to any folder, create a new scala project
in IDEA 3. Add the jars folder in the unzipped path to File -> Project Structure -> Modules -> Dependencies in IDEA (the programs such as hadoop, spark, etc. already included) are
completed. After the above 3 steps, you can run
4. (Optional) Solve the
problem that winutils.exe
cannot be found downloaded from tree/master/hadoop-2.7.1/bin
, put it in the spark_home/jars/bin/ folder, and set the HADOOP_HOME environment variable to point to the spark_home/jars folder

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326282300&siteId=291194637