Study Notes --spark third experiment basis

HDFS common operations:

1. Start Hadoop, create a user directory "/ user / hadoop" in HDFS;

$cd /usr/local/hadoop
$. / Sbin / Start- dfs.sh start HDFS #
$. / Bin / HDFS the DFS -mkdir -p / user / hadoop # Create a user directory / user / hadoop in HDFS

 

 2. Under the new "/ home / hadoop" directory on the local file system of Linux systems a text file test.txt, and just type something in the file, and then uploaded to the HDFS "/ user / hadoop" directory;

$cd /home/hadoop
$vim test.txt
# Casually enter some content in test.txt and stored exiting vim editor
$cd /usr/local/hadoop
$./bin/hdfs dfs -put /home/hadoop/test.txt /user/hadoop

 

 3. HDFS in the test.txt file in the "/ user / hadoop" directory, downloaded to the local file system Linux system in "/ home / hadoop / download" directory;

$ cd /usr/local/hadoop
$. / Bin / HDFS -get /user/hadoop/test.txt the DFS / Home / hadoop / Download

 

 4. The contents of HDFS test.txt file under the "/ user / hadoop" directory is output to the terminal for display;

$ cd /usr/local/hadoop
$./bin/hdfs dfs -cat /user/hadoop/test.txt

 

 5. Under HDFS in "/ user / hadoop" directory, create a subdirectory input, the test.txt file in HDFS in "/ user / hadoop" directory, copied to "/ user / hadoop / input" directory;

$ cd /usr/local/hadoop
$./bin/hdfs dfs -mkdir /user/hadoop/input
$./bin/hdfs dfs -cp /user/hadoop/test.txt /user/hadoop/input

 

 6. Delete HDFS in test.txt file in the "/ user / hadoop" directory, delete all the contents input HDFS subdirectory and its subdirectories "/ user / hadoop" directory.

$ cd /usr/local/hadoop
$./bin/hdfs dfs -rm /user/hadoop/test.txt
$./bin/hdfs dfs -rm -r /user/hadoop/input

 

 Data Spark read the file system

1. Read the local file system Linux in spark-shell in "/home/hadoop/test.txt", then the statistics of the number of lines in the file;

$ cd /usr/local/spark
$./bin/spark-shell
scala>val textFile=sc.textFile("file:///home/hadoop/test.txt")
scala>textFile.count()

 

 2.在 spark-shell 中读取 HDFS 系统文件“/user/hadoop/test.txt”(如果该文件不存在,请先创建),然后,统计出文件的行数;

scala>val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/test.txt")
scala>textFile.count()

 

Guess you like

Origin www.cnblogs.com/zwang/p/12313338.html