HDFS common operations:
1. Start Hadoop, create a user directory "/ user / hadoop" in HDFS;
$cd /usr/local/hadoop $. / Sbin / Start- dfs.sh start HDFS # $. / Bin / HDFS the DFS -mkdir -p / user / hadoop # Create a user directory / user / hadoop in HDFS
2. Under the new "/ home / hadoop" directory on the local file system of Linux systems a text file test.txt, and just type something in the file, and then uploaded to the HDFS "/ user / hadoop" directory;
$cd /home/hadoop $vim test.txt # Casually enter some content in test.txt and stored exiting vim editor $cd /usr/local/hadoop $./bin/hdfs dfs -put /home/hadoop/test.txt /user/hadoop
3. HDFS in the test.txt file in the "/ user / hadoop" directory, downloaded to the local file system Linux system in "/ home / hadoop / download" directory;
$ cd /usr/local/hadoop
$. / Bin / HDFS -get /user/hadoop/test.txt the DFS / Home / hadoop / Download
4. The contents of HDFS test.txt file under the "/ user / hadoop" directory is output to the terminal for display;
$ cd /usr/local/hadoop
$./bin/hdfs dfs -cat /user/hadoop/test.txt
5. Under HDFS in "/ user / hadoop" directory, create a subdirectory input, the test.txt file in HDFS in "/ user / hadoop" directory, copied to "/ user / hadoop / input" directory;
$ cd /usr/local/hadoop $./bin/hdfs dfs -mkdir /user/hadoop/input $./bin/hdfs dfs -cp /user/hadoop/test.txt /user/hadoop/input
6. Delete HDFS in test.txt file in the "/ user / hadoop" directory, delete all the contents input HDFS subdirectory and its subdirectories "/ user / hadoop" directory.
$ cd /usr/local/hadoop $./bin/hdfs dfs -rm /user/hadoop/test.txt $./bin/hdfs dfs -rm -r /user/hadoop/input
Data Spark read the file system
1. Read the local file system Linux in spark-shell in "/home/hadoop/test.txt", then the statistics of the number of lines in the file;
$ cd /usr/local/spark $./bin/spark-shell scala>val textFile=sc.textFile("file:///home/hadoop/test.txt") scala>textFile.count()
2.在 spark-shell 中读取 HDFS 系统文件“/user/hadoop/test.txt”(如果该文件不存在,请先创建),然后,统计出文件的行数;
scala>val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/test.txt")
scala>textFile.count()