Shell in HDFS


Shell: provides software for users to interact with the system using the interface. The
shell is divided into:

  • GUI Shell
  • Command-line Shell

Commands in the shell

  • -ls: view the directory structure of the specified path
  • -du: count the size of all files in the directory
  • -mv: move files
  • -cp: copy files
  • -rm: delete files / blank folders
  • -cat: View the contents of the file
  • -text: The source file is output in text format
  • -mkdir: create a blank folder
  • -put: upload files
  • -help: View command help

Case: Shell collects data regularly into HDFS

step:

  • Configure environment variables: Set java environment variables and hadoop environment variables in the shell script
    (the purpose of this is to improve the reliability of the system and ensure that the machine running the program can still run the script without configuring environment variables)
  • Prepare log storage directory and file to be uploaded: set a log storage directory and file to be uploaded directory in the shell script
  • Set the long path of log files
  • Implement file upload: Move the file to the directory to be uploaded first, and then upload it from the directory to be uploaded to HDFS (use Linux Crontab expression to perform scheduled tasks * * * * * time sharing day, month and week)
  • Execution program displays the results
1. Build the project environment (new Maven project)

Insert picture description here
Insert picture description here
Insert picture description here
The following error is reported because Maven is not installed (you can also click ok first, don't care)
Insert picture description here
Insert picture description here

Install Maven

Click Download Maven to
Insert picture description here
download and unzip it into a folder, configure the environment variables. After
Insert picture description here
adding to the Path
Insert picture description here
configuration, open cmd and view the version information.
Insert picture description here
Under the maven installation directory, create a new directory
Insert picture description here
setting file to modify the warehouse path.
Insert picture description here
Open eclipse
Insert picture description here
Insert picture description here
configuration and restart eclipse, just Alright
Insert picture description here

Nodes connected to hadoop cluster

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
After the new maven project is built, you may get an error, don't worry.
Configure pom.xml file

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.itcast</groupId>
  <artifactId>HadoopDemo</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <dependencies>
  	<dependency>
  		<groupId>org.apache.hadoop</groupId>
  		<artifactId>hadoop-common</artifactId>
  		<version>2.7.4</version>
  	</dependency>
  	<dependency>
  		<groupId>org.apache.hadoop</groupId>
  		<artifactId>hadoop-hdfs</artifactId>
  		<version>2.7.4</version>
  	</dependency>
  	<dependency>
  		<groupId>org.apache.hadoop</groupId>
  		<artifactId>hadoop-client</artifactId>
  		<version>2.7.4</version>
  	</dependency>
  	<dependency>
  		<groupId>junit</groupId>
  		<artifactId>junit</artifactId>
  		<version>RELEASE</version>
  	</dependency>
  	        <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.8</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
  </dependencies>
  
</project>

After the configuration is complete, the required jar package will be automatically loaded, and you will not get an error.

Second, prepare the log storage directory and files to be uploaded
Third, set the log file upload path
Fourth, to achieve file upload
Fifth, the execution program displays the operation results
Published 72 original articles · praised 3 · visits 3548

Guess you like

Origin blog.csdn.net/id__39/article/details/104905028