table of Contents
Shell: provides software for users to interact with the system using the interface. The
shell is divided into:
- GUI Shell
- Command-line Shell
Commands in the shell
- -ls: view the directory structure of the specified path
- -du: count the size of all files in the directory
- -mv: move files
- -cp: copy files
- -rm: delete files / blank folders
- -cat: View the contents of the file
- -text: The source file is output in text format
- -mkdir: create a blank folder
- -put: upload files
- -help: View command help
Case: Shell collects data regularly into HDFS
step:
- Configure environment variables: Set java environment variables and hadoop environment variables in the shell script
(the purpose of this is to improve the reliability of the system and ensure that the machine running the program can still run the script without configuring environment variables) - Prepare log storage directory and file to be uploaded: set a log storage directory and file to be uploaded directory in the shell script
- Set the long path of log files
- Implement file upload: Move the file to the directory to be uploaded first, and then upload it from the directory to be uploaded to HDFS (use Linux Crontab expression to perform scheduled tasks * * * * * time sharing day, month and week)
- Execution program displays the results
1. Build the project environment (new Maven project)
The following error is reported because Maven is not installed (you can also click ok first, don't care)
Install Maven
Click Download Maven to
download and unzip it into a folder, configure the environment variables. After
adding to the Path
configuration, open cmd and view the version information.
Under the maven installation directory, create a new directory
setting file to modify the warehouse path.
Open eclipse
configuration and restart eclipse, just Alright
Nodes connected to hadoop cluster
After the new maven project is built, you may get an error, don't worry.
Configure pom.xml file
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.itcast</groupId>
<artifactId>HadoopDemo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
After the configuration is complete, the required jar package will be automatically loaded, and you will not get an error.