Foreword:
Main application scenarios of hadoop:
Data analysis platform;
Recommended system;
The underlying storage system of the business system;
Business monitoring system.
Practical applications: e-commerce, energy mining, energy saving, online travel, fraud detection, image processing, IT security, etc.
The junior school opened a course on the construction and application of hadoop, and the textbook was printed in March 2020. With the update and continuous development of the network, the construction process and application will continue to change in the future, and the learning process will be written here as a memory.
This section builds tools and environment:
vm virtual machine, ubuntu system, JAVA package (), hadoop package ()
Auxiliary tools: xshell6 and xftp (for the use of the two tools, please find my blog by yourself )
Preliminary environmental preparation:
CPU: Turn on virtual machine cpu virtualization
Network: open NAT mode
Real host network: configure the network card and NAT mode to match
The following content uses offline installation tutorial
Set up the virtualization of the cpu:
Turn on the network card:
Set up the network card:
Set up the NAT network in the virtual machine:
The xshell tool and xftp establish a connection with the server. (Convenient for our later operation)
Use the tool to create a folder on the server and upload files (the hbase compressed package here is temporarily stored in the /home directory, and will be explained later in the hbase configuration process)
To achieve the construction of hadoop, the environment configuration of the jdk package is required:
Decompress the jdk file package just uploaded, without changing the decompression directory
root@user01:/home# cd jdk/
root@user01:/home/jdk# ls
jdk-8u171-linux-x64.tar.gz
root@user01:/home/jdk# tar -xzvf jdk-8u171-linux-x64.tar.gz
Configure jdk environment variables:
export JAVA_HOME=/home/jdk/jdk1.8.0_171
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME CLASSPATH
root@user01:/home/jdk# vi /root/.bashrc
(Add the above variables at the end of the file)
Make environment variables take effect:
root@user01:/home/jdk# source /root/.bashrc
Check whether the jdk is installed properly:
java -version //查看java的版本
echo $JAVA_HOME //校验变量值
The preparations for the next preliminary work are completed. Ready to install hadoop
root@user01:/home# cd hadoop/
root@user01:/home/hadoop# ls
hadoop-2.7.3.tar.gz
root@user01:/home/hadoop# tar -xzvf hadoop-2.7.3.tar.gz
Configure hadoop environment variables:
root@user01:/home# vi /root/.bashrc
Configure Hadoop search path
vi /root/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Variable takes effect:
root@user01:/home# source /root/.bashrc
Check whether the installation is normal:
root@user01:/home# hadoop version
Roughly displayed information:
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
Pseudo-distributed mode configuration:
Modify Hadoop configuration file
1. Modify /home/hadoop/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
vi hadoop-env.sh
export JAVA_HOME=/home/jdk/jdk1.8.0_171 //添加的变量语句
Delete the cursor highlight path area of the environment variable before the fluorescent green:
Paste the jdk path to this location:
Pseudo-distributed configuration
2. Configure /home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml, add the following content
vi core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Fill in the above code into the position shown in the picture
3. Configure /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
vi hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/tmp/dfs/data</value>
</property>
4. Format the namenode
hadoop namenode –format
Check whether it is successful:
Successfully formatted and Exiting with status0 indicate success, and Exiting with status1 indicates formatting errors
5. Manually start namenode, datanode, secondarynamenode
start-dfs.sh
root@user01:/home/hadoop/hadoop-2.7.3/etc/hadoop# start-dfs.sh
Starting namenodes on [localhost]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.
Are you sure you want to continue connecting (yes/no)? ys
Please type 'yes' or 'no': yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
localhost: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-user01.out
root@localhost's password:
localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-user01.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
[email protected]'s password:
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-user01.out
6. Manually start the yarn management port
start-yarn.sh
7.jps view the started process
8. Enter the URL http://localhost:50070 in the browser to test
to sum up:
The overall installation process may be too cumbersome. The teacher tried to build it after teaching it once. When I first learned Hadoop, I didn’t know how to understand the structure and principle of Hadoop. When I asked the teacher what Hadoop can do, the teacher said that you will know in the future. Maybe this is not my major course, hahaha.
I checked Baidu, and blogs written by other big guys, and found that a big guy likened hadoop to cutting vegetables and cooking, so the abstract metaphor is particularly easy to understand.
Be a rookie and learn with an open mind.