Hadoop big data platform construction and application (1) Hadoop installation

Foreword:

Main application scenarios of hadoop:

Data analysis platform;

Recommended system;

The underlying storage system of the business system;

Business monitoring system.

Practical applications: e-commerce, energy mining, energy saving, online travel, fraud detection, image processing, IT security, etc.

The junior school opened a course on the construction and application of hadoop, and the textbook was printed in March 2020. With the update and continuous development of the network, the construction process and application will continue to change in the future, and the learning process will be written here as a memory.

This section builds tools and environment:

vm virtual machine, ubuntu system, JAVA package (), hadoop package ()

Auxiliary tools: xshell6 and xftp  (for the use of the two tools, please find my blog by yourself  )

Preliminary environmental preparation:

CPU: Turn on virtual machine cpu virtualization

Network: open NAT mode

Real host network: configure the network card and NAT mode to match

The following content uses offline installation tutorial

Set up the virtualization of the cpu:

Turn on the network card:

Set up the network card:

Set up the NAT network in the virtual machine:

The xshell tool and xftp establish a connection with the server. (Convenient for our later operation)

Use the tool to create a folder on the server and upload files (the hbase compressed package here is temporarily stored in the /home directory, and will be explained later in the hbase configuration process)

To achieve the construction of hadoop, the environment configuration of the jdk package is required:

Decompress the jdk file package just uploaded, without changing the decompression directory

root@user01:/home# cd jdk/

root@user01:/home/jdk# ls

jdk-8u171-linux-x64.tar.gz

root@user01:/home/jdk# tar -xzvf jdk-8u171-linux-x64.tar.gz

Configure jdk environment variables:

export JAVA_HOME=/home/jdk/jdk1.8.0_171

export CLASSPATH=$JAVA_HOME/lib/

export PATH=$JAVA_HOME/bin:$PATH

export PATH JAVA_HOME CLASSPATH

root@user01:/home/jdk# vi /root/.bashrc

(Add the above variables at the end of the file)

Make environment variables take effect:

root@user01:/home/jdk# source /root/.bashrc

Check whether the jdk is installed properly:

java -version //查看java的版本

echo $JAVA_HOME //校验变量值

The preparations for the next preliminary work are completed. Ready to install hadoop


root@user01:/home# cd hadoop/

root@user01:/home/hadoop# ls

hadoop-2.7.3.tar.gz

root@user01:/home/hadoop# tar -xzvf hadoop-2.7.3.tar.gz

Configure hadoop environment variables:

root@user01:/home# vi /root/.bashrc

Configure Hadoop search path

vi /root/.bashrc

export HADOOP_HOME=/home/hadoop/hadoop-2.7.3

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Variable takes effect:

root@user01:/home# source /root/.bashrc

Check whether the installation is normal:

root@user01:/home# hadoop version

Roughly displayed information:

Hadoop 2.7.3

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff

Compiled by root on 2016-08-18T01:41Z

Compiled with protoc 2.5.0

From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4

This command was run using /home/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar

Pseudo-distributed mode configuration:

Modify Hadoop configuration file

1. Modify /home/hadoop/hadoop-2.7.3/etc/hadoop/hadoop-env.sh

vi hadoop-env.sh

export JAVA_HOME=/home/jdk/jdk1.8.0_171  //添加的变量语句

Delete the cursor highlight path area of ​​the environment variable before the fluorescent green:

Paste the jdk path to this location:

Pseudo-distributed configuration

2. Configure /home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml, add the following content

vi core-site.xml

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://localhost:9000</value>

</property>

<property>

  <name>hadoop.tmp.dir</name>

  <value>file:/home/hadoop/tmp</value>

  <description>Abase for other temporary directories.</description>

</property>

<property>

  <name>dfs.permissions</name>

  <value>false</value>

</property>

Fill in the above code into the position shown in the picture

3. Configure /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

vi hdfs-site.xml

<property>

  <name>dfs.replication</name>

  <value>1</value>

</property>

<property>

  <name>dfs.namenode.name.dir</name>

  <value>file:/home/hadoop/tmp/dfs/name</value>

</property>

<property>

  <name>dfs.datanode.data.dir</name>

  <value>file:/home/hadoop/tmp/dfs/data</value>

</property>

4. Format the namenode

hadoop namenode –format

Check whether it is successful:

Successfully formatted and Exiting with status0 indicate success, and Exiting with status1 indicates formatting errors

5. Manually start namenode, datanode, secondarynamenode

start-dfs.sh

root@user01:/home/hadoop/hadoop-2.7.3/etc/hadoop# start-dfs.sh

Starting namenodes on [localhost]

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.

Are you sure you want to continue connecting (yes/no)? ys

Please type 'yes' or 'no': yes

localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

root@localhost's password:

localhost: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-user01.out

root@localhost's password:

localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-user01.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.

Are you sure you want to continue connecting (yes/no)? yes

0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.

[email protected]'s password:

0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-user01.out

6. Manually start the yarn management port

start-yarn.sh

7.jps view the started process

8. Enter the URL http://localhost:50070 in the browser to test

to sum up:

The overall installation process may be too cumbersome. The teacher tried to build it after teaching it once. When I first learned Hadoop, I didn’t know how to understand the structure and principle of Hadoop. When I asked the teacher what Hadoop can do, the teacher said that you will know in the future. Maybe this is not my major course, hahaha.

I checked Baidu, and blogs written by other big guys, and found that a big guy likened hadoop to cutting vegetables and cooking, so the abstract metaphor is particularly easy to understand.

Be a rookie and learn with an open mind.

Guess you like

Origin blog.csdn.net/qq_43575090/article/details/108798742