Hadoop pseudo-distributed cluster construction

Hadoop pseudo-distributed

1. Preparation

1. Turn off the firewall

service iptables start

    Turn on the firewall immediately, but it doesn't work after restarting.

service iptables stop

    Turn off the firewall immediately, but it doesn't work after restarting.

    The following commands are permanent operations and take effect after rebooting.

chkconfig iptables on

    Turn on the firewall and it will take effect after restarting.

chkconfig iptables off

    Turn off the firewall, it will take effect after restarting.

2. Configure the hostname

    Note that the cluster hostname where hadoop is installed cannot have underscores. Otherwise, there will be a problem that the host cannot be found, and the cluster cannot be started.

    Configure the hostname:

vim /etc/sysconfig/network
source /etc/sysconfig/network

    E.g:

NETWORKING=yes
HOSTNAME=hadoop01

    After the configuration is completed, the command interface will not be modified immediately, and you need to restart the machine to make changes.

    You can also use the following command to temporarily modify the hostname.

hostname hadoop01

3. Configure Hosts

    This function is configured for decoupling. If a specific ip address is written in each service, if the ip address of a server changes, it will be a lot of work to modify the ip again; if it is used in the service configuration The method of changing the ip to the host name, then when the ip of a server changes, you only need to change the corresponding mapping relationship in the /etc/hosts file.

vim /etc/hosts

    Fill in the content format as follows:

127.0.0.1 localhost
::1 localhost
192.168.75.150 hadoop01
其他主机和ip 主机名
……

    This configuration is the same as configuring the hosts file under Windows.

4. Configure password-free communication

    Generate your own public and private keys, and the generated public and private keys will be automatically stored in the /root/.ssh directory.

    The command is as follows:

ssh-keygen

    Copy the generated public key to the remote machines that need to communicate with each other.

    The command is as follows:

ssh-copy-id root@hadoop01

    At this point, the public key is saved in the /root/.ssh/authorized_keys file of the remote host, and the known host information is saved in known_hosts, so there is no need to enter a password when accessing again.

    Here you need to send a password-free login to the machine. When hadoop starts, you don't need to enter a password. Otherwise, you need to enter the password when you start HDFS or yarn.

ssh hadoop01

    Use the above command to connect remotely and verify that you can connect without a password.

5. Install JDK

1. Download and unzip

    Upload the jdk installation package to your own management directory through fz.

    Unzip the installation package

tar -zxvf [jdk安装包位置]

2. Configure environment variables

    Modify /etc/profile

    This file is an environment variable setting that runs every time a user logs in, and when a user logs in for the first time, this file is executed. And collect the shell settings from the configuration files in the /etc/profile.d directory.

vim /etc/profile

    Add the following configuration at the end of the file line, save and exit.

export JAVA_HOME=/home/app/jdk1.7.0_45/
export PATH=$PATH:$JAVA_HOME/bin

    Note: Remember not to blindly copy the jdk directory configured by JAVA_HOME.

3. reload config file

    Reload the profile to make the configuration take effect, the command is as follows:

source /etc/profile

    After the environment variable configuration is completed, test whether the environment variable takes effect. Use the following command to display the variable path and java information normally, and the configuration is correct.

echo $JAVA_HOME
java -version

2. Pseudo-distributed configuration

1. Download and install hadoop

    Upload the hadoop installation package to Linux's own management directory through fz, and decompress the installation package.

tar -zxvf [hadoop安装包位置]

2. Configure hadoop

    The following configuration files are all in the hadoop-2.7.1/etc/hodoop/ directory.

1. Modify hadoop-env.sh

    Open hadoop-env.sh via vim.

vim [hadoop]/etc/hadoop/hadoop-env.sh

    The main thing is to modify the path of java_home.

    In the 27th line of hadoop-env.sh, modify export JAVA_HOME=${JAVA_HOME} to be the same as the path of JAVA_HOME in the environment variable.

    Reload for changes to take effect. The command is as follows:

source hadoop-env.sh

2. Modify core-site.xml

    This file is the core configuration file, which mainly manages the configuration of the namenode and the configuration of the file storage location.

    Open the core-site.xml file through vim.

vim [hadoop]/etc/hadoop/core-site.xml

    The configuration information in this file is empty in the initial configuration, and you need to add the namenode configuration and file storage location configuration in the <configuration> tag. The configuration information is as follows:

<configuration>
<property>
<!--用来指定hdfs的master,namenode的地址-->
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<!--用来指定hadoop运行时产生文件的存放目录-->
<name>hadoop.tmp.dir</name>
<value>/home/park/work/hadoop-2.7.1/tmp</value>
</property>
</configuration>

    In the first <value> tag, you need to pay attention to the host name and fill in the host name you planned.

    In the second <value> tag, you need to plan the storage location of your own tmp directory, and fill in according to your own plan. Do not use the Linux system /tmp directory, because Linux has its own clearing mechanism for this directory, which will cause data lost.

3. Modify hdfs-site.xml

    This file is the configuration file of HDFS, which mainly configures the number of storage copies of HDFS.

    Open hdfs-site.xml via vim:

vim [hadoop]/etc/hadoop/hdfs-site.xml

    The configuration information of this file is also empty for the first time. You need to add the content to be configured in the <configuration> tag. The content of the configuration template is as follows:

<configuration>
<property>
<!--指定hdfs保存数据副本的数量,包括自己,默认为3-->
<!--伪分布式模式,此值必须为1-->
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

    The main modification of this file is also the content of the <value> tag. To install a pseudo-cluster, set the value to 1. Because there is only one server in pseudo-distribution, distributed storage cannot be realized.

4. Modify mapred-site.xml

    This directory mainly configures the running platform of maperd.

    In the /etc/hadoop directory, there is only one mapred-site.xml.template file. This file is a configuration template file. Copy it and rename it, and remove the suffix .template. The operation command is as follows:

cp mapred-site.xml.template mapred-site.xml

Open the mapred-site.xml file via vim. The command is as follows:

vim [hadoop]/etc/hadoop/mapred-site.xml

This file is configured for the first time, and the configuration content is also empty. Just fill in the following template information. This configuration is to let mapreduce run on yarn.

<configuration>
<property>
<!--指定mapreduce运行在yarn上-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

5. Modify yarn-site.xml

    This file is the core configuration file of yarn, which mainly manages the configuration of yarn and how NodeMannager obtains data.

    Open the yarn-site.xml file via vim. The command is as follows:

vim [hadoop]/etc/hadoop/yarn-site.xml

    This file is configured for the first time, and the configuration content is also empty. The configuration template is as follows:

<configuration>
<property>
<!--指定yarn的master,resourcemanager的地址-->
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<!--NodeManager获取数据的方式-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

    The value of the first <value> tag needs to be changed to the host name you planned, and the others do not need to be modified.

6. Modify slaves

    This file configures its own subordinate members.

    vim open this file, the command is as follows:

vim slaves

    Open the file and find that this is an empty file. You only need to add your own server host name to this file. Because it is a pseudo-cluster, there is only one host, so you only need to enter your own planned host name here. E.g:

hadoop01

    After the input is complete, exit and save.

3. Configure environment variables

    Like using Hadoop and java, you may need to configure environment variables by adding Hadoop environment variable information to the /etc/profile file.

1. Edit /etc/profile

    Open the /etc/profile file with the following command:

vim /etc/profile
export HADOOP_HOME=/home/park/work/hadoop-2.5.2/
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

    Note: The value of HADOOP_HOME is the Hadoop installation directory, do not blindly copy it.

    The value of PATH needs to be configured in two, one is the path of bin and the other is the path of sbin, because there are two command directories in Hadoop, and these two command directories need to be added to the environment variables.

2. reload config file

    Use the following command to reload the configuration file:

source /etc/profile

    The environment variable configuration is complete, and test whether the environment variable takes effect.

echo $HADOOP_HOME

    If the installation path information of Hadoop appears, the configuration is correct.

4. Restart linux

    Under normal circumstances, the configured Hadoop does not need to be restarted. If some configuration files in Hadoop do not take effect, you can restart the Linux service to solve this problem. Why does it need to restart the configuration file to work? The reason is temporarily unknown.

    The restart command is as follows, any of them can be used:

reboot
init 6

3. Start Hadoop

1. Format the namenode

    Before starting Hadoop, a formatting operation is required to ensure that the namenode can store data normally.

    Enter hadoop/bin and enter the command to format the namenode. The command is as follows:

hadoop namenode -format

    The previously formatted command looked like this:

hdfs namenode -format

When formatting, if you see the following output information, it proves that the formatting is successful.

Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted

 

2. Start hadoop

    There are many command files in the hadoop-2.7.1/sbin directory. You can see that when you enter this directory, you can start the entire Hadoop cluster component by executing the following command in this directory:

./start-all.sh

    After startup, enter the following command:

jps

    You can see 5 Hadoop-related processes, which proves that the startup is successful.

3. Close Hadoop

    There is also a command to close the service in the /home/app/hadoop-2.6.0/sbin directory. Enter the following command in this directory to close the entire cluster component and stop the service.

./stop-all.sh

 

Fourth, hdfs command

hadoop fs -mkdir /user/trunk

hadoop fs -ls /user

hadoop fs -lsr /user   (递归的)

hadoop fs -put test.txt /user/trunk

hadoop fs -put test.txt .  (复制到hdfs当前目录下,首先要创建当前目录)

hadoop fs -get /user/trunk/test.txt . (复制到本地当前目录下)

hadoop fs -cat /user/trunk/test.txt

hadoop fs -tail /user/trunk/test.txt  (查看最后1000字节)

hadoop fs -rm /user/trunk/test.txt

hadoop fs -rmdir /user/trunk

hadoop fs -help ls (查看ls命令的帮助文档)

 

Five, through the Hadoop Web management page

    When the Hadoop service is configured and restarted normally, you can enter the address in the following format through a browser on another PC to access the Hadoop Web management page.

    http://[server_ip]:50070

    If you can't access it, it may be that the server port 50070 is closed. Open port 50070 as follows:

service iptables status #查询防火墙状态

service iptables start #开启防火墙

iptables -I INPUT -p tcp --dport 80 -j ACCEPT #开通特定端口

iptables -I INPUT -p tcp --dport 80 -j DROP #关闭特定端口

service iptables save #保存配置

service iptables restart #重启防火墙

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324950768&siteId=291194637