The most detailed Hadoop+Hbase+Hive fully distributed environment construction tutorial (1)

1. Preparation

1. Preliminary installation package

I put all the packages I need on the Baidu disk:
link: https://pan.baidu.com/s/1NHxweoK7zYf5hqP1aLIHAw Extraction code: ip4c
hadoop-2.8.5.tar.gz, hbase-2.1.1-bin. tar.gz, apache-hive-2.3.4-bin.tar.gz, jdk-8u102-linux-x64.tar.gz, mysql-community-*.rpm, xshell, xftp, CentOS-7-x86_64-Minimal- 1804.ISO, mysql-connector-java-8.0.13.jar

Note, mysql is not completely necessary. Now hive comes with derby, so mysql is not needed, and it is actually easier to configure. Xshell and xftp are used to connect to the virtual machine, which is more convenient. For the virtual machine, I use vmware, and the installation package is not uploaded here. Solve it by yourself

2. Install the virtual machine

Install three dime virtual machines, and then install xshell and xftp on windows. How to install Baidu by yourself is very simple.

Static address setting

After installing the virtual machine, set the address of the virtual machine to a static address. Otherwise, the address of the virtual machine will change periodically, which will cause unnecessary trouble. If you want to set the virtual machine to a static address, first understand the gateway of the virtual machine. What's the code.
Click [Edit], [Virtual Network Editor], select nat mode, and then [nat settings] as shown in the figure
Insert picture description here
Insert picture description here

And there are 子网ip, 子网掩码, 网关ipinformation, and use them to set up a static ip.
Under the rootuser, do the following things:

vi /etc/sysconfig/network-scripts/ifcfg-ens33

Then modify it as follows:

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none" # 这里是修改过的地方
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="006888a8-6385-4dda-b8e8-9d6f89b07a4f"
DEVICE="ens33"
ONBOOT="yes"
# 下面四个是新添加的,ipaddr就是我们为虚拟机设置的静态ip,每个主机要设置不同的静态
# ip,并且前只有ip的后面那个8位可以变,就分别设成129,130,131吧
IPADDR=192.168.208.129 
GATEWAY=192.168.208.2
NETMASK=255.255.255.0
DNS1=192.168.208.2

Then restart the network service

service network restart

You can use the ip addrcommand to view the local IP as follows:

[fay@master ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:96:b6:0a brd ff:ff:ff:ff:ff:ff
    inet 192.168.208.129/24 brd 192.168.208.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::5914:336a:4dde:d580/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

You can see 192.168.208.129that the configured static ip
and then use xshell to connect to the three hosts, use xftp to upload the pre-downloaded installation package to one of the virtual machines, select it as the master, and the other two as the younger brothers slave1 and slave2, here each host configuration /etc/hostsfile as follows:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.208.129 master
192.168.208.130 slave1
192.168.208.131 slave2

Then each host creates a user fay, and then gives /optthe permissions of this directory to this user:, chown fay -R /optswitch to this user. The process of creating a user can be set when the virtual machine is installed (if it is vmware), just follow the steps, and you don’t want to use root directly. Everything behind is /optloaded.

Turn off firewall

systemctl stop firewalld
systemctl disable firewalld

Time synchronization

The time of the three virtual machines may not be synchronized, which will affect the use of HBASE. Therefore, it is recommended to synchronize the time of the three virtual machines. There are two methods, which can be used simultaneously.
The first type : Set up the virtual machine
Set up the virtual machine, find [Options] ==> [vmwaretools] ==> [Click to synchronize the client time with the host] ==> [OK]

Insert picture description here

Insert picture description here

The second: install ntp service on the virtual machine

yum install -y ntpdate
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate -u ntp.api.bz

The second type is recommended here. If you often suspend the virtual machine, the first type does not seem to have much effect, and the second type can be restored later.

Back operation by faythe user

Install java

java is required

su fay
tar -zxvf jdk-8u102-linux-x64.tar.gz -C /opt

Configure environment variables

vi ~/.bashrc
export JAVA_HOME=/opt/jdk1.8.0_102
export PATH=$PATH:$JAVA_HOME/bin

# 退出来
source ~/.bashrc 
# 测试java是否安装成功
java -version

Second, Hadoop

Keyless SSH login

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Then try to get the secret key free ssh localhost, and if possible, give your public key to the two slaves

ssh-copy-id fay@slave1
ssh-copy-id fay@slave2

For the other two slaves, execute the same command, and then hand over the public keys to the other two machines respectively. At this point, the three virtual machines can switch to each other without secret keys.

Then unzip hadoop, then enter the etc/hadoop folder to modify the configuration file

tar -zxvf hadoop-2.8.5.tar.gz -C /opt
cd /opt/hadoop-2.8.5/etc/hadoop/

Modify core-site.xml, these xml files, some have, some have template or default fields, just copy them into the file I said

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
	<property>
		<name>hadoop.tmp.dir</name>
	<value>/home/fay/tmp</value>
	</property>
</configuration>

Modify hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
</property>
<property>

        <name>dfs.datanode.max.xcievers</name>

        <value>4096</value>

      <dedication> Datanode 有一个同时处理文件的上限,至少要有4096</dedication>
</property>
<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:9001</value>
</property>
<property>  <!--设置为true,可以在浏览器中IP+port查看-->

        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
</property>
</configuration>

Modify mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
 <property>
    <name>mapreduce.jobhistory.address</name>
    <!--配置实际的主机名和端口-->
    <value>master:10020</value>
  </property>
 
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
  </property>
</configuration>

Modify yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<!--日志保存时间 默认保存3-7-->	
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<property> <!--ResourceManager 对客户端暴露的地址--> 
<name>yarn.resourcemanager.address</name> 
<value>master:8032</value> 
	</property> 
	<property> <!--ResourceManager 对ApplicationMaster暴露的地址-->  
		<name>yarn.resourcemanager.scheduler.address</name> 
		<value>master:8030</value> 
	</property> 
	<property> <!--ResourceManager 对NodeManager暴露的地址--> 
		<name>yarn.resourcemanager.resource-tracker.address</name>  
		<value>master:8031</value> 
		</property> 	
	<property> <!--ResourceManager 对管理员暴露的地址--> 
		<name>yarn.resourcemanager.admin.address</name>   
		<value>master:8033</value> 
	</property> 
	<property> <!--ResourceManager 对外web暴露的地址,可在浏览器查看-->   
		<name>yarn.resourcemanager.webapp.address</name> 
		<value>master:8088</value> 
	</property>
</configuration>

Modify yarn-env.shandhadoop-env.sh

#将这句话放到带有java_home的位置,主要是有时候它就是不认你配置的java_home环境变量
export JAVA_HOME=/opt/jdk1.8.0_102

Modify the slaves file

#删掉localhost
slave1
slave2

Add hadoop to the environment variable, modify ~/.bashrc

export JAVA_HOME=/opt/jdk1.8.0_102
export HADOOP_HOME=/opt/hadoop-2.8.5
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

then source ~/.bashrc

The other two machines are simple, just copy it directly

scp -r /opt fay@slave1:/
scp -r /opt fay@slave2:/

In theory, this step is okay, but you have insufficient permissions if you have not configured the /opt permissions before, so fix the permissions.
Then copy the environment variable file directly

scp ~/.bashrc fay@slave1:/home/fay/
scp ~/.bashrc fay@slave2:/home/fay/

Two machines are source ~/.bashrc
then masternode initialization namenode:

hdfs namenode -format

Then start hadoop

start-dfs.sh
start-yarn.sh

Enter on the master node jpsand see the following display:

[fay@master hadoop-2.8.5]$ jps
35184 SecondaryNameNode
34962 NameNode
35371 ResourceManager
35707 Jps

Two slave nodes:

[fay@slave1 ~]$ jps
16289 Jps
16035 DataNode
16152 NodeManager

At this point, hadoop has basically been installed. Of course, you may not be so smooth, and there may be errors. Then find solutions online according to the errors. If it succeeds once, it only means that I wrote too well and you are serious.
Test the Hadoop
input on the windows browser. The 192.168.208.129:8088interface shows that the yarn should not be a problem.
Insert picture description here
Of course, you still have to run the mapreduce use case that comes with Hadoop.

$ cd /opt/hadoop-2.8.5
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/fay
$ hdfs dfs -put etc/hadoop input
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z.]+'
# 没有报java错误,那就ok,在output目录查看下输出结果:
$ hdfs dfs -cat output/*

Okay, you are fine with the above, and hadoop itself should be fine. Continue to update the installation of hbase and hive later

For the installation and configuration of HBase and Hive, please refer to the most detailed Hadoop+Hbase+Hive fully distributed environment setup tutorial (2)

Guess you like

Origin blog.csdn.net/Fei20140908/article/details/83999521