1. Configure the environment version
Upload the data to Baidu Cloud and pick it up by yourself: Link: https://pan.baidu.com/s/1evVp5Zk0_X7VdjKlHGkYCw Extraction code: ypti
copy this content and open the Baidu Netdisk mobile phone App, the operation is more convenient
(The apache version of Hadoop 2.6.4 was installed before, and an error was reported when hive was started, but it was changed to CDH again)
2. Configuration work before installation
2.1 Install jdk
(1) Download jdk
(2) Unzip, and configure environment variables in the /etc/profile file
export JAVA_HOME=/home/jdk1.8.0_131
export PATH=${JAVA_HOME}/bin:${PATH}
2.2 SSH password-free login
ssh-keygen
Change according to the path of the file:
cp /root/.ssh/id_rsa.pub /root/.ssh/authoried_keys
Test with the command:
ssh localhost
2.3 mysql installation (needed in hive environment)
You can refer to the rookie tutorial: https://www.runoob.com/linux/mysql-install-setup.html
My database is remote, need to configure mysql remote connection
2.4 Configure IP
To set /etc/hosts, both servers need to be changed. Mine are two, one master and one data. Nothing is written in the brackets.
IP地址 hostname (master)
IP地址 hostname (data)
3. Install Hadoop
(1) Download files
(2) Decompress to the server separately and set environment variables
Environment variable configuration:
export HADOOP_HOME=/home/hadoop-2.6.0-cdh5.15.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
#export YARN_CONF_IR=/home/hadoop-2.6.4/etc/hadoop
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Remember to use source /etc/profile to take effect! ! !
(3) Configuration file
- Configure the master server
Enter the hadoop file directory /etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
Enter the hadoop file directory /etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop-2.6.0-cdh5.15.1/hadoop_data/hdfs/namenode</value>
</property> -->
</configuration>
Enter the hadoop file directory /etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
</configuration>
Enter the hadoop file directory /etc/hadoop/yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>
Create a new masters file in the hadoop file directory /etc/hadoop/, and type master
Create a slave file in the hadoop file directory /etc/hadoop/ and type data (if there are multiple data servers, write them separately, such as data1, data2, data3)
- Configure data server
Enter the hadoop file directory /etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
Enter the hadoop file directory /etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop-2.6.0-cdh5.15.1/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
Enter the hadoop file directory /etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
</configuration>
Enter the hadoop file directory /etc/hadoop/yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>
(4) Start
Enter the hadoop file directory/sbin, start start-all.sh, or start-dfs.sh and start-yarn.sh respectively
(5) View
- The master server starts the NameNode node:
- data server, start the DataNode node:
4. Install Hbase
(1) Download Hbase to decompress
(2) Configure environment variables
export HBASE_HOME=/home/hbase-1.2.0-cdh5.15.1
export PATH=$PATH:$HBASE_HOME/bin
(3) Configuration file
Enter the Hbase installation directory/conf/hbase-env.sh, change
Enter the Hbase installation directory/conf/hbase-site.xml, change
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:/home/hbase-1.2.0-cdh5.15.1/hbase_data</value>
</property>
</configuration>
(4) Start
Type hbase shell
5. Install Hive
(1) Download Hive to decompress
(2) Configure environment variables
export HIVE_HOME=/home/hive-1.1.0-cdh5.15.1
export PATH=:$JAVA_HOME/bin:$MAVEN_HOME/bin:$FINDBUGS_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$PATH
(3) Configuration file
Enter the Hive installation directory/conf/hive-env.sh, change
export HADOOP_HOME=/home/hadoop-2.6.0-cdh5.15.1/
export HBASE_HOME=/home/hbase-1.2.0-cdh5.15.1
Enter the Hive installation directory/conf/hive-site.sh, change
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://IP地址:3306/hive?createDatabaseIfNotExsit=true;characterEncoding=utf8&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
The remote connection to the mysql database is set up. The hive database is the default and cannot be changed. It needs to be created in mysql in advance.
(4) Start
Enter hive to start
If the terminal Jline package error is reported, the jline package under the hadoop file directory /share/hadoop/yarn/lib/ needs to be consistent with the jline package version of the Hive installation directory /lib/! ! !
The installation has come to an end and the rest will continue!