Ubuntu+Hadoop+Mysql+Hive+Sqoop

Environment configuration instructions:

software version
VirtualBox 6.1
Ubuntu 16.04
Hadoop 2.7.7
MySql 5.7.29
MySql driver 5.1.46
Hive 2.3.6
sqoop 1.4.7

A, hadoop cluster configuration and HDFS commands to learn

Reference:
"Python + + spark2.0 hadoop machine learning and Big Data combat" the first 2-6 chapters
(the book in a search online pdf version can be found to fool teaching, followed by a step by step installation.)

1, book with a different installation configuration instructions:

(1)VirtualBox 6.1

This is the latest version, my computer is 64-bit, so download the 6.1 version, if your computer is 32, should go to download version 5.2 which corresponds to 32-bit software.

(2)Ubuntu 16.04

The latest version is 18.04, but do not know stable unstable, it is safe installed 16.04. I set up a user name zdx, follow the command line as long as there zdx, is my name, do not follow Luanqiao, you need to replace your own name.

(3)Hadoop 2.7.7

I downloaded the latest version of the hadoop3.X, faced a variety of errors in the subsequent environment configuration, such as hadoop not start, the more change the more wrong, took a long time, so I finally dead horse a living horse medicine, on for the low version, no problem.

(4) hadoop cluster

Virtual machine name IP addresses
master 192.168.56.114
data1 192.168.56.111
data2 192.168.56.112
data3 192.168.56.113

2, problems encountered during installation and solutions:

Note that is the biggest mistake a first encounter is close SSH login problem-free.
After setting up the books along with ssh, we must remember execute the command ssh localhost, check if successful. If you could lose your password, it shows set to fail. On my computer set up ssh-free dense failed follow-up question is: This issue does not affect the start hadoop pseudo-distributed, but will directly affect the start hadoop cluster, the virtual machine will not start datanode, and I hadoop change this problem on a cluster, but also to a greater wrong, because we have to change four virtual machines, and change the contents of the host and the guest is not the same, a lot of trouble, the more trouble the more mistakes, the more mistakes the more trouble, so be sure from the outset ensures ssh can avoid dense login.
This is a problem I encountered and solutions

Two, MySql5.7.29 installation

I followed the configuration of the chiefs, she wrote in great detail.
ubuntu install mysql

Three, Hive2.3.6 installation

(Only in the master installation, pre-start only to master a virtual machine, the latter need to start hadoop cluster, all commands are entered on the master)
(1) first started only master.
In order to avoid mistakes at Hive down to half appear first expansion:

ulimit -s 102400

(2) Download Hive

wget  https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.6/ apache-hive-2.3.6-bin.tar.gz

(3) decompression

sudo tar -zxvf   apache-hive-2.3.6-bin.tar.gz

(4) Mobile apache-hive-2.3.6-bin to / usr / local / hive

sudo mv apache-hive-2.3.6-bin   /usr/local/hive

(5) modify file permissions

sudo  chown  -R  zdx:zdx   usr/local/hive

(6) New tmp directory

mkdir  -p  /usr/local/hive/tmp

(7) Add the environment variable

sudo   gedit   ~/.bashrc

In the file, add:

export  HIVE_HOME=/usr/local/hive 
export  PATH=$PATH:$HIVE_HOME/bin

(8) to validate the configuration

source   ~/.bashrc

(9) change directory (directory must be switched, steps 10, 11 are in the directory)

cd  /usr/local/hive/conf

(10) the hive-env.sh.template rename hive-env.sh

sudo  mv  hive-env.sh.template  hive-env.sh

(11) found the following three positions respectively in the English capital in hive-env.sh, after deleting the front of the line #, and back = modify things:

sudo gedit  hive-env.sh

In the file, add:

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib

(12) Then you need to start hadoop cluster
to go back to the root directory, enter the command: cd
start data1, data2, data3 after
then the virtual machine in the command line input start-all.sh Master
(13) to create relevant warehouse on hdfs, and configure the permissions:

hadoop  fs  -mkdir  -p  /user/hive/warehouse 
hadoop  fs  -mkdir  -p  /user/hive/tmp 
hadoop  fs  -mkdir  -p  /user/hive/log 
hadoop  fs  -chmod  -R  777 /user/hive/warehouse
hadoop  fs  -chmod  -R  777 /user/hive/tmp
hadoop  fs  -chmod  -R  777 /user/hive/log

The main content (14) to modify the hive-site.xml

cp  hive-default.xml.template  hive-site.xml
sudo  vi  hive-site.xml

File needs to be modified or added content:

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
</property>

<property>  
    <name>hive.exec.scratchdir</name>  
    <value>/user/hive/tmp</value>  
</property>  

<property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log/hadoop</value>
    <description>Location of Hive run time structured log file</description>
</property> 

<property> 
    <name>javax.jdo.option.ConnectionURL</name> 
    <value>jdbc:mysql://192.168.56.114:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value> 
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>zdx123</value>
    <description>password to use against metastore database</description>
</property>

<property>
    <name>system:java.io.tmpdir</name>
    <value>/usr/local/hive/tmp</value>
</property>

<property>
    <name>system:user.name</name>
    <value>root</value>
</property>

(If it is not present, then add the ones you need to manually direct the end of the text, at a total of 9 need to modify or add places, change the code as above)
(However, javax.jdo.option.connectionURL below you want my "192.168.56.114" into your own MySQL address;
javax.jdo.option.ConnectionUserName below you put my login name "root" into your own MySQL database login name;
javax.jdo.option.ConnectionPassword below you will put my password "zzdx123" change the login password of your own MySQL database)
(add this:
vi command Daquan
to hive.metastore.warehouse.dir example:
free to enter a location in the text "/" in / rear input hive.metastore.warehouse.dir, carriage return, if the text in several hive.metastore.warehouse.dir, press n down one by one to find, to find out that you want to modify, press x to delete, press a is edited by: W is stored, according to: Q is the exit)
(15) arranged drive (JDBC MySQL, as used herein, version 5.1.46), command line download:

sudo  wget  https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz

(16) decompression:

sudo  tar  -zxvf   mysql-connector-java-5.1.46.tar.gz

(17) is moved to the hive / lib directory

sudo  cp  mysql-connector-java-5.1.46/mysql-connector-java-5.1.46/ mysql-connector-java-5.1.46-bin.jar     /usr/local/hive/lib

(18) initialization

cd   /usr/local/hive/bin
schematool    -dbType    mysql    -initSchema 

(See display schemaTool completed on screen, it shows the success of the initialization)
(19) to start hive under / usr / local / hive / bin

hive

Content and after (20) The fifth step refer to the chiefs wrote to test the hive is installed correctly
start the hive and testing

Four, Sqoop1.4.7 installation

(Start hadoop cluster, after only operate on Master)
(1) In order to avoid under Hive lower half segment error, the first expansion:

ulimit   -s   102400

(2) Download Sqoop

wget  http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

(3) decompression

sudo  tar  -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

(4) moving the position

sudo  mv  sqoop-1.4.7.bin__hadoop-2.6.0  /usr/local/sqoop

(5) empowerment of sqoop directory

sudo  chown  -R  zdx:zdx  /usr/local/sqoop

(6) Add the environment variable

sudo  gedit  ~/.bashrc

Modify the contents of the file:

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/hadoop-2.7.3

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/hadoop-2.7.3

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/hive

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=

(The above content to be modified in accordance with the actual situation. In addition, you also installed Hbase Zookeeper and then have to add configuration after the equals sign)

(7) copy the dependent packages $ SQOOP_HOME / lib, the following two main copy files.

cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46/ mysql-connector-java-5.1.46-bin.jar   /usr/local/sqoop/lib 
cp /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar /usr/local/sqoop/lib 

(8) to modify / usr / local / sqoop / bin / configure-sqoop

sudo  gedit  /usr/local/sqoop/bin/configure-sqoop

Commented HCatalog, Accumulo check, unless you are ready to use components on HADOOP HCatalog, Accumulo such as:

##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi
 

#if[ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi
 

#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
#  if [ -z "${HIVE_CONF_DIR}" ]; then
#   TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
#  fi
#  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
 

#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#fi

(9) Test the connection to the mysql
first make sure that mysql is running:

sudo  service  mysql  start

Then test connectivity:

sqoop list-databases --connect jdbc:mysql://192.168.56.114:3306/ --username root -P

(192.168.56.114 is the IP address of my master, and it is my MySql address, you need to replace your MySql address;
attention to the final P is capitalized)
enter the password if the database can be displayed on your mysql, it said it had communication, as shown below:

At this point the configuration to it! ! !

Finally, special thanks to these heavyweights:
https://www.cnblogs.com/opsprobe/p/9126864.html
https://segmentfault.com/a/1190000011303459
https://www.cnblogs.com/standingby/p/10039974 .html
https://www.cnblogs.com/harrymore/p/9056863.html

Published an original article · won praise 0 · Views 7

Guess you like

Origin blog.csdn.net/weixin_43931044/article/details/104903746