Summary of the big data comprehensive experiment (Lin Ziyu)

Here is the source of the
experiment Comprehensive Experiment One
Comprehensive Experiment Two

Experimental environment description

The experimental environment I use is: Hadoop2.7.7 + Hive3.1.2 + ZooKeeper3.6.1 + HBase1.4.13 + Sqoop1.4.6

I have been learning according to teacher Lin Ziyu’s blog before. He installed hadoop 3.1.3 and Hbase 2.2.2. In this experiment, the version of hadoop and hbase could not meet the demand because of Sqoop 1.4.6. The data interconductivity of the data can't support hbase2.x, you need to replace hbase, and you need to replace the compatible hadoop with it. The above experimental environment is feasible for pro-testing

Change hadoop and hbase version and install ZooKeeper

The previous hadoop and hbase versions do not need to be deleted.
For the installation of Hive and Sqoop, please refer to teacher Lin Ziyu’s blog:
Hive installation
Sqoop installation

  1. Install hadoop 2.7.7

We choose to install Hadoop into /usr/local/:

$ sudo tar -zxf ~/下载/hadoop-2.7.7.tar.gz -C /usr/local    # 解压到/usr/local中
$ cd /usr/local/
$ sudo mv ./hadoop-2.7.7/ ./hadoop2.7            # 将文件夹名改为hadoop
$ sudo chown -R hadoop ./hadoop2.7       # 修改文件权限

Pseudo-distributed configuration:

$ sudo vi /usr/local/hadoop2.7/etc/hadoop/core-site.xml 
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop2.7/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
$ sudo vi /usr/local/hadoop2.7/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop2.7/tmp/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop2.7/tmp/dfs/data</value>
    </property>
</configuration>

After the configuration is complete, perform the formatting of the NameNode:

$ cd /usr/local/hadoop2.7
./bin/hdfs namenode -format

At this point, the configuration is completed, and you must enter the hadoop2.7 directory to start

$ cd /usr/local/hadoop2.7
./sbin/start-dfs.sh  

You can enter the web page to see: the localhost:50070
Insert picture description here
version is already 2.7.7. If you want to use 3.1.3 in the future, just go directly to the 3.1.3 directory and start it.

  1. Install hbase1.4.13 and
    unzip the installation package hbase-1.4.13-bin.tar.gz to the path /usr/local
$ cd ~
$ sudo tar -zxf ~/下载/hbase-1.4.13-bin.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv ./hbase-1.4.13 ./hbase1.4
$ sudo chown -R hadoop ./hbase1.4

Configure environment variables

$ vim ~/.bashrc

Join

export PATH=$PATH:/usr/local/hbase1.4/bin

Exit and take effect immediately

$ source ~/.bashrc

Pseudo-distributed configuration:

$ sudo vi /usr/local/hbase1.4/conf/hbase-env.sh

Here is to find the three names of JAVA_HOME, HBASE_CLASSPATH, HBASE_MANAGES_ZK in the file hbase-env.sh, then delete the "#" in front, and change it to the following. I want to use an external zookeeper so I changed HBASE_MANAGES_ZK Is false, if you use the zookeeper inside hbase, you can change it to true. JAVA_HOME is the location where java is written. The jdk I use is jdk1.8.0_241. You can modify it according to your own jdk version.

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_241
export HBASE_CLASSPATH=/usr/local/hbase1.4/conf 
export HBASE_MANAGES_ZK=false
$ sudo vi /usr/local/hbase1.4/conf/hbase-site.xml
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://localhost:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
        		<name>hbase.unsafe.stream.capability.enforce</name>
        		<value>false</value>
    	</property>
</configuration>

The configuration is complete, but I can’t start it yet, because I use an external zookeeper, so I need to start zookeeper before starting hbase.

  1. Install zookeeper 3.6.1

Note: I think the use of external zookeeper will make hbase more stable, and the problem of hbase can also be solved with external zookeeper. If it is troublesome, do not install it, it is also possible to use the zookeeper inside hbase.

Unzip the installation package zookeeper3.6.1 to the path /usr/local

$ cd ~
$ sudo tar -zxf ~/下载/apache-zookeeper-3.6.1-bin.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv ./apache-zookeeper-3.6.1 ./zookeeper
$ sudo chown -R hadoop ./zookeeper

Enter the zookeeper directory and create a directory tmp

$ cd /usr/local/zookeeper
$ sudo mkdir tmp
$ sudo chown -R hadoop ./tmp

Enter conf, copy zoo_sample.cfg and name it zoo.cfg, modify zoo.cfg

$ cd conf
$ sudo cp zoo_sample.cfg  zoo.cfg
$ sudo vi zoo.cfg
#将原来的dataDir注释掉,改成这样
dataDir=/usr/local/zookeeper/tmp

As shown in the figure:
Insert picture description here
save and exit.

Modify environment variables

$ vim ~/.bashrc
# 添加以下内容
export ZOOKEEPER_HOME=/usr/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH
# 保存退出

# 使配置生效
$ source ~/.bashrc

Zookeeper installation is complete.
Start zookeeper first: zkServer.sh start
and then hbase:

$ cd /usr/local/hbase1.4
$ ./bin/start-hbase.sh

Insert picture description here
We can also go to the website of hbase to see: localhost:16010, we
Insert picture description here
can see that it is version 1.4.13

Note: startup sequence: hadoop->zookeeper->hbase
shutdown sequence: hbase- >zookeeper->hadoop

  1. Modify the configuration of
    Sqoop Modify the following information of sqoop-env.sh
$ cd /usr/local/sqoop/conf
$ sudo vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop2.7
export HADOOP_MAPRED_HOME=/usr/local/hadoop2.7
export HBASE_HOME=/usr/local/hbase1.4

Finally, I will post my own environment variable information, and you can modify it as appropriate (may be very irregular, please ignore it, hee hee)
Insert picture description here

Summary of trampling in the experiment

  1. No data is imported into the bigdata_user table of hive.
    Import can be achieved with this sentence:
hive>load data local inpath '/usr/local/bigdatacase/dataset/user_table.txt' overwrite into table bigdata_user;

  1. I failed to use Sqoop to import data from Hive to MySQL . The error: Sqoop:Import failed: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf The
    solution is:
    (1) When you execute it every time After the failure, there are actually many temporary folders under the /tmp/sqoop-your username/compile (or /tmp/sqoop/compile, you can find it) folder, the folder name is a large string of characters, each There are .jar packages, .java and .class files generated corresponding to the table name in the folder. Copy these three files to your sqoop installation directory/lib folder to solve
    (2) Copy hive-common-3.1.2.jar in hive/lib to sqoop/lib
  2. Use Sqoop to import data from MySQL to HBase
    . If HBase uses 1.4, there will be no error, and if you import data to hbase through java api programming, you will also use version 1.4 without error.
  3. Install R language
    must follow this order to operate
    (1) add public key
$ sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 51716619E084DAB9

(2) Use vim to open the /etc/apt/sources.list file

$ sudo vim /etc/apt/sources.list 

(3) Add the mirror source of Tsinghua University in the last line of the file: (here mine is Ubuntu18.04, the corresponding R language version is bionic-cran35, different versions of Ubuntu, install different R language versions)

deb http://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/linux/ubuntu/ bionic-cran35/

(5) Exit vim and update the software source list:

$ sudo apt-get update 

(6) Install R language

$ sudo apt-get install r-base

After the installation is successful, enter R, and then install some dependent libraries, including: RMySQL, ggplot2, devtools and recarts

install.packages(‘RMySQL’)
install.packages(‘ggplot2’)
install.packages(‘devtools’)
devtools::install_github(‘taiyun/recharts’)

There were errors when installing these things, for example,
Insert picture description here
here he said let us install libcurl4-openssl-dev, then we will exit and install directly

$ sudo apt-get install libcurl4-openssl-dev

Other errors are solved in this way

The hbase exception I encountered and the solution

The abnormal phenomenon is: Starting and closing the hbase process is normal. After the hbase shell command is entered, the list command and the status command are normal, but the create't1','t1' command (create table) throws an exception: ERROR: org.apache.hadoop. hbase.PleaseHoldException: Master is initializing Check the hbase-master-service-slave.log file and there are a lot of WARN information as follows: WARN [master/hadoop72:16000:becomeActiveMaster] master.HMaster: hbase:meta,1.1588230740 is NOT online; state={ 1588230740 state=OPEN, ts=1560231893015, server=hadoop75,16020,1560231583387}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
Solution:
1. Time is not synchronized, date -R view each The time of the node machine, see if the error is within 30 seconds
2. Check the value of the parameter hbase.rootdir in hbase-site.xml, which needs to be placed in hdfs, and the host name and port number must be in the core-site.xml in hadoop The value of the parameter fs.defaultFS remains the same
3. Modify the host name mapping of each host, that is, the /etc/hosts file
4. Pay attention to the problem of version matching between hbase and hadoop, replace hadoop-xxxx.jar under hadoop with hadoop-xxxx.jar under hbase/lib --- this nonsense, after the replacement, the hbase process cannot be started completely In fact, it is OK to check the version matching table of hbase official website, no need to manually replace the jar package
5. Copy core-site.xml and hdfs-site.xml under hadoop to the hbase/conf directory
6. Clear hbase.rootdir, hbase The files in the .tmp.dir directory, delete the version-2 folder in the dataDir directory set in zookeeper/conf/zoo.cfg————I did so, but the exception still exists.
7. After the final stop-hbase.sh, Zookeeper-3.4.14/bin/zkCli.sh command to enter the zookeeper cluster client, ls / command to view zk registration information, rmr /hbase command to delete the existing hbase registration information, and then start-hbase.sh to restart the hbase process, the problem is solved .

The solution is reprinted here https://blog.csdn.net/dream_bin/article/details/88343000

So far, all the pits encountered are listed.
If there are errors, please point out in the comments. Any questions can also be posted in the comment area. Let us learn and make progress together.

Guess you like

Origin blog.csdn.net/lendsomething/article/details/106804146