Hadoop cluster construction +hive installation

1. System environment

VMware-workstation:VMware-workstation-full-16.2.3

ubuntu: ubuntu-21.10

hadoop:hadoop2.7.2

mysql:mysql-connector-java-8.0.19

jdk: jdk-8u91-linux-x64.tar (note that if it is the linux version, because the virtual machine is created in the linux system)

hive:hive1.2.1

Tips:

Right click to paste

2. Create a virtual machine

1. Just select typical

2. Import the ubuntu image file:

3. Remember the user name, which involves the following file path: (recommended to use hadoop)

4. Disk size, just choose the default 20GB, not less than 15GB, otherwise there may be problems when connecting remotely

5. Clone and generate child nodes slave1, slave2...

to choose完整克隆

The password of the cloned virtual machine is the same as that of the original virtual machine

3. Modify the virtual machine

Modify hostname and hostname resolution files

step

1.sudo vi /etc/hostname

Just modify it to the name of your own virtual machine: master, slave1, etc.

(requires reboot to take effect)

The first time you use sudo, you need to enter a password, which is the password when creating a virtual machine

2.sudo vi /etc/hosts

Edit the host name resolution file, fill in the corresponding node addresses and corresponding host names

Paste the text similar to the format below into the original file (each node needs to)

192.168.40.133 master

192.168.40.132 slave1

First you need to ifconfigcheck the current ip address

If ifconfigthe command cannot be found, follow the error prompt to install it

question

1. vim cannot input

Reason: ubuntu installs the vim-tiny version by default, and there is no old version of the vi editor. It is a minimized version of vim and only contains a small number of functions

You can go to the folder to see the file, is it as follows:

If so, just reinstall vim

Reload command:

sudo apt-get remove vim-common(Uninstall the old version)

sudo apt-get install vim(install new version)

4. Set up SSH

1. ps -e|grep ssh: Check whether the ssh service is installed (it was not installed at the beginning)

If installed:

There is sshd indicating that the ssh-server is started

2 sudo apt-get install ssh.: install ssh

3. The node generates a public key and private key pair:

ssh-keygen -t rsa:generate

cat .ssh/id_rsa.pub >> .ssh/authorized_keys: import public key

cd .ssh cat id_rsa.pub: view public key

What you should end up seeing should look like this:

This step needs to be performed by each node

4. The child node transmits the key to the master node

scp .ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa1.pub(Hadoop here is to fill in the user name when creating a virtual machine)

5. The master node sets password-free login

cat id_rsa1.pub >> .ssh/authorized_keys(This is the key passed from the child node)

6. The master node returns the child nodes:

scp .ssh/authorized_keys hadoop@slave1:/home/hadoop/.ssh/authorized_keys(the name of the child node)

7. Verify ssh password-free login:

execute ssh+hostname(as ssh master)

Verify that you can log in directly without entering a password

5. Configure the cluster

1. Create the following folders

/home/hadoop/data/namenode
/home/hadoop/data/datanode
/home/hadoop/temp

(can be created directly in the folder:

Right click new)

2. Decompress hadoop2.7.2 and jdk1.8 under hadoop path on Master

If the compressed package cannot be directly dragged into the folder, it can be uploaded to the virtual machine from the windows system through the following command:

First windows+R opens the command line

Then:

scp E:\python+hive\hadoop-2.7.2.tar.gz hadoop@192.168.40.133:/home/hadoop

path to file in windows username host ip

Take hadoop as an example, other compressed packages are similar

After the jdk is passed over, it needs to be renamed to jdk1.8

3. Unzip the gz file:

tar zxvf hadoop-2.7.2.tar.gz

(Take Hadoop as an example)

4. Modify some configuration files of the master:

  • Modify the hadoop-env.sh file:

export HADOOP_PREFIX=/home/hadoop/hadoop-2.7.2 (this is new, not in the original file)

export JAVA_HOME=/home/hadoop/jdk1.8

  • Modify the yarn-env.sh file:

export JAVA_HOME=/home/hadoop/jdk1.8

  • The following is the configuration of some xml files
修改core-site.xml文件:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property> 
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value> 
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>


修改hdfs-site.xml文件:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

修改mapred-site.xml文件:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>

修改yarn-site.xml文件:
<configuration> 
<property><name>yarn.nodemanager.aux-services</name> 
<value>mapreduce_shuffle</value></property> 
<property> 
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
<value>org.apache.hadoop.mapred.ShuffleHandler</value> 
</property><property>
<name>yarn.resourcemanager.scheduler.address</name> 
<value>master:8030</value></property> 
<property> 
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value></property> 
<property> <name>yarn.resourcemanager.address</name>
<value>master:8032</value> </property> 
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value></property>
<property><name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>

Just copy and paste the above

Note that the configuration part in the replacement source file of the configuration package should not be replaced too much. If some files are in .template format, just rename them and delete the .template

6. Modify environment variables:

sudo vi /etc/profile: Modify environment variables

export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.2.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
export JAVA_HOME=/home/hadoop/jdk1.8
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH 
export JRE_HOME=$JAVA_HOME/jre

All nodes must be configured

Direct CV is enough

7. Copy hadoop:

scp -r hadoop-2.7.2/ hadoop@slave1:/home/hadoop: Copy from the master node to other nodes

8. Check the version:

source /etc/profile: The configuration of environment variables can take effect immediately (otherwise, the virtual machine needs to be restarted)

Then:

java –version hadoop version

Check the version of jdk and hadoop

The corresponding version description can be found. The environment variable configuration is successful.

9. Start the cluster

hdfs namenode -format: format (you only need to configure it once)

start-all.sh: start hadoop cluster

The information above indicates success

Cluster startup ends here

jps: view process

You can see other processes except RunJar, indicating that the cluster started successfully

6. Install hive

Mysql

1. Install MySQL (in master)

sudo apt-get install mysql-server:server

sudo apt-get install mysql-client sudo apt-get install libmysqlclient-dev: client

The default installation is MySQL8, which will automatically generate a password

sudo cat /etc/mysql/debian.cnf: View the initial password:

It is recommended to copy and save

2. Modify the configuration file:

sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf

Comment out bind-address = 127.0.0.1

3. Login to the database:

mysql -u debian-sys-maint -p: Log in to the database, enter the above default password for the password

4. Create a user

create user 'hive'@'%' IDENTIFIED BY '123456';

grant all privileges on *.* to 'hive'@'%';

flush privileges;

Execute in sequence, the created database user name is hive, password is 123456

Hive

5. Upload apache-hive-1.2.1-bin.tar.gz, decompress and rename to hive-1.2.1

It is also possible to transfer from windows

MySQL driver package mysql-connector-java-8.0.19.jar, put the decompressed jar into the lib directory of hive

version may be different

6.hive environment variable configuration

sudo vi /etc/profile

export HIVE_HOME=/home/hadoop/hive-1.2.1
export PATH=$PATH:$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

All nodes must be configured

7. Modify the /conf/hive-env.sh file

Add the following statement:

HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export HIVE_CONF_DIR=/home/hadoop/hive-1.2.1/conf

If it is .template, delete it directly or copy a copy as .xml

template is just a template

8. Modify the hive configuration file:

① Execute the following command to create a path for HDFS storage:

hdfs dfs -mkdir -p /hive/warehouse
hdfs dfs -mkdir -p /hive/logs
hdfs dfs -mkdir -p /hive/tmp
hdfs dfs -chmod 733 /hive/warehouse 
hdfs dfs -chmod 733 /hive/logs 
hdfs dfs -chmod 733 /hive/tmp

Check for success:

② Create a local directory:

mkdir -p /home/hadoop/hive-1.2.1/hivedata/logs

③ Configuration.xml file:

cd /home/hadoop/hive-1.2.1/conf: Jump to the conf folder

配置hive-site.xml:
cp hive-default.xml.template hive-site.xml:首先复制一份hive-site.xml
然后修改hive-site.xml

<property><name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/stock?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=GMT&allowPublicKeyRetrieval=true</value></property>
<property><name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value></property>
<property><name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value></property>
<property><name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value></property>
<property><name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value></property> 
<property><name>hive.exec.scratchdir</name><value>/hive/tmp</value> 
</property> 
<!--Article The six-line stock is the name of the database, which you choose yourself; then the part after the url is relatively long, so you need to copy it all --> 


configure log4j: 
first copy two files: 
cp hive-exec-log4j.properties.template hive-exec-log4j. properties 
cp hive-log4j.properties.template hive-log4j.properties 

and then modify the configuration: 
hive.log.dir=/home/hadoop/hive-1.2.1/logs 
log4j.appender.EventCounter=org.apache.hadoop.log. metrics.EventCounter 
<!-- Both files (hive-exec-log4j.properties and hive-log4j.properties) must be modified -->

9. The master node hive starts:

schematool --dbType mysql -initSchema: start the database

If startup fails:

You can df -hlview the current memory usage, if the memory is too full:

Try the virtual machine-"Right click to open the settings-"Find the hard disk-"Extend, increase capacity-"Restart the virtual machine

If it doesn’t work, go online to search for error messages.

Note: can only be started once

Such an error message means that it cannot be restarted repeatedly, and has no effect on normal use.

hive:hive start

This is a successful start

The statement in hive is the same as mysql

exit;You can exit hive by

10. Remote connection configuration (child node slave1):

Child node configuration:

① Copy the hive-1.2.1 directory on the master to other nodes

scp -r hive-1.2.1/ hadoop@slave1:/home/hadoop

② Modify the hive-site.xml file and delete the following configuration:

• javax.jdo.option.ConnectionURL

• javax.jdo.option.ConnectionDriverName

• javax.jdo.option.ConnectionUserName

• javax.jdo.option.ConnectionPassword

Then add the following configuration:

<property> 
<name>hive.metastore.uris</name> 
<value>thrift://192.168.149.128:9083</value>, 
</property> 
<!--192.168.149.128 is the host name, modify it yourself; 9083 is the port, do not move --> 

<property> 
<name>hive.server2.thrift.bind.host</name> 
<value>**.**.**.**</value> <!-- Host address--> 
</property> 
<property> 
<name>hive.server2.thrift.port</name> 
<value>10000</value> <!--The default port can be used--> 
</property >

The remote connection configuration ends here

11. Make a remote connection:

The master starts the metastore:hive --service metastore &

& means background start

Execution jpscan see its RunJar

Child node connection: hivejust execute

The master starts hiveserver2:hive --service hiveserver2 &

Child node connection: beeline -u jdbc:hive2://192.168.40.133:10000/stock1 -n rootjust execute

Change the host address and database name by yourself, do not move the port 10000

The above two servers can be opened at the same time, and the second one is generally used more

Then you can perform hive operations on the child nodes

7. Example of hive operation:

123

Process: Create table-"Organize data-"Load data into hive-"JDBC connect hive to retrieve data

① Table creation statement:

create table fortest(time_ String,begin_ FLOAT,end_ FLOAT,low_ FLOAT,high_ FLOAT) row format delimited fields terminated by ',';

②Data format:

Different columns are separated by "," and multiple pieces of data are separated by newlines

③ Load data:

LOAD DATA LOCAL INPATH "/home/hadoop/test.txt" INTO TABLE fortest;

You need to copy the .txt file to the virtual machine folder first, just CV directly

④JDBC configuration

Coordinates to import:

 <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>1.2.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>

Configuration file:

jdbc_driver=org.apache.hive.jdbc.HiveDriver
jdbc_url=jdbc:hive2://192.168.40.133:10000/stock1
jdbc_username=hive
jdbc_password=123456

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/125712898