1. System environment
VMware-workstation:VMware-workstation-full-16.2.3
ubuntu: ubuntu-21.10
hadoop:hadoop2.7.2
mysql:mysql-connector-java-8.0.19
jdk: jdk-8u91-linux-x64.tar (note that if it is the linux version, because the virtual machine is created in the linux system)
hive:hive1.2.1
Tips:
Right click to paste
2. Create a virtual machine
1. Just select typical
2. Import the ubuntu image file:
3. Remember the user name, which involves the following file path: (recommended to use hadoop)
4. Disk size, just choose the default 20GB, not less than 15GB, otherwise there may be problems when connecting remotely
5. Clone and generate child nodes slave1, slave2...
to choose完整克隆
The password of the cloned virtual machine is the same as that of the original virtual machine
3. Modify the virtual machine
Modify hostname and hostname resolution files
step
1.sudo vi /etc/hostname
Just modify it to the name of your own virtual machine: master, slave1, etc.
(requires reboot to take effect)
The first time you use sudo, you need to enter a password, which is the password when creating a virtual machine
2.sudo vi /etc/hosts
Edit the host name resolution file, fill in the corresponding node addresses and corresponding host names
Paste the text similar to the format below into the original file (each node needs to)
192.168.40.133 master
192.168.40.132 slave1
First you need to ifconfig
check the current ip address
If ifconfig
the command cannot be found, follow the error prompt to install it
question
1. vim cannot input
Reason: ubuntu installs the vim-tiny version by default, and there is no old version of the vi editor. It is a minimized version of vim and only contains a small number of functions
You can go to the folder to see the file, is it as follows:
If so, just reinstall vim
Reload command:
sudo apt-get remove vim-common
(Uninstall the old version)
sudo apt-get install vim
(install new version)
4. Set up SSH
1. ps -e|grep ssh
: Check whether the ssh service is installed (it was not installed at the beginning)
If installed:
There is sshd indicating that the ssh-server is started
2 sudo apt-get install ssh
.: install ssh
3. The node generates a public key and private key pair:
ssh-keygen -t rsa
:generate
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
: import public key
cd .ssh
cat id_rsa.pub
: view public key
What you should end up seeing should look like this:
This step needs to be performed by each node
4. The child node transmits the key to the master node
scp .ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa1.pub
(Hadoop here is to fill in the user name when creating a virtual machine)
5. The master node sets password-free login
cat id_rsa1.pub >> .ssh/authorized_keys
(This is the key passed from the child node)
6. The master node returns the child nodes:
scp .ssh/authorized_keys hadoop@slave1:/home/hadoop/.ssh/authorized_keys
(the name of the child node)
7. Verify ssh password-free login:
execute ssh+hostname
(as ssh master
)
Verify that you can log in directly without entering a password
5. Configure the cluster
1. Create the following folders
/home/hadoop/data/namenode /home/hadoop/data/datanode /home/hadoop/temp
(can be created directly in the folder:
Right click new)
2. Decompress hadoop2.7.2 and jdk1.8 under hadoop path on Master
If the compressed package cannot be directly dragged into the folder, it can be uploaded to the virtual machine from the windows system through the following command:
First windows+R opens the command line
Then:
scp E:\python+hive\hadoop-2.7.2.tar.gz hadoop@192.168.40.133:/home/hadoop
path to file in windows username host ip
Take hadoop as an example, other compressed packages are similar
After the jdk is passed over, it needs to be renamed to jdk1.8
3. Unzip the gz file:
tar zxvf hadoop-2.7.2.tar.gz
(Take Hadoop as an example)
4. Modify some configuration files of the master:
- Modify the hadoop-env.sh file:
export HADOOP_PREFIX=/home/hadoop/hadoop-2.7.2 (this is new, not in the original file)
export JAVA_HOME=/home/hadoop/jdk1.8
- Modify the yarn-env.sh file:
export JAVA_HOME=/home/hadoop/jdk1.8
- The following is the configuration of some xml files
修改core-site.xml文件: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/temp</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration> 修改hdfs-site.xml文件: <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/data/datanode</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>slave1:50090</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration> 修改mapred-site.xml文件: <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration> 修改yarn-site.xml文件: <configuration> <property><name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property><property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value></property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value></property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value></property> <property><name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
Just copy and paste the above
Note that the configuration part in the replacement source file of the configuration package should not be replaced too much. If some files are in .template format, just rename them and delete the .template
6. Modify environment variables:
sudo vi /etc/profile
: Modify environment variables
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2 export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.2.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH export JAVA_HOME=/home/hadoop/jdk1.8 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$PATH export JRE_HOME=$JAVA_HOME/jre
All nodes must be configured
Direct CV is enough
7. Copy hadoop:
scp -r hadoop-2.7.2/ hadoop@slave1:/home/hadoop
: Copy from the master node to other nodes
8. Check the version:
source /etc/profile
: The configuration of environment variables can take effect immediately (otherwise, the virtual machine needs to be restarted)
Then:
java –version
hadoop version
Check the version of jdk and hadoop
The corresponding version description can be found. The environment variable configuration is successful.
9. Start the cluster
hdfs namenode -format
: format (you only need to configure it once)
start-all.sh
: start hadoop cluster
The information above indicates success
Cluster startup ends here
jps
: view process
You can see other processes except RunJar, indicating that the cluster started successfully
6. Install hive
Mysql
1. Install MySQL (in master)
sudo apt-get install mysql-server
:server
sudo apt-get install mysql-client
sudo apt-get install libmysqlclient-dev
: client
The default installation is MySQL8, which will automatically generate a password
sudo cat /etc/mysql/debian.cnf
: View the initial password:
It is recommended to copy and save
2. Modify the configuration file:
sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf
Comment out bind-address = 127.0.0.1
3. Login to the database:
mysql -u debian-sys-maint -p
: Log in to the database, enter the above default password for the password
4. Create a user
create user 'hive'@'%' IDENTIFIED BY '123456';
grant all privileges on *.* to 'hive'@'%';
flush privileges;
Execute in sequence, the created database user name is hive, password is 123456
Hive
5. Upload apache-hive-1.2.1-bin.tar.gz, decompress and rename to hive-1.2.1
It is also possible to transfer from windows
MySQL driver package mysql-connector-java-8.0.19.jar, put the decompressed jar into the lib directory of hive
version may be different
6.hive environment variable configuration
sudo vi /etc/profile
export HIVE_HOME=/home/hadoop/hive-1.2.1 export PATH=$PATH:$HIVE_HOME/bin:$PATH export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
All nodes must be configured
7. Modify the /conf/hive-env.sh file
Add the following statement:
HADOOP_HOME=/home/hadoop/hadoop-2.7.2 export HIVE_CONF_DIR=/home/hadoop/hive-1.2.1/conf
If it is .template, delete it directly or copy a copy as .xml
template is just a template
8. Modify the hive configuration file:
① Execute the following command to create a path for HDFS storage:
hdfs dfs -mkdir -p /hive/warehouse hdfs dfs -mkdir -p /hive/logs hdfs dfs -mkdir -p /hive/tmp hdfs dfs -chmod 733 /hive/warehouse hdfs dfs -chmod 733 /hive/logs hdfs dfs -chmod 733 /hive/tmp
Check for success:
② Create a local directory:
mkdir -p /home/hadoop/hive-1.2.1/hivedata/logs
③ Configuration.xml file:
cd /home/hadoop/hive-1.2.1/conf
: Jump to the conf folder
配置hive-site.xml: cp hive-default.xml.template hive-site.xml:首先复制一份hive-site.xml 然后修改hive-site.xml <property><name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/stock?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=GMT&allowPublicKeyRetrieval=true</value></property> <property><name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value></property> <property><name>javax.jdo.option.ConnectionUserName</name> <value>hive</value></property> <property><name>javax.jdo.option.ConnectionPassword</name> <value>123456</value></property> <property><name>hive.metastore.warehouse.dir</name> <value>/hive/warehouse</value></property> <property><name>hive.exec.scratchdir</name><value>/hive/tmp</value> </property> <!--Article The six-line stock is the name of the database, which you choose yourself; then the part after the url is relatively long, so you need to copy it all --> configure log4j: first copy two files: cp hive-exec-log4j.properties.template hive-exec-log4j. properties cp hive-log4j.properties.template hive-log4j.properties and then modify the configuration: hive.log.dir=/home/hadoop/hive-1.2.1/logs log4j.appender.EventCounter=org.apache.hadoop.log. metrics.EventCounter <!-- Both files (hive-exec-log4j.properties and hive-log4j.properties) must be modified -->
9. The master node hive starts:
schematool --dbType mysql -initSchema
: start the database
If startup fails:
You can df -hl
view the current memory usage, if the memory is too full:
Try the virtual machine-"Right click to open the settings-"Find the hard disk-"Extend, increase capacity-"Restart the virtual machine
If it doesn’t work, go online to search for error messages.
Note: can only be started once
Such an error message means that it cannot be restarted repeatedly, and has no effect on normal use.
hive
:hive start
This is a successful start
The statement in hive is the same as mysql
exit;
You can exit hive by
10. Remote connection configuration (child node slave1):
Child node configuration:
① Copy the hive-1.2.1 directory on the master to other nodes
scp -r hive-1.2.1/ hadoop@slave1:/home/hadoop
② Modify the hive-site.xml file and delete the following configuration:
• javax.jdo.option.ConnectionURL
• javax.jdo.option.ConnectionDriverName
• javax.jdo.option.ConnectionUserName
• javax.jdo.option.ConnectionPassword
Then add the following configuration:
<property> <name>hive.metastore.uris</name> <value>thrift://192.168.149.128:9083</value>, </property> <!--192.168.149.128 is the host name, modify it yourself; 9083 is the port, do not move --> <property> <name>hive.server2.thrift.bind.host</name> <value>**.**.**.**</value> <!-- Host address--> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> <!--The default port can be used--> </property >
The remote connection configuration ends here
11. Make a remote connection:
①
The master starts the metastore:hive --service metastore &
& means background start
Execution jps
can see its RunJar
Child node connection: hive
just execute
②
The master starts hiveserver2:hive --service hiveserver2 &
Child node connection: beeline -u jdbc:hive2://192.168.40.133:10000/stock1 -n root
just execute
Change the host address and database name by yourself, do not move the port 10000
The above two servers can be opened at the same time, and the second one is generally used more
Then you can perform hive operations on the child nodes
7. Example of hive operation:
123
Process: Create table-"Organize data-"Load data into hive-"JDBC connect hive to retrieve data
① Table creation statement:
create table fortest(time_ String,begin_ FLOAT,end_ FLOAT,low_ FLOAT,high_ FLOAT) row format delimited fields terminated by ',';
②Data format:
Different columns are separated by "," and multiple pieces of data are separated by newlines
③ Load data:
LOAD DATA LOCAL INPATH "/home/hadoop/test.txt" INTO TABLE fortest;
You need to copy the .txt file to the virtual machine folder first, just CV directly
④JDBC configuration
Coordinates to import:
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency>
Configuration file:
jdbc_driver=org.apache.hive.jdbc.HiveDriver
jdbc_url=jdbc:hive2://192.168.40.133:10000/stock1
jdbc_username=hive
jdbc_password=123456