[Ubuntu+Big Data] Big data development in Linux (Hadoop, MySQL, Idea, jdk, Hive)

1. Construction of big data development [system environment] in Ubuntu:

1. Development of a virtual machine with only one Linux system (not recommended):

If you need to develop in a Linux system, it is recommended to set up a separate user space for development (to avoid interference).

Why create new users:
Users are independent of each other, just like different clone systems. Creating new users facilitates our independent development work.

However, this method is not recommended for the following reasons:

Tip: In a virtual machine, it is actually not recommended to set up a second user in a virtual machine for big data development, because if the system crashes, other users will not be able to use it, so it is recommended to open a new system to independently carry out big data development.

Although it is not recommended, we still need to write down the process for your reference.


(1) Create a new user from the command line (optional):
1. Create a new user:

sudo useradd -m 用户名 -s /bin/bash

2. Set user password:

sudo passwd 用户名

3. Add administrator rights to the user (add the user to the "sudo" group):

sudo adduser 用户名 sudo

(2) Create a new user in the system settings (optional):
First find [Users] in the settings, unlock it, and then click [Add] in the upper right corner to add
Insert image description here
the administrator to facilitate our development.
Insert image description here
Then restart the virtual machine, select our newly created user, and enter the new user (just like a new system)


The above method is for reference only, but is not recommended. You should choose according to your actual situation.

2. Development of a virtual machine independent Linux system (recommended):

Open another Ubuntu in the VM virtual machine, and then develop in this system.

In this "new" system, we also need to:
(1) Update the apt tool

sudo apt-get update

(2) Use the apt tool to install the Vim editor (used for writing code):

sudo apt-get install vim

(3) Install the ssh server (the client should be installed by default)

sudo apt-get install openssh-server
//因为已经默认安装了客户端(client),所以我们只需要安装服务端(server)即可

(4) After installation, use the ssh command to try to log in (see if the installation is successful):

ssh localhost

If there is verification, just enter yes, and then log in again after downloading the configuration.
(5) Log out:

exit

Insert image description here


Set up passwordless login to the ssh server:
After exiting,
(1) use the ssh-keygen command to generate a key:

cd ~/.ssh/
ssh-keygen -t rsa

Then just keep pressing enter.

(2) Add the key to the authorization:

cat ./id_rsa.pub >> ./authorized_keys

If you don’t know how to operate it, you can refer to the following picture of how I did it:
Insert image description here
Done! ! ! As shown in the picture, now you do not need to enter a password to log in to ssh.


2. [Java environment] jdk download, installation and troubleshooting:

The big data framework relies on the Java language, so a Java development environment needs to be built in the system:

Hyperlink, click me ==》

3. [Big Data Framework Environment] Download and install the Hadoop framework

1. Baidu installation:

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Then:
(1) Unzip and install:

Note that if your directory becomes Chinese, you need (c is uppercase):
sudo tar -zxf ~/下载/hadoop-3.3.1.tar.gz -C /usr/local

sudo tar -zxf ~/Downloads/hadoop-3.3.1.tar.gz -C /usr/local

(2) Modify the file name:

cd /usr/local
sudo mv ./hadoop-3.3.1/  ./hadoop

(3) Modify file permissions:

sudo chown -R 用户名 ./hadoop

(4) Check the current version (and check whether the installation is successful):

./hadoop/bin/hadoop version

(5) Create a new folder (input) to facilitate management of our files:

cd /usr/local/hadoop
mkdir input

(6) Copy the configuration file to the new folder:

cp ./etc/hadoop/*.xml  ./input

(7) Run a Grep instance to check whether the installation is successful:

./bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep ./input ./output 'dfs[a-z.]+'

(8) View results:

cat ./output/*

2. Hadoop pseudo (a Linux system) distributed file system installation:

(1) Enter the Hadoop installation directory (usr/local/hadoop)
(2) Generally, configuration files are placed in the etc directory, so let’s look for:
Insert image description here

Insert image description here
Insert image description here
Insert image description here
(3), finally configure it like this (just copy it, but pay attention to the corresponding version name and path):

<configuration>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/usr/local/hadoop/tmp</value>
		<description>Abase for other temporary directories</description>
	</property>
		
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
	</property>
</configuration>

This is the second configuration file:

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
		
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/usr/local/hadoop/tmp/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/usr/local/hadoop/tmp/dfs/data</value>
	</property>
</configuration>

(4) Save the modifications to the above two configuration files.

Then,
(5), initialization - distributed file system (HDFS):

cd /usr/local/hadoop
./bin/hdfs namenode -format

Then
(6), start - Distributed File System (HDFS):
If the directory is not here, first cd to this
cd /usr/local/hadoop

./sbin/start-dfs.sh

(7) After startup, enter jpsto view the currently running Java process:
Insert image description here
[Optional] In addition, you can enter the following format command to view related commands that can be used in the distributed file system:
Insert image description here


Then: configuration of hdfs side (distributed file system):

1. On the hdfs side, use its command 创建文件夹目录:

./bin/hdfs dfs -mkdir -p /user/hadoop/input

2. Use the put command 上传本地的配置文件to go to the newly created folder on the hdfs side (distributed file system)

./bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input

3. Run the Grep instance to test:

./bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep /user/hadoop/input /user/hadoop/output 'dfs[a-z.]+'

4. View the results:

./bin/hdfs dfs -cat /user/hadoop/output/*

Insert image description here

Note: When you need to re-run the program, you need to delete the input folder previously created on the hdfs side:
the command to delete the folder on the distributed file system side is as follows:

./bin/hdfs dfs -rm -r /user/hadoop/input

Stop the hdfs side from running (if needed):

/usr/local/hadoop/sbin/stop-dfs.sh

4. Install Idea (Java IDE) under linux

Because you need to write Java language, this Java integrated development environment (IDE) is essential:

首先你也可以在应用商店里安装,有手就行,我就不啰嗦了。

The following only introduces the official website installation method:

(1) First, of course, Baidu idea, then enter the official website and download the community version (I don’t need to say more about this)
Insert image description here
(2) After clicking download:
Insert image description here

(3) After the download is completed, it is a tar.gz compressed package. As soon as we hear the compressed package, we should first decompress it (because we need to use it):
here we decompress it directly to the local directory.

sudo tar -zxf ~/Downloads/ideaIC-2021.2.3.tar.gz -C /usr/local

If your directory is in Chinese, you need to change it to:

sudo tar -zxf ~/下载/ideaIC-2021.2.3.tar.gz -C /usr/local

Insert image description here

(4) After decompressing, we will enter its directory and we will find that it comes with an installation method:
Insert image description here
(5) Open it and see that it says to run ./idea.sh.
It’s easy, just enter the path of the command. You can run the command:

/usr/local/idea-IC-212.5457.46/bin/idea.sh

You will see the installation interface pop up.
Usually the next step is fine.

5. [Hive environment] installation:

1. Enter the official website of Baidu: http://www.apache.org/dyn/closer.cgi/hive/
2. Enter the download page:
Insert image description here
3. Select a version:
Insert image description here
4. Download its bin binary version:
Insert image description here

5. Unzip the .tar.gz compressed package to /usr/local, and then change the file name, as before:

Copy:
(1) Unzip and install:

Note that if your directory has been changed to Chinese, you need (c is uppercase):
sudo tar -zxf ~/下载/apache-hive-1.2.2-bin.tar.gz -C /usr/local

sudo tar -zxf ~/Downloads/apache-hive-1.2.2-bin.tar.gz -C /usr/local

(2) Modify the file name (the original name is too long):

cd /usr/local
sudo mv ./apache-hive-1.2.2-bin/  ./hive

(3) Modify file permissions (give permissions):

sudo chown -R 用户名 ./hive

(4) To make it easier to use commands, set environment variables:
Open the configuration file:

vim ~/.bashrc

Enter the environment variables, save and exit:
(If you don’t know, please refer to: Link》》》 )

#定义它的路径
export HIVE_HOME=/usr/local/hive
#将他的bin路径给环境变量
export PATH=${
    
    PATH}:${HIVE_HOME}/bin
#hadoop也要给个路径:
export HADOOP_HOME=/usr/local/hadoop

Then the configuration takes effect immediately:

source ~/.bashrc

Then enter: hive command to see the effect.

(5) Modify the configuration file: hive-site.xml under /usr/local/hive/conf
and execute the following command:

cd /usr/local/hive/conf

Change the file name: delete the last .template.

mv hive-default.xml.template hive-default.xml

Of course, you can also use no command:
Insert image description here

Create a new configuration hive-site.xmlfile: use vim editor:

cd /usr/local/hive/conf
vim hive-site.xml

hive-site.xmlAdd the following configuration information in :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>

Then, colon, wq save and exit.

6. [MySql]

1. Installation:

sudo apt-get install mysql-server

2. Start mysql

service mysql start

3. Use the root command (if not, please enter sudo passwd root to set the password)

su

As shown in the picture
4. Log in to the mysql shell interface

mysql -u root -p 

Insert image description here

5. Create a new hive database.
mysql>Down:

create database hive; 

#This hive database corresponds to the hive of localhost:3306/hive in hive-site.xml and is used to save hive metadata

6. Configure mysql to allow hive access:
mysql>Next:

grant all on *.* to hive@localhost identified by 'hive'; 

#Grant all permissions to all tables in all databases to the hive user. The following hive is the connection password configured in hive-site.xml.
7. Refresh the mysql system permissions relationship table
mysql>:

flush privileges; 

8. Start hive

(1) Start the hadoop cluster first:

start-all.sh

(2) Start hive again

hive 

Guess you like

Origin blog.csdn.net/zhinengxiong6/article/details/121372560