1 download hadoop
Official website: http://hadoop.apache.org/
2 Install hadoop
The official document: http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/SingleCluster.html
the hadoop archive into /app
a folder
cd /opt
tar -zxvf hadoop-3.1.0.tar.gz -C /app
Let's switch to the app directory and modify the name of the hadoop folder. (This step can be done or not)
mv hadoop-3.1.0 hadoop3.1
2.1 Configure Hadoop environment
Next we begin to configure the Hadoop development environment.
Let's build a single-node cluster and configure a pseudo-distribution, why not do it distributed?
In fact, the distributed configuration is similar to the pseudo-distributed configuration, except that the number of distributed machines is increased, and the others are no different. Therefore, it is better to build pseudo-distributed as Hadoop learning, but we will build a real distributed environment later.
2.1.1 Set up SSH password-free login
We need to log in to the host and slave frequently when operating the cluster later, so it is necessary to set up SSH password-free login.
Enter the following code:
ssh-keygen -t rsa -P ''
Generate a passwordless key pair, ask for the save path and directly enter and press Enter to generate a key pair: id_rsa
and id_rsa.pub
, which are stored in the ~/.ssh
directory by default .
Next: id_rsa.pub
add it to the authorized key.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Then modify the permissions:
chmod 600 ~/.ssh/authorized_keys
Then you need to enable RSA authentication, start the public key private key pairing authentication method:
vim /etc/ssh/sshd_config
if it prompts that the authority is insufficient, add it before the command sudo
;
modify the ssh
configuration:
RSAAuthentication yes # 启用 RSA 认证
PubkeyAuthentication yes # 启用公钥私钥配对认证方式
AuthorizedKeysFile %h/.ssh/authorized_keys # 公钥文件路径
Restart SSH (you can restart in your own virtual machine locally, but you can’t restart on the platform, and you don’t need to. After restarting, you won’t be able to connect to the command line!)
service ssh restart
2.1.2 Configure Hadoop files
6 files in total
hadoop-env.sh;
yarn-env.sh ;
core-site.xml;
hdfs-site.xml;
mapred-site.xml;
yarn-site.xml。
2.1.3 hadoop-env.sh configuration
The two env.sh
files are mainly JDK
the location of the configuration
Tip: If you forget the location of the JDK, enter echo $JAVA_HOME
you can see, oh.
First we switch to the hadoop
directory
cd /app/hadoop3.1/etc/hadoop/
Edit hadoop-env.sh
Insert the following code in the file:
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/app/jdk1.8.0_171
2.1.4 yarn-env.sh configuration
Edit and yarn-env.sh
insert the following code:
export JAVA_HOME=/app/jdk1.8.0_171
2.1.5 core-site.xml placement
This is the core configuration file we need to add in the file HDFS
are URI
and NameNode
temporary files folder location, this temporary folder is created below. Add the following code to
the configuration
tag at the end of the file :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
</configuration>
2.1.6 hdfs-site.xml file configuration
replication
Refers to the number of replicas, we are now a single node, so yes 1
.
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/hadoop/hdfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/hadoop/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
2.1.7 mapred-site.xml file configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2.1.8 yarn-site.xml placement
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.2.10:8099</value>
<description>这个地址是mr管理界面的</description>
</property>
</configuration>
2.1.9 Create folder
We configured in the configuration file in some folder path, and now we have to create them, /usr/hadoop/
use the directory hadoop
user operation, build tmp
, hdfs/name
, hdfs/data
directory, execute the following command:
mkdir -p /usr/hadoop/tmp
mkdir /usr/hadoop/hdfs
mkdir /usr/hadoop/hdfs/data
mkdir /usr/hadoop/hdfs/name
2.2 Add Hadoop to environment variables
vim /etc/profile
Insert the following code at the end of the file:
export HADOOP_HOME=/app/hadoop3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Finally make the changes take effect:source /etc/profile
2.3 Verification
Now that the configuration work is basically done, the next step is to complete:
- Format
HDFS
files, - Start
hadoop
, - Just verify
Hadoop
.
2.4 Format
Before using Hadoop
, we need to format some hadoop
basic information.
Use the following command:
hadoop namenode -format
start up
Next we start Hadoop:
start-dfs.sh
Enter the command and the interface as shown in the figure below should appear:
This means that the startup was unsuccessful, because the root
user can't start it yet hadoop
. Let's set it up.
In the /hadoop3.1/sbin
next cd /app/hadoop3.1/sbin
path: .
Will start-dfs.sh
, stop-dfs.sh
two top of the file, add the following parameters
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Also,start-yarn.sh
, stop-yarn.sh
the top also need to add the following:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
Start again start-dfs.sh
, and finally enter the command jps
to verify, the following interface represents the successful start:
If after you is a graphical interface, you can open the Firefox browser input in the graphical interface of your virtual machine: http: // localhost: 9870 / or windows Enter http://virtual machine ip address:9870/ on the machine to access the management page of hadoop.