Build Hadoop environment on Ubuntu (click mode + pseudo-distribution mode)
Build Hadoop environment on Ubuntu (click mode + pseudo-distribution mode)
1. Install Ubuntu virtual machine
Ubuntu download address: Ubuntu official website
The following is a VM virtual machine installed ubuntu
fool can be installed
here need to take note
2. Add hadoop user to system user
After installing Ubuntu, we officially started to configure hadoop-add a named hadoop to the system user, specifically for Hadoop testing
~$ sudo addgroup hadoop
~$ sudo adduser --ingroup hadoop hadoop
Then enter the following command to make the newly created hadoop account have the highest authority
~$ sudo gedit /etc/sudoers
In the opened file, add the following line under root.
hadoop ALL=(ALL:ALL) ALL
Here you need to pay attention: if you manually type this line of content, hadoop is followed by /t, which is the Tab of the keyboard. If the input here is wrong, it will cause a big problem in the Ubuntu system.
Then save and exit
3. Install ssh
Next, configure the ssh service so that the system can log in remotely.
Install ssh first
~$ sudo apt-get install openssh-server
After ssh installation is complete, start the service first
~$ sudo /etc/init.d/ssh start
After startup, you can use the following command to check whether the service is started correctly
~$ ps -e | grep ssh
Next we have to set up to log in without password, generate private key and public key
~$ ssh-keygen -t rsa -P ""
Because I already have a private key, I will be prompted whether to overwrite the current private key. You will be prompted to enter the password during the first operation, and press Enter directly. At this time, two files will be generated under ~/home/{username}/.ssh: id_rsa and id_rsa.pub, the former is the private key and the latter is the public Key, now we append the public key to authorized_keys (authorized_keys is used to save all the public key content of users who are allowed to log in to the ssh client as the current user):
~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
After you can log in to ssh to confirm, you do not need to enter a password when logging in. If there is a problem with the subsequent jps, it means that the ssh passwordless login setting failed
~$ ssh localhost
drop out
~$ exit
Then we log in again and try to see if the passwordless login is successful
~$ ssh localhost
The result is the same as above, so we don’t need a password to log in to ssh in the future
4. Install java environment
~$ sudo apt-get install openjdk-8-jdk
~$ java -version
Here openjdk-8-jdk is the jdk that comes with our ubuntu, the download is successful as shown in the figure below (compare according to your own jdk version)
5. Install hadoop
Install hadoop on the hadoop official website,
then unzip it and put it in the directory you want, mine is in /usr/local/hadoop
~$ sudo tar xzf hadoop-1.0.2.tar.gz
~$ sudo mv hadoop-1.0.2 /usr/local/hadoop
Next, give relevant permissions to the hadoop user created earlier, otherwise the hadoop user cannot configure the files under this directory.
~$ sudo chown -R hadoop:hadoop /usr/local/hadoop
6. Configure single click mode
Next, configure this to bashrc, enter the following command
~$ sudo gedit ~/.bashrc
The system will open Gedit, and append the following to the end of the file
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH= PATH: PATH :PATH:HADOOP_INSTALL/bin
export PATH= P A T H : PATH: PATH:HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME= H A D O O P I N S T A L L e x p o r t H A D O O P C O M M O N H O M E = HADOOP_INSTALL export HADOOP_COMMON_HOME= HADOOPINSTALLexportHADOOPCOMMONHOME=HADOOP_INSTALL
export HADOOP_HDFS_HOME= H A D O O P I N S T A L L e x p o r t Y A R N H O M E = HADOOP_INSTALL export YARN_HOME= HADOOPINSTALLexportYARNHOME=HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR= H A D O O P I N S T A L L / l i b / n a t i v e e x p o r t H A D O O P O P T S = " − D j a v a . l i b r a r y . p a t h = HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path= HADOOPINSTALL/lib/nativeexportHADOOPThePTS="−Djava.library.path= HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
Then use the following command to make the configuration effective. Then use the following command to make the configuration effective.
~$ source ~/.bashrc
Then set hadoop-env.sh
~$ sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 (视你机器的java安装路径而定)
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin
Let the environment variable configuration take effect source
~$ source /usr/local/hadoop/conf/hadoop-env.sh
At this point, the stand-alone mode of hadoop is complete! ! !
Run the example WordCount that comes with hadoop to experience the following MapReduce process
. Create a new input folder in the hadoop directory.
First we locate the hadoop directory
~$ cd /usr/local/hadoop
Then create the input directory
~$ mkdir input
If it doesn't work, use sudo to increase permissions
Copy the README.txt folder to the input folder
~$ sudo cp README.txt input
Finally run the command
~$ sudo bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.10.0-sources.jar org.apache.hadoop.examples.WordCount input output
Let's run and see the results
~$ cat output/*
7. Configure pseudo-distribution mode
First create a few folders in the hadoop directory
~$ sudo mkdir /usr/local/hadoop/tmp
~$ sudo mkdir /usr/local/hadoop/hdfs
~$ sudo mkdir /usr/local/hadoop/hdfs/name
~$ sudo mkdir /usr/local/hadoop/hdfs/data
Configure yarn-env.sh
~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-env.sh
Add the following content:
#export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
configuration core-site.xml
~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
Insert the following content.
Note: Be sure to delete some of the original </configuration> in core-site.xml, otherwise an error will occur when formatting. That is, there is only one pair of <configuration </configuration> in the .xml file.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
Also modify the configuration file hdfs-site.xml to
insert the following content: (delete the original <configuration)
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.dir</name>
<value>file:/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.dir</name>
<value>file:/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
Similarly, configure yarn-site.xml
Finally, we enter the command to modify the permissions of hadoop
~$ sudo chmod 777 -R /usr/local/hadoop
8. Finally we check whether hadoop is installed successfully
Start HDFS in distributed mode,
you need to format the namenode every time you boot
~$ hdfs namenode -format
Start hdfs
~$ start-all.sh
Show progress
~$ jps
There are 6 processes that are correct
. Enter http://localhost:50070/ in the browser, and the following page appears.
Enter http://localhost:8088/, and the following page appears