Build a Hadoop environment on Ubuntu (stand-alone mode + pseudo-distributed mode)

Build Hadoop environment on Ubuntu (click mode + pseudo-distribution mode)

Build Hadoop environment on Ubuntu (click mode + pseudo-distribution mode)

1. Install Ubuntu virtual machine

Ubuntu download address: Ubuntu official website

The following is a VM virtual machine installed ubuntu
fool can be installed
Insert picture description hereInsert picture description hereInsert picture description hereInsert picture description hereInsert picture description here
Insert picture description herehere need to take note
Insert picture description here

2. Add hadoop user to system user

After installing Ubuntu, we officially started to configure hadoop-add a named hadoop to the system user, specifically for Hadoop testing

    ~$ sudo addgroup hadoop
    ~$ sudo adduser --ingroup hadoop hadoop

Then enter the following command to make the newly created hadoop account have the highest authority

~$ sudo gedit /etc/sudoers

In the opened file, add the following line under root.

hadoop ALL=(ALL:ALL) ALL
Note that there is a Tab behind hadoopHere you need to pay attention: if you manually type this line of content, hadoop is followed by /t, which is the Tab of the keyboard. If the input here is wrong, it will cause a big problem in the Ubuntu system.

Then save and exit

3. Install ssh

Next, configure the ssh service so that the system can log in remotely.

Install ssh first

~$ sudo apt-get install openssh-server

After ssh installation is complete, start the service first

~$ sudo /etc/init.d/ssh start

After startup, you can use the following command to check whether the service is started correctly

~$ ps -e | grep ssh

Insert picture description hereNext we have to set up to log in without password, generate private key and public key

~$ ssh-keygen -t rsa -P ""

Insert picture description hereBecause I already have a private key, I will be prompted whether to overwrite the current private key. You will be prompted to enter the password during the first operation, and press Enter directly. At this time, two files will be generated under ~/home/{username}/.ssh: id_rsa and id_rsa.pub, the former is the private key and the latter is the public Key, now we append the public key to authorized_keys (authorized_keys is used to save all the public key content of users who are allowed to log in to the ssh client as the current user):

~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

After you can log in to ssh to confirm, you do not need to enter a password when logging in. If there is a problem with the subsequent jps, it means that the ssh passwordless login setting failed

~$ ssh localhost

Insert picture description here
drop out

~$ exit

Then we log in again and try to see if the passwordless login is successful

~$ ssh localhost

The result is the same as above, so we don’t need a password to log in to ssh in the future

4. Install java environment

    ~$ sudo apt-get install openjdk-8-jdk
    ~$ java -version

Here openjdk-8-jdk is the jdk that comes with our ubuntu, the download is successful as shown in the figure below (compare according to your own jdk version)
Insert picture description here

5. Install hadoop

Install hadoop on the hadoop official website,
Insert picture description hereInsert picture description hereInsert picture description here then unzip it and put it in the directory you want, mine is in /usr/local/hadoop

    ~$ sudo tar xzf hadoop-1.0.2.tar.gz
    ~$ sudo mv hadoop-1.0.2 /usr/local/hadoop

Next, give relevant permissions to the hadoop user created earlier, otherwise the hadoop user cannot configure the files under this directory.

  ~$ sudo chown -R hadoop:hadoop /usr/local/hadoop

6. Configure single click mode

Next, configure this to bashrc, enter the following command

	~$ sudo gedit ~/.bashrc

The system will open Gedit, and append the following to the end of the file
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH= PATH: PATH :PATH:HADOOP_INSTALL/bin
export PATH= P A T H : PATH: PATH:HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME= H A D O O P I N S T A L L e x p o r t H A D O O P C O M M O N H O M E = HADOOP_INSTALL export HADOOP_COMMON_HOME= HADOOPINSTALLexportHADOOPCOMMONHOME=HADOOP_INSTALL
export HADOOP_HDFS_HOME= H A D O O P I N S T A L L e x p o r t Y A R N H O M E = HADOOP_INSTALL export YARN_HOME= HADOOPINSTALLexportYARNHOME=HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR= H A D O O P I N S T A L L / l i b / n a t i v e e x p o r t H A D O O P O P T S = " − D j a v a . l i b r a r y . p a t h = HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path= HADOOPINSTALL/lib/nativeexportHADOOPThePTS="Djava.library.path= HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
Insert picture description hereThen use the following command to make the configuration effective. Then use the following command to make the configuration effective.

	~$ source ~/.bashrc

Then set hadoop-env.sh

~$ sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 (视你机器的java安装路径而定)
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin

Insert picture description hereLet the environment variable configuration take effect source

~$ source /usr/local/hadoop/conf/hadoop-env.sh

At this point, the stand-alone mode of hadoop is complete! ! !
Insert picture description here
Run the example WordCount that comes with hadoop to experience the following MapReduce process
. Create a new input folder in the hadoop directory.
First we locate the hadoop directory

	~$ cd /usr/local/hadoop

Then create the input directory

	~$ mkdir input

If it doesn't work, use sudo to increase permissions

Copy the README.txt folder to the input folder

	~$ sudo cp README.txt input

Finally run the command

	~$ sudo bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.10.0-sources.jar org.apache.hadoop.examples.WordCount input output

Insert picture description hereLet's run and see the results

	~$ cat output/*

Insert picture description here

7. Configure pseudo-distribution mode

First create a few folders in the hadoop directory

    ~$ sudo mkdir /usr/local/hadoop/tmp
    ~$ sudo mkdir /usr/local/hadoop/hdfs
    ~$ sudo mkdir /usr/local/hadoop/hdfs/name
    ~$ sudo mkdir /usr/local/hadoop/hdfs/data

Configure yarn-env.sh

	~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-env.sh

Add the following content:
#export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Insert picture description hereconfiguration core-site.xml

	~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

Insert the following content.
Note: Be sure to delete some of the original </configuration> in core-site.xml, otherwise an error will occur when formatting. That is, there is only one pair of <configuration </configuration> in the .xml file.

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://localhost:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/hadoop/tmp</value>
		<description>Abase for other temporary directories.</description>
	</property>
</configuration>

Insert picture description hereAlso modify the configuration file hdfs-site.xml to
insert the following content: (delete the original <configuration)

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.namenode.dir</name>
		<value>file:/usr/local/hadoop/hdfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.dir</name>
		<value>file:/usr/local/hadoop/hdfs/data</value>
	</property>
</configuration>

Insert picture description hereSimilarly, configure yarn-site.xml

Insert picture description hereFinally, we enter the command to modify the permissions of hadoop

	~$ sudo chmod 777 -R /usr/local/hadoop

8. Finally we check whether hadoop is installed successfully

Start HDFS in distributed mode,
you need to format the namenode every time you boot

~$ hdfs namenode -format

Insert picture description hereStart hdfs

	~$ start-all.sh

Show progress

	~$ jps

Insert picture description hereThere are 6 processes that are correct
. Enter http://localhost:50070/ in the browser, and the following page appears.
Insert picture description hereEnter http://localhost:8088/, and the following page appears
Insert picture description here

Guess you like

Origin blog.csdn.net/WANG_g_m/article/details/106299439