Hadoop take pseudo-distributed cloud environments in a single host

Hadoop framework model is the basis for large data, large data should not only talk about the business environment in favor of large data (such as the supermarket to buy baby diapers at the same time should also recommend the classic case of beer), as a solution manager, is not the lack of technology otherwise, there is suddenly tour of the suspects. :) do solutions manager, technology + business, personal understanding, technology should account for 60% of business accounted for 40%, in fact, when it comes to business customers know better than us, so technology is very important. Earlier we talked about the environment is too large data structures, today we use a single cloud host (or self vmware virtual machine) were all practical components of Hadoop, Big Data deepen technical details again.

First, we set up a pseudo-distributed environment hadoop

The so-called pseudo-distributed, i.e. the hadoop dfs of namenode, datanode are in a cloud host. Today we use ubuntu 14.04 test, host configuration to 2cpu, 2GB of memory, 40GB hard drive, flexible IP configuration.

Second, for ease of operation, the new account hadoop

1、sudo useradd -m hadoop -s /bin/bash

This command creates hadoop account and create a / home / hadoop directory, and use the / bin / bash as Shell

2、sudo passwd hadoop

This command hadoop account password

3、sudo adduser hadoop sudo

Hadoop user administrator privileges to increase, facilitate future deployments

3. For ease of operation, to achieve dense Master Free login

1、sudo apt-get update

Upgrade the system, in order to prepare for the future installation of java jdk

2, free password generation native, in preparation for the hadoop

ssh localhost after a successful login exit

cd ~/.ssh/

ssh-keygen -t rsa

cat ./id_rsa.pub>>./authorized_keys

ssh localhost if normal, will achieve free secret landing

Fourth, install java environment

1, sudo apt-get install openjdk-7-jre openjdk-7-jdk jdk installed file to 201M

2, dpkg -L openjdk-7-jdk | grep 'bin / javac' obtained java execution path / usr / lib / jvm / java-7-openjdk-amd64

3, configuration variables JAVA_HOME

vim ~/.bashrc

In the beginning of the file increases

JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

! After wq save and exit

4, the environment variables to take effect

source ~/.bashrc

5, java -version, if output version, the normal installation

V. Installation pseudo-distributed environment hadoop

1, download hadoop2.7.6.tar.gz from the http://mirrors.cnnic.cn/apache.hadoop/common

2, uploaded to the cloud host by rz command

3, install hadoop

sudo tar -zxf hadoop-2.7.6.tar.gz -C /usr/local

cd /usr/local/

sudo mv ./hadoop-2.7.6/ ./hadoop renamed the folder name

sudo chown -R hadoop ./hadoop modify the file permissions for the user hadoop

4. Check hadoop version

cd /usr/local/hadoop

./bin/hadoop version

Sixth, pseudo-distributed configuration hadoop environment

If it is a stand-alone environment, no configuration to use, but can not be used stand-alone environment hdfs function, so we are configured in pseudo-distributed.

1, the configuration file core-site.xml

cd /usr/local/hadoop/etc/hadoop

vim core-site.xml

 

<configuration>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>file:/usr/local/hadoop/tmp</value>

                <description>Abase for other temporary directories.</description>

        </property>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://localhost:9000</value>

        </property>

</configuration>

 

2, hdfs-site.xml configuration file

<configuration>

        <property>

                <name>dfs.replication</name>

                <value>1</value>

        </property>

        <property>

                <name>dfs.namenode.name.dir</name>

                <value>file:/usr/local/hadoop/tmp/dfs/name</value>

        </property>

        <property>

                <name>dfs.datanode.data.dir</name>

                <value>file:/usr/local/hadoop/tmp/dfs/data</value>

        </property>

</configuration>

3, Formatting hdfs environment

cd /usr/local/hadoop

./bin/hdfs namenode -format

4, the configuration file mapreduce

vim mapred-site.xml

 

<configuration>

        <property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

        </property>

</configuration>

 

5, configuration files yarn

vim yarn-site.xml

 

<configuration>

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

</configuration>

~                   

Seven, the process starts all

./sbin/start-dfs.sh first start hdfs

./sbin/stop-dfs.sh turn off hdfs, to produce a secondary process

 

./sbin/start-dfs.sh start hadoop

./sbin/start-yarn.sh start yarn

./sbin/mr-jobhistory-daemon.sh start historyserver turn history server in order to view the task operation in the web

 

Eight, to view dfs using the web interface, mapreduce

1, http: //118.121.206.238: 50070 elastic ip View dfs

2, http: //118.121.206.238: 8088 ip View mapreduce elastic

I hope this article can help you.

More real-time updates, visit public number.    

 

Click here to get the highest ¥ 1888 Ali cloud offering universal vouchers

Guess you like

Origin blog.csdn.net/qq_29718979/article/details/90745155