Hadoop framework model is the basis for large data, large data should not only talk about the business environment in favor of large data (such as the supermarket to buy baby diapers at the same time should also recommend the classic case of beer), as a solution manager, is not the lack of technology otherwise, there is suddenly tour of the suspects. :) do solutions manager, technology + business, personal understanding, technology should account for 60% of business accounted for 40%, in fact, when it comes to business customers know better than us, so technology is very important. Earlier we talked about the environment is too large data structures, today we use a single cloud host (or self vmware virtual machine) were all practical components of Hadoop, Big Data deepen technical details again.
First, we set up a pseudo-distributed environment hadoop
The so-called pseudo-distributed, i.e. the hadoop dfs of namenode, datanode are in a cloud host. Today we use ubuntu 14.04 test, host configuration to 2cpu, 2GB of memory, 40GB hard drive, flexible IP configuration.
Second, for ease of operation, the new account hadoop
1、sudo useradd -m hadoop -s /bin/bash
This command creates hadoop account and create a / home / hadoop directory, and use the / bin / bash as Shell
2、sudo passwd hadoop
This command hadoop account password
3、sudo adduser hadoop sudo
Hadoop user administrator privileges to increase, facilitate future deployments
3. For ease of operation, to achieve dense Master Free login
1、sudo apt-get update
Upgrade the system, in order to prepare for the future installation of java jdk
2, free password generation native, in preparation for the hadoop
ssh localhost after a successful login exit
cd ~/.ssh/
ssh-keygen -t rsa
cat ./id_rsa.pub>>./authorized_keys
ssh localhost if normal, will achieve free secret landing
Fourth, install java environment
1, sudo apt-get install openjdk-7-jre openjdk-7-jdk jdk installed file to 201M
2, dpkg -L openjdk-7-jdk | grep 'bin / javac' obtained java execution path / usr / lib / jvm / java-7-openjdk-amd64
3, configuration variables JAVA_HOME
vim ~/.bashrc
In the beginning of the file increases
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
! After wq save and exit
4, the environment variables to take effect
source ~/.bashrc
5, java -version, if output version, the normal installation
V. Installation pseudo-distributed environment hadoop
1, download hadoop2.7.6.tar.gz from the http://mirrors.cnnic.cn/apache.hadoop/common
2, uploaded to the cloud host by rz command
3, install hadoop
sudo tar -zxf hadoop-2.7.6.tar.gz -C /usr/local
cd /usr/local/
sudo mv ./hadoop-2.7.6/ ./hadoop renamed the folder name
sudo chown -R hadoop ./hadoop modify the file permissions for the user hadoop
4. Check hadoop version
cd /usr/local/hadoop
./bin/hadoop version
Sixth, pseudo-distributed configuration hadoop environment
If it is a stand-alone environment, no configuration to use, but can not be used stand-alone environment hdfs function, so we are configured in pseudo-distributed.
1, the configuration file core-site.xml
cd /usr/local/hadoop/etc/hadoop
vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2, hdfs-site.xml configuration file
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
3, Formatting hdfs environment
cd /usr/local/hadoop
./bin/hdfs namenode -format
4, the configuration file mapreduce
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5, configuration files yarn
vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
~
Seven, the process starts all
./sbin/start-dfs.sh first start hdfs
./sbin/stop-dfs.sh turn off hdfs, to produce a secondary process
./sbin/start-dfs.sh start hadoop
./sbin/start-yarn.sh start yarn
./sbin/mr-jobhistory-daemon.sh start historyserver turn history server in order to view the task operation in the web
Eight, to view dfs using the web interface, mapreduce
1, http: //118.121.206.238: 50070 elastic ip View dfs
2, http: //118.121.206.238: 8088 ip View mapreduce elastic
I hope this article can help you.
More real-time updates, visit public number.
Click here to get the highest ¥ 1888 Ali cloud offering universal vouchers