hadoop (1)-installation and basic use
Article Directory
- hadoop (1)-installation and basic use
- 1. Introduction
- Second, hadoop key configuration file
- 3. Preparation before hadoop installation
- Four, hadoop installation
- 4.1 Download hadoop
- 4.2 Unzip to a custom installation directory
- 4.3 Enter the installation directory
- 4.4 Modify the hadoop-env.sh file
- 4.5 Modify the core-site.xml file
- 4.6 Modify hdfs-site.xml file
- 4.7 Configure mapred-site.xml and yarn-site.xml files
- 4.8 Format hdfs file system
- 4.9 Start
- Five, hadoop page view
- 6. Basic operation
1. Introduction
1.1 Hadoop features
hadoop is a distributed system developed by Apache. In a distributed environment, it is used for storage and processing of large amounts of data.
1.2 Hadoop composition
Hadoop mainly consists of two parts, hdfs (hadoop distributed file system) distributed file system and MapReduce programming model.
- hdfs: abstracts the previous file system, the files are stored on multiple machines, but share the same address space.
- MapReduce: A data processing method that can process a large amount of data in batches, of course, non-real-time (response time depends on the amount of data processed).
Second, hadoop key configuration file
2.1 core-site.xml
Used to configure the properties of Common components
2.2 hdfs-site.xml
Used to configure hdfs attributes
2.3 mapred-site.xml sum yarn-site.xml
Used to configure MapReduce properties
2.4 hadoop-env.sh
Configure the Hadoop running environment, such as configuring jdk path, etc.
3. Preparation before hadoop installation
3.1 jdk installation
First make sure that jdk is installed, here is jdk8.
3.2 Set up password-free login
You can use the command ssh localhost to log in without password. If you cannot log in to the installation machine in ssh, you need to install it.
- sudo apt-get install ssh
- In the login user home directory, enter ssh-keygen -t rsa -P '' -f .ssh / id_rsa
- cp .ssh/id_rsa.pub .ssh/authorized_keys
- Finally, use ssh localhost to see if you can log in without password.
Four, hadoop installation
The following uses pseudo-distributed (installed on a machine to simulate a small-scale cluster) installation as an example.
4.1 Download hadoop
Download address: http://hadoop.apache.org/releases.html, the version used here is hadoop-2.7.1, that is, the installation package is hadoop-2.7.1.tar.gz
4.2 Unzip to a custom installation directory
tar -zxvf hadoop-2.7.1.tar.gz
4.3 Enter the installation directory
cd hadoop-2.7.1
# 再进入配置文件目录
cd etc/hadoop
4.4 Modify the hadoop-env.sh file
Specify the java_home directory and add the configuration as follows:
export JAVA_HOME=/usr/local/java
4.5 Modify the core-site.xml file
Modify the configuration as follows:
<configuration>
<!-- hdfs文件地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.1:9000</value>
</property>
</configuration>
4.6 Modify hdfs-site.xml file
Modify the configuration as follows:
<configuration>
<!-- hdfs的web访问地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>localhost:50070</value>
</property>
<!-- 副本数 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- hdfs文件系统元数据存储目录 -->
<property>
<name>dfs.name.dir</name>
<value>/home/china/big_data_dir/hadoop/name</value>
</property>
<!-- hdfs文件系统数据存储目录 -->
<property>
<name>dfs.data.dir</name>
<value>/home/china/big_data_dir/hadoop/data</value>
</property>
</configuration>
4.7 Configure mapred-site.xml and yarn-site.xml files
If there is no such file in the configuration directory, you can copy a copy from the template, ie cp mapred-site.xml.template mapred-site.xml,
The mapred-site.xml configuration is as follows:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
The yarn-site.xml configuration is as follows:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>work.cn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>work.cn:8088</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.1:9001</value>
</property>
</configuration>
4.8 Format hdfs file system
bin/hdfs namenode -format
4.9 Start
sbin/start-dfs.sh
sbin/start-yarn.sh
At this time, you can view the progress of the startup through jsp, there are three, as follows:
21392 NameNode
21712 SecondaryNameNode
21505 DataNode
At this point, the hadoop installation starts.
Five, hadoop page view
5.1 namenode view
Enter http: // localhost: 50070 in the browser to view.
Click Browse the file system under the Utilities drop-down box at the top of the page to view the file system in hdfs.
4.2 View other cluster applications (jobtracker)
Enter http: // localhost: 8088 in the browser to view.
6. Basic operation
6.1 General commands
The hdfs file operation (except for a few commands) is similar to the file operation commands in Linux, except that bin / hadoop fs is added in front. Such as:
#创建文件夹
bin/hadoop fs -mkdir /test
#查看文件内容
bin/hadoop fs -cat /
#查看文件列表
bin/hadoop fs -ls /
The important point here is that files are uploaded from the local to the hdfs file system and downloaded from the hdfs file system to the local.
6.2 File upload from local to hdfs file system
Commands such as:
bin/hadoop fs -copyFromLocal ~/hadoop_space/t.txt /test/
6.3 File download from hdfs file system to local
Commands such as:
bin/hadoop fs -copyToLocal /test/t.txt ~/hadoop_space/t1.txt