A. Prepare the experimental environment
need to prepare four Linux server operating system, configuration parameters, like the best, because of my pseudo-distributed virtual machine is deployed from before, so I have the same environment, and every day is a default virtual machine Hadoop pseudo-distributed yo!
1> .NameNode server (172.20.20.228)
2> .DataNode server (172.20.20.226-220)
II. Modify Hadoop configuration file
Modify the configuration file path before me a copy of the full directory, an absolute path is: "/ tosp / opt / hadoop", after modifying the files in this directory, we will come to hadoop directory connection, when you need to pseudo-distributed or local mode only when the need to change the directory to soft connection point, thus easy three models profiles peace with the situation.
1> .core-site.xml configuration file
[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cdh14:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tosp/opt/hadoop</value> </property> </configuration> <!-- The role of core-site.xml configuration file: Used to define system-level parameters, such as HDFS URL, Hadoop temporary And means for configuring the directory profile rack-aware cluster like reference herein The number of defined core-default.xml will override the default configuration file. The role of fs.defaultFS parameters: # Declaration namenode address, equivalent to the statement hdfs file system. The role of hadoop.tmp.dir parameters: # Declare address hadoop working directory. --> [root@cdh14 ~]$
2> .hdfs-site.xml profile
[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> <!-- Role hdfs-site.xml configuration file: #HDFS related settings such as the number of copies of a file, and whether to use the block size force permissions And the like, of which the parameter defines hdfs-default.xml override the default configuration file. The role of dfs.replication parameters: # For data availability and redundancy purposes, HDFS will save the same data on multiple nodes Multiple copies of blocks, which defaults to 3. The pseudo-distributed environment in which only one node only It can save a copy, which can be defined by dfs.replication property. It is a Level backup software. --> [root@cdh14 ~]$
3> .mapred-site.xml configuration file
[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> <!-- mapred-site.xml configuration file the action: #HDFS default settings related to the number, such as reduce the task, the memory can be used The lower limits, the parameters defined herein mapred-default.xml overrides the default file default allocation. The role of mapreduce.framework.name parameters: # Specify MapReduce computational framework, there are three options, the first one: local (local), the Two kinds are classic (hadoop generation implementation framework), the third is the yarn (II implementation framework), I Here are configured with the current version of the latest computing framework yarn can be. --> [root@cdh14 ~]$
4> .yarn-site.xml profile
[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/yarn-site.xml <?xml version="1.0"?> <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>cdh14</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> <!-- Effect yarn-site.xml configuration file: # Is used to configure scheduler level parameters. The role of yarn.resourcemanager.hostname parameters: # Specifies the resource manager (resourcemanager) hostname Effect yarn.nodemanager.aux-services parameters: # Specify nodemanager use shuffle --> [root@cdh14 ~]$
5> .slaves profile
[root@cdh14 ~]$ more /tosp/opt/hadoop/etc/hadoop/slaves # The role of the configuration file: NameNode is used to record what needs to DataNode server nodes connected by sending remote commands and instructions when starting or stopping service destination host. cdh14
cdh12
cdh11
cdh10
cdh9
cdh8
cdh7 [root@cdh14 ~]$
3. In the NameNode node configuration password-free login nodes each DataNode
1> Generate public and private keys on the local pair (before generation, the delete keys last deployment pseudo-distributed)
[root@cdh14 ~]$ rm -rf ~/.ssh/* [root@cdh14 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa Generating public/private rsa key pair. Your identification has been saved in /home/root/.ssh/id_rsa. Your public key has been saved in /home/root/.ssh/id_rsa.pub. The key fingerprint is: a3:a4:ae:d8:f7:7f:a2:b6:d6:15:74:29:de:fb:14:08 root@cdh14 The key's randomart image is: + - [RS 2048] ---- + | . | | And | | o =. | | oo. | | . S . . . | | O . ... . | | . ... the | | o .. oo. . | |. yes. +++. or | +-----------------+ [root@cdh14 ~]$
2> using ssh-copy-id command to assign the public key DataNode server (172.20.20.228)
[root@cdh14 ~]$ ssh-copy-id root@cdh14 The authenticity of host 'cdh14 (172.16.30.101)' can't be established. ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@cdh14's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@cdh14'" and check to make sure that only the key(s) you wanted were added. [root@cdh14 ~]$ ssh cdh14 Last login: Fri May 25 18:35:40 2018 from 172.16.30.1 [root@cdh14 ~]$ who root pts/0 2018-05-25 18:35 (172.16.30.1) root pts/1 2018-05-25 19:17 (cdh14) [root@cdh14 ~]$ exit logout Connection to cdh14 closed. [root@cdh14 ~]$ who root pts/0 2018-05-25 18:35 (172.16.30.1) [root@cdh14 ~]$
3> using ssh-copy-id command to assign the public key DataNode server (172.20.20.226-220)
[root@cdh14 ~]$ ssh-copy-id root@chd12-cdh7 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@s102's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@s102'" and check to make sure that only the key(s) you wanted were added. [root@cdh14 ~]$ ssh s102 Last login: Fri May 25 18:35:42 2018 from 172.16.30.1 [root@s102 ~]$ who root pts/0 2018-05-25 18:35 (172.16.30.1) root pts/1 2018-05-25 19:19 (cdh14) [root@s102 ~]$ exit logout Connection to s102 closed. [root@cdh14 ~]$ who root pts/0 2018-05-25 18:35 (172.16.30.1) [root@cdh14 ~]$
Note: The above is a common sign in the dense configuration-free, consistent root user to configure, and preferably also configure the root user login-free dense, because later I will perform the appropriate shell script.
V. start the service and verify success
1> format file system
2> Start hadoop
3> Script verify whether NameNode and DataNode normal start with a custom