Recommendations (encountered pit):
-
If they use the operating system is linux, I was a deepin system, installed two virtual machines as a result, the user name is not configured, leading to not start, because when the hadoop master node starts, the default user name is the slave master machine username. Therefore, to ensure that the three machines, the user name is the same! !
-
Debug MapReduce programs in windows will be little, the need to install winutils.
-
Build a complete recommendation (including environmental, configuration), and then clone or copy a table scp.
-
If you want even zookeeper and kafka all together, it can also directly configured. Do not install it, then use scp pass, quite area.
-
Cloning finished modifying hostname!
-
The whole process a little rough, some details of the encounter, have forgotten records. Welcome advice, if there are problems, please notify, I will promptly correct.
Virtual machine installation
-
-
Transfer files to a virtual machine: I am using scp pass quickly. Users recommend xshell, sftp under windows.
-
Network configuration of the virtual machine to the default NAT, CentOS minimum installation
Configuration:
-
Turn off the firewall
Status firewalld systemctl $ # View firewall state $ systemctl STOP firewalld # temporarily stop firewall $ systemctl disable firewalld # prohibit firewall boot
-
Close SELinux
Security-Enhanced Linux (Security-Enhanced Linux) referred to SELinux, it is a Linux kernel module, Linux is a security subsystem.
Vim $ / etc / SELinux / config # will SELINUX = enforcing changed SELINUX = disabled
-
Install ntp time synchronization service
install yum - the y-ntp # Set the boot from Kai systemctl enable ntpd.service # start systemctl enable ntpd systemctl Status ntpd # Check whether to activate
There may not start due to: the chronyd conflict:
disable chronyd systemctl : switch off, start ntp, you can
-
Modify the host name, configure a static ip
(After cloning, which should make some modifications to change the host name, ip to change what you can)
Step one: Modify / etc / sysconfig / network-scripts / ifcfg-xxx file
The main changes the following parameters:
BOOTPROTO="static" ONBOOT="yes" IPADDR="172.16.125.128" NETMASK="255.255.255.0" GATEWAY="172.16.125.2"
Modify the contents of the file / etc / sysconfig / network: The second part
Same as above
# Created by anaconda GATEWAY=172.16.125.2 DNS=172.16.125.2
The third step: restart the network
service network restart
-
Configuring ssh
master secret can log on to free slave node
$ ssh-keygen -t rsa
Open ~ / .ssh following three files
-rw-r -. r-- 1 root root 392 9 Yue 21:05 authorized_keys 26 # Certified Keys . -rw ------- 1 root root 1679 20:57 id_rsa 9 Yue 26 # private -rw-r -. r-- 1 root root 393 9 Yue 20:57 id_rsa.pub 26 # public
In the master on the public three machines into authorized_keys years. command:
$ sudo cat id_rsa.pub >> authorized_keys
The master authorized_keys placed on the other linux ~ / .ssh directory
$ sudo scp authorized_keys [email protected]:~/.ssh
Authorized_keys modify permissions, the command:
$ chmod 644 authorized_keys
Whether the test was successful
ssh host2 enter the user name and password, and then quit, ssh host2 without a password again, directly into the system. This means success.
If there is at the time of login ssh:
The authenticity of host 'hadoop2 (192.168.238.130)' can't be established
The need to modify the / etc / ssh / ssh_config configuration file, add the following two lines arranged:
StrictHostKeyChecking no UserKnownHostsFile /dev/null
-
Finally: installation configuration jdk, hadoop
core stie.xml
< The Configuration > < Property > < name > fs.defaultFS </ name > <-! Default file system location -> < value > HDFS: // Master: 9000 / </ value > </ Property > < Property > < name > hadoop.tmp.dir </ name > <-! Hadoop working directory, namenode, datanode data -> < value > / Home / WHR / Workbench / Hadoop / data / </ value > </property> </configuration>
hdfs.site.xml
< Configuration > < Property > < name > dfs.replication </ name > <-! Copy number -> < value > 2 </ value > </ Property > < Property > <-! Position of the secondary child node one configuration -> < name > dfs.namenode.secondary.http-address </ name > < value > Slave1: 50090 </ value > </property> </configuration>
mapreduce.site.xml
< The Configuration > < Property > <-! Designated mapreduce running cluster with yarn in order to achieve distributed -> < name > mapreduce.framework.name </ name > < value > yarn </ value > </ Property > < / the Configuration >
yarn.site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Slaves configuration file
slave1 slave2
Masters configuration file (CDH version, if any, I use, does not have this file)
master
clone
-
vmware cloning nothing to say
-
Cloning is complete, to modify the host name in the / etc / hostname file!
-
Add three machines ip address mapping: / etc / hosts
Start the cluster
- Format the node (after I formatted a cloned)
Hadoop the NameNode $ - format # finally appeared, that success, you can see the final status code is 0 Storage Directory / tmp / hadoop-root / the DFS / name has been successfully formatted.
-
Start: sbin to the directory (the directory can be configured into first sbin environment variable; no need to switch on the directory)
start-dfs.sh
start-yarn.sh
-
jps command
After all the startup is complete, view the results :( ssh to log in on the slave master)
You can see the master of NameNode, two DataNode, a SecondaryNameNode have Booted
yarn under the ResourceManager, two slave in NodeManager also start to complete
4. This can be accessed through a browser the port 50070
You can run about mapreduce sample program:
$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.15.2.jar pi 5 5
Hadoop configuration file
Reference profile
https://www.cnblogs.com/xhy-shine/p/10530729.html