1. Stand-alone mode
1. Create a new virtual machine
Install centos-12 system, default root user
2. Create a new user
useradd centos, set password
Grant centos user permissions in the /etc/sudoers file, chmod -v u+w /etc/sudoers; centos ALL=(ALL) ALL;
chmod -v uw /etc/sudoers; centos users need to add sudo to execute commands;
3. Configure the network and set the hostname
Download xshell and xftp to connect to the virtual machine, and transfer the jdk and hadoop installation packages to the virtual machine
Decompression: tar -xzvf , copy the decompressed files (decompression is installed) to your own directory: jdk: /usr/java hadoop: /opt/
5. Configure the environment /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_80
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
HADOOP_HOME=/opt/hadoop-2.7.5PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
To make the configuration file take effect, the source command
6. hadoop fs -ls View hadoop files is native, that is, stand-alone mode
2. Pseudo-distributed mode
1. Operate on the basis of stand-alone mode, under /opt/hadoop-2.7.5/etc/, copy a pseudo-distributed file: cp -R hadoop hadoop_virtual
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop_tmp</value>
</property>
vi hdfs-site.xml :
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
vi mapred-site.xml :
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
4. Configure password-free login ssh
ssh-keygen -t rsa: generate public key and key
public key and key path: ls ~/.ssh , copy the public key to authorized_keys under /home/centos/.ssh/, cat id_rsa.pub >> authorized_keys
settings File permissions password-free operation: chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys
5. Format HDFS and restart
hdfs purpose-format
6. Start the Hadoop cluster
/opt/hadoop-2.7.5/sbin/start-all.sh , jsp command verification
1. Operate on the basis of pseudo-distributed mode, under /opt/hadoop-2.7.5/etc/, copy a fully distributed file: cp -R adoop_virtual hadoop_cluster
2. Clone three virtual machines (s1, s2, s3), configure the network and user name (this is because the mac address has changed due to migration, and the ip has also changed. The solution will be explained at the end)
3. Configure /etc/hosts in the host (s0) so that each machine can communicate with each other, ping to verify
192.168.176.137 s0
192.168.176.138s1
192.168.176.139s2
192.168.176.140s3
sudo scp /etc/hosts [email protected]:/etc/ , ssh s1 verification Each host can log in to each other without password
5. Host configuration files under /opt/hadoop-2.7.5/etc/hadoop_cluster (similar to pseudo-distribution, s0 master node namenode, s1, s2 slave node datanode, s3 is a copy of namenode)
<property>
<name>fs.defaultFS</name>
<value>hdfs://s0/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop_tmp</value>
</property>
vi hdfs-site.xml :
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
vi mapred-site.xml :
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<name>yarn.resourcemanager.hostname</name>
<value>s0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
6.配置主机/opt/hadoop-2.7.5/etc/hadoop_cluster 下的slaves文件为:
s1
s2
7.远程拷贝主机(s0)中的/opt/hadoop-2.7.5/etc/hadoop_cluster到各个机器:
sudo scp -r hadoop_cluster centos@s3:/opt/hadoop-2.7.5/etc/ (已存在文件夹会覆盖); ssh切换到各 个机器查看文件是否更改
8.格式化HDFS,重启
hdfs namenode -format
9. 启动Hadoop集群
/opt/hadoop-2.7.5/sbin/start-all.sh , jsp命令验证(显示的与伪分布式是有区别的,s0上显示namenode相 关的,s1、s2显示datanode相关)