How to quickly create a Hadoop cluster? Detailed steps

1. Upload the compressed package and unzip it

2. View the compression methods and local libraries supported by hadoop

3. Modify the configuration file

(1)hadoop-env.sh

(2)core-site.xml

(3)hdfs-site.xml

(4)mapred-site.xml

(5)yarn-site.xml

4. Configure the environment variables of hadoop

5. Format the cluster

6. Start the cluster

7. Stop the cluster


I started three servers together, so the following articles will appear first and second, I hope readers can distinguish themselves.

It is recommended that each server execute it once to deepen understanding

1. Upload the compressed package and unzip it

 

Upload our recompiled hadoop package that supports snappy compression to the first server and decompress it;

The first machine executes the following command

cd /kkb/soft/
tar -xzvf hadoop-3.1.4.tar.gz -C /kkb/install

2. View the compression methods and local libraries supported by hadoop

 

bin/hadoop checknative

If openssl is false, then all machines can install openssl online, execute the following command, and the virtual machine can be installed online after it is connected to the Internet

sudo yum -y install openssl-devel

3. Modify the configuration file

 

(1)hadoop-env.sh

 

vim hadoop-env.sh
export JAVA_HOME=/qinluyu/install/jdk1.8.0_141

(2)core-site.xml

 

vim core-site.xml
<configuration>
   <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/qinluyu/install/hadoop-3.1.4/hadoopDatas/tempDatas</value>
    </property>
    <!--  缓冲区大小,实际工作中根据服务器性能动态调整;默认值4096 -->
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
    <!--  开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟;默认值0 -->
    <property>
        <name>fs.trash.interval</name>
        <value>10080</value>
    </property>

 <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
    </property>
</configuration>

(3)hdfs-site.xml

 

vim hdfs-site.xml
<configuration>
     <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>node01:9868</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>node01:9870</value>
    </property>
    <!-- namenode保存fsimage的路径 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///qinluyu/install/hadoop-3.1.4/hadoopDatas/namenodeDatas</value>
    </property>
    <!--  定义dataNode数据存储的节点位置,实际工作中,一般先确定磁盘的挂载目录,然后多个目录用,进行分割  -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///qinluyu/install/hadoop-3.1.4/hadoopDatas/datanodeDatas</value>
    </property>
    <!-- namenode保存editslog的目录 -->
    <property>
        <name>dfs.namenode.edits.dir</name>
        <value>file:///qinluyu/install/hadoop-3.1.4/hadoopDatas/dfs/nn/edits</value>
    </property>
    <!-- secondarynamenode保存待合并的fsimage -->
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:///qinluyu/install/hadoop-3.1.4/hadoopDatas/dfs/snn/name</value>
    </property>
    <!-- secondarynamenode保存待合并的editslog -->
    <property>
        <name>dfs.namenode.checkpoint.edits.dir</name>
        <value>file:///qinluyu/install/hadoop-3.1.4/hadoopDatas/dfs/nn/snn/edits</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
	<property>
        <name>dfs.blocksize</name>
        <value>134217728</value>
    </property>
</configuration>

(4)mapred-site.xml

 

vim mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>
        <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

(5)yarn-site.xml

 

vim yarn-site.xml
<configuration>
   
<property>
       <name>yarn.resourcemanager.hostname</name>
        <value>node01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

     <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
	<property>
        	<name>yarn.log.server.url</name>
        	<value>http://node01:19888/jobhistory/logs</value>
	</property>
	<property>
        	<name>yarn.log-aggregation.retain-seconds</name>
        	<value>25920000</value>
	</property>

</configuration>

4. Configure the environment variables of hadoop

 

sudo vim /etc/profile
export HADOOP_HOME=/qinluyu/install/hadoop-3.1.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile

5. Format the cluster

 

To start a Hadoop cluster, both HDFS and YARN clusters need to be started.

Note: When you start HDFS for the first time, you must format it. Essentially some cleanup and preparation, since HDFS still doesn't physically exist at this point.

hdfs namenode -format

6. Start the cluster

 

start-dfs.sh
start-yarn.sh
# 已过时mr-jobhistory-daemon.sh start historyserver
mapred --daemon start historyserver

7. Stop the cluster

 

stop-dfs.sh
stop-yarn.sh 
# 已过时 mr-jobhistory-daemon.sh stop historyserver
mapred --daemon stop historyserver

 

Guess you like

Origin blog.csdn.net/qinluyu111/article/details/123647335