centos7 configuration hadoop, hbase, pseudo-distribution

Download javajdk8 and configure the environment

1. Download hadoop and unzip it

Apache Hadoop

tar -zxf hadoop包名

2. Enter the decompressed hadoop configuration directory

cd ./hadoop包名/etc/hadoop

3. Configuration file <configuration></configuration> Note: In addition to the path configuration file, hadoop is the host name and can be modified by itself

core-site.xml      

<!-- 设置默认使用的文件系统 Hadoop支持file、HDFS、GFS、ali|Amazon云等文件系统 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop:8020</value>
</property>
 
<!-- 设置Hadoop本地保存数据路径 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/hadoop</value>
</property>
 
<!-- 设置HDFS web UI用户身份 -->
<property>
    <name>hadoop.http.staticuser.user</name>
    <value>root</value>
</property>
 
<!-- 整合hive 用户代理设置 -->
<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
 
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>
 
<!-- 文件系统垃圾桶保存时间 -->
<property>
    <name>fs.trash.interval</name>
    <value>1440</value>
</property>

hdfs-site.xml

<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop:9868</value>
</property>

mapred-site.xml

<!-- 设置MR程序默认运行模式: yarn集群模式 local本地模式 -->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
 
<!-- MR程序历史服务地址 -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>hadoop:10020</value>
</property>
 
<!-- MR程序历史服务器web端地址 -->
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>hadoop:19888</value>
</property>
 
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
 
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
 
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
        <name>mapreduce.application.classpath</name>        
        <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib-examples/*</value>      
</property>

yarn-site.xml

<!-- Site specific YARN configuration properties -->


 
<!-- 设置YARN集群主角色运行机器位置 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop</value>
</property>
 
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
 
<!-- 是否将对容器实施物理内存限制 -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
 
<!-- 是否将对容器实施虚拟内存限制。 -->
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
 
<!-- 开启日志聚集 -->
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
 
<!-- 设置yarn历史服务器地址 -->
<property>
    <name>yarn.log.server.url</name>
    <value>http://hadoop:19888/jobhistory/logs</value>
</property>
 
<!-- 历史日志保存的时间 7天 -->
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>
 
<!--每个磁盘的磁盘利用率百分比-->
<property>
    <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
    <value>95.0</value>
 </property>
 <!--集群内存-->
 <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>
 <!--调度程序最小值-分配-->
<property>
   <name>yarn.scheduler.minimum-allocation-mb</name>
   <value>2048</value>
</property>
 <!--比率,具体是啥比率还没查...-->
<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
</property>

Fill in the host names of several hosts in the workers file, such as:

hadoop
hadoop1
hadoop2
hadoop3

 configure sbin

cd ../../sbin

Configure the second line of start-dfs.sh and stop-dfs.sh

HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

Configure the second line of start-yarn.sh and stop-yarn.sh

RN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
YARN_RESOURCEMANAGER_USER=root

Configure environment variables

vim /etc/profile
#hadoop
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

4. Shut down and clone 3 virtual machines

Rename each machine

vim /etc/hostname

Modify the hosts ip address of all virtual machines to change by yourself

vim /etc/hosts

 Shut down and restart all

If hadoop is formatted on the host,  if the system is cloned to someone else, delete the dsf and logs folders under the hadoop package and re-initialize

hadoop namenode -format

5. Test hadoop

start-all.sh

in browser

Enter hostname: 9870

          Hostname: 8088

See if the two web pages are normal, how many computers and how many nodes

 

 6. Download the hbase package

Apache HBase – Apache HBase Downloads

I chose version 2.5 and version 3. My computer problem is still hbase optimization and lack of system tables

tar -zxf hbasr包名

7. Configure hbase

cd ./hbase/conf

hbase-env.sh second line

export JAVA_HOME=/opt/javajdk
# 因为hbase自带的有zk 这里true 是使用 false 是用的外部的
export HBASE_MANAGES_ZK=true
export HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP=true

hbase-site.xml <configuration></configuration> Pay attention to change the host name directory


<!--开启分布式-->
        <!-- HBase数据在HDFS中的存放的路径 -->
        <property>
            <name>hbase.rootdir</name>
            <value>hdfs://hadoop:8020/hbase</value>
        </property>
        <!--Hbase的运行模式。false是单机模式,true是分布式模式。
        若为false,Hbase和Zookeeper会运行在同一个JVM里面 -->
        <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
        </property>

 <![CDATA[
           
       注释内容(包含其他注释符)
   <!-- ZooKeeper的地址 -->
        <property>
            <name>hbase.zookeeper.quorum</name>
            <value>hadoop,hadoop1,hadoop2,hadoop3</value>
        </property>

       ]]>
       <!-- ZooKeeper的地址 -->
        <property>
            <name>hbase.zookeeper.quorum</name>
            <value>hadoop</value>
        </property>




        <!-- ZooKeeper快照的存储位置 -->
        <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/opt/hbase/apache-zookeeper-3.6.0-bin/data</value>
        </property>
        <!--  V2.1版本,在分布式情况下, 设置为false -->
        <property>
            <name>hbase.unsafe.stream.capability.enforce</name>
            <value>true</value>
        </property>
<!-- acl权限 -->
<property>
   <name>hbase.superuser</name>
   <value>hadoop</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>    
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.coprocessor.master.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.rpc.engine</name>
  <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
   <name>hbase.security.authorization</name>
   <value>true</value>
</property>

regionservers change hostname

8. Add hbase environment variables

vim /etc/profile

 

#hbase
export HBASE_HOME=/opt/hbase
export PATH=$HBASE_HOME/bin:$PATH
#刷新环境
source /etc/profile

 

9. Test hbase   

start-all.sh
start-hbase.sh

Enter the host name in the browser: 16010

Guess you like

Origin blog.csdn.net/weixin_58503231/article/details/130032971