hadoop协作框架(一)—sqoop的安装,配置和使用

版本选择和下载

CDH
http://archive.cloudera.com/cdh5/

  • CDH 5.3.x 版本,非常的稳定,好用 cdh-5.3.6,各个版本之间的依赖和兼容不用
    • hadoop-2.5.0-cdh5.3.6.tar.gz
    • hive-0.13.1-cdh5.3.6.tar.gz
    • zookeeper-3.4.5-cdh5.3.6.tar.gz
    • sqoop-1.4.5-cdh5.3.6.tar.gz
  • 下载地址
    http://archive.cloudera.com/cdh5/cdh/5/

安装和配置

  • 创建cdh目录和赋权
# mkdir /opt/cdh/cdh-5.3.6
# chown -R hadoop:hadoop /opt/cdh/cdh-5.3.6/
  • 解压
tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz  -C /opt/cdh/cdh-5.3.6/
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/cdh/cdh-5.3.6/
  • 配置

    • 配置hadoop-env.sh,mapper-env.sh,yarn-env.sh的JAVA_HOME
    export JAVA_HOME=/usr/java/jdk1.8.0_111/
    • 配置core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop01:8020</value>
        </property>
    
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/tmp</value>
        </property>
    
        <property>
            <name>fs.trash.interval</name>
            <value>420</value>
        </property>
    
    </configuration>
    • 配置hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop01:50090</value>
        </property>
        <property>
            <name>dfs.namenode.http-address</name>
            <value>hadoop01:50070</value>
        </property>
    
    </configuration>
    • 配置slaves
    hadoop01
    • 格式化
    bin/hdfs namenode -format
    • 配置yarn-site.xml
    <configuration>
    
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop01</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <!--nodemanager resources-->
        <property>
            <name>yarn.nodemanager.resources.memory-mb</name>
            <value>4096</value>
        </property>
        <property>
            <name>yarn.nodemanager.resources.cpu-vcores</name>
            <value>4</value>
        </property>
        <!--日志聚集-->
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>640800</value>
        </property>
    
    </configuration>
    • 配置mapperduce-site.xml
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop01:10020</value>
        </property>
    
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop01:19888</value>
        </property>
    
    </configuration>
    • 修改环境变量

    • 清除tmp目录下

    • 启动

      • ./hadoop-daemon.sh start namenode

      • sbin/hadoop-daemon.sh start datanode

      • ./yarn-daemon.sh start resourcemanager

      • ./yarn-daemon.sh start nodemanager

      • ./mr-jobhistory-daemon.sh start historyserver

  • 配置hive

    • 配置hive-env.sh:配置HADOOP_HOME/HIVE_CONF_DIR
    HADOOP_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/
    export HIVE_CONF_DIR=/opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/conf/
    • 配置log4j
    hive.log.dir=/opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/
    hive.log.file=hive.log
    • 配置hive-site.xml
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
      <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
      <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
      <!-- Hive Execution Parameters -->
    
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/user/hive/warehouse</value>
            <description>location of default database for the warehouse</description>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://192.168.100.10:3306/metadata?createDatabaseIfNotExist=true</value>
            <description>JDBC connect string for a JDBC metastore</description>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>Driver class name for a JDBC metastore</description>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>root</value>
            <description>username to use against metastore database</description>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>123456</value>
            <description>password to use against metastore database</description>
        </property>
    
        <property>
            <name>hive.cli.print.header</name>
            <value>true</value>
            <description>Whether to print the names of the columns in query output.</description>
        </property>
    
        <property>
            <name>hive.cli.print.current.db</name>
            <value>true</value>
            <description>Whether to include the current database in the Hive prompt.</description>
        </property>
    
          <property>
            <name>javax.jdo.option.Multithreaded</name>
            <value>true</value>
            <description>Set this to true if multiple threads access metastore through JDO concurrently.</description>
          </property>
    
    
    </configuration>
    
    
    • 拷贝mysql的依赖包
cp mysql-connector-java-5.1.42.jar /opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/lib/
- 创建hive的元数据存储目录

```
bin/hdfs dfs -mkdir -p /usr/hive/warehouse/
bin/hdfs dfs -chmod g+w /usr/hive/warehouse/
```

- 测试 bin/hive
  • 配置sqoop

    • 配置hadoop_home,在sqoop_env.sh配置
    export HADOOP_COMMON_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/
    • 配置hive_home,在sqoop_env.sh配置
    export HADOOP_MAPRED_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/
    • sqoop命令

    • 拷贝依赖jar包

    • list-database

    bin/sqoop list-databases \
    --connect jdbc:mysql://hadoop01:3306 \
    --username root \
    --password 123456
    • import、设置目录import
    bin/sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --username root \
    --password 123456 \
    --table my_user
    • 设置hdfs目录
    bin/sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --username root \
    --password 123456 \
    --table my_user \
    --target-dir /user/beifeng/sqoop/imp_my_user \
    --num-mappers 1
    • import启动过程分析

      • 将语句转换成java类
      • 编译成jar包
      • 运行mapreduce任务
    • export

发布了91 篇原创文章 · 获赞 27 · 访问量 9万+

猜你喜欢

转载自blog.csdn.net/xhwwc110/article/details/80790171