版本选择和下载
CDH
http://archive.cloudera.com/cdh5/
- CDH 5.3.x 版本,非常的稳定,好用 cdh-5.3.6,各个版本之间的依赖和兼容不用
- hadoop-2.5.0-cdh5.3.6.tar.gz
- hive-0.13.1-cdh5.3.6.tar.gz
- zookeeper-3.4.5-cdh5.3.6.tar.gz
- sqoop-1.4.5-cdh5.3.6.tar.gz
- 下载地址
http://archive.cloudera.com/cdh5/cdh/5/
安装和配置
- 创建cdh目录和赋权
# mkdir /opt/cdh/cdh-5.3.6
# chown -R hadoop:hadoop /opt/cdh/cdh-5.3.6/
- 解压
tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/cdh/cdh-5.3.6/
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/cdh/cdh-5.3.6/
配置
- 配置hadoop-env.sh,mapper-env.sh,yarn-env.sh的JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.8.0_111/
- 配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>420</value> </property> </configuration>
- 配置hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop01:50090</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:50070</value> </property> </configuration>
- 配置slaves
hadoop01
- 格式化
bin/hdfs namenode -format
- 配置yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--nodemanager resources--> <property> <name>yarn.nodemanager.resources.memory-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resources.cpu-vcores</name> <value>4</value> </property> <!--日志聚集--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>640800</value> </property> </configuration>
- 配置mapperduce-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property> </configuration>
修改环境变量
清除tmp目录下
启动
./hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
./yarn-daemon.sh start resourcemanager
./yarn-daemon.sh start nodemanager
./mr-jobhistory-daemon.sh start historyserver
配置hive
- 配置hive-env.sh:配置HADOOP_HOME/HIVE_CONF_DIR
HADOOP_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/ export HIVE_CONF_DIR=/opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/conf/
- 配置log4j
hive.log.dir=/opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/ hive.log.file=hive.log
- 配置hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! --> <!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --> <!-- WARNING!!! You must make your changes in hive-site.xml instead. --> <!-- Hive Execution Parameters --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.100.10:3306/metadata?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> <property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property> <property> <name>javax.jdo.option.Multithreaded</name> <value>true</value> <description>Set this to true if multiple threads access metastore through JDO concurrently.</description> </property> </configuration>
- 拷贝mysql的依赖包
cp mysql-connector-java-5.1.42.jar /opt/cdh/cdh-5.3.6/hive-0.13.1-cdh5.3.6/lib/
- 创建hive的元数据存储目录
```
bin/hdfs dfs -mkdir -p /usr/hive/warehouse/
bin/hdfs dfs -chmod g+w /usr/hive/warehouse/
```
- 测试 bin/hive
配置sqoop
- 配置hadoop_home,在sqoop_env.sh配置
export HADOOP_COMMON_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/
- 配置hive_home,在sqoop_env.sh配置
export HADOOP_MAPRED_HOME=/opt/cdh/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/
sqoop命令
拷贝依赖jar包
list-database
bin/sqoop list-databases \ --connect jdbc:mysql://hadoop01:3306 \ --username root \ --password 123456
- import、设置目录import
bin/sqoop import \ --connect jdbc:mysql://hadoop01:3306/sqoop \ --username root \ --password 123456 \ --table my_user
- 设置hdfs目录
bin/sqoop import \ --connect jdbc:mysql://hadoop01:3306/sqoop \ --username root \ --password 123456 \ --table my_user \ --target-dir /user/beifeng/sqoop/imp_my_user \ --num-mappers 1
import启动过程分析
- 将语句转换成java类
- 编译成jar包
- 运行mapreduce任务
export