一、安装Hadoop

配置ssh

配置ssh就是为了能够实现免密登录，这样方便远程管理Hadoop并无需登录密码在Hadoop集群上共享文件资源。
如果机子没有配置ssh的话，在命令终端输入ssh localhost是需要输入你的电脑登录密码的。配置好ssh后，就无需输入密码了。

第一步就是在终端执行" ssh-keygen -t rsa -P ‘’ "，
之后一路enter键，当然如果你之前已经执行过这样的语句，那过程中会提示是否要覆盖原有的key，输入y即可。

home:~ root$ ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/root/.ssh/id_rsa): 
/Users/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /Users/root/.ssh/id_rsa.
Your public key has been saved in /Users/root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:5FXIbTsTcgSg4xApj2y5sY+n7+qEPKorhZaNhmioGDk [email protected]
The key's randomart image is:
+---[RSA 2048]----+
|    ..  .o.=o    |
|  . .. .  +.=    |
| . =. o . .+ o   |
|  * .o + .  +    |
|++++  . S    o   |
|EB+.             |
|B*.o             |
|=.o o            |
|*o+*o            |
+----[SHA256]-----+

第二步执行语句" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys "用于授权你的公钥到本地可以无需密码实现登录。
理论上这时候，你在终端输入ssh lcoalhost就能够免密登录了。

home:~ root$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

报错信息:

home:~ root$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

解决方法:
选择系统偏好设置->选择共享->点击远程登录

home:~ root$ ssh localhost
Last login: Fri Jan 25 17:31:54 2019 from ::1

配置环境变量

vim ~/.bash_profile
添加以下两句

export HADOOP_HOME=/Users/root/software/hadoop/hadoop3.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

生效环境变量

 source ~/.bash_profile

修改配置文件

1、hadoop2.9/etc/hadoop-env.sh文件

修改或替换以下内容
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HEAPSIZE=2000
exportHADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK-Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

2、配置NameNode主机名与端口

进入hadoop2.9/etc/core-site.xml文件

<configuration>
    <property>
       <name>hadoop.tmp.dir</name>
       <value>/Users/hadoop-2.9/tmp/hadoop-${user.name}</value>(根据情况定义当前目录)
       <description>A base for other temporary directories.</description>
    </property>
    <property>
       <name>fs.default.name</name>
       <value>hdfs://localhost:9000</value>
    </property>
</configuration>

3、配置HDFS的默认参数副本数

进入hadoop2.9/etc/hdfs-site.xml文件

<configuration>
    <property>
       <name>dfs.replication</name>
       <value>1</value>
    </property>
</configuration>

4、配置JobTracker主机名与端口

进入hadoop2.9/etc/mapred-site.xml文件

<configuration>
    <property>
       <name>mapred.job.tracker</name>
       <value>hdfs://localhost:9000</value>
    </property>
    <property>
       <name>mapred.tasktracker.map.tasks.maximum</name>
       <value>2</value>
    </property>
    <property>
       <name>mapred.tasktracker.reduce.tasks.maximum</name>
       <value>2</value>
    </property>
</configuration>

注：如果mapred-site.xml文件不存在，需要自己创建（复制mapred-site.xml.template文件对后缀名进行修改）

5、进入hadoop2.9/etc/yarn-site.xml文件

<configuration>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
</configuration>

6、格式化文件系统

进入hadoop2.9文件夹，用如下命令格式化：
bin/hdfs namenode -format (指定其安装目录的路径)

出现如下，说明成功

2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-01-25 17:58:32,575 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2019-01-25 17:58:32,575 INFO util.GSet: VM type       = 64-bit
2019-01-25 17:58:32,576 INFO util.GSet: 0.029999999329447746% max memory 3.6 GB = 1.1 MB
2019-01-25 17:58:32,576 INFO util.GSet: capacity      = 2^17 = 131072 entries
2019-01-25 17:58:32,613 INFO namenode.FSImage: Allocated new BlockPoolId: BP-137592425-192.168.11.67-1548410312604
2019-01-25 17:58:32,629 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
2019-01-25 17:58:32,642 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2019-01-25 17:58:32,760 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 409 bytes saved in 0 seconds .
2019-01-25 17:58:32,776 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-01-25 17:58:32,781 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xlc-2.local/192.168.11.67
************************************************************/

7、启动NameNode和DataNode的守护进程。

sbin/start-dfs.sh

8、启动ResourceManager和NodeManager的守护进程。

sbin/start-yarn.sh
直接启动所有服务
sbin/start-all.sh

10、验证hadoop

http://localhost:50070 打开能进入hdfs管理页面，表示启动成功。
http://localhost:8088 打开能进入hadoop进程管理页面，表示启动成功。

问题:
无法访问HDFS http://localhost:50070/
yarn访问地址正常: http://localhost:8088
解决办法:
直接使用hadoop 2.9版本安装后可正常访问。

二、安装hive3.1

1、下载

在安装hive之前需要安装mysql,由于本机已经安装了mysql,所以省略。
下载地址:https://hive.apache.org/downloads.html
当前下载为apache-hive-3.1.1-bin.tar.gz,解压后重命名为Hadoop目录，将迁移到hadoop安装目录下。

2、配置系统环境变量

vim ~/.bash_profile
export HIVE_HOME=/usr/hadoop/hadoop2.9/hive(注：按自己路径修改)
export PATH= $PATH:$ HIVE_HOME/bin:$HIVE_HOME/conf

3、修改Hive配置文档

1)、进入/usr/hadoop/hadoop2.9/hive/conf，新建文件hive-site.xml

cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-default.xml
cp hive-site.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
cp hive-exec-log4j.properties.template hive-exec-log4j.properties

2)、添加hive-site.xml内容

<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>111111</value>
    </property>
</configuration>

3)、修改hive-env.sh内容

HADOOP_HOME=/usr/hadoop/hadoop2.9
export HIVE_CONF_DIR=/usr/hadoop/hadoop2.9/hive/conf
将java项目中使用的mysql-connector-java.jar移至hive/lib目录下面。
mysql-connector-java-5.1.46.jar

4、启动hive

1)、如果是第一次启动Hive，则需要先执行如下初始化命令：
schematool -dbType mysql -initSchema

XLC-2:bin xianglingchuan$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException : Could not create connection to database server. Attempted reconnect 3 times. Giving up.
SQL Error code: 0
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

启动hive

XLC-2:bin xianglingchuan$ bin/hive
19/01/26 23:37:57 DEBUG util.VersionInfo: version: 2.9.2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 96860f82-17bb-48ca-9475-700e2ebffc6f

Logging initialized using configuration in file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

问题一: 启动及ssh localhost时提示"Bad configuration option: usekeychain"

home:sbin root$ ./start-all.sh 
WARNING: Attempting to start all Apache Hadoop daemons as xianglingchuan in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: /Users/root/.ssh/config: line 8: Bad configuration option: usekeychain

home:sbin root$ ssh localhost
/Users/root/.ssh/config: line 8: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 21: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 30: Bad configuration option: usekeychain
/Users/root/.ssh/config: terminating, 3 bad configuration options

解决办法:
直接将.ssh/config文件内容清空，重新生成key

问题二、hadoop 2.9版本报错:

```

home:sbin root$ sbin/start-dfs.sh
19/01/26 17:36:51 DEBUG util.Shell: setsid is not available on this machine. So not using it.
19/01/26 17:36:51 DEBUG util.Shell: setsid exited with exit code 0
19/01/26 17:36:51 ERROR conf.Configuration: error parsing conf core-site.xml
com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 start byte 0xa0 (at char #766, byte #37)

解决办法: 
    将core-site.xml新加入的property节点去空格及验证编码格式。



# 常用信息
## Hadoop开启关闭调试信息  
   开启：export HADOOP_ROOT_LOGGER=DEBUG,console  
   关闭：export HADOOP_ROOT_LOGGER=INFO,console

## 各配置文件作用说明
core-site.xml  配置Service的URI地址、Hadoop集群临时目录等信息
hdfs-site.xml  配置Hadoop集群的HDFS别名、通信地址、端口等信息
map-site.xml  计算框架资源管理名称、历史任务访问地址等信息(2.9为mapred-site.xml)
yarn-site.xml  配置资源管理器的相关内容
fair-scheduler.xml Hadoop FairScheduler调度策略配置文件

Mac下Hadoop单机模式安装