hadoop local install
下载java jdk安装包
curl -O http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.rpm?AuthParam=1525399927_bf1d984d06bb3ec51e18b1e0cd30a4b7
下载hadoop安装包
curl -O http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
基础环境配置
- 禁止ipv6网络
vi /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
- 修改主机名
hostnamectl set-hostname master.hadoop
- 关闭防火墙
systemctl status firewalld
systemctl stop firewalld
systemctl disable firewalld
systemctl status iptables
- selinux
sestatus # 查看状态
setenforce 0 # 临时关闭selinux
vi /etc/selinux/config # 永久关闭
SELINUX=disabled
- 安装jdk
rpm -ivh jdk-8u171-linux-x64.rpm
- 配置ssh信任
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 取消公钥验证(针对单个用户)
vi ~/.ssh/config
host localhost
StrictHostKeyChecking no
host 0.0.0.0
StrictHostKeyChecking no
host *hadoop*
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
# 如果是全部用户
vi /etc/ssh/ssh_config
StrictHostKeyChecking ask ---> StrictHostKeyChecking no
部署hadoop
- 部署hadoop,这里部署到/usr/local
1. 解压安装包至/usr/local
tar xvxf hadoop-3.1.0.tar.gz -C /usr/local/
cd /usr/local
mv hadoop-3.1.0 hadoop # 方便操作
2. 配置相关环境变量
在/etc/profile文件追加以下内容
vi /etc/profile
export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/local/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
3. 修改hadoopq启动文件中JAVA_HOME
vi /usr/local/hadooop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/default
4. 验证配置
[root@master ~]# hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class
OPTIONS is none or any of:
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
buildpaths attempt to add class files from build tree
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version
Daemon Commands:
kms run KMS, the Key Management Server
SUBCOMMAND may print help when invoked w/o parameters or with -h.
体验mapreduce
1. 新建测试文件test.txt
makdir ~/input
vi ~/input/test.txt
hello world
how do you do
are you ok
2. 统计单词
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar wordcount input/ output
3. 查看结果
[root@master output]# cat out/cat part-r-00000
are 1
do 2
hello 1
how 1
ok 1
world 1
you 2