Hadoop搭建,Hive搭建,Spark搭建,基础配置


搭建步骤


基础环境配置

三台机器主机名修改:(无需重启生效,重新连接即可)

hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2

三台机器防火墙关闭:

systemctl stop firewalld.service

IP映射:

vim /etc/hosts
内网ip1 master
内网ip2 slave1
内网ip3 slave2

三台机器时区更改:

tzselect 5911

三台机器ntp服务下载:

yum install -y ntp

master作为ntp服务器,修改ntp配置文件。(master上执行)

vim /etc/ntp.conf
server 127.127.1.0
fudge 127.127.1.0	stratum 10

master中重启ntp服务

systemctl restart ntpd.service

slave1、slave2同步:

ntpdate master

比赛中必须将如下信息添加至环境变量中并且生效(三台机器)

vim /etc/profile
TZ='Asia/Shanghai'; export TZ
保存退出
source /etc/profile

slave1、slave2从节点在早十晚五时间段每隔半个小时同步一次主节点时间(24小时制、用户root任务调度crontab

crontab -e
添加如下:
*/30 10-17 * * *./ntpdate master

免密配置

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ssh-copy-id localhost

master:

cat id_rsa.pub >> authorized_keys
(注意在.ssh/路径下操作。查看一下authorized_keys,如果authorized_keys中有密钥信息,就不用追加了)
ssh master
exit

slave1、slave2:

scp master:~/.ssh/id_rsa.pub ~/.ssh/master_rsa.pub
cat master_rsa.pub >> authorized_keys

master:

ssh slave1
ssh slave2

方法二:

ssh-keygen -t rsa
三台机器将密钥拷贝到第一台机器
命令:  ssh-copy-id master
将第一台机器的公钥拷贝到其他机器
scp /root/.ssh/authorized_keys   slave1:/root/.ssh
scp /root/.ssh/authorized_keys   slave2:/root/.ssh
ssh-copy-id localhost

JDK配置

mkdir -p /usr/java
tar -zxvf /usr/package/jdk-8u171-linux-x64.tar.gz -C /usr/java/

vim /etc/profile
source /etc/profile

JAVA

export JAVA_HOME=/usr/java/jdk1.8.0_171
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
export PATH JAVA_HOME CLASSPATH
保存退出
source /etc/profile

scp -r /usr/java/ slave1:/usr/
scp -r /usr/java/ slave2:/usr/

从节点配置环境变量并且生效。


Zookeeper安装配置

mkdir -p /usr/zookeeper
tar -zxvf /usr/package/zookeeper-3.4.10.tar.gz -C /usr/zookeeper/

vim /etc/hosts
192.168.57.110 master master.root
192.168.57.111 slave1 slave1.root
192.168.57.112 slave2 slave2.root

进入配置文件目录conf:

zoo.cfg:

cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
## 修改如下内容
dataDir=/usr/zookeeper/zookeeper-3.4.10/zkdata
## 添加如下内容
dataLogDir=/usr/zookeeper/zookeeper-3.4.10/zkdatalog
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

myid:

进入主目录
mkdir zkdata
mkdir zkdatalog
vim zkdata/myid
1
保存推出

scp -r /usr/zookeeper/ root@slave1:/usr/
scp -r /usr/zookeeper/ root@slave2:/usr/
slave1修改myid为2
slave2修改myid为3

三台机器配置环境变量:

vim /etc/profile
## ZOOKEEPER
export ZOOKEEPER_HOME=/usr/zookeeper/zookeeper-3.4.10 
PATH=$PATH:$ZOOKEEPER_HOME/bin
source /etc/profile

三台机器启动zookeeper:

zkServer.sh start
zkServer.sh status

Hadoop配置

mkdir -p /usr/hadoop
tar -zxvf /usr/package/hadoop-2.7.3.tar.gz -C /usr/hadoop/

三台机器配置环境变量:

vim /etc/profile
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile

进入配置文件目录 主目录/etc/hadoop

hadoop-env.sh:

export JAVA_HOME=/usr/java/jdk1.8.0_171

core-site.xml:

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/hadoop/hadoop-2.7.3/hdfs/tmp</value>
		<description>A base for other temporary directories.</description>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
	<property>
		<name>fs.checkpoint.period</name>
		<value>60</value>
	</property>
	<property>
		<name>fs.checkpoint.size</name>
		<value>67108864</value>
	</property>
</configuration>

yarn-site.xml:

<configuration>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>master:18040</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>  
		<value>master:18030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>master:18088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>master:18025</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>master:18141</value>
	</property>
	<property>
		<name>yarn.resourcemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>

yarn-en.sh:

export JAVA_HOME=/usr/java/jdk1.8.0_171

slaves:

vim slaves
slave1
slave2

master:

vim master
master

hdfs-site.xml:

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/usr/hadoop/hadoop-2.7.3/hdfs/name</value>
		<final>true</final>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/usr/hadoop/hadoop-2.7.3/hdfs/data</value>
		<final>true</final>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>master:9001</value>
	</property>
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
</configuration>

mapred-site.xml:

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>


scp -r /usr/hadoop/ root@slave1:/usr/
scp -r /usr/hadoop/ root@slave2:/usr/

sourc /etc/profile

验证,待前面步骤全部验证成功后再格式化启动

hadoop namenode -format
sbin/start-all.sh

Hive安装配置

slave2 启动mysql服务,重置密码:

systemctl start mysqld.service
grep "temporary password" /var/log/mysqld.log    查看初始密码
mysql -uroot -p
set global validate_password_policy=0;
set global validate_password_length=4;
alter user 'root'@'localhost' identified by '123456';
\q	
mysql -uroot -p123456
create user 'root'@'%' identified by '123456';
grant all privileges on *.* to 'root'@'%' withgrant option;
flush privileges;

解压安装 master、slave1:

mkdir -p /usr/hive
tar -zxvf /usr/package/apache-hive-2.1.1-bin.tar.gz -C /usr/hive

master、slave1设置HIVE系统环境变量($HIVE_HOME):在配置文件目录下

vim /etc/profile
## HIVE
export HIVE_HOME=/usr/hive/apache-hive-2.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile

master、slave1设置HIVE运行环境:

cp hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export HIVE_CONF_DIR=/usr/hive/apache-hive-2.1.1-bin/conf
export HIVE_AUX_JARS_PATH=/usr/hive/apache-hive-2.1.1-bin/lib
保存退出

解决jline的版本冲突 操作环境: master、slave1:

cp /usr/hive/apache-hive-2.1.1-bin/lib/jline-2.12.jar /usr/hadoop/hadoop-2.7.3/share/hadoop/yarn/lib/

slave1驱动拷贝(依赖包存放于/usr/package/):

cp /usr/package/mysql-connector-java-5.1.47-bin.jar /usr/hive/apache-hive-2.1.1-bin/lib/

slave1 配置 hive-site.xml 文件:

<configuration>
<!-- Hive产生的元数据存放位置-->
<property>
	<name>hive.metastore.warehouse.dir</name>
	<value>/user/hive_remote/warehouse</value>
</property>
	<!-- 数据库连接JDBC的URL地址-->
<property>
	<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://slave2:3306/hive?createDatabaseIfNotExist=true</value>        
		<!--连接MySQL所在的ip(主机名)及端口-->
</property>
	<!-- 数据库连接driver,即MySQL驱动-->
<property>
	<name>javax.jdo.option.ConnectionDriverName</name>
	<value>com.mysql.jdbc.Driver</value>
</property>
	<!-- MySQL数据库用户名-->
<property>
	<name>javax.jdo.option.ConnectionUserName</name>
	<value>root</value>
</property>
	<!-- MySQL数据库密码-->
<property>
	<name>javax.jdo.option.ConnectionPassword</name>
	<value>123456</value>
</property>
<property>
	<name>hive.metastore.schema.verification</name>
	<value>false</value>
</property>
<property>
	<name>datanucleus.schema.autoCreateAll</name>
	<value>true</value>
</property>
</configuration>

master 配置 hive-site.xml 文件:

<configuration>
<!-- Hive产生的元数据存放位置-->
<property>
	<name>hive.metastore.warehouse.dir</name>
	<value>/user/hive_remote/warehouse</value>
</property>
<!--- 使用本地服务连接Hive,默认为true-->
<property>
	<name>hive.metastore.local</name>
	<value>false</value>
</property>
<!-- 连接服务器-->
<property>
	<name>hive.metastore.uris</name>
<value>thrift://slave1:9083</value>          
<!--hive客户端通过thrift服务器服务连接MySQL数据库,这里的thrift服务器就是slave1的ip(主机名)-->
</property>
</configuration>

初始化数据库,启动metastore服务,开启客户端操作环境: slave1、master
解决:(slave1)

启动:

bin/hive --service metastore

先查看是否有连接数据库的权限,没有的话给权限
查看是否已经有该数据库,有的话删除
将hive目录下的 metastore_db删除
重新初始化 bin/schematool -dbType mysql -initSchema
开启slave1 的metastore: bin/hive --service metastore
hive服务器不需要初始化,可以直接启动客户端 bin/hive


hql 语句

hadoop namenode -format
sbin/start-all.sh
systemctl start mysqld.service
bin/schematool -dbType mysql -initSchema
bin/hive


hadoop namenode -format
sbin/start-all.sh
systemctl start mysqld.service
bin/schematool -dbType mysql -initSchema
bin/hive

create external table person(age double, workclass string, fnlwgt string, edu string, edu_num double, marital_status string, occupation string, relationship string, race string, sex string, gain string, loss string, hours double, native string, income string) row format delimited fields terminated by ',';

load data local inpath '/root/college/person.csv' into table person;
insert overwrite local directory '/root/person00' row format delimited fields terminated by ',' select count(*) from person;
insert overwrite local directory '/root/person03' row format delimited fields terminated by ',' select round(avg(age)) from person;
insert overwrite local directory '/root/person04' row format delimited fields terminated by ',' select count(*) from person where age between 35 and 40 and marital_status == 'Never-married';
insert overwrite local directory '/root/person05' row format delimited fields terminated by ',' select count(*) from person where hours between 20 and 30 and occupation == 'Tech-support';
insert overwrite local directory '/root/person06' row format delimited fields terminated by ',' select * from(select count(*) as s from person group by race) w order by w.s desc;
select * from(select count(*) as s from person group by race) w order by w.s desc;
select count(*) x from person group by race having order by x desc;

create table student(Id Int, Name String, Age Int, Sex string) row format delimited fields terminated by ',';
alter table student add columns (address string);

Spark安装配置

scala安装

mkdir -p /usr/scala
tar -zxvf 压缩包 -C /usr/scala

vim /etc/profile
export SCALA_HOME=/usr/scala/scala-2.11.12
export PATH=$SCALA_HOME/bin:$PATH
保存退出
source /etc/profile

scala -version

scp -r /usr/scala root@slave1:/usr/
scp -r /usr/scala root@slave2:/usr/
配置环境变量

mkdir -p /usr/spark
tar -zxvf 压缩包 -C /usr/spark

进入配置文件目录

spark-env.sh:

cp spark-env.sh.template spark-env.sh
vim spark-env.sh
	export SPARK_MASTER_IP=master
	export SCALA_HOME=/usr/scala/scala-2.11.12
	export SPARK_WORKER_MEMORY=8g
	export JAVA_HOME=/usr/java/jdk1.8.0_171
	export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
	export HADOOP_CONF_DIR=/usr/hadoop/hadoop-2.7.3/etc/Hadoop

slaves:

cp slaves.template.template slaves
vim slaves
	slave1
	slave2

vim /etc/profile
export SPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
source /etc/profile

scp -r /usr/spark root@slave1:/usr/
scp -r /usr/spark root@slave2:/usr/

从节点配置环境变量并且生效。



MR

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


import java.net.URI;

public class TenMain extends Configured implements Tool {
    
    
    public int run(String[] strings) throws Exception {
    
    
        Job job = Job.getInstance(super.getConf(), "mapreduce_x");
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job,new Path("hdfs://mycluster/bigdatacase/dataset"));

        job.setMapperClass(TenMapper.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(LongWritable.class);

        job.setReducerClass(TenReduce.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(NullWritable.class);

        job.setOutputFormatClass(TextOutputFormat.class);
        Path path = new Path("hdfs://mycluster/usr/output");
        TextOutputFormat.setOutputPath(job,path);
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://mycluster/usr/output"), new Configuration());
        //判断目录是否存在
        boolean bl2 = fileSystem.exists(path);
        if(bl2){
    
    
            //删除目标目录
            fileSystem.delete(path, true);
        }
        boolean b1 = job.waitForCompletion(true);
        return b1 ?0:1;
    }

    public static void main(String[] args) throws Exception {
    
    
        Configuration configuration = new Configuration();
        int run = ToolRunner.run(configuration,new TenMain(),args );
        System.exit(run);
    }
}



import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class TenMapper extends Mapper<LongWritable, Text,LongWritable,LongWritable> {
    
    
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    
    
        String[] split = value.toString().split("\t");
        if (split[5].equals("2014-12-11")&&split[3].equals("4")){
    
    
            context.write(new LongWritable(1),new LongWritable(1));
        }
    }
}


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class TenReduce extends Reducer<LongWritable,LongWritable,LongWritable, NullWritable> {
    
    
    @Override
    protected void reduce(LongWritable key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
    
    

        int sum =0;
        for (LongWritable i:values
             ) {
    
    
            sum+=Integer.parseInt(i.toString());
        }
        context.write(new LongWritable(sum),NullWritable.get());
    }
}

猜你喜欢

转载自blog.csdn.net/shuyv/article/details/120517849