Big Data - Cloudera Manager (referred to as CM) + CDH build big data platform Big Data - Hadoop cluster environment set up Linux configuration ntp time server (full) CDH cluster manually import scm library

A, Cloudera Manager Introduction

Cloudera Manager (referred to as CM) Cloudera company is the development of a large data cluster installation deployment tool, this weapon has a cluster automated installation, centralized management, cluster monitoring, alarm and other features that make installation a cluster from a few days to a few shortened within hours, the operation and maintenance personnel was reduced from dozens to less than a few people, which greatly improves the efficiency of cluster management.

In fact, there are a lot before the start of work to do, such as configuring IP address, turn off the firewall, configure SSH-free secret sign-on, these are relatively conventional environment configuration is not mentioned here, do not know who can refer to big data - Hadoop cluster environment to build in the part.

Attach big data "Past and Present," an article to you, I hope you have a large data more understanding of past lives big data: the birth, development, and future?

1, CM technical architecture

  •  Agent: installed on each host is responsible for starting and stopping the process of unpacking configuration, triggering device and monitoring host;
  • Management Service: a group perform a variety of monitoring, alerting and reporting role of the service;
  • Database: stores configuration and monitoring information is usually the case, a plurality of logical databases running on one or more database servers; e.g., Cloudera management server and monitoring role uses different logic database;
  • Cloudera Repository: Repository software distributed by the Cloudera management;
  • Clients: is an interface for interacting with the server;
  • Admin Console: Web-based user interface and administrators to manage clusters and Cloudera management;
  • API: API Cloudera Manager to create custom applications and developers;

2, CM four functions

  • Management: for cluster management, such as add, delete nodes and other operations;
  • Monitoring: Monitoring the health of a cluster of indicators and operation of the system equipment to conduct a comprehensive monitoring;
  • Diagnosis: clusters of issues arise diagnosis, give advice on solutions to problems;
  • Integration: The integration of the various components hadoop;

3, analysis table

Cloudera Manager appear background When an open source product to do better and better, some people want to make money from open source products inside. To make money in open source Hadoop Cloudera flag is a company born in 2008.
Solve the problem

A significant reduction in cluster deployment time

And various ecological compatibility upgrade technology and strong

Support for Kerberos security authentication

Easy to maintain

Many of the bug fix hadoop

Web-based management interface of the cluster approach

The ability to monitor cluster running status and alarm

Disadvantaged

Since the system is high

Customized source code high degree of difficulty

With paid product features

Designed to lock problems

4, this tutorial uses the virtual machine configuration

Configured according to your PC:

  192.168.1.101(zy1) 192.168.1.102 (zy2) 192.168.1.103 (zy3) 192.168.1.104(zy4)
Character Server/Agent Agent Agent Agent
CPU Dual-core Dual-core Dual-core Dual-core
RAM 8G 4G 4G 4G

Second, the Cluster Time Synchronization

The reason to do time synchronization server cluster, I think we should be self-evident right, CM requires time between each node can not be too much difference, it is to ensure unified management and reduce unnecessary errors cluster, trouble. Another reason is that the server is generally used in the enterprise is not able to connect outside the network, so the first step we started to do time synchronization server cluster.

1, the configuration time for China's time zone

 CST is a central China Standard Time, CST, if not, all hosts are amended as CST:

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

2, ntp time synchronization server installation,

Here you need to install a master time server, time synchronization to other nodes, respectively, this time the main server, so that you can go to the unified management of time, ntp just provides this function, the other nodes continue to synchronize time intervals, the time to reach consistent results.

Each server will need to install ntp, installed via yum:

yum install -y ntp

3, configure ntp

1) modify all nodes /etc/ntp.conf 

vi /etc/ntp.conf
restrict 192.168.1.10x nomodify notrap nopeer noquery          //当前节点IP地址
restrict 192.168.1.1 mask 255.255.255.0 nomodify notrap        //集群所在网段的网关(Gateway),子网掩码(Genmask)

2)选择一个主节点zy1,修改其/etc/ntp.conf

vi /etc/ntp.conf

在server部分添加一下部分,并注释掉server 0 ~ n 

server 127.127.1.0 Fudge 127.127.1.0 stratum 10

3)主节点以外,继续修改/etc/ntp.conf

vi /etc/ntp.conf 

在server部分添加如下语句,将server指向主节点:

server 192.168.1.101 Fudge 192.168.1.101 stratum 10

修改后,各个节点配置如下:

主节点(zy1):

 其它节点zy2:

节点zy3:

节点zy4:

 

 4、启动ntp服务

 执行以下命令后,ntp服务就会启动成功,并开启了自启动:

systemctl start ntpd.service
systemctl enable ntpd.service
service ntpd status

三 设置SELINUX

编辑/etc/sysconfig/selinux文件,把把里边的一行改为SELINUX=disabled**,然后重启就可以了:

四 安装mysql数据库

mysql数据安装在哪都可以,只要能够访问到这个数据库就可以,这里将数据库安装在zy1节点上,具体可以参考博客:Linux下yum方式安装mysql 以及卸载

1、直接使用yum -y install命令安装mysql是无法安装mysql的高级版本,需要先安装带有可用的mysql5系列社区版资源的rpm包,输入如下命令进行安装:

rpm -Uvh http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm

2、 查看mysql可用版, 然后使用如下命令查看mysql的可用版本:

yum repolist enabled | grep "mysql.*-community.*"

3、然后就可以开始安装mysql了,一般我们只需要安装mysql-server和mysql-client就可以了:

yum -y install mysql-community-server

4、mysql安装成功后使用命令开启服务:

systemctl start mysql

可以使用命令让mysq服务加入开机启动(可选):

systemctl enable mysqld 

使用命令查看mysql状态:

systemctl status mysql

5) 创建用户并登录mysql,使用命令创建用户:

mysqladmin -u root password 123456aa

6) 使用navicat连接mysql,在使用navicat连接mysql之前需要先给用户授予远程登录权限,否则将无法连接mysql。在本机登录mysql后,使用如下命令进行远程登录授权:

grant all privileges on *.* to 'root'@'%' identified by '123456aa' with grant option

其中root是用户名,123456aa是密码;上述命令是允许使用该用户名和密码从任何主机访问该服务器上的mysql,然后执行如下命令:

flush privileges
systemctl restart mysql

7) 本机终端连接:

mysql -u root -p

然后输入密码即可:

8) 卸载

查看已安装的mysql 命令:

rpm -qa | grep -i mysql

yum remove mysql-xxx依次卸载,直到mysql的其它依赖全部卸载掉为止;

使用命令查看mysql相关的文件目录:

find / -name mysql

五、安装Cloudera Manager

1、下载CM安装文件并且进行配置

将文件下载到主节点/opt/bigdata目录下:http://archive.cloudera.com/cm5/cm/5/

wget http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.16.2_x86_64.tar.gz 

解压:

tar -xzvf cloudera-manager-centos7-cm5.16.2_x86_64.tar.gz -C /opt/bigdata/cloudera

在解压路径下出现两个子目录cloudera和cm-5.16.2,其中cm-5.16.2存放CM框架本身的配置、依赖库、启动脚本等文件。

修改文件/opt/bigdata/cloudera/cm-5.16.2/etc/cloudera-scm-agent/config.ini指向server所在的服务器地址:

vim /opt/bigdata/cloudera/cm-5.16.2/etc/cloudera-scm-agent/config.ini

下载mysql驱动包,放在/opt/bigdata/cloudera/cm-5.16.2/share/cmd/lib目录下:

cd /opt/bigdata/cloudera/cm-5.16.2/share/cmf/lib
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.47/mysql-connector-java-5.1.47.jar

拷贝CM文件夹给其它节点:

scp -r /opt/bigdata/cloudera/ zy2:/opt/bigdata/
scp -r /opt/bigdata/cloudera/ zy3:/opt/bigdata/
scp -r /opt/bigdata/cloudera/ zy4:/opt/bigdata/

在所有节点创建cloudera Manager用户cloudera-scm:

useradd --system --home=/opt/bigdata/cloudera/cm-5.16.2/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm 

详解:    useradd 
--system  表示创建的是系统用户
--home=/opt/bigdata/cloudera/cm-5.16.2/run/cloudera-scm-server  指定用户主目录路径
--no-create-home   不创建用户主目录
--shell=/bin/false  不作为一个登陆用户
--comment "Cloudera SCM User"   
cloudera-scm  用户名

验证:# cat /etc/passwd | grep cloudera-scm。

2、下载CDH文件(主节点操作)

首先来介绍什么是CDH:

  • CDH全称Cloudera’s Distribution Including Apache Hadoop,是面向hadoop的企业级部署,是hadoop(hadoop、flume、hbase等等)发行版本之一;
  • 由Cloudera公司维护,基于稳定版本的Apache Hadoop构建;
  • 提供了Hadoop的核心:可扩展存储、分布式计算;
  • 基于web的用户界面;

hadoop包含很多发行版本,如:

  • Apache Hadoop;
  • Cloudera’s Distribution Including Apache Hadoop(CDH);
  • Hortonworks Data Platform (HDP);
  • MapR;
  • EMR;

在主节点输入 mkdir /opt/cloudera/parcel-repo/ 新建文件夹,然后赋予权限:

mkdir /opt/bigdata/cloudera/parcel-repo 
chown cloudera-scm:cloudera-scm /opt/bigdata/cloudera/parcel-repo/

在所有子节点输入 mkdir /opt/cloudera/parcels/ 新建文件夹,然后赋予权限:

mkdir /opt/bigdata/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/bigdata/cloudera/parcels/ 

下载CDH安装文件http://archive.cloudera.com/cdh5/parcels/5.14.0/到CM的parcel-repo包里,就可以通过parcel包进行安装CDH了,安装和升级都更加的方便了。

文件名中的el7代表CentOS7版本,如果是别的版本请下载其它的版本。

  • CM只能识别不高于自身版本低的CDH;
  • 修改**.sha1结尾的文件名为.sha**,这才能够被CM识别;

CDH源,去掉校验码文件结尾的1:

 mv CDH-5.14.0-1.cdh5.14.0.p0.24-el7.parcel.sha1 CDH-5.14.0-1.cdh5.14.0.p0.24-el7.parcel.sha

官网下载比较慢,推荐到网盘下载: https://pan.baidu.com/s/1JC-vpYH7SWBwju9C8DkVPw 密码: 26v8.

3、主节点安装cloudera-scm-server

正常安装 cloudera-scm-server 时,安装 scm 库是通过脚本 /opt/bigdata/cloudera/cm-5.16.2/schema/scm_prepare_database.sh 来自动建库的。进入/opt/bigdata/cloudera/cm-5.16.2/share/cmf/schema目录下,执行以下命令:

./scm_prepare_database.sh mysql cm -h zy1 -uroot -p'123456aa' --scm-host zy1 scm scm scm

 其语法格式如下:

./scm_prepare_database.sh mysql <mysql-database> -h<mysql-host> -u<mysql-user> -p<mysql-pwd> --scm-host <scm-host> <scm-user> <scm-dbname> <scm-pwd>
  • <mysql-database>,<mysql-host>,<mysql-user> <mysql-pwd> 是需要创建的 scm 库,对应的 mysql 的主机名,用户名和登录密码;

  • <scm-host>,<scm_user>,<scm-dbname>,<scm-pwd>分别对应 cloudera-scm-server 部署在的主机名,scm 库的登录用户,scm 库的库名,scm 库的登录密码;

如果执行失败,可以参考博客:安装CDH在初始化CM数据库的时候出现mysql数据库连接的问题的解决方案。

看到successfully则表明初始化成功; 输入mysql -uroot -p,进入mysql,输入show databases:

六、CM启动 & CDH安装

1、主节点中进入 /opt/bigdata/cloudera/cm-5.16.2/etc/init.d/ 目录,输入 ./cloudera-scm-server start 启动服务端,再输入 ./cloudera-scm-agent start 启动agent;

./cloudera-scm-server start 
./cloudera-scm-agent start 

如果想查看状态:

./cloudera-scm-server status

2、在所有从节点进入 /opt/bigdata/cloudera/cm-5.16.2/etc/init.d/ 目录,输入 ./cloudera-scm-agent start 启动agent:

./cloudera-scm-agent start 

3、打开浏览器,访问主节点的7180端口,首次启动需要多等待一些时间才能访问到,因为cloudera manager正在初始化一些数据表(有些浏览器可能会无法访问,这里我用谷歌可以访问到这个页面):

 

进入页面后,用户名和密码都是 admin,点击登陆,如下图:

 

进入下图,接受许可,点击继续:

选择60天试用版本,如下图,然后继续:

进入下图,点击继续:

点击选项卡中的“当前管理的主机”,将所有主机都勾选,然后点击继续,如下图:

进入下图,点击更多选项:

根据parcel实际路径修改以下两项;

 修改为:

然后重启服务和代理:

./cloudera-scm-server restart 
./cloudera-scm-agent restart 

然后稍等几分钟,重新加载网页:

等待CM自动安装配置CDH,如下图:

安装完成后点击继续,检查主机正确性如下图:

 

检验可能遇到以下问题:

 

 集群设置,这里我选择自定义,选择自己需要的环境,其他需要依赖的会自动创建,如下图,然后点击继续:

 注意:先不要选择kafka,这个需要激活,我们可以后面再安装。

 服务配置一般按默认就可以,如需特殊调整,自行设置,然后点击继续,这里默认配置如下:

 

接下来数据库设置,输入对应库的名称,用户名和密码,然后点击测试连接,当测试通过后,点击继续:

 

 由于之前我们没有创建相应的数据库,因此需要连接mysql、创建数据库:

create database manager DEFAULT CHARACTER SET utf8; 

 

接下来是集群审核页面,这里默认就可以,点击继续:

安装完成如下图,点击完成:

参考文章

[1]Cloudera Manager(简称CM)+CDH构建大数据平台(转载)

[2]Linux配置ntp时间服务器(全)

[3]Centos7离线安装Cloudera Manager 5.14.1

[4]CDH集群手动导入scm库

[5]CenOS7下CM&CDH大数据平台搭建

[6]CDH安装:选定的 Parcel 正在下载并安装在群集的所有主机上 主机运行状况不良

Guess you like

Origin www.cnblogs.com/zyly/p/11822907.html