Overview of Cloude Manager
CDH is an enterprise-level big data management platform used by Cloudera to monitor and manage the overall hadoop cluster environment.
Cloudera Manager is divided into:
- Cloudera Manager Server: Provides monitoring and management operations for the entire cluster. Cloudera Manager Server manages the overall cluster through Cloudera Manager Agent deployed on different devices. Cloudera Manager Server needs to be deployed on one device.
- Cloudera Manager Agent: Deployed on every device that needs to be monitored and managed. Responsible for collecting operating data and executing issued management commands
- DataBase: A relational database is a database that stores overall cluster status data when Cloudera Manager performs management operations.
installation requirements
- The operating system used in this example is Centos7 x64. You need to select the corresponding version of the installation package according to your operating system.
- Use the ROOT user to install; if you use other non-ROOT users to install, please ensure that the user has sudo permissions, and also ensure that the owner of all related files is this user.
- Ensure that all hosts are fully connected to the network. That is to say, the firewall policy and SELinux policy must be fully understood.
- All execution orders in this case are marked with a green 5 New Roman
- Also ensure that the JDK version is above 1.8.
- Ensure that the memory of each device must be at least 8G.
- Ensure that the remaining space of the /var directory and /usr directory of each device is more than 30G. Data storage disks are mounted as required.
- This example was tested with a minimal installation of the Centos7 x64 version.
Installation example introduction
Server basic parameters
IP |
hostname |
CPU |
RAM |
disk |
Role |
operating system |
192.168.174.111 |
hdfs01 |
8C |
128G |
3T |
server |
Centos7 x64 |
192.168.174.112 |
hdfs02 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.113 |
hdfs03 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.114 |
hdfs04 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.115 |
hdfs05 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.116 |
hdfs06 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.117 |
hdfs07 |
8C |
128G |
3T |
agent |
Centos7 x64 |
192.168.174.118 |
hdfs08 |
8C |
128G |
3T |
agent |
Centos7 x64 |
For the installation of CDH, we'd better use the ROOT user to install it to avoid all kinds of troubles caused by the permissions of some directories and folders. Of course, if you want to use other non-ROOT users to install, but the user must have NOPASSWD permissions for sudo. Also, the owner of all related files and folders must be set to this user, except those that are modified individually.
The official latest stable version download address of CDH software package: http://archive.cloudera.com/cdh5/parcels/latest/
The following three files need to be downloaded at this address:
CDH-5.14.0-1.cdh5.14.0.p0.24-el7.parcel
CDH-5.14.0-1.cdh5.14.0.p0.24-el7.parcel.sha1
manifest.json
Cloudera Manager official download address: http://archive.cloudera.com/cm5/cm/5/
Download only one file at this address (the file needs to be selected according to your own system):
cloudera-manager-centos7-cm5.14.1_x86_64.tar.gz
Official installation reference documentation: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/installation_installation.html
Operating system related configuration
turn off firewall
Because we want to build a cluster, there will be communication between the clusters. If there is communication between the servers, there must be a corresponding firewall policy open, so we need to close the firewall. The following is the command to close the firewall (operate under the root user):
# 检查防火墙状态
Centos6: [root@localhost ~]# service iptables status
Centos7: [root@localhost ~]# systemctl status firewalld.service
如果显示状态不是iptables: Firewall is not running.则需要关闭防火墙
# 关闭防火墙
Centos6:[root@localhost ~]# service iptables stop
Centos7:[root@localhost ~]# systemctl stop firewalld
# 永久关闭防火墙
Centos6:chkconfig iptables off
Centos7:systemctl disable firewalld.service
# 检查防火墙状态
Centos6: [root@localhost ~]# service iptables status
Centos7:[root@localhost ~]# systemctl status firewalld.service
iptables: Firewall is not running.
close SElinux
Because all the access rights of centos are managed by SELinux, in order to avoid the failure caused by the permission relationship in our installation, we will close it, and then re-manage it as needed. The following gives the operation command to turn off SELinux (operate under the ROOT user):
# 查看SElinux的状态
[root@localhost ~]# /usr/sbin/sestatus –v
SELinux status: enabled
如果SELinux status参数为enabled即为开启状态,需要进行下面的关闭操作。
# 关闭SElinux
[root@localhost ~]# vim /etc/selinux/config
在文档中找到SELINUX,将SELINUX的值设置为disabled,即:
SELINUX=disabled
# 在内存中关闭SElinux
[root@localhost ~]# setenforce 0
# 检查内存中状态
[root@localhost ~]# getenforce
如果日志显示结果为disabled或者permissive,说明操作已经成功。
Configure yum source
This step is mainly to set the operating system installation package (ISO) as the YUM source to install more components. Execute on all hosts (upload the Centos ISO installation file to the /opt folder)
1. Mount the operating system ISO file to the specified directory
[root@localhost ~]# mkdir /mnt/iso
[root@localhost ~]# mount -o loop /opt/CentOS-7-x86_64-DVD-1511.iso /mnt/iso
Where CentOS-7-x86_64-DVD-1511.iso is the ISO file of CentOS7.2
2. Set up the yum source repo file
[root@localhost ~]# cd /etc/yum.repos.d
[root@localhost ~]# mkdir /opt/repo_bak;mv *.repo /opt/repo_bak
[root@localhost ~]# vi base.repo
Add the following code to the newly created base.repo file:
[base]
name=CentOS 7
baseurl=file:///mnt/iso
gpgcheck=0
3. Refresh yum
[root@localhost ~]# yum clean all
[root@localhost ~]# yum makecache
Install related dependencies
[root@localhost ~]# yum -y install chkconfig bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb httpd httpd-tools unzip ntp
start httpd service
[root@localhost ~]# systemctl start httpd.service
[root@localhost ~]# systemctl enable httpd.service #设置为开机启动
Configuring NTP Clock Synchronization
Set up a unified clock synchronization service on all devices where the CDH environment is to be installed. If we have a clock server, then we need to configure the NTP client on each device; if not, we use the server host as the clock server and configure the NTP server for the server host. other servers to synchronize this server's clock.
In this example, the server host is configured as an NTP server, and other hosts are configured as NTP clients. It would be easier if there were clock servers, all configured as NTP clients.
NTP server configuration (configured on the server host, if there is a clock server, the server host is also configured as a client)
Modify /etc/ntp.conf
Make the following modifications to the contents of the file:
1. Comment out all configurations starting with restrict
2. Find restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap, uncomment it, and change the IP and mask to the real environment IP and mask. This line is configured to allow ntp client connections
3. Find server 0.centos.pool.ntp.org iburst and comment all server configurations
4. Add the following two lines
server 127.127.1.0
fudge 127.127.1.0 stratum 10
Start the NTP service
Execute the following command to start the ntp service
[root@localhost ~]# systemctl restart ntpd
View service status
After the service is started, use ntpq -p to view the service status. When the reach reaches a relatively large value (usually 17), configure the NTP client.
NTP client configuration (configured on the agent host)
Modify /etc/ntp.conf
Make the following modifications to the file:
1. Comment out all restrict and server configurations
2. Add the following note, you need to modify the following IP to be the IP of the NTP server (in this case, the IP of the server host)
server 192.168.187.5
Synchronize time manually
In order to avoid the first synchronization time being slow, and to test whether our configuration is correct, we first use the following command to manually synchronize once.
[root@localhost ~]# ntpdate 192.168.187.51
Start the NTP service
[root@localhost ~]# systemctl restart ntpd
Set the ntp service of all hosts to start automatically at boot
centos6:[root@localhost ~]# chkconfig ntpd on
centos7:[root@localhost ~]# systenctl enable ntpd.service
Modify hostname
The reason for modifying the host name is to facilitate our memory and management, but this is not the main reason. The more important thing is to prevent the internal implementation mechanism of hadoop from routing to the IP of the host through the host name. We need to ensure that each machine is hostnames are not the same.
In this example, only the operation of the first server is used as an example, other servers are the same, but it should be noted that the hostname of each host must be different. (It is recommended to use the format hdfs1, hdfs2, hdfs3... . to name each server at once).
The following gives the operation command to modify the host name (operate under ROOT):
Centos6:
[root@localhost ~]# vi /etc/hostname
将内容修改为新的hostname
Centos7:
[root@localhost ~]# hostnamectl set-hostname hdfs1;
[root@localhost ~]# hostname hdfs1
执行完以上的命令退出,重新登录即可。
Underscores (_) cannot be used in hostnames.
hostname cannot protect uppercase characters
Set Host Routing (HOSTS)
There are two main reasons for modifying HOSTS:
1. In order to prevent the internal implementation mechanism of hadoop from accessing the host through the host name.
2. In order to make it more convenient for us to write in the configuration process, it looks clear at a glance.
It should be noted here that we configure HOSTS not only to configure the correspondence between the local IP and the host, but to configure the correspondence between the IP and the host name of all machines for each machine in our plan.
Modify the HOSTS method:
The following is the operation command to modify HOSTS (operate under the ROOT user):
Modify the /etc/hosts file and add the correspondence between the IP addresses and host names of all the hosts in the plan. And each machine is configured.
[root@hdfs1 ~]# vi /etc/hosts
Add the content in the following format to the file, which is the IP and host name of all the hosts in our plan, and the same content should be added to the HOSTS of each machine. The IP and the host name are separated by a TAB key .
192.168.186.101 hdfs1
192.168.186.102 hdfs2
192.168.186.103 hdfs3
……
If we want multiple names to be routed to the same IP, we just need to continue adding them later, also using the TAB key to separate them. E.g:
192.168.186.101 hdfs1 master spark hadoop
Installation of relational database
MySQL installation
The installation package used
MySQL-client-5.6.26-1.el6.x86_64.rpm
MySQL-devel-5.6.26-1.el6.x86_64.rpm
MySQL-server-5.6.26-1.el6.x86_64.rpm
Uninstall MySQL and mariadb that comes with Centos
Execute the following two commands to view the pre-installed MySQL or mariadb on the system
[root@hdfs1 ~]# rpm -qa | grep MySQL
[root@hdfs1 ~]# rpm -qa | grep mariadb
Delete all installed components queried by the above command with the following command
[root@hdfs1 ~]# rpm -e --nodeps (以上命令查出来的所有包,以空格分开)
Use the following command to query all MySQL related files
[root@hdfs1 ~]# find / -name mysql
Delete all files found by the above command
Install MySQL
Go to the directory of our MySQL installation package and execute the following command (I put all MySQL installation packages in the root user's home directory)
[root@hdfs1 ~]# rpm -ivh MySQL*
start MySQL
centos6:[root@hdfs1 ~]# service mysql start
centos7:[root@hdfs1 ~]# systemctl start mysql.service
initialization password
During the installation of MySQL, we will see the following printed:
A RANDOM PASSWORD HAS BEEN SET FOR THE MySQL root USER !You will find that password in ‘/root/.mysql_secret’.
We need to find the randomly generated root password under this file, use the following command:
[root@hdfs1 ~]# cat /root/.mysql_secret
Log in to MySQL's command console
[root@hdfs1 ~]# mysql –uroot –p密码 # 密码为我们在上一步中查看到的随机密码
Change the password of the MySQL root user
Execute the following SQL in the MySQL command console to reset the password of the root user to 123456
SET PASSWORD FOR 'root'@'localhost' = PASSWORD(‘123456’)
Modify the permissions of the MySQL ROOT user
grant all on *.* to root@"%" identified by "123456"
Set MySQL to start at startup
centos6:[root@hdfs1 ~]# chkconfig mysqld on
centos7:[root@hdfs1 ~]# systemctl enable mysqld.service
Edit the /etc/my.cnf file. Edit Please backup the my.cnf file. Set according to the following parameters:
[mysqld] transaction-isolation = READ-COMMITTED # Disabling symbolic-links is recommended to prevent assorted security risks; # to do so, uncomment this line: # symbolic-links = 0
key_buffer = 16M key_buffer_size = 32M max_allowed_packet = 32M thread_stack = 256K thread_cache_size = 64 query_cache_limit = 8M query_cache_size = 64M query_cache_type = 1
max_connections = 550 #expire_logs_days = 10 #max_binlog_size = 100M
#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system #and chown the specified folder to the mysql user. log_bin=/var/lib/mysql/mysql_binary_log
binlog_format = mixed
read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M
# InnoDB settings innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 64M innodb_buffer_pool_size = 4G innodb_thread_concurrency = 8 innodb_flush_method = O_DIRECT innodb_log_file_size = 512M
[mysqld_safe] log-error=/var/log/mariadb/mariadb.log pid-file=/var/run/mariadb/mariadb.pid |
symbolic-links一定要注释掉
初始化数据库
[root@lhdfs1 ~]# /usr/bin/mysql_secure_installation
按照如下方式初始化。此步骤会初始化root用户的密码,请记住初始化后的root用户的密码。
初始化相关数据库以及用户。在MYSQL命令行下执行如下SQL:
create database amon DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database rman DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database metastore DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database sentry DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database nav DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database navms DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database monitor DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
grant all on *.* to 'root'@'%' identified by '123456' with grant option;
JDK的安装
因为CDH的运行依赖JDK1.8的运行环境。所以在安装CDH之前一定要先安装JDK1.8。本示例中以在ROOT用户下安装JDK为例。
1、 下载并上传JDK1.8的安装包
将压缩包上传到任意目录,本文以ROOT用户的主目录(~)下为例
2、 解压到相应的安装目录下
本示例中将JDK安装在了/usr/local目录下。
[root@hdfs1 ~]# tar -zxvf jdk-8u131-linux-x64.tar.gz -C /usr/local/
3、 配置环境变量
将解压后的jdk的目录配置到环境变量中
[root@hdfs1 ~]# vi /etc/profile
在该文件的末尾处添加以下内容
export JAVA_HOME=/usr/local/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
4、刷新环境变量
[root@hdfs1 ~]# source /etc/profile
5、测试是否安装成功
在任意目录下执行一下命令
[root@hdfs1 ~]# java -version
如果出现Java的版本信息证明安装成功,如果未出现,请检查环境变量中配置的路径是否正确。
Cloudera Manager Server的安装
既然是server的安装,我们就只在server主机上执行以下步骤。
上传安装包
对于server的安装我们只需要以下安装介质
Cloudera Manager 安装包:cloudera-manager-centos7-cm5.12.1_x86_64.tar.gz
MySQL驱动包:mysql-connector-java-5.1.44-bin.jar
大数据离线安装库: CDH-5.12.1-1.cdh5.12.1.p0.3-el7.parcel
CDH-5.12.1-1.cdh5.12.1.p0.3-el7.parcel.sha
manifest.json
以上的介质我们在[安装示例介绍]章节中都已经明确,本示例中上传到ROOT用户的主目录下。
创建安装目录并解压安装介质
[root@hdfs1 ~]# mkdir /opt/cloudera-manager
[root@hdfs1 ~]# tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager
安装数据库驱动并初始化数据库
安装数据库驱动
[root@hdfs1 ~]# mkdir -p /usr/share/java
[root@hdfs1 ~]# cp mysql-connector-java-5.1.44-bin.jar /usr/share/java/mysql-connector-java.jar
初始化数据库
创建系统用户cloudera-scm
[root@hdfs1 ~]# useradd --system --home=/opt/cloudera-manager/cm-5.12.1/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
创建server存储目录
[root@hdfs1 ~]# mkdir /var/lib/cloudera-scm-server
[root@hdfs1 ~]# chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-server
创建hadoop离线安装包存储目录
[root@hdfs1 ~]# mkdir -p /opt/cloudera/parcels;
[root@hdfs1 ~]# chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
配置agent的server指向
修改文件我方式有两种,当然,这不过是shell的功能而已,如果shell厉害,可能还会有更多的方法。
第一种方法:
[root@hdfs1 ~]# vi /opt/cloudera-manager/cm-5.12.1/etc/cloudera-scm-agent/config.ini
将server_host修改为cloudera manager server的主机名,对于本示例而言,也就是server主机。
第二种方法:
[root@dhdfs1 ~]# sed -i "s/server_host=localhost/server_host=hdfs1/" /opt/cloudera-manager/cm-5.12.1/etc/cloudera-scm-agent/config.ini
部署CDH离线安装包
[root@hdfs1 ~]# mkdir -p /opt/cloudera/parcel-repo;
[root@hdfs1 ~]# chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo;
[root@hdfs1 ~]# mv CDH-5.12.1-1.cdh5.12.1.p0.3-el7.parcel CDH-5.12.1-1.cdh5.12.1.p0.3-el7.parcel.sha manifest.json /opt/cloudera/parcel-repo/
启动Cloudera Manager Server
[root@hdfs1 ~]# /opt/cloudera-manager/cm-5.12.1/etc/init.d/cloudera-scm-server start
启动Cloudera Manager Agent
[root@hdfs1 ~]# /opt/cloudera-manager/cm-5.12.1/etc/init.d/cloudera-scm-agent start
Cloudera Manager Agent的安装
在除了server服务器外的其他的服务器都要执行以下步骤进行对agent的部署。
上传安装包
对于agent的安装我们只需要以下的两个安装介质
Cloudera Manager 安装包:cloudera-manager-centos7-cm5.12.1_x86_64.tar.gz
MySQL驱动包:mysql-connector-java-5.1.44-bin.jar
安装数据库驱动
[root@hdfs1 ~]# mkdir -p /usr/share/java
[root@hdfs1 ~]# cp mysql-connector-java-5.1.44-bin.jar /usr/share/java/mysql-connector-java.jar
创建安装目录并解压安装介质
[root@hdfs1 ~]# mkdir /opt/cloudera-manager
[root@hdfs1 ~]# tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager
建系统用户cloudera-scm
[root@hdfs1 ~]# useradd --system --home=/opt/cloudera-manager/cm-5.12.1/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
创建hadoop离线安装包存储目录
[root@hdfs1 ~]# mkdir -p /opt/cloudera/parcels;
[root@hdfs1 ~]# chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
配置agent的server指向
修改文件我方式有两种,当然,这不过是shell的功能而已,如果shell厉害,可能还会有更多的方法。
第一种方法:
[root@hdfs1 ~]# vi /opt/cloudera-manager/cm-5.12.1/etc/cloudera-scm-agent/config.ini
将server_host修改为cloudera manager server的主机名,对于本示例而言,也就是server主机。
第二种方法:
[root@dhdfs1 ~]# sed -i "s/server_host=localhost/server_host=hdfs1/" /opt/cloudera-manager/cm-5.12.1/etc/cloudera-scm-agent/config.ini
启动Cloudera Manager Agent
[root@hdfs1 ~]# /opt/cloudera-manager/cm-5.12.1/etc/init.d/cloudera-scm-agent start
Cloudera ManagerMent Service集群的安装
当我们部署完CDH的server和agent之后,我们的其他一切操作都在网页上进行操作。首先我们就要安装CDH的监控集群,它是用来监控我们整个CDH的所有主机和集群的运行状态的服务。所以安装很有必要。
但是有一点是,他的进程很多,非常占用内存,生产环境中一定不要和集群安装在一台机器上。我的安装部署原则是,server主机上部署所有Cloudera Manager 相关的组件(MySQL,Cloudera Manager Server, Cloudera ManagerMent Service的所有角色),而hadoop集群的所有角色都分配到agent中。