1. File download
Related files used for CDH installation: Link: https://pan.baidu.com/s/1xDQD1Sa8s47Qiu_EFYdhUA?pwd=5mrt Extraction code: 5mrt
2. Machine basic adjustment
All machines require all steps below.
2.1. Preparing the machine
At least three 2-core 16G machines.
2.1.1. ECS server
If you buy an ECS server, you need:
Modify manage_etc_hosts in the /etc/cloud/templates/hosts.redhat.tmpl file to False
vim /etc/cloud/templates/hosts.redhat.tmpl
delete the manage_etc_hosts line configuration in the /etc/cloud/cloud.cfg file
vim /etc/cloud/cloud.cfg
2.2. hosts file
101.200.233.33 hadoop01
112.126.56.59 hadoop02
39.96.39.79 hadoop03
The external network address is configured here. If it is an internal machine of the company, just configure the internal network address.
All nodes must be configured. After a new node is added, all node information must be configured.
2.3. Firewall
systemctl stop firewalld
systemctl disable firewalld
iptables -F
systemctl status firewalld
2.4. selinux
vim /etc/selinux/config
changes SELINUX=enforcing to SELINUX=disabled. Restart after modification to take effect.
Run the following command to take effect without restarting:
setenforce 0View
status getenforce
2.5. Time zone
View current time zone settings
timedatectl
sets Asia Shanghai time zone
timedatectl set-timezone Asia/Shanghai
2.6. Clock synchronization
Install ntp service on all nodes
yum install -y The configuration file that comes with the new version of ntp
has already set up many available network clock synchronization servers. When the network is lost, hardware time can still be used to provide clock synchronization to other nodes.
Start ntpd and check the status
systemctl enable ntpd
systemctl restart ntpd
systemctl status ntpd
verify whether the time is synchronized
ntpq -p
If you are using a cloud host, the following error may occur
localhost: timed out, nothing received
Request timed out
can turn off ipv6, and then verify whether the time is synchronized again. The steps to turn off ipv6 are as follows:
permanently turn off ipv6
Add the following parameters to /etc/sysctl.conf
vim /etc/sysctl.conf
# Disable IPv6 on all interfaces in the entire system. You can simply set this parameter to turn off IPv6 on all interfaces.
net.ipv6.conf.all.disable_ipv6 = 1
# Disable a specified interface. IPv6 (for example: eth0, eth1)
net.ipv6.conf.eth0.disable_ipv6 = 1
net.ipv6.conf.eth1.disable_ipv6 = 1
2.7. Set up swap space
Temporary modification
sysctl vm.swappiness=0
# Check whether the modification is successful
cat /proc/sys/vm/swappiness
Permanent modification
echo 'vm.swappiness=0' >> /etc/sysctl.conf
# Execute the following command to make the modification take effect immediately
sysctl -p
Cloudera recommends setting the swap space to 0. Excessive swap space will cause a surge in GC time.
2.8. Turn off large page compression
Temporarily effective
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Effective permanently
echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' >> /etc/rc.local
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.local
Give execution permission to the file
chmod +x /etc/rc.local
2.9. Maximum number limit
Maximum number of open files, number of processes, memory usage adjustment
vim /etc/security/limits.conf
* soft nofile 65535
* hard nofile 1024999
* soft nproc 65535
* hard noroc 65535
* soft memlock unlimited
* hard memlock unlimited
Check whether it takes effect
ulimit -a
The main ones to focus on are open files etc.
2.10. Machine restart
Restart the machine for all the above adjustments to take effect.
3. Basic component deployment
3.1. JDK
Reference document: https://blog.csdn.net/u012443641/article/details/126147592
3.2. CDH environment variables
vim /etc/profile
# hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
#hadoop 安装目录
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_HOME=$HADOOP_HOME/../hive
export HADOOP_HDFS_HOME=$HADOOP_HOME/../hadoop-hdfs
export HADOOP_MAPRED_HOME=$HADOOP_HOME/../hadoop-mapreduce
export HADOOP_YARN_HOME=$HADOOP_HOME/../hadoop-yarn
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
3.3. mysql
Reference document: https://blog.csdn.net/u012443641/article/details/126147592
Just find a node and install the mysql service.
Since this CDH version is relatively old and does not support mysql8, just install the latest version of mysql-5.7.x.
4. CDH deployment
4.1. Mysql database creation
Create the metadata database and users of CDH, and the database and users of the amon service.
Log in to mysql as the root user and execute the following command
-- Create the required database
create database cmf DEFAULT CHARACTER SET utf8;
create database amon DEFAULT CHARACTER SET utf8;
-- Create a user and allow remote login
create user 'cmf'@'%' IDENTIFIED WITH mysql_native_password by 'cmf';
create user ' amon'@'%' IDENTIFIED WITH mysql_native_password by 'amon';
--
grant ALL PRIVILEGES ON cmf.* to 'cmf'@'%';
grant ALL PRIVILEGES ON amon.* to 'amon'@'%';
flush privileges;
4.2. mysql driver package
On the hadoop01 node, which is the node where mysql is installed, put the mysql java driver package.
mkdir -p /usr/share/java/
Copy the mysql java driver package to the above directory, rename the file, and remove the version number
cp mysql-connector-java-5.1.48.jar /usr/share/java/mysql-connector-java.jar
4.3. Offline deployment of cm service
4.3.1. Upload rpm package
Create a directory to save rpm packages
mkdir /opt/cloudera-manager
server 节点上传
cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
agent 节点上传
cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
4.3.2. 安装cm server
选择 hadoop01 为cm server,然后执行以下命令:
yum -y localinstall cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
yum -y localinstall cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
4.3.3. 安装cm agent
所有节点都要安装 agent。
注意:server 节点,上面已经安装过了 daemons,这儿不需要重新安装了。
yum -y localinstall cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
yum -y localinstall cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
4.3.4. agent配置修改
所有节点修改 agent 的配置,指向 server 的节点 hadoopa01。
一定要注意下面的 hadoop01 主机名,要修改为你的 server 节点所在的主机名。
sed -i "s/server_host=localhost/server_host=hadoop01/g" /etc/cloudera-scm-agent/config.ini
The host name of the node. If you don’t notice it when you run it for the first time, you can directly enter the corresponding configuration file. to modify.
4.3.5. The master node modifies the server configuration
vim /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=hadoop01
com.cloudera.cmf.db.name=cmf
com.cloudera.cmf.db.user=cmf
com.cloudera.cmf.db.password=cmf
com.cloudera.cmf.db.setupType=EXTERNAL
4.4. Deploy offline parcel source
Deploy the offline parcel source on the hadoop01 node.
4.4.1. Install httpd
yum install -y httpd
4.4.2. Deploy offline parcel source
mkdir -p /var/www/html/cdh6_parcel
cp CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel /var/www/html/cdh6_parcel
cp CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1 /var/www/html/cdh6_parcel/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
cp manifest.json /var/www/html/cdh6_parcel
When moving the sha1 name file, you need to rename it and remove the last 1. Otherwise, during the deployment process, CM will think that the above file download is incomplete and will download it again.
4.4.3. Start httpd
systemctl start httpd
and then check whether it can be accessed through this link: http://hadoop01/cdh6_parcel.
4.5. Start Server
Run the following command on the hadoop01 node, which is the node where the cm server service is installed.
systemctl enable cloudera-scm-server
systemctl start cloudera-scm-server
View log
cd /var/log/cloudera-scm-server
tail -f cloudera-scm-server.log
For the Alibaba Cloud web interface, the hadoop01 node firewall needs to be set to open port 7180.
Wait for 1 minute, open the http://hadoop01:7180 page, account password: admin/admin, do not rush to log in, log in after starting all agents.
If it cannot be opened, check the server log and troubleshoot the error carefully.
4.6. Start Agent
To be executed on all nodes:
systemctl enable cloudera-scm-agent
systemctl start cloudera-scm-agent
4.7. Page operation deployment components
http://hadoop01:7180/
Account password: admin/admin
Welcome to Cloudera Manager, End User License Terms and Conditions, check.
Welcome to Cloudera Manager, which version do you want to deploy? Choose Cloudera Express free version.
Add a cluster in CM
Modify cluster name
This name will be displayed on the main page
Add hosts to be managed
If some hosts are not displayed here, you need to check whether the agent service is started successfully on the corresponding host.
Select repository
Select more options, we use httpd to install
If the configuration in the picture above is correct, after a while, the page will look like this
Check CDH-6.3.2-1.cdh6.3.2.p0.1605554 and continue the installation.
Install Parcels
Just wait on this page until everything is successful.
Then check the network and hosts
Then display the checker results to view the detected errors and fix them.
Fix the above error:
1. Psycopg2
Psycopg2 This error is related to SQL writing calls such as pig and hue. If the cluster uses these, it can be repaired. Note that this needs to be executed on each machine.
1.Installation services
yum install postgresql-server
pip install psycopg2==2.7.5 --ignore-installed
echo 'LC_ALL="en_US.UTF-8"' >> /etc/locale.conf
su -l postgres -c "postgresql-setup initdb"
2. Modify configuration 1
vim /var/lib/pgsql/data/pg_hba.conf
3. Modify configuration 2.
The configuration of small cluster (less than 1000 nodes) is as follows. For large cluster (more than 1000 nodes), please refer to the official website: https://docs.cloudera.com/documentation/enterprise/6/latest/topics/cm_ig_extrnl_pstgrs.html# id_inq_bgy_52b
vim /var/lib/pgsql/data/postgresql.conf
max_connection = 100
shared_buffers = 256MB
wal_buffers = 8MB
checkpoint_segments = 16
checkpoint_completion_target = 0.9
4.重启
systemctl enable postgresql
systemctl restart postgresql
5.
Create a database according to the official website as needed.
Select service
Just select the component services that need to be installed according to your own cluster needs.
Custom role assignment
Just distribute it as needed.
Database settings
The database used here is amon.
Review changes
If the machine has multiple disks, you need to change the DataNode data directory, configure multiple directories, separate them with commas, and configure the others as needed.
Pay attention to the configured directory, which requires sufficient disk space.
first run
Just wait until all startups are successful. If there are any failures, check the specific logs and troubleshoot.
Started successfully
Fix warnings etc.
Click on each warning and make modifications according to the prompts.
4.8. Add new node
All the following operations can be performed on the new node.
4.8.1. Basic adjustment
Perform all the steps in the previous "Machine Basic Adjustment" and deploy the JDK.
4.8.2. Upload rpm package
Create a directory to save rpm packages
mkdir /opt/cloudera-manager
上传
cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
4.8.3. Installation service
Install CDH's daemon and agent services.
yum -y localinstall cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
yum -y localinstall cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
4.8.4. Modify configuration
Modify the agent configuration to point to the server node hadoopa01
Be sure to pay attention to the hadoop01 host name below and change it to the host name of your server node. If you didn't notice it the first time you ran it, you can directly enter the corresponding configuration file to modify it.
sed -i "s/server_host=localhost/server_host=hadoop01/g" /etc/cloudera-scm-agent/config.ini
4.8.5. Start Agent
systemctl stop cloudera-scm-agent
systemctl enable cloudera-scm-agent
systemctl start cloudera-scm-agent
systemctl status cloudera-scm-agent
4.8.6. Verification
Go to the CDH interface to observe the host list
It is detected that the host is running normally. It will take some time, just wait.
4.8.7. Join the cluster
Add the new host to the cluster.
Then you can deploy the component service on the new host.
4.8.8. Add new component role
New component roles for all machines are all operated on the CDH page. The main new roles are:
- Various component gateways have been included in the above step of adding a host, and you can choose the configured template. Host templates can also be configured in advance.
- Main startup roles: DataNode, NodeManager. Note that after adding a role, you need to switch to the specific role page to check whether the component services are started correctly to prevent some services from not starting.
4.8.8.1. Special attention
After adding the yarn role, you need to add a role group corresponding to the new node.
The configurations that need to be modified include: the memory and CPU allocated by the node to the container, and yarn's usage restrictions on the node CPU.
Release Statement
Original link: https://blog.csdn.net/u012443641/article/details/131330513