Installation of various component services of CDH6.3.2

CDH

Overview

CDH refers to Cloudera’s Distribution Including Apache Hadoop (Cloudera distribution including Apache Hadoop), which is a set of big data solutions provided by Cloudera.

CDH is built on the Apache Hadoop ecosystem, which includes Hadoop core components (such as HDFS, YARN and MapReduce) and other related open source technologies (such as Hive, HBase, Spark, Impala, etc.). By integrating these components, Cloudera provides enterprises with a stable, reliable, and scalable data processing platform.

In CDH, Cloudera Manager is a key component used to manage and monitor the entire cluster. It provides an easy-to-use web interface for cluster configuration, software installation, performance monitoring, and troubleshooting.

CDH documentation

CDH download

Architecture

Hadoop core components:

Hadoop Distributed File System (HDFS):用于存储和管理大规模数据集的分布式文件系统。

Yet Another Resource Negotiator (YARN):用于分配和管理集群资源以运行各种应用程序。

MapReduce:一种分布式数据处理框架,用于在集群上执行大规模数据处理任务。

Data storage and processing components:

Hive:一个基于Hadoop的数据仓库基础架构,提供了类似于SQL的查询语言,方便进行数据分析和处理。

HBase:一个分布式的、面向列的NoSQL数据库,适用于高度可扩展的实时读写。

Spark:一个快速的、通用的大数据处理引擎,支持批处理、交互式查询和流式处理。

Impala:一个高性能的SQL查询引擎,可在Hadoop上实时查询存储在HDFS和HBase中的数据。

Solr:一个开源的、高性能的搜索平台,用于构建实时搜索和大规模分析应用。

Sqoop:用于在Hadoop和关系型数据库之间进行数据传输的工具。

Data integration and stream processing components:

Kafka:一个高吞吐量的、分布式的流处理平台,用于处理实时数据流。

Flume:一个用于高效、可靠地从多个数据源采集、聚合和移动数据的分布式系统。

Security and management components:

Cloudera Manager:用于集群的配置、部署、监控和管理的全面管理平台。

Apache Sentry:提供细粒度的访问控制和权限管理,以保护敏感数据。

Apache Knox:提供了一个单一的访问点和API网关,用于安全地访问和管理Hadoop集群。

Create database

Create the database required by each component

CREATE DATABASE hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

CREATE DATABASE oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

CREATE DATABASE hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

Install Kafka service

In CDH, the installation and configuration methods of each component service are similar. Here we take the installation of Kafka component service as an example.

Add service

On the homepage, click Add Service
Insert image description here
Enter the service list and select the Kafka service
Insert image description here
Select three machines for Kafka Broker
Insert image description here

Configuration

Adjust the heap memory size in the audit changes, and leave other configurations as default.
Insert image description here

Waiting for installation

Insert image description here

Kafka command usage

Create Kafka Topic

/opt/cloudera/parcels/CDH/bin/kafka-topics --bootstrap-server node03:9092,node04:9092,node05:9092  --create --replication-factor 1 --partitions 1 --topic test

kafka-topics --bootstrap-server node03:9092,node04:9092,node05:9092  --create --replication-factor 1 --partitions 1 --topic test

View Kafka Topic

/opt/cloudera/parcels/CDH/bin/kafka-topics --zookeeper node03:2181 --list

kafka-topics --zookeeper node03:2181 --list

Delete Kafka Topic

/opt/cloudera/parcels/CDH/bin/kafka-topics --delete --bootstrap-server node03:9092,node04:9092,node05:9092 --topic test

kafka-topics --delete --bootstrap-server node03:9092,node04:9092,node05:9092 --topic test

Installation of other components

In CDH, the installation of each component service is similar. Refer to the above Kafka installation to install the following common components.

Install Flume service

Choose to add Flume service
Insert image description here
Select which services Flume depends on
Insert image description here
Assign the node where Flume Agent is located
Insert image description here
Insert image description here

Install Hive service

Choose to add Hive service
Insert image description here
Add Hive service to Cluster
Insert image description here
Configure hive metadata

Exception occurred:
Insert image description here
Solution: Copymysql-connector-java.jar and distribute it to the /usr/share/java/ directory of each node

[root@node01 ~]# ./sync.sh /usr/share/java/mysql-connector-java.jar

Retest
Insert image description here
Use default configuration
Insert image description here
Automatically start the Hive process after installation
Insert image description here

Note: After installing Spark, configure Hive On Spark, and then restart Hive

Insert image description here

Install Spark service

Add Spark service
Insert image description here
CDH6.x comes with spark2.4 and does not need to be upgraded

Assign nodes
Insert image description here
Just select the default for all cluster settings
Insert image description here
Waiting for installation
Insert image description here
Click to restart
Insert image description here
Insert image description here
After installing Spark, configure Hive On Spark, and then restart Hive
Insert image description here

Install OOZIE service

Choose to add OOZIE service
Insert image description here
Allocate nodes
Insert image description here
Insert image description here
Configure Oozie metadata
Insert image description here
Use default configuration< a i=4> Wait for installation and start oozie
Insert image description here

Insert image description here

Install HUE service

Choose to add Hue service
Insert image description here
Allocate nodes
Insert image description here
Insert image description here
Configure hue metadata
Insert image description here
Wait for installation and automatically start the hue process< /span>
Insert image description here

Install Flink service

Flink service is also a common service in the field of big data. However, CDH6.3.2 version does not include Flink service, so Flink needs to be compiled manually.

Download relevant configuration package

Check the version information of each component of CDH. The Flink version matching Hive2.1.1 is flink-1.13.6

Insert image description here
Download the Flink installation package

wget https://archive.apache.org/dist/flink/flink-1.13.6/flink-1.13.6-bin-scala_2.11.tgz

Download the Flink source code package

wget https://archive.apache.org/dist/flink/flink-1.13.6/flink-1.13.6-src.tgz

Install Maven

Download Maven

wget https://archive.apache.org/dist/maven/maven-3/3.8.8/binaries/apache-maven-3.8.8-bin.tar.gz

Reference:Maven installation and configuration

Flink’s CDH version compilation configuration

Unzip the Flink source package

tar -axvf flink-1.13.6-bin-scala_2.11.tgz

tar -axvf flink-1.13.6-src.tgz
mv flink-1.13.6 flink-src

flink’s pom.xml file

<flink.hadoop.version>3.0.0-cdh6.3.2</flink.hadoop.version>

<hive.version>2.1.1-cdh6.3.2</hive.version>

Add the following content in<repositories> tag

<repositories> 
	<repository> 
		<id>cloudera</id> 
		<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
	</repository> 
	<repository> 
		<id>confluent-repo</id>
		<url>https://packages.confluent.io/maven/</url>
	</repository> 
</repositories>

Revision vim /root/flink-src/flink-connectors/flink-sql-connector-hive-2.3.9/pom.xmlText item

	<groupId>org.apache.hive</groupId>
			<artifactId>hive-exec</artifactId>
			<version>2.1.1-cdh6.3.2</version>

Compile Flink

mvn clean install -DskipTests -Dfast -Drat.skip=true -Dhaoop.version=3.0.0-cdh6.3.2 -Dinclude-hadoop -Dscala-2.11 -T10C

Copy the successfully compiled flink-sql-connector-hive to the lib directory of flink

 [root@node01 ~]#cp flink-src/flink-connectors/flink-sql-connector-hive-2.2.0/target/flink-sql-connector-hive-2.2.0_2.11-1.13.6.jar /usr/local/flink/lib/

 # 拷贝hive-exec-2.1.1-cdh6.3.2.jar、libfb303-0.9.3.jar 
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hive-exec-2.1.1-cdh6.3.2.jar /usr/local/flink/lib/ 
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/libfb303-0.9.3.jar /usr/local/flink/lib/ 

Copy related hadoop packages

[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hadoop-common-3.0.0-cdh6.3.2.jar /usr/local/flink/lib/ 
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar /usr/local/flink/lib/  
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar /usr/local/flink/lib/ 
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-hs-3.0.0-cdh6.3.2.jar /usr/local/flink/lib/ 
[root@node01 ~]#cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar /usr/local/flink/lib/ 

Produce Flink’s parcel package and csd file

1. Compress the complete Flink package of lib including the copied jar.

[root@node01 ~]# cd /usr/local
[root@node01 local]#tar -zcvf flink-1.13.6-cdh6.3.2.tgz flink

2. Download the production script

[root@node01 local]# yum install git 
[root@node01 local]# git clone https://github.com/YUjichang/flink-parcel.git

加速地址: git clone https://gitclone.com/github.com/YUjichang/flink-parcel.git 

3. Modify script configuration

[root@node01 local]# cd flink-parcel/
[root@node01 flink-parcel]# vim flink-parcel.properties 
#FLINk存放目录地址 
FLINK_URL= /usr/local/flink-1.13.6-cdh6.3.2.tgz 
#flink版本号 
FLINK_VERSION=1.13.6 
#扩展版本号 
EXTENS_VERSION=CDH6.3.2 
#操作系统版本,以centos为例 
OS_VERSION=7 
#CDH 小版本 
CDH_MIN_FULL=6.0 
#CDH大版本
CDH_MAX_FULL=6.4
CDH_MIN=5 
CDH_MAX=6 

4. Run the build.sh script to start building parcel and csd

[root@node01 flink-parcel]# ./build.sh parcel 
[root@node01 flink-parcel]# ./build.sh csd 

5. After compilation is completed, the generated Flink parcel and csd files

FLINK_ON_YARN-1.13.6.jar 

FLINK-1.13.6-CDH6.3.2-el7.parcel
FLINK-1.13.6-CDH6.3.2-el7.parcel.sha 
manifest.json

6. Add Flink service to CM

cp FLINK-1.13.6-CDH6.3.2-el7.parcel /opt/cloudera/parcel-repo/ 

cp FLINK-1.13.6-CDH6.3.2-el7.parcel.sha /opt/cloudera/parcel-repo/

Add Flink service directly to CM

1. Copy the image package to cloudera’s parcel-repo


root@node01 local]# tar -zxvf flink-1.13.6-cdh6.3.2_parcel.tar.gz
[root@node01 local]# cd flink-1.13.6-cdh6.3.2/
[root@node01 flink-1.13.6-cdh6.3.2]# ll
total 377276
-rwxrwxrwx 1 root root 386296010 Aug 30 11:51 FLINK-1.13.6-CDH6.3.2-el7.parcel
-rwxrwxrwx 1 root root        40 Aug 30 11:51 FLINK-1.13.6-CDH6.3.2-el7.parcel.sha
-rwxrwxrwx 1 root root     21123 Aug 30 11:51 FLINK_ON_YARN-1.13.6.jar
-rwxrwxrwx 1 root root       841 Aug 30 11:52 manifest.json
[root@node01 flink-1.13.6-cdh6.3.2]# cp FLINK-1.13.6-CDH6.3.2-el7.parcel /opt/cloudera/parcel-repo/ 
[root@node01 flink-1.13.6-cdh6.3.2]# cp FLINK-1.13.6-CDH6.3.2-el7.parcel.sha /opt/cloudera/parcel-repo/

Add Flink.parcel-related loading dependency configuration to the manifest.json file in cloudera's parcel-repo

vim /opt/cloudera/parcel-repo/manifest.json
[
  {
    
    
    "Components": [
      {
    
    
        "Pkg_version": "",
        "Version": "6",
        "Name": "",
        "Pkg_release": ""
      }
    ],
    "Hash": "",
    "Parcelname": "",
    "Replaces": ""
  },
  {
    
    
    "Components": [
      {
    
    
        "Pkg_version": "flink1.13.6",
        "Version": "flink1.13.6",
        "Name": "flink",
        "Pkg_release": "cdh6.3.2"
      }
    ],
    "Hash": "4e1a65e353d2e36c7e9d12a912eb8516a7f486f5",
    "Parcelname": "flink-1.13.6-cdh6.3.2-el7.Parcel",
    "Replaces": "flink"
  }
]

3. Copy FLINK_ON_YARN to cloudera’s csd

[root@node01 software]# cd flink-1.13.6-cdh6.3.2/
[root@node01 flink-1.13.6-cdh6.3.2]# cp FLINK_ON_YARN-1.13.6.jar /opt/cloudera/csd/ 
[root@node01 flink-1.13.6-cdh6.3.2]# systemctl restart cloudera-scm-server 

4. After restarting, allocate and activate on the CM page
Insert image description here
Insert image description here
Insert image description here
Insert image description here

5. Add Flink service configuration and cluster planning
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Restart
Insert image description here

Verify Flink service

1. Use yarn-per-job mode to run wordcount test flink_on_yarn

[root@node01 ~]# chmod 777 /opt/cloudera/parcels/FLINK/bin/flink
[root@node01 ~]# sudo -u hdfs  /opt/cloudera/parcels/FLINK/bin/flink run -t yarn-per-job /opt/cloudera/parcels/FLINK/lib/flink/examples/batch/WordCount.jar
Printing result to stdout. Use --output to specify output path.
2023-08-16 10:37:33,318 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/etc/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2023-08-16 10:37:33,589 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2023-08-16 10:37:33,747 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2023-08-16 10:37:33,747 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2023-08-16 10:37:33,801 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{
    
    masterMemoryMB=2048, taskManagerMemoryMB=2048, slotsPerTaskManager=1}
2023-08-16 10:37:36,652 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1692149627327_0002
2023-08-16 10:37:36,893 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1692149627327_0002
2023-08-16 10:37:36,894 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2023-08-16 10:37:36,895 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2023-08-16 10:37:43,446 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2023-08-16 10:37:43,447 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface node05:8080 of application 'application_1692149627327_0002'.
Job has been submitted with JobID c0b4b89406f6fee0a4c3f6b95bb0ee67
Program execution finished
Job with JobID c0b4b89406f6fee0a4c3f6b95bb0ee67 has finished.
Job Runtime: 12175 ms
Accumulator Results:
- 7611dc575cfdcecc9d3528d9326c6aba (java.util.ArrayList) [170 elements]


(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)
(bare,1)
(be,4)
(bear,3)
(bodkin,1)
(bourn,1)

2. Browser access:http://IP:8088/clusterView execution

Insert image description here
3. Verify Hive_FlinkSQL

[root@node01 ~]# /opt/cloudera/parcels/FLINK/bin/flink-sql-client

Insert image description here

Guess you like

Origin blog.csdn.net/qq_38628046/article/details/132308920