CDH Data Warehouse Project (3) —— Kerberos Security Authentication and Sentry Authority Management

0 Description

This article is based on " CDH Data Warehouse Project (1) - CDH Installation, Deployment and Construction Detailed Process" and "CDH Data Warehouse Project (2) - User Behavior Data Warehouse and Business Data Warehouse Construction" and build CDH data warehouse. This chapter mainly introduces Kerberos authentication and Sentry authority management based on CDH data warehouse

1 Kerberos security authentication

1.1 Overview of Kerberos

Kerberos is a computer network authorization protocol used to authenticate personal communications in a secure manner in a non-secure network. The software design adopts the client/server structure, and can perform mutual authentication, that is, both the client and the server can authenticate each other. It can be used to prevent eavesdropping, prevent replay attacks, protect data integrity, etc. It is a system that uses a symmetric key system for key management.

1.2 Kerberos concept

There are some concepts in Kerberos that need to be understood:
1) KDC: Key Distribution Center, which is responsible for managing the issuing of tickets and recording authorization.
2) Realm: The identity of the Kerberos management realm.
3) Principal: When adding a user or service, it is necessary to add a principal to kdc. The form of principal is: main name/instance name@domain name.
4) Master name: The master name can be a user name or a service name, indicating that it is the principal used to provide various network services (such as hdfs, yarn, hive).
5) Instance name: The instance name is simply understood as the host name.

1.3 Principles of Kerberos authentication

insert image description here

2 Kerberos installation

2.1 Server node installs kerberos related software

The server node only needs to install one node, here it is installed on the chen102 node

yum install -y krb5-server krb5-workstation krb5-libs

View installation results

rpm -qa | grep krb5

insert image description here

2.2 client node installation

The client node can install multiple

yum install -y krb5-workstation krb5-libs

insert image description here

2.3 Configure Kerberos

There are two files that need to be configured: kdc.conf and krb5.conf. The kdc configuration only requires the Server service node configuration, namely chen102.
1) kdc configuration

vim /var/kerberos/krb5kdc/kdc.conf

insert image description here
illustrate:

CHEN.COM: Realm name, Kerberos supports multiple realms, generally all capitalized.
acl_file: admin user rights.
admin_keytab: The keytab for KDC verification.
supported_enctypes: Supported verification methods, pay attention to remove aes256-cts, JAVA uses aes256-cts verification method needs to install additional jar package, all not used here

2) krb5 file configuration

vim /etc/krb5.conf

insert image description here

The modified content is synchronized to other nodes in the cluster

default_realm: The default realm, which sets the default realm of the Kerberos application and must be consistent with the name of the realm to be configured.
ticket_lifetime: Indicates the time limit for the voucher to take effect, usually 24 hours.
renew_lifetime : Indicates the maximum time limit for the certificate to be extended, usually one week. Subsequent access to securely authenticated services will fail when the credentials expire.
udp_preference_limit= 1: Disable the use of udp, which can prevent a bug in Hadoop.
realms: configure the realm used, if there are multiple realms, just add other statements to the [realms] section.
domain_realm: The mapping relationship between the cluster domain name and the Kerberos realm. In the case of a single realm, it can be ignored.

2.4 Generate Kerberos database

Execute on the server node

 kdb5_util create -s

After the creation is complete, the corresponding files will be generated in the /var/kerberos/krb5kdc directory

ls /var/kerberos/krb5kdc/

insert image description here

2.5 Grant Kerberos administrators all privileges

vim /var/kerberos/krb5kdc/kadm5.acl
#修改为以下内容:
*/[email protected]      *

Explanation:
*/admin: all principals of admin instance
@HADOOP.COM: realm
*: all permissions
The meaning of this authorization is to grant all principals of admin instance all permissions corresponding to CHEN.COM realm. That is, if the instance is admin when creating the Kerberos principal, it has all the permissions of the CHEN.COM domain. For example, creating the following principal user1/admin has all the permissions of the CHEN.COM domain.

2.6 Start Kerberos service (executed by server node)

start krb5kdc

systemctl start krb5kdc

start kadmin

 systemctl start kadmin

insert image description here

Set up autostart

systemctl enable krb5kdc
systemctl enable kadmin

Check if it is set to boot automatically

systemctl is-enabled krb5kdc
systemctl is-enabled kadmin

Note: When the startup fails, you can check it through /var/log/krb5kdc.log and /var/log/kadmind.log.
insert image description here

2.7 Create administrator principal/instance

kadmin.local -q "addprinc admin/admin"

insert image description here

2.8 kinit administrator verification

kinit admin/admin
klist

insert image description here
Other nodes try
insert image description here

2.9 Kerberos database operation

2.9.1 Log in to the kerberos database

1) Log in locally (no authentication required)

kadmin.local

insert image description here

2) Remote login (subject authentication is required, first authenticate the administrator subject just created)

kadmin

insert image description here

2.9.2 Create kerberos principal

kadmin.local -q "addprinc test/test"

2.9.3 Modify the subject password

kadmin.local -q "cpw test/test"

insert image description here

2.9.4 View all principals

kadmin.local -q "list_principals"

insert image description here

2.10 Kerberos Principal Authentication

Kerberos provides two authentication methods, one is through password authentication, and the other is through keytab key file authentication, but the two methods cannot be used at the same time

2.10.1 Password authentication

kinit test/test
klist

insert image description here

2.10.2 keytabl key file authentication

1) Generate the keytab file of the subject admin/admin to the specified directory /root/admin.keytab

kadmin.local -q "xst -k /root/test.keytab test/[email protected]"

2) Use keytab for authentication

kinit -kt /root/test.keytab test/test

View the subject name contained in keytab

klist -ekt /root/test.keytab

3) View the authentication certificate

klist

insert image description here
At this time, you can no longer log in after passing the password authentication.

2.11 Destruction certificate

kdestroy
klist

3 CDH install Kerberos

3.1 CDH enables Kerberos security authentication

Create admin principal/instance for CM

kadmin.local -q "addprinc cloudera-scm/admin"

insert image description here

3.2 CDH page starts kerberos

insert image description here

3.3 Environmental Confirmation

insert image description here

3.4 Fill in the configuration

Kerberos encryption type: aes128-cts, des3-hmac-sha1, arcfour-hmac
insert image description here

3.5 Fill in the subject name and password

insert image description here

3.6 Waiting to import kdc

insert image description here

3.7 Waiting to restart the cluster

insert image description here

3.8 View subject

insert image description here

4 Kerberos security environment practice

After Kerberos is enabled, the communication between the system and the system (flume-kafka), and the communication between the user and the system (user-hdfs) need to be authenticated first, and the communication can only be carried out after the authentication is passed.
Therefore, after Kerberos is enabled, the scripts used in the data warehouse, etc., need to add a step of security authentication to work normally.

4.1 User access service authentication

After Kerberos security authentication is enabled, daily access services (such as accessing HDFS, consuming Kafka topics, etc.) need to perform security authentication first
1) Create a user principal/instance in the Kerberos database

kadmin.local -q "addprinc hive/[email protected]"

2) Perform user authentication

kinit hive/[email protected]

3) Access HDFS

hadoop fs -ls /

insert image description here

4) hive query

hive

insert image description here
5) Consume Kafka topic
(1) Modify Kafka configuration
① Search for "security.inter.broker.protocol" in the configuration item of Kafka, and set it to SALS_PLAINTEXT.
② Search for "ssl.client.auth" in the Kafka configuration item and set it to none.
(2) Create the jaas.conf file

vim /var/lib/hive/jaas.conf
KafkaClient {
    
    
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true;
};

(3) Create a consumer.properties file

vim /etc/kafka/conf/consumer.properties

The content of the file is as follows

security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka

(4) Declare the jass.conf file path

export KAFKA_OPTS="-Djava.security.auth.login.config=/var/lib/hive/jaas.conf"

(5) Use kafka-console-consumer to consume Kafka topic data

kafka-console-consumer --bootstrap-server chen102:9092 --topic topic_start --from-beginning --consumer.config /etc/kafka/conf/consumer.properties

Can consume data normally
insert image description here

4.2 windows webui browser authentication

After we set CDH to support kerberos, the situation shown in the figure below will appear:
you can log in to 9870, but you cannot view directories and files. This is because our local environment has not passed the authentication.
Next we set up local authentication.
(1) Download Firefox
(2) Set up the browser
① Open the Firefox browser, enter: about:config in the address bar, and enter the setting page.
insert image description here
② Search for "network.negotiate-auth.trusted-uris", modify the value to your own server host name
insert image description here
③ Search for "network.auth.use-sspi", double-click to change the value to false
insert image description here
(3) Install kfw
① Install the provided kfw -4.1-amd64.msi.
② Copy the contents of the /etc/krb5.conf file of the cluster to C:\ProgramData\MIT\Kerberos5\krb.ini, and delete the configuration related to the path.

链接:https://pan.baidu.com/s/1sMmqTbVcVhNQubjQR5CrCQ 
提取码:amo6

document content:

[logging]

 [libdefaults]
  default_realm = CHEN.COM
  dns_lookup_realm = false
  dns_lookup_kdc = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  forwardable = true
  udp_preference_limit = 1

[realms]
 CHEN.COM = {
    
    
  kdc = chen102  
  admin_server = chen102
 }

[domain_realm]

③ Open MIT, enter the subject name and password:

insert image description here
④ Test
insert image description here

4.3 User Behavior Data Warehouse

4.3.1 Log collection Flume configuration

The log is collected by Flume, and the data is sent to Kafka. The Flume is equivalent to a Kafka producer. Therefore, we need to perform the security authentication of the above Kafka client. But here we do not need to manually configure, after opening Kerberos, CM will automatically configure.

4.3.2 Consume Kafka Flume configuration

Consume Kafka Flume and transfer data from Kafka to HDFS. This Flume is equivalent to a Kafka consumer. Therefore, we also need to perform the security authentication of the Kafka client above (no manual authentication is required, CM will automatically configure it). In addition, we also need to perform security authentication of the HDFS client, which requires manual configuration.
(1) Generate the keytab file of the hive user.
There are two methods of user authentication: "enter the password" and "use the keytab key file". Here, the keytab key file is required for authentication.

kadmin.local -q "xst -k /var/lib/hive/hive.keytab hive/[email protected]"

(2) Increase read permission
chmod +r /var/lib/hive/hive.keytab
(3) Distribute keytab files
(4) Modify flume agent configuration files

## 组件
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2

## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = chen102:9092,chen103:9092,chen104:9092
a1.sources.r1.kafka.topics=topic_start

## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers = chen102:9092,chen103:9092,chen104:9092
a1.sources.r2.kafka.topics=topic_event

## channel1
a1.channels.c1.type=memory
a1.channels.c1.capacity=100000
a1.channels.c1.transactionCapacity=10000

## channel2
a1.channels.c2.type=memory
a1.channels.c2.capacity=100000
a1.channels.c2.transactionCapacity=10000

## sink1
a1.sinks.k1.type = hdfs
#a1.sinks.k1.hdfs.proxyUser=hive
a1.sinks.k1.hdfs.kerberosPrincipal=hive/[email protected]
a1.sinks.k1.hdfs.kerberosKeytab=/var/lib/hive/hive.keytab
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

##sink2
a1.sinks.k2.type = hdfs
#a1.sinks.k2.hdfs.proxyUser=hive
a1.sinks.k2.hdfs.kerberosPrincipal=hive/[email protected]
a1.sinks.k2.hdfs.kerberosKeytab=/var/lib/hive/hive.keytab
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 10
a1.sinks.k2.hdfs.roundUnit = second

## 不要产生大量小文件
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0

## 控制输出文件是原生文件。
a1.sinks.k1.hdfs.fileType = CompressedStream 
a1.sinks.k2.hdfs.fileType = CompressedStream 

a1.sinks.k1.hdfs.codeC = lzop
a1.sinks.k2.hdfs.codeC = lzop

## 拼装
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1

a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2

insert image description here
insert image description here
After the configuration is successful, restart Flume and execute the acquisition script to verify whether the acquisition can be performed normally. Check hdfs and you can see the following page, indicating that the collection can be done normally
insert image description here

4.3.3 ods layer

Add the following content to ods_log.sh:

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

insert image description here

4.3.4 dwd layer

dwd_start_log.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.3.5 dws layer

dws_log.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.3.6 ads layer

ads_uv_log.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.4 Business Data Warehouse

Add kerberos authentication to the business data warehouse respectively

4.4.1 sqoop import

sqoop_import.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive

4.4.2 ods layer

ods_db.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.4.3 dwd layer

dwd_db.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.4.4 dws layer

dws_db_wide.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.4.5 ads layer

ads_db_gmv.sh

kinit -kt /var/lib/hive/hive.keytab hive/hive
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]" -n hive -e "$sql"

4.4.6 sqoop export

kinit -kt /var/lib/hive/hive.keytab hive/hive

4.5 Test run

4.5.1 hdfs new directory

hadoop fs -mkdir /user/hive/bin_kerberos

4.5.2 Upload the latest script to the hdfs directory

4.5.3 Create data

insert image description here

4.5.4 Hue page new scheduling task

insert image description here

4.5.6 View execution results

5 Sentry

5.1 sentry overview

The cdh version of Hadoop usually adopts the structure of Kerberos+Sentry in the processing of data security. Kerberos is mainly responsible for user authentication of platform users, while sentry is responsible for data authority management.

5.2 What is Sentry

Apache Sentry is a Hadoop open source component released by Cloudera, which provides fine-grained, role-based authorization and multi-tenant management mode.
Sentry provides data control and enforcement of granular levels of permissions for authenticated users and applications on a Hadoop cluster. Sentry can currently be used with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (only for Hive table data).
Sentry is designed to be a pluggable authorization engine for Hadoop components. It allows custom authorization rules to authenticate user or application access requests to Hadoop resources. Sentry is highly modular and can support the authorization of various data models in Hadoop

insert image description here

5.3 Sentry installation and deployment

5.3.1 Add Sentry service

insert image description here

5.3.2 Custom sentry role

insert image description here

5.3.3 Configure database connection

insert image description here

5.3.4 Complete sentry service addition

insert image description here

5.4 Sentry integrates with Hive

5.4.1 Modify configuration parameters

(1) Cancel the HiveServer2 user simulation
Search for "HiveServer2 enable simulation" in the hive configuration item and uncheck it
insert image description here
(2) Ensure that hive users can submit MR tasks
Search for "allowed system users" in the yarn configuration item and ensure that "hive" is included .
insert image description here
(3) Search for "Enable storage notifications in the database" in the Hive configuration item and check it.
insert image description here

(4) Search for "Sentry" in the Hive configuration item, and check Sentry.
insert image description here

5.5 sentry and impala configuration

Search for "Sentry" in the Impala configuration item and select it.
insert image description here

5.6 HDFS configuration sentry

1) Search for "Enable Access Control List" in the HDFS configuration item and check it.
insert image description here

2) Search for "Enable Sentry Synchronization" in the HDFS configuration item, and make the modification as shown in the figure below.
insert image description here

5.7 sentry and hue

1) Configure HUE to support Sentry
Search for "Sentry" in the HUE configuration item and check Sentry.
insert image description here

6 sentry authority authentication test

6.1 Operation via HUE

1) View the administrator group in Sentry rights management.
Search for "Administrator Group" in Sentry's configuration items, including hive and impala. Only when a user belongs to a group in it, can other users be granted permissions.
insert image description here
2) Create two users reader and writer on all nodes of the Hive cluster to prepare for the permission test.

useradd reader
passwd reader
useradd writer
passwd writer

3) Use the hive user to log in to HUE, create two user groups reader and writer, and create two users reader and writer under the two user groups to prepare for the authority test.
insert image description here

4) Sentry work interface (HUE users need to be granted permission to access Sentry)

5) Click the Roles button and click the Add button
insert image description here

6) Edit Role
admin_role (first add administrator privileges for hive users)
insert image description here
reader_role
insert image description here

writer_role
insert image description here

7) Permission test
(1) Use reader and writer users to log in to HUE respectively
(2) Query any table in the gmall database, and find that only reader users can find out, but writer cannot. Indicates that permission control takes effect.
The reader user has the query permission.
insert image description here
The reader user does not have the data insertion permission.
insert image description here
The writer user does not have the query permission.
insert image description here

The writer user has insert permission
insert image description here

6.2 Command line operation

1) Create two users reader_cmd and writer_cmd on all nodes of the Hive cluster

useradd reader_cmd
passwd reader_cmd
useradd writer_cmd
passwd writer_cmd

2) Use the Sentry administrator user hive to connect to HiveServer2 through the beeline client

kinit -kt /var/lib/hive/hive.keytab hive/[email protected]
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]"

insert image description here

① 创建Role(reader_role_cmd,writer_role_cmd)

create role reader_role_cmd;
create role writer_role_cmd;

insert image description here
② Assign privilege to role

GRANT select ON DATABASE gmall TO ROLE reader_role_cmd;
GRANT insert ON DATABASE gmall TO ROLE writer_role_cmd;

insert image description here

③ Grant the role to the user group

GRANT ROLE reader_role_cmd TO GROUP reader_cmd;
GRANT ROLE writer_role_cmd TO GROUP writer_cmd;

insert image description here
After executing the above command, you can also view the currently created character on the hue page
insert image description here

④ Check the authorization status
(1) Check all roles (administrators)

SHOW ROLES;

insert image description here
(2) View the role (administrator) of the specified user group

SHOW ROLE GRANT GROUP reader_cmd;

insert image description here

(3) View the role of the current authenticated user

SHOW CURRENT ROLES;

insert image description here

(4) View the specific permissions of the specified ROLE (administrator)

SHOW GRANT ROLE reader_role_cmd;

insert image description here

⑤ Permission test
(1) Create Kerberos principals for reader_cmd and writer_cmd

kadmin.local -q "addprinc reader_cmd/[email protected]"
kadmin.local -q "addprinc writer_cmd/[email protected]"

insert image description here
(2) Use reader_cmd to log in to HiveServer2, and query any table under the gmall library

kinit reader_cmd/[email protected]
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]"

insert image description here
Has query permission
insert image description here
but does not have insert permission
insert image description here
(3) Use writer_cmd to log in to HiveServer2, and query any table under the gmall library

kinit writer_cmd/[email protected]
beeline -u "jdbc:hive2://chen102:10000/;principal=hive/[email protected]"

insert image description here
writer_cmd does not have the query table permission
insert image description here
writer_cmd has the insert table permission
insert image description here
(4) Query result
reader_cmd has the query permission for the gmall table, but writer_cmd does not. Indicates that the authorization is in effect.

The latter is the performance test of the CDH data warehouse project and the operation of cleaning the CDH cluster. For details, see
"CDH Data Warehouse Project (4) - Cluster Performance Test/Resource Management/Clear CDH Cluster"

Guess you like

Origin blog.csdn.net/Keyuchen_01/article/details/128794103