Ranger in HDP integrates Kerberos for fine-grained access control

table of Contents

1. Ranger permission issue

1. What is Ranger?

​2. Summary

Second, the configuration and realization of specific components

2.1, HDFS for permission control

2.2, HBase permission control

2.3, Hive access control

2.4, YARN permission control


1. Ranger permission issue

1. What is Ranger?

Ranger targets the permissions within components, such as HDFS read and write execution, Hive and Hbase read and write updates, yarn queue resource usage rights, task submission permissions, currently ranger only supports hdfs, hive, hbase, kafka, yarn and other components. For groups and users to fine-grained control of resource access rights.


2. Summary

It mainly focuses on the integration of Kerberos around the ranger for fine-grained user access control, and the description of the Ranger Audit audit function

  1. HDFS permission control (Kerberos integration)
  2. HBase permission control
  3. Hive permission control
  4. yarn task submission resource access control

Second, the configuration and realization of specific components

Kerberos has been installed in the cluster, kangll, ranger_hive users and Kerberos users have been created.

The Kerberos client supports two types, one is to use principal + Password, and the other is to use principal + keytab. The former is suitable for users to perform interactive applications, such as hadoop fs -ls, and the latter is suitable for services, such as yarn's rm and nm.

 

2.1, HDFS for permission control

1. Create a new operating system user useradd kangll (you can write a script to create a user for each node in the cluster)

2. Carry out kerberos authentication for the kangll user on the kdc host, and copy the generated keytab file to the machine of the hadoop cluster, modify the permissions, and perform kinit to obtain the Tiket of the service

3. Check the permission settings under the /kangna directory. At this time, only hdfs users can read and write

4. Add Kerberos configuration in service

 

Description: (1) The  hadoop.secirity.auth_to_local   configuration here will be generated in core-site.xml after Kerberos is installed. For the specific meaning, please refer to: https://www.jianshu.com/p /2ad4be7ecf39

RULE:[1:$1@$0]([email protected])s/.*/hdfs/    

RULE:[2:$1@$0]([email protected])s/.*/hdfs/

(2) The principal can be found in the Kerberos.csv file that we install Kerberos and download, or go directly to kadmin.local and use listprincs to view

5. Create the corresponding policy

 At this time, the /kangll folder can only be read and written by the hdfs user, and no other users can operate

The kangll user opens the read permission to the HDFS /kangll folder

6. Authorization verification 

Read and write access to the /kangll folder. Below we see that because I only enable the read permission, writing to files is not possible.

Below I open the  write  permission

 

Verify again, it's perfect

7.Ranger Audit

Note : The operation record of Kangll users in the Ranger audit function module is allowed for kangll users to write to the /kangll folder under HDFS.

2.2, HBase permission control

 1. Policy configuration

2. Before adding permission to create table

 3. After adding create table, read and write permissions

 4. The policy has only read permission

Enable permission to read data

The permission to write data is not enabled

After the kangll user's permission to write data is enabled

  

Create user kangll to control kangll user's operation permissions on HBase tables. Here, just like hive tables, you can assign specific table and column permissions. Users can be assigned specific table operation permissions such as: create, read, write, and administrator permissions.

5.Ranger Anit

2.3, Hive permission control

1. First confirm whether Ambari has enabled ranger-hive-plugin

HiveServer2 also cannot enter (hdp202). Here you only need to do principal authentication to create a kangll user, principal authentication kinit kangll

2. Enter Ranger's WebUI, click Add New Service (URL is Ranger Admin machine ip: 6080)

 

Explain that there will be a default service after the ranger is installed here, but Kerberos is not configured. We can modify this Service.

3.Edit Service 

  • jdbc.url* 中填  : jdbc:hive2://hdp202:2181,hdp201:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 
  • jdbc.driverClassName :  org.apache.hive.jdbc.HiveDriver

It’s fine if the Test conection passes the test. Sometimes it doesn’t pass, but it’s okay. It’s fine to execute later, jdbc.url is hiveServer2 IP:10000

4. Use hive users to create tables and insert data

Get tickets

[root@hdp202 keytabs]# kinit -kt hive.service.keytab hive/[email protected]
[root@hdp202 keytabs]# kinit -kt hive.service.keytab hive/[email protected]

[hive@hdp202 keytabs]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hdp202:2181,hdp201:2181/default;password=hive;principal=hive/[email protected];serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
20/07/07 21:20:47 [main]: INFO jdbc.HiveConnection: Connected to hdp202:10000
Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.4.0-315 by Apache Hive
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show databases;
+---------------------+
|    database_name    |
+---------------------+
| default             |
| information_schema  |
| ranger_hive         |
| sys                 |
+---------------------+
4 rows selected (0.154 seconds)
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> use ranger_hive;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> create table employee(name String,age int,address String)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> load data local inpath '/hadoop/data/ranger_hive.txt' into table employee; 
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> select * from employee;

+----------------+---------------+-------------------+
| employee.name  | employee.age  | employee.address  |
+----------------+---------------+-------------------+
| kangna         | 12            | shanxi            |
| zhangsan       | 34            | Shanghai          |
| lisi           | 23            | beijing           |
| wangwu         | 21            | guangzhou         |
+----------------+---------------+-------------------+
4 rows selected (2.285 seconds)

5. View and modify the configuration strategy

 

 

 After adding the operation authority of the kangll user to the database, now we go back and continue to use the kangll user to enter hiveServer2 and use the show database command to view the hive database and no error will be reported.

Switch to kangll user, kinit obtains the ticket, connect to hiveserver2 

[kangll@hdp202 keytabs]$ kinit kangll
Password for [email protected]: 
[kangll@hdp202 keytabs]$ klist
Ticket cache: FILE:/tmp/krb5cc_1017
Default principal: [email protected]

Valid starting       Expires              Service principal
07/07/2020 21:57:04  07/08/2020 21:57:04  krbtgt/[email protected]
[kangll@hdp202 keytabs]$ beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hdp202:2181,hdp201:2181/default;principal=hive/[email protected];serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/07/07 21:57:27 [main]: INFO jdbc.HiveConnection: Connected to hdp202:10000
Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.4.0-315 by Apache Hive
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show databases;
+----------------+
| database_name  |
+----------------+
| ranger_hive    |
+----------------+
1 row selected (0.157 seconds)
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> use ranger_hive;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show tables;
+-----------+
| tab_name  |
+-----------+
| employee  |
+-----------+
1 row selected (0.044 seconds)
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> select * from employee;
+----------------+---------------+-------------------+
| employee.name  | employee.age  | employee.address  |
+----------------+---------------+-------------------+
| kangna         | 12            | shanxi            |
| zhangsan       | 34            | Shanghai          |
| lisi           | 23            | beijing           |
| wangwu         | 21            | guangzhou         |
+----------------+---------------+-------------------+
4 rows selected (0.271 seconds)
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> 

No permission to create table

Enable permission to create tables

6.Ranger Audit audit function module 

Note: For the policy created by the kangll user, the operations of the kangll user on the cluster components are also recorded before the policy is created, but the permissions are not allowed. The Kangll user performed a query operation in Hive, and the execution result of this permission is allowed.

7.hive supplement

In addition to the Access strategy type, hive also has Masking and Row Level Filter. No specific demonstration is given here.

Policy Type 为 Masking 

Policy Type 为 Row Level Filter 

2.4, YARN permission control

1. Mapreduce comes with worldcount to submit task test

(1) Create the wordcount.txt file and upload it to HDFS

[root@hdp201 tmp]# vim wordcount.txt 
[root@hdp201 tmp]# cat wordcount.txt 
world is a new world
I will do my world do this job
bye bye
[root@hdp201 tmp]# hdfs dfs -put wordcount.txt /data/input

(2) jar location of wordcount in hdp

[kangll@hdp201 mapreduce]$ pwd
/usr/hdp/3.1.4.0-315/hadoop/hadoop/share/hadoop/mapreduce

 (3) Execute jar, do not configure any strategy of kangll user to execute directly

[kangll@hdp201 mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar wordcount /data/input/wordcount.txt /data/output
20/07/06 18:02:36 INFO client.RMProxy: Connecting to ResourceManager at hdp201/10.168.138.188:8050
20/07/06 18:02:36 INFO client.AHSProxy: Connecting to Application History server at hdp202/10.174.96.212:10200
20/07/06 18:02:37 INFO hdfs.DFSClient: Created token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6 on 10.168.138.188:8020
20/07/06 18:02:37 INFO security.TokenCache: Got dt for hdfs://hdp201:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.168.138.188:8020, Ident: (token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6)
20/07/06 18:02:37 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/kangll/.staging/job_1594028454535_0001
20/07/06 18:02:37 INFO input.FileInputFormat: Total input files to process : 1
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: number of splits:1
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1594028454535_0001
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: 10.168.138.188:8020, Ident: (token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6)]
20/07/06 18:02:38 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/07/06 18:02:38 INFO impl.TimelineClientImpl: Timeline service address: hdp202:8188
20/07/06 18:02:39 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/kangll/.staging/job_1594028454535_0001
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User kangll does not have permission to submit application_1594028454535_0001 to queue default
	at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:427)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: org.apache.hadoop.security.AccessControlException: User kangll does not have permission to submit application_1594028454535_0001 to queue default

We can see from the task execution log that the user kangll does not have the permission to submit the task, let’s turn it on in the configuration policy below

2. Policy configuration

3. Re-execute the job and check the results

 

Guess you like

Origin blog.csdn.net/qq_35995514/article/details/106575381
HDP