table of Contents
Second, the configuration and realization of specific components
2.1, HDFS for permission control
1. Ranger permission issue
1. What is Ranger?
Ranger targets the permissions within components, such as HDFS read and write execution, Hive and Hbase read and write updates, yarn queue resource usage rights, task submission permissions, currently ranger only supports hdfs, hive, hbase, kafka, yarn and other components. For groups and users to fine-grained control of resource access rights.
2. Summary
It mainly focuses on the integration of Kerberos around the ranger for fine-grained user access control, and the description of the Ranger Audit audit function
- HDFS permission control (Kerberos integration)
- HBase permission control
- Hive permission control
- yarn task submission resource access control
Second, the configuration and realization of specific components
Kerberos has been installed in the cluster, kangll, ranger_hive users and Kerberos users have been created.
The Kerberos client supports two types, one is to use principal + Password, and the other is to use principal + keytab. The former is suitable for users to perform interactive applications, such as hadoop fs -ls, and the latter is suitable for services, such as yarn's rm and nm.
2.1, HDFS for permission control
1. Create a new operating system user useradd kangll (you can write a script to create a user for each node in the cluster)
2. Carry out kerberos authentication for the kangll user on the kdc host, and copy the generated keytab file to the machine of the hadoop cluster, modify the permissions, and perform kinit to obtain the Tiket of the service
3. Check the permission settings under the /kangna directory. At this time, only hdfs users can read and write
4. Add Kerberos configuration in service
Description: (1) The hadoop.secirity.auth_to_local configuration here will be generated in core-site.xml after Kerberos is installed. For the specific meaning, please refer to: https://www.jianshu.com/p /2ad4be7ecf39
RULE:[1:$1@$0]([email protected])s/.*/hdfs/
RULE:[2:$1@$0]([email protected])s/.*/hdfs/
(2) The principal can be found in the Kerberos.csv file that we install Kerberos and download, or go directly to kadmin.local and use listprincs to view
5. Create the corresponding policy
At this time, the /kangll folder can only be read and written by the hdfs user, and no other users can operate
The kangll user opens the read permission to the HDFS /kangll folder
6. Authorization verification
Read and write access to the /kangll folder. Below we see that because I only enable the read permission, writing to files is not possible.
Below I open the write permission
Verify again, it's perfect
7.Ranger Audit
Note : The operation record of Kangll users in the Ranger audit function module is allowed for kangll users to write to the /kangll folder under HDFS.
2.2, HBase permission control
1. Policy configuration
2. Before adding permission to create table
3. After adding create table, read and write permissions
4. The policy has only read permission
Enable permission to read data
The permission to write data is not enabled
After the kangll user's permission to write data is enabled
Create user kangll to control kangll user's operation permissions on HBase tables. Here, just like hive tables, you can assign specific table and column permissions. Users can be assigned specific table operation permissions such as: create, read, write, and administrator permissions.
5.Ranger Anit
2.3, Hive permission control
1. First confirm whether Ambari has enabled ranger-hive-plugin
HiveServer2 also cannot enter (hdp202). Here you only need to do principal authentication to create a kangll user, principal authentication kinit kangll
2. Enter Ranger's WebUI, click Add New Service (URL is Ranger Admin machine ip: 6080)
Explain that there will be a default service after the ranger is installed here, but Kerberos is not configured. We can modify this Service.
3.Edit Service
- jdbc.url* 中填 : jdbc:hive2://hdp202:2181,hdp201:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
- jdbc.driverClassName : org.apache.hive.jdbc.HiveDriver
It’s fine if the Test conection passes the test. Sometimes it doesn’t pass, but it’s okay. It’s fine to execute later, jdbc.url is hiveServer2 IP:10000
4. Use hive users to create tables and insert data
Get tickets
[root@hdp202 keytabs]# kinit -kt hive.service.keytab hive/[email protected]
[root@hdp202 keytabs]# kinit -kt hive.service.keytab hive/[email protected]
[hive@hdp202 keytabs]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hdp202:2181,hdp201:2181/default;password=hive;principal=hive/[email protected];serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
20/07/07 21:20:47 [main]: INFO jdbc.HiveConnection: Connected to hdp202:10000
Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.4.0-315 by Apache Hive
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show databases;
+---------------------+
| database_name |
+---------------------+
| default |
| information_schema |
| ranger_hive |
| sys |
+---------------------+
4 rows selected (0.154 seconds)
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> use ranger_hive;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> create table employee(name String,age int,address String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> load data local inpath '/hadoop/data/ranger_hive.txt' into table employee;
0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> select * from employee;
+----------------+---------------+-------------------+
| employee.name | employee.age | employee.address |
+----------------+---------------+-------------------+
| kangna | 12 | shanxi |
| zhangsan | 34 | Shanghai |
| lisi | 23 | beijing |
| wangwu | 21 | guangzhou |
+----------------+---------------+-------------------+
4 rows selected (2.285 seconds)
5. View and modify the configuration strategy
After adding the operation authority of the kangll user to the database, now we go back and continue to use the kangll user to enter hiveServer2 and use the show database command to view the hive database and no error will be reported.
Switch to kangll user, kinit obtains the ticket, connect to hiveserver2
[kangll@hdp202 keytabs]$ kinit kangll Password for [email protected]: [kangll@hdp202 keytabs]$ klist Ticket cache: FILE:/tmp/krb5cc_1017 Default principal: [email protected] Valid starting Expires Service principal 07/07/2020 21:57:04 07/08/2020 21:57:04 krbtgt/[email protected] [kangll@hdp202 keytabs]$ beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://hdp202:2181,hdp201:2181/default;principal=hive/[email protected];serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 20/07/07 21:57:27 [main]: INFO jdbc.HiveConnection: Connected to hdp202:10000 Connected to: Apache Hive (version 3.1.0.3.1.4.0-315) Driver: Hive JDBC (version 3.1.0.3.1.4.0-315) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0.3.1.4.0-315 by Apache Hive 0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show databases; +----------------+ | database_name | +----------------+ | ranger_hive | +----------------+ 1 row selected (0.157 seconds) 0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> use ranger_hive; 0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> show tables; +-----------+ | tab_name | +-----------+ | employee | +-----------+ 1 row selected (0.044 seconds) 0: jdbc:hive2://hdp202:2181,hdp201:2181/defau> select * from employee; +----------------+---------------+-------------------+ | employee.name | employee.age | employee.address | +----------------+---------------+-------------------+ | kangna | 12 | shanxi | | zhangsan | 34 | Shanghai | | lisi | 23 | beijing | | wangwu | 21 | guangzhou | +----------------+---------------+-------------------+ 4 rows selected (0.271 seconds) 0: jdbc:hive2://hdp202:2181,hdp201:2181/defau>
No permission to create table
Enable permission to create tables
6.Ranger Audit audit function module
Note: For the policy created by the kangll user, the operations of the kangll user on the cluster components are also recorded before the policy is created, but the permissions are not allowed. The Kangll user performed a query operation in Hive, and the execution result of this permission is allowed.
7.hive supplement
In addition to the Access strategy type, hive also has Masking and Row Level Filter. No specific demonstration is given here.
Policy Type 为 Masking
Policy Type 为 Row Level Filter
2.4, YARN permission control
1. Mapreduce comes with worldcount to submit task test
(1) Create the wordcount.txt file and upload it to HDFS
[root@hdp201 tmp]# vim wordcount.txt
[root@hdp201 tmp]# cat wordcount.txt
world is a new world
I will do my world do this job
bye bye
[root@hdp201 tmp]# hdfs dfs -put wordcount.txt /data/input
(2) jar location of wordcount in hdp
[kangll@hdp201 mapreduce]$ pwd
/usr/hdp/3.1.4.0-315/hadoop/hadoop/share/hadoop/mapreduce
(3) Execute jar, do not configure any strategy of kangll user to execute directly
[kangll@hdp201 mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar wordcount /data/input/wordcount.txt /data/output
20/07/06 18:02:36 INFO client.RMProxy: Connecting to ResourceManager at hdp201/10.168.138.188:8050
20/07/06 18:02:36 INFO client.AHSProxy: Connecting to Application History server at hdp202/10.174.96.212:10200
20/07/06 18:02:37 INFO hdfs.DFSClient: Created token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6 on 10.168.138.188:8020
20/07/06 18:02:37 INFO security.TokenCache: Got dt for hdfs://hdp201:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.168.138.188:8020, Ident: (token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6)
20/07/06 18:02:37 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/kangll/.staging/job_1594028454535_0001
20/07/06 18:02:37 INFO input.FileInputFormat: Total input files to process : 1
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: number of splits:1
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1594028454535_0001
20/07/06 18:02:38 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: 10.168.138.188:8020, Ident: (token for kangll: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1594029757336, maxDate=1594634557336, sequenceNumber=5, masterKeyId=6)]
20/07/06 18:02:38 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
20/07/06 18:02:38 INFO impl.TimelineClientImpl: Timeline service address: hdp202:8188
20/07/06 18:02:39 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/kangll/.staging/job_1594028454535_0001
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User kangll does not have permission to submit application_1594028454535_0001 to queue default
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:427)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: org.apache.hadoop.security.AccessControlException: User kangll does not have permission to submit application_1594028454535_0001 to queue default
We can see from the task execution log that the user kangll does not have the permission to submit the task, let’s turn it on in the configuration policy below
2. Policy configuration
3. Re-execute the job and check the results