Enable Kerberos authentication
• Install Kerberos
• Install and configure master KDC/Kerberos Server
Note: Kerberos Server can be any host on the same Hadoop cluster network
1. Install the software krb5-server, krb5-workstation required by KDC
yum install krb5-libs krb5-server krb5-workstation
View the installation list with the command rpm -qa|grep krb5
2. After the software installation is complete, first configure the /etc/krb5.conf file.
[libdefaults]
default_realm = EXAMPLE.COM #You need to configure here, modify the default EXAMPLE.COM to the value you want to define
[realms]
EXAMPLE.COM ={
kdc = kerberos.example.com #The hostname is configured here
admin_server = kerberos.example.com #Same as above
}
3. Configure the /var/kerveros/krb5kdc/kdc.conf file
Notice:
[realms]
EXAMPLE.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}
Here is Example.COM consistent with the configuration in /etc/krb5.conf.
4. Create a Kerberos database
This step may take a long time, and a series of files will be generated under /var/kerberos/krb5kdc/ after creation. And you will be prompted to enter the database administrator's password.
kdb5_util create -r EXAMPLE.COM -s
Other operations:
Delete the Kerberos database. If the database is rebuilt, the principal related files under /var/kerberos/krb5kdc will be deleted first:
kdb5_util -r EXAMPLE.COM destroy
5. Create an administrator and enter the password [admin]. kadmin.local can be run directly on the KDC without Kerberos authentication.
/usr/sbin/kadmin.local -q "addprinc admin/admin"【admin】
Add ACL permissions to the database administrator, modify the kadm5.acl file, * represents all permissions
cat /var/kerberos/krb5kdc/kadm5.acl
6. Set the kerberos service to start at boot and turn off the firewall
chkconfig krb5kdc on
chkconfig kadmin on
chkconfig iptables off
7. Start the krb5kdc and kadmind processes
/usr/sbin/kadmind
/usr/sbin/krb5kdc
or
service krb5kdc start
service kadmin start
service krb5kdc status
8. Check that Kerberos is running normally
kinit admin/admin
9. Use admin to log in to Kerberos
kinit admin/[email protected] [initialization certificate]
klist [View current certificate]
10. Use the kadmin.local tool to create a user, and use the command listprincs to check whether the user has been created successfully.
11. Use the administrator to create a keytab authentication file for the user
Execute under kadmin
addprinc -randkey [email protected]
xst -k service.keytab test
The default build directory is /tmp/
View keytab file
klist –k -t /etc/security/service.keytab
direct execution
ktadd -k /root/wangjy.keytab -norandkey [email protected]
This keytab file is equivalent to the user's long-term key, which can be used for account authentication on any host.
12. Use administrator to delete users
Execute under kadmin
delprinc -force [email protected]
• Install idap client
yum install openldap-clients
• Install Kerberos Client
Install Kerberos Client on other hosts in the cluster:
yum install krb5-libs krb5-server krb5-workstation
• Enable Kerberos authentication for Hadoop environment
Note: The following steps are only for CDH5.5.X version of Hadoop
• Basic environment
1. Configure the KDC and its domains
2. Install openldap-clients on the Cloudera Manager Server host
3. Install krb5-workstation, krb5-libs on other nodes of the Hadoop cluster
4. Add the following configuration information to /var/kerberos/krb5kdc/kdc.conf on the host where the KDC is located
max_life = 1d
max_renewable_life = 7d
kdc_tcp_ports = 88
5. If YARN-HA is turned on, you need to clear the relevant status in Zookeeper:
Stop YARN and format the State Store. This can be done through the Cloudera Manager page.
• installation steps
1. Enable conditions:
Set up a running KDC. Cloudera Manager supports MIT KDC and Active Directory.
The KDC should be configured to have a non-zero ticket lifetime and an updatable lifetime. CDH does not work properly if the ticket is not updatable.
If you want to use Active Directory, the OpenLdap client library should be installed on the Cloudera Manager Server host. Also, the Kerberos client library should be installed on all hosts.
After all the above conditions are confirmed, check Yes and go to the next step.
2. KDC information
3. KRB5 configuration
Whether to deploy krb5.conf to each node in the cluster
4. Import KDC Account Management Ticket
5. Configure the HDFS Datanode port
6. Successful start
• Possible errors
1、Communication failure with server while initializing kadmin interface
Reason :
The host specified for the admin server (also known as the master KDC) does not have the kadmind daemon running.
Workaround :
Make sure to specify the correct hostname for the master KDC. If the correct hostname is specified, make sure kadmind is running on the specified master KDC.
• Turn off Kerberos authentication
• Close steps
1. Modify the hdfs configuration core-site.xml
hadoop.security.authentication -> Simple
hadoop.security.authorization -> false
dfs.datanode.address -> from 1004 (for Kerberos) to 50010 (default)
dfs.datanode.http.address -> from 1006 (for Kerberos) to 50075 (default)
2.hbase configuration
hbase.security.authentication -> simple
hbase.security.authorization -> false
3.zookeeper configuration
enableSecurity-> false
4.Hue configuration
Delete Kerberos Ticket Renewer instance
• Possible errors
datanodes fail to start
Exception information:
java.io.IOException: Failed on local exception: java.net.SocketException: Permission denied; Host Details : local host is: "xxxxx"; destination host is: (unknown)
Workaround :
Restore the datanode's dfs.datanode.address to 50010 and dfs.datanode.http.address to 50075.
Namenode in Yarn is in standby state
• Development machines connect to Kerberos
• Windows
1. Configure environment variables: USERDNSDOMAIN=HADOOP.COM
2. Modify mapred.properties
#kerberos authentication configuration
hadoop.security.authentication=kerberos
#kerberos.file.path=/etc/krb5.conf
kerberos.file.path=E:/etc/security/keytab/krb5.conf
hdfs.user=hdfs
#mapreduceAuthenticating users
dfs.client.keytab.file=/etc/security/keytab/hebei.keytab
#hiveAuthenticating Users
hive.dfs.client.kerberos.principal=hiveuser/[email protected]
hive.dfs.client.keytab.file=E:/etc/security/keytab/hiveuser.keytab
FAQ
• Cancel access control
Hadoop file system (HDFS) has permission control, similar to Linux permissions. If dfs.permissions is found in the test environment, uncheck this item
Note: The production environment must use HDFS with permission control
• File backup modification
The Hadoop file system (HDFS) defaults to three copies of data to ensure the reliability of Hadoop. If it is a test environment or if the hard disk capacity is small, you can find the dfs.replication configuration and modify the number of backups (this number is an integer greater than 0). )
Note: The production environment must use more than three backups, depending on the data and hard disk capacity
• Hadoop file system ( HDFS ) space is insufficient, increase the space
Insufficient space in the Hadoop file system (HDFS) will cause errors in programs such as MR and Hive. It is necessary to add storage to each device in the Hadoop cluster. After the operating system mounts the new storage, the configuration of some components in Hadoop must be modified.
On the CM management page (port 7180 of the master node), click HDFS to enter the HDFS page, and click Configure:
Search for the dfs.datanode.data.dir configuration item, click "+", and add a new storage directory /newdisk1/dfs/dn
Search for the hadoop.log.dir configuration item and change the original directory to the new storage directory /newdisk1/var/log/hadoop-mapreduce
On the CM management page (port 7180 of the main node), click YARN to enter the YARN page, click Configure, search for the following configuration items in turn, and modify them to a new storage directory:
Configuration item: yarn.nodemanager.local-dirs Storage directory: /newdisk1/yarn/nm
Configuration item: yarn.nodemanager.log-dirs Storage directory: /newdisk1/var/log/hadoop-yarn/container
Configuration item: hadoop.log.dir Storage directory: /newdisk1/var/log/hadoop-yarn
Configuration item: hadoop.log.dir Storage directory: /newdisk1/var/log/hadoop-yarn
• If an error occurs or is interrupted during the installation process, and if you want to start the installation again, you can perform the following operations:
master node: shut down server and agent
/opt/cm-5.5.0/etc/init.d/cloudera-scm-server stop
/opt/cm-5.5.0/etc/init.d/cloudera-scm-agent stop
rm -rf /opt/cloudera/parcel-cache
rm -rf /opt/cloudera/parcel-parcels
Clear the database :
drop database scm;
Rebuild the database :
CREATE DATABASE scm OWNER scm ENCODING 'UTF8';
slave node : close the agent
/opt/cm-5.5.0/etc/init.d/cloudera-scm-agent stop
rm -rf /opt/coudera
Restart the service :
master node: open server, agent
/opt/cm-5.5.0/etc/init.d/cloudera-scm-server start
/opt/cm-5.5.0/etc/init.d/cloudera-scm-agent start
slave node: open agent
/opt/cm-5.5.0/etc/init.d/cloudera-scm-agent start