Overview of Kerberos (1)

Title Configure Kerberos authentication

The use of strong authentication to establish user identity is the basis for Hadoop secure access. Users need to reliably identify themselves, and then spread the identification throughout the Hadoop cluster to access cluster resources.
Hortonworks uses Kerberos for authentication.
Kerberos is an industry standard for authenticating users and resources in a Hadoop cluster. HDP also includes Ambari, which simplifies Kerberos setup, configuration, and maintenance.
By default, Ambari requires users to authenticate with a username and password. Whether you configure it to use an internal database for authentication or to synchronize with an external source (such as LDAP or Active Directory), Ambari uses this authentication mechanism. Optionally, you can configure Ambari to use Kerberos tokens for authentication via SPNEGO (simple and protected GSSAPI negotiation mechanism).

Kerberos overview

A high-level overview of Kerberos authentication, including key distribution center, principal, authentication server, ticket-granting ticket and ticket-granting server.
The basis of strong authentication and establishing user identity is secure access in Hadoop. Users need to be able to reliably "recognize"
and then spread this identity throughout the Hadoop cluster. After completing this operation, these users can access resources (for example,
as files or directories) or interact with the cluster (for example, run MapReduce tasks). In addition to users, the resources themselves (such as hosts and services) need to be mutually authenticated in a Hadoop cluster to avoid potentially malicious systems or clusters of daemons that "impersonate" trusted components to gain access to data.
Hadoop uses Kerberos as the basic user and service propagation for strong authentication and identification.
Kerberos is a third-party authentication mechanism in which users and services rely on a third party (Kerberos server) to authenticate each service to another. The Kerberos server itself is called the Key Distribution Center, or KDC.
At a high level, it has three parts:
• A database of users and services (called principals) and their respective Kerberos passwords that it knows
• An authentication server (AS) that performs initial authentication and issues a ticket grant Ticket (TGT)
•Ticket Granting Server (TGS), which issues subsequent service tickets based on the initial TG

The user principal requests authentication from the AS. The AS returns the TGT encrypted with the Kerberos password of the user principal, which is only known by the user principal and the AS. The user principal uses its Kerberos password to decrypt the TGT locally. From then on, until the ticket expires, the user principal can use the TGT to obtain a service ticket from the TGS. The service ticket allows the subject to access various services.
Because cluster resources (hosts or services) cannot provide a password every time to decrypt the TGT, they use a special file called keytab, which contains the authentication credentials of the resource principal. The set of hosts, users, and services controlled by the Kerberos server is called a realm.

the term

Term description
Key Distribution Center, or KDC A trusted source for authentication in an environment that supports kerberos.
Kerberos KDC Server A machine or server that acts as a key distribution center (KDC).
Kerberos Client Any machine in the cluster that authenticates against the KDC.
Principal The unique name of the user or service authenticated according to the KDC.
Keytab A file containing one or more subjects and their keys.
Realm The Kerberos network includes a KDC and many clients.
KDC Admin Account Ambari is used to create an administrative account in the KDC for the main body and the generated key tab.

Overview of Kerberos principals

Each service and sub-service in Hadoop must have its own subject. The principal name in a given realm consists of the principal name and the instance name. In this example, the instance name is the FQDN of the host running the service. Since the service does not use a password to log in to obtain a ticket, the authentication credentials of its principal are stored in the keytab file, which is extracted from the Kerberos database and stored in the local secure directory along with the service principal on the service component host.

Insert picture description here

Principal and Keytab naming convention

assets Convention example
Principals s e r v i c e c o m p o n e n t n a m e / service_component_name/ servicecomponentname/[email protected] nn/[email protected]
Keytabs $service_component_abbreviation.service.keytab /etc/security/keytabs/nn.service.keytab

Note that in the previous example, the main name of each service principal. These main names (such as nn or hive) serve the NameNode or hive respectively. Each main name is appended with the instance name, which is the FQDN of the host on which it is running. This convention provides unique subject names for services running on multiple hosts (such as datanode and nodemanager). The host name is added to distinguish the request of DataNode a from the request of DataNode b. This is important for the following reasons:
• Damage to the Kerberos credentials of one DataNode will not automatically cause the Kerberos credentials of all DataNodes to be damaged.
• If multiple datanodes have exactly the same principal and are connected to the NameNode at the same time, and the Kerberos authenticator sent happens to have the same timestamp, then the authentication will be rejected as a replay request.

In addition to Hadoop service agents, Ambari itself also needs a set of Ambari agents to perform service "smoke" checks, perform alert health checks, and retrieve metrics from cluster components. The Keytab file of the Ambari principal resides on each cluster host, just like the Keytab file of the service principal.

Main Amber description
Smoke and “Headless” Service users Ambari is used to perform service "smoke" checks and operational alarm health checks.
Ambari Server user When clustering is enabled for Kerberos, component REST endpoints (such as YARN ATS components) require SPNEGO authentication. The Ambari server needs to access these apis and needs a Kerberos principal in order to authenticate these apis through SPNEGO.

Guess you like

Origin blog.csdn.net/m0_48187193/article/details/114871949