HDFS Rights Management User Guide

Original address: http://hadoop.apache.org/docs/r1.0.4/cn/hdfs_permissions_guide.html

Overview

The Hadoop distributed file system implements a permission model for files and directories similar to POSIX systems. Every file and directory has an owner and a group . A file or directory has different permissions for its owner, other users in the same group, and all other users. For files, r permission is required when reading the file, and w permission is required when writing or appending to the file . For directories, r permission is required when listing the contents of the directory, w permission is required when creating or deleting subfiles or subdirectories, and x permission is required when accessing sub-nodes of a directory . Unlike the POSIX model, files in the HDFS permission model have no sticky , setuid or setgid bits, as there is no concept of executable files here. For simplicity, there is also no sticky , setuid or setgid bits for directories here. In general, the permissions of a file or directory are its mode . HDFS adopts the conventions of Unix representation and display modes, including the use of octal numbers to represent permissions. When a new file or directory is created, its owner is the user of the client process, and its group is the parent directory's group (BSD regulations).

The identity of each user process accessing HDFS is divided into two parts, which are a list of user names and group names . Every time a user process accesses a file or directory foo, HDFS performs a permission check on it.

  • If the user is the owner of foo, check the access rights of the owner;
  • If the group associated with foo is present in the list of group names, check the access rights of the group user;
  • Otherwise check the access rights of other users of foo.

If the permission check fails, the client's action fails.

User ID

In this version of Hadoop, the client user identity is given by the host operating system. For Unix-like systems,

  • username equals `whoami`;
  • The group list is equivalent to `bash -c groups`.

Other ways to determine user identity (such as Kerberos, LDAP, etc.) will be added in the future. It is unrealistic to expect to use the first method mentioned above to prevent one user from impersonating another user. This user identification mechanism combined with a permission model allows a collaborative group to share resources in the file system in an organized fashion.

Regardless, the user identity mechanism is only an external feature to HDFS itself. HDFS does not provide functions such as creating user identities, creating groups, or handling user credentials.

Understand the implementation of the system

Every file or directory operation passes the full pathname to the name node, and every operation does a permission check on this path. The client framework implicitly associates user identities with connections to name nodes, reducing the need to change existing client APIs. It is often the case that when an operation on a file succeeds, the same operation fails later because the file or some directory on the path no longer exists. For example, the client first starts reading a file, it makes a request to the name node to get the position of the first data block of the file. But subsequent second requests to fetch other chunks may fail. On the other hand, deleting a file does not revoke the access rights the client has gained to the file's data blocks. Permission management enables the client's access to a file to be revoked between requests. To repeat, a change in permissions does not revoke the current client's permission to access the file's data blocks.

The map-reduce framework assigns user identities by passing strings without any other special security considerations. The owner and group attributes of a file or directory are stored as strings, not converted to numeric user and group IDs in the traditional Unix way.

The rights management features of this release do not require any changes to the data node's behavior. Data blocks on a data node do not have any associated attributes such as Hadoop owner or permissions.

Filesystem API changes

All methods that take a path parameter may throw an AccessControlException if the permission check fails.

Added method:

  • public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException;
  • public boolean mkdirs(Path f, FsPermission permission) throws IOException;
  • public void setPermission(Path p, FsPermission permission) throws IOException;
  • public void setOwner(Path p, String username, String groupname) throws IOException;
  • public FileStatus getFileStatus(Path f) throws IOException; also returns the owner, group and mode properties associated with the path.

The mode of creating a new file or directory is constrained by the configuration parameter umask. When using the previous create(path, …) method ( without specifying the permission parameter), the mode of the new file is 666 & ^umask. When using the new create(path, permission , …) method ( with the permission parameter P specified ), the mode of the new file is P & ^umask & 666. When a new directory is created using the previous mkdirs(path) method ( without specifying the permissions parameter), the mode of the new directory is 777 & ^umask. When a new directory is created using the new mkdirs(path, permission ) method ( with the permission parameter P specified ), the pattern for the new directory is P & ^umask & 777.

Shell command changes

New operation:

chmod [-R] mode file …
Only the owner of the file or superuser has permission to change the file mode.
chgrp [-R] group file …
The user using the chgrp command must belong to a specific group and be the owner of the file, or the user must be superuser.
chown [-R] [owner][:[group]] file …
The owner of the file can only be changed by the superuser.
ls file …
lsr file …
Output format adjusted to show owner, group and mode.

root

The superuser is the user who runs the name node process. Broadly speaking, if you start the name node, you are superuser. A superuser can do anything because a superuser can pass all permission checks. There is no permanent mark on who was superuser in the past ; when the name node starts running, the process automatically determines who is now superuser. The superuser of HDFS does not have to be the superuser on the name node host, nor does it need to be the same for all clusters. Likewise, an experimenter running HDFS on a personal workstation can easily become a superuser of his deployment instance without any configuration.

In addition, administrators can use configuration parameters to specify a specific group of users, and if so, members of this group will also be superusers.

web server

The identity of the web server is a configurable parameter. The name node has no concept of a real user, but the web server behaves as if it has the identity (username and group) of the user selected by the administrator. Unless the selected identity is superuser, there will be parts of the namespace that are not visible to the web server.

Online upgrade

If the cluster is started on a version 0.15 dataset ( fsimage ), all files and directories have owner O , group G , and mode M , where O and G are the superuser's user ID and group name, respectively, and M is a Configuration parameters.

Configuration parameters

dfs.permissions = true
If true, turns on the permission system described earlier. If false, permission checking is turned off, but no other behavior is changed. Changes to this configuration parameter do not change the file or directory's mode, owner, and group information.

 

chmod, chgrp and chown always check permissions regardless of whether permission mode is on or off . These commands are only useful in the context of permission checking, so there will be no compatibility issues. In this way, this allows administrators to reliably set file owners and permissions before turning on regular permission checks.
dfs.web.ugi = webuser, webgroup
Username used by the web server. If this parameter is set to the name of the superuser, all Web clients can see all the information. If this parameter is set to an unused user, the Web client will only be able to access resources that can be accessed by "other" permissions. Additional groups can be appended to form a comma-separated list.
dfs.permissions.supergroup = supergroup
The superuser's group name.
dfs.upgrade.permission = 777
Initial mode when upgrading. Files are never set with x permissions. In the configuration file, the decimal number 511 10 can be used .
dfs.umask = 022
The umask parameter is used when creating files and directories. In the configuration file, the decimal number 18 10 can be used .

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324494756&siteId=291194637