Big Data Rights Management - describes the research process and components (1)

table of Contents

1, problem

2, the research process and results

3, Components Introduction

hue Introduction

Security Introduction

ACLs Introduction

ldap Introduction



1, issue:

Big Data Platform permissions issue so many architects are headache, big data platform HIVE library [] and [] HDFS directory allows users to freely operate, so that the library [HIVE] and [directory] HDFS sensitive and non-sensitive areas is not effectively controlled.
Even HDFS through the ACL control, but still operate by any user program libraries [HIVE] and [directory] HDFS sensitive and non-sensitive areas.

2, the research process and results:

This time to study a bit large data rights management platform based on Cloudera's Distribution Including Apache Hadoop (CDH ) , and in the course of this study primarily designed to hue, Security, ACLs, LDAP four components.
By rights management, you can achieve the following results:
       1) big data platform to the users assigned to different roles (groups).
       2) each role (group) have different privileges.
       3) the role (group) permissions can operate hive, impala and databases, and hdfs path.
       4) (group) permissions in the program where different users can have roles.
       5) management of users, roles (groups), permissions need to be set in Linux, Hue, Ldap, where Ldap and hue I've written a shell script.

3, Components Introduction

hue Introduction

       Hadoop the User preference Experience = HUE
       Hue is an open source Apache Hadoop UI system, the Cloudera Desktop evolved, the company finally Cloudera Hadoop community to contribute to the Apache Foundation, which is based on the Python Web framework Django implementation.
       We can Hue by using the browser on the Web console to interact with Hadoop cluster to analyze process data, such as data on the operation of HDFS, run MapReduce Job, execute SQL statements Hive, HBase database browsing and so on.

Here Insert Picture Description       hue instructions for use
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/hue.html

       hue test area, account demo, password Demo
https://demo.gethue.com

Security Introduction

       Sentry service is an RPC server, it will authorize the metadata stored in the underlying relational database, and provides RPC interface to retrieve and manipulate privileges. It supports the use of Kerberos security to access the service. Authorized service provider supports metadata from databases stored; it does not deal with the actual privilege verification. Hive, Impala and Solr service is a client of the service, when they are configured to use a sentry, they will perform sentry privileges.
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/sentry.html

ACLs Introduction

       Hadoop Distributed File System (HDFS) to achieve a shared file and directory permissions model most of the POSIX model. It is associated with each file and directory owner and group associated. For other users, and all other users as the user owner, as a member of this group, which has a separate file or directory permissions. For files, r need permission to read the file, and requires permission to write w or appended to the file. For the directory, you need r permission to list the contents of a directory, create or delete files or directories w need permission to access the directory x child needs permission.

       Compared with the POSIX model, the file is not setuid or setgid bit, because there is no concept of an executable file. For catalog, catalog no setuid or setgid bit as simplified. You can set the sticky bit on a directory, the file to prevent anyone other than the super-user, owner of the directory or file owner deleted or moved directory. Sticky bit settings file is invalid. Overall permissions, file or directory is its model. Typically, using the Unix convention for representing and display mode, including the use of octal in this specification. When you create a file or directory, its owner is the user ID of the client process, whose group is the group's parent directory (BSD rule).

       HDFS also offers optional support for POSIX ACL (Access Control List) to use named user or named more granular set of rules to enhance the file permissions for specific. ACL will be discussed in detail later in this document.

       Each client process to access HDFS has a two-part identified by a user name and a list consisting of. Whenever execute permissions on the file or directory foo HDFS client process must be accessible to inspection,

       If a user name and owner match foo, the test owner permissions; otherwise, if any member of the group and the group foo list of matches, then the test group permissions; otherwise, the test will be other permissions of foo. If the permissions check fails, the client will fail.
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html

ldap Introduction

       LDAP is the Lightweight Directory Access Protocol, English name is Lightweight Directory Access Protocol, generally referred to as LDAP. It is based on the X.500 standard, but much simpler and can be customized as needed. Unlike X.500, LDAP supports TCP / IP, which is necessary for Internet access. The core LDAP specifications are defined in RFC, all LDAP-related RFC can be found in LDAPman RFC page.

Guess you like

Origin blog.csdn.net/silentwolfyh/article/details/88668749