Big Data Cloud platform Greenplum: multi-tenant articles

 

 

Greenplum MPP is the best open source database, after 15 years of development, from the data warehouse evolved into the era of big data cloud over the platform.

This series of articles will introduce Greenplum support for all aspects of the cloud. Benpian focus on multi-tenant.

1. What is multi-tenant

It refers to a multi-tenant system can support multiple tenants. A tenant is usually a group of users with similar access mode and permissions, typical tenant is several users in the same organization or company.

To achieve multi-tenancy, you first need to consider is the multi-tenant data level. Multi-tenant model data layer has a prominent effect on the upper multi-tenant services and applications to achieve. This article focuses on the data layer multi-tenancy and Greenplum database support for various multi-tenant model.

Weighing different implementations of multi-tenancy, consider the following factors:

  • Scalability: Tenant number of levels, as well as future trends
  • Security: Data isolation levels between tenant requirements
  • Resource Sharing: multi-tenant usually has some form of sharing of resources, the need to avoid a bad tenant SQL eat system resources and affect the response time of other tenants
  • Flexibility: different tenants may have different needs, scalability demand for specific tenants
  • Cross tenants analysis and optimization: for all tenant or multiple tenants and behavior data analysis capabilities
  • Operation and maintenance and management: the complexity of cheap and operation and maintenance management, including monitoring, modifying the database schema, create an index, collect statistics, data loading, etc.
  • Cost: total cost of ownership, including program implementation costs, operation and maintenance costs, etc.

2. Multi-tenant model

Multi-tenant model describes the mapping between the tenant and the tenant's data. Different multi-tenant model will affect the design, management and maintenance of databases and applications.

Using more multi-tenant model, there are three. Greenplum these three models have excellent support.

2.1 a tenant a database

The easiest way to achieve multi-tenant Greenplum is to create a cluster for each tenant, as shown below. Each application is assigned a tenant tenant id, and configure the appropriate connection information database (a database including ip, port, etc.) for each tenant. The application connects to the database assigned according to the tenant id.

This data model different tenants physical isolation, high security level. If each tenant Greenplum clusters use different hardware, the use of resources between them is physically separated; if tenants Greenplum cluster share the same set of hardware, you need proper resource allocation and management, to avoid interaction. Due to different tenants use a separate database, flexibility, and easy to meet the specific needs of different tenants (such as the need for additional fields). Influence facets case of failure. The disadvantage is that a large number of databases, maintenance complexity, high cost of ownership. Suitable relatively small number of tenants scene.

2.2 a tenant a namespace (Schema / Namespace)

Multiple tenants share the same database, each tenant has a separate name space (or mode). Each application is assigned a tenant id, and all the operating limits for each tenant in the assigned namespace / mode. As shown below.

In this multi-tenant model, isolated from each other on different tenants of data logic, the safety control is relatively simple. Different tenants have different modes, can easily meet the specific needs of different tenants, and high flexibility. High resource management capacity requirements, in order to avoid different tenants compete for resources. Greenplum combination filepace and tablespace properties, you can put different tenants of data stored on different disks, reducing the competition for disk IO. Operation and maintenance and management more complex and difficult to analyze large amounts of cross-tenant tenants. For moderate number of tenants scene.

2.3 Full sharing

Different tenants share the same database, the same name space. Different tenants of data coexist in the same group table, by the tenant id tag of different tenants, and a data access (in each application SQL to access data comprising tenant id). As shown below.

Such multi-tenant model, the physical data of different tenants are stored together, the resources of the system and security isolation demanding. Operation and maintenance is relatively simple. Good scalability and can support a larger number of tenants. Since the data is stored together with the tenant, the tenant across the data analysis and optimization very straightforward. Low cost, can lower the cost to support more tenants.

全共享模型中,很多数据库采用添加大量自定义字段的方式满足不同租户的特定需求,以提高灵活性。这种方式有诸多局限性,譬如字段数目不能太多、管理复杂等。Greenplum 自 5.0 开始支持更多半结构化数据,包括JSON、Hstore 等,通过这种半结构化数据,可以更灵活、高效、便捷的满足不同租户的特定需求。

2.4 混合模型

这种模型不是一个新的实现方式,而是混合前面介绍的三种模型以满足不同用户的服务级别需求。譬如对于最大的少数几个租户采用一租户一数据库的模型,其他租户采用全共享方式。或者对资源隔离级别要求高、服务响应时间要求高的客户采用一租户一数据库的模型,其他租户采用一租户一名字空间方式或者全共享方式。

2.5 对比

下表列出了不同模式的特点。混合模型兼具不同模型的优缺点,不再单独列出。根据不同需求可以采用不同的实现方式。

特性 一租户一数据库 一租户一名字空间 全共享
扩展性、租户数量
安全隔离 物理隔离 逻辑隔离 依赖数据库和应用安全控制
资源隔离 若用不同集群,则高;否则依赖数据库资源管理特性 依赖数据库资源管理特性 依赖数据库资源管理特性
灵活性 通过JSON等半结构化数据类型提供较高灵活性
跨租户分析 很难,需要跨库查询 难,需要跨模式查询 容易
运维管理 复杂度高 复杂度中 复杂度低
对应用影响 较高
成本

 

3. Greenplum 资源管理

上面提到,不管使用何种多租户模型(除非是不同的物理集群),否则都涉及到资源管理的问题,以满足不同租户的不同资源使用需求,避免某个租户过度使用资源,影响其他租户。

Greenplum 5 设计实现了一个全新的基于资源组的资源管理器,相比之前的资源队列,可以做到灵活高效的资源管理。

下表对资源组和资源队列进行了对比:

特性 资源组(Resource Group) 资源队列(Resource Queue)
并发控制 事务级别 语句级别
死锁 极端情况下会出现
CPU 管理 基于比例、基于cgroup 基于粗粒度的优先级
CPU 空闲利用率 可以充分利用空闲CPU 部分利用
内存限制 精细 粒度粗
组内内存共享
动态修改资源配置 部分
排队 无并发槽位或者内存配额时 Uncomplicated time slot
Management DDL, Utility statements Yes no
Segment-level monitoring and management Yes no
Rule-based resource management Yes no

 

Greenplum can achieve fine-grained resource management in multi-tenant cloud on the scene, it is well suited for resource isolation tenant, a tenant to avoid excessive consumption of resources to ensure the rational use of resources.

For more details, please refer to the official documentation  https://gpdb.docs.pivotal.io/580/admin_guide/workload_mgmt_resgroups.html

4. Summary

Multi-tenancy is the basic requirement cloud database, Greenplum article describes four multi-tenant implementation, and it was compared. Greenplum also introduced the new Explorer-based resource group to achieve resource sharing and multi-tenant isolation.

 

Guess you like

Origin www.cnblogs.com/zhangrui153169/p/11434043.html