Explain the query_band load identification and application of GaussDB (DWS) in detail

Abstract: query_band is a session-level (session) GUC parameter, which is a string type and supports any combination of characters.

This article is shared from Huawei Cloud Community " GaussDB (DWS) query_band load identification and application ", author: a vine in front of the door.

query_band overview

GaussDB (DWS) implements query_band-based load identification and priority scheduling. On the one hand, it provides a more flexible means of load identification. It is no longer limited to routing jobs to corresponding resource pools based on the "user-resource pool" mapping relationship. Provides a "key-value pair-resource pool" routing method; on the other hand, it implements job priority scheduling, and schedules jobs according to priority when queuing occurs.

Administrators can configure resource pools and priorities associated with query_band according to business scenarios and job categories to achieve more flexible load management. If the business does not configure query_band or the user does not associate query_band with behavior, the job will use the resource pool associated with the user and the default priority (Medium) by default.

What is query_band?

query_band is a session-level (session) GUC parameter, which is a string type and supports any combination of characters. When query_band is used for load identification, in order to facilitate the distinction and solve the problem of incomprehensible meaningless strings, only strings in the form of key-value pairs are supported. The query_band key-value pair has the following restrictions:

  • Only supports identifying strings in the form of key-value pairs, namely: "key=value";
  • Valid characters: numbers 0~9, uppercase letters A~Z, lowercase letters a~z and some symbols ('.', '-', '_' and '#');
  • The maximum length of a single key-value pair is 1024;
  • Multiple key-value pairs are supported, and the key-value pairs are separated by semicolons;
  • 示例:SET query_band = ‘JobName=abc;AppName=test;ApplicationName=jdbc’。

query_band load identification

The resource management function provided by GaussDB (DWS) implements resource isolation control and query scheduling from the perspective of resource pools, thereby realizing resource isolation between different businesses. A resource pool is the basic unit of resource management and query scheduling. Before running a query, it is necessary to determine which resource pool to use, and use the resource pool resources (computing resources/concurrency, etc.) during query scheduling and query running.

Queries are initiated by users, and users are generally classified according to business. Therefore, it is natural to think of associating users with resource pools, so as to realize the effect of user queries running in corresponding resource pools. GaussDB (DWS) provides the ability to associate user-resource pools. By default, users are associated with the default resource pool. You can create custom resources according to business needs and associate users with custom resource pools. User queries are based on "user-resource pool " relationship will route the query to the corresponding resource pool for execution, so as to realize the management and control of query concurrency, memory and CPU resources. In this way, the resource limitation and isolation between different businesses can be realized, the mixed load requirements of the database can be met, and the orderly and controllable resource scheduling can be ensured during query execution.

The association relationship between users and resource pools provided by "user-resource pool" is not applicable to scenarios where users and services are mixed and crossed (multiple users correspond to multiple services). In addition, the jobs of different users in a resource pool may have different priorities. In this case, different priorities need to be configured for different users or services to implement priority scheduling. Therefore, it is necessary to provide a capability. On the one hand, it is no longer limited to the "user-resource pool" association mode, and on the other hand, it can also implement priority scheduling within the resource pool. In this case, query_band load identification comes into being.

query_band load identification provides two capabilities:

  • On the one hand, it provides a more flexible load identification method, no longer limited to routing jobs to corresponding resource pools based on the mapping relationship of "user-resource pool", and provides a routing method of "key-value pair-resource pool";
  • On the other hand, it implements priority scheduling, supports setting different priorities for different users or services, and realizes priority scheduling in resource pools.

query_band function implementation

working principle

The load identification of query_band takes key-value pairs as the unit. There may be many key-value pairs used by users, but in fact there are only a few key-value pairs associated with load behavior. For the convenience of subsequent understanding, here, key Value pairs are divided into valid key-value pairs and invalid key-value pairs:

Valid key-value pairs: associated payload behavior;

Invalid key-value pair: No payload behavior is associated.

The query_band set in the session may contain multiple key-value pairs, and different key-value pairs may be used for load identification in different scenarios to achieve load control (by time/day). When the query_band contains a unique valid key-value pair, use the key-value pair for load identification; when the query_band contains multiple valid key-value pairs, select a valid key-value pair for load identification according to the following rules:

  • When the matching order of key-value pairs is different, the key-value pair with the smallest matching sequence number is preferentially selected for load identification;
  • When all key-value pairs match in the same order, select the top key-value pair in sequence for load identification

Example: Assume that all key-value pairs in set query_band='b=1;a=3;c=1' have the same matching order, then select b=1 for load identification; suppose set query_band='b=1;a=3 ;c=1', where the sequence of b=1 is -1, the sequence of a=3 is 4, and the sequence of c=1 is 1, then select c=1 for load identification.

recognition ability

The administrator user adjusts the resource pool and scheduling priority used by the business (different businesses correspond to different query_band key-value pairs) according to business scenarios and load changes. The working mechanism of load identification and query_band during business operation is as follows:

  1. Set query_band in the session, example: SET query_band='JobName=abc;UserName=elk';
  2. The load management module parses the query_band to determine whether it contains valid key-value pairs;
  3. If the query_band does not contain a valid key-value pair, use the "user-resource pool" method to route the job to the corresponding resource pool to run, and set the job priority to Medium;
  4. If the query_band contains a valid key-value pair, use the "key-value pair-resource pool" method to route the job to the corresponding resource pool to run, and set the job priority to the key-value pair associated priority;
  5. Jobs are queued in the corresponding resource pool according to the set priority, waiting for query scheduling.

priority scheduling

query_band supports three priorities (High/Medium/Low), and provides Rush as a special priority (green channel). The default priority is Medium. In practice, it is recommended that most jobs use Medium priority, low priority jobs use Low priority, and privileged jobs use High priority. It is not recommended to use too many High jobs. Rush priority is used as an emergency in special scenarios and is not recommended for normal use.

When scheduling, high-quality jobs are scheduled first, and low-quality jobs are scheduled after all high-quality jobs are scheduled. GaussDB (DWS) contains multiple priority queues. Except in the dynamic load management scenario, the CN global concurrency control queue does not support priority scheduling, the following queues support priority scheduling (scheduling in priority order):

  • In the static load management scenario, CN global concurrent control queue;
  • In the dynamic load management scenario, CCN global memory control queue;
  • Resource pool concurrency control and memory management queue. (both dynamic and static are supported)

During job running, the job priority can be queried through the pgxc_session_wlmstat/pg_session_wlmstat view. The priority in the view is displayed as an INT type. The corresponding relationship between the number and the priority is as follows:

query_band external interface

gs_wlm_set_queryband_action

Provide FUNCTION: gs_wlm_set_queryband_action(query_band cstring, action cstring, order int4) is used to set query_band load behavior, the return value type of the function is bool, which indicates whether the function call is successful or not, and contains three input parameters with the following meanings:

  • query_band: query_band key-value pair
  • action: load behavior
  • order: matching order (serial number), default parameter, default value -1

Application example: Set the query_band key-value pair "UserName=elk" to associate resource pool p1, priority Rush, and matching order to 1.

SELECT * FROM gs_wlm_set_queryband_action('UserName=elk','respool=p1;priority=rush',1);

gs_wlm_set_queryband_order

Provide FUNCTION: gs_wlm_set_queryband_order(query_band cstring, order int4) is used to modify the matching order of query_band. The return value type of the function is bool, indicating whether the function call is successful or not. It contains two input parameters with the following meanings:

  • query_band: query_band key-value pair
  • order: matching order (serial number), default parameter, default value -1

Except for -1, two query_band key-value pairs are not allowed to use the same matching order. When setting the matching order of query_band key-value pairs, if there is a query_band that holds the matching order, its order will be automatically +1. Repeat the above steps until there is no identical match Sequential query_band key-value pairs exist. In the matching order, -1 is the largest, representing the lowest matching priority, and the minimum value is 0, representing the highest matching priority.

Application example: Suppose the matching order of the query_band key-value pair "UserName=elk" is 1, the matching order of "UserName=bin" is 2, and the matching order of "UserName=yagao" is 3, then set the query_band key-value pair "UserName =on" matching order is 1.

SELECT * FROM gs_wlm_set_queryband_order('UserName=on',1);

After the setting is complete, the matching sequence of query_band key-value pairs is as follows:

System table pg_workload_action

query_band supports multiple load behaviors, and the system table pg_workload_action is used to store the load behaviors corresponding to different query_band key-value pairs. For subsequent scalability (adding new load behaviors does not require new fields), the system table is designed to store one row corresponding to one load behavior. When a query_band key-value pair is associated with multiple load behaviors, each load behavior stores a row of data. The system table contains four fields:

  • qband: key-value pair
  • class: load behavior category
  • object: load action name
  • action: the associated payload action

query_band currently supports the following load behaviors, where the matching order (serial number) of query_band key-value pairs is also stored in the system table as a load behavior.

Note: Default values ​​do not need to be stored in system tables; resource pools hold OIDs.

Example: Assume that the query_band key-value pair "UserName=elk" is associated with resource pool p1, the priority is Rush, and the matching order is 1; "UserName=on" is associated with resource pool p1, the priority is Medium, and the matching order is -1. The result of querying pg_workload_action is as follows:

postgres=# select * from pg_workload_action order by 1,2;
 qband | classname | objname | action
--------------+-----------+----------+--------
 UserName=elk | order | respool  | 1
 UserName=elk | workload  | respool  | 16722
 UserName=elk | workload  | priority | rush
 UserName=on | workload  | respool  | 16722
(4 rows)

pg_queryband_action view

The pg_workload_action system table is used to store the query_band key-value pair load behavior. The query_band behavior can directly query the table, but the way of displaying one row for each load behavior is not easy to use, so we provide pg_queryband_action for querying all query_band keys Load behavior of value pairs, each row corresponds to all load behaviors of a key-value pair.

Example: Assume that the query_band key-value pair "UserName=elk" is associated with resource pool p1, the priority is Rush, and the matching order is 1; "UserName=on" is associated with resource pool p1, the priority is Medium, and the matching order is -1. The result of querying pg_queryband_action is as follows:

postgres=# select * from pg_queryband_action;
 qband | respool_id | respool | priority | qborder
--------------+------------+---------+----------+---------
 UserName=on | 16722 | p1      | Medium   | -1
 UserName=elk | 16722 | p1      | rush     | 1
(2 rows)

query_band application

basic application

Create resource pool respool_1, and create user user_1 to associate resource pools respool_1 and respool_2. In the scenario where the query_band load behavior is not set, user_1 is used to run jobs. At this time, all user_1 jobs are routed to respool_1 to run, and the priority is Medium.

Set the load behavior of the query_band key-value pair "JobName=elk" to the associated resource pool respool_2, with a priority of Medium; set the load behavior of the query_band key-value pair "JobName=on" to the priority of High. User_1 sets up different query_bands to run jobs, and the different job running modes, associated resource pools, and job priorities are shown in the following table:

Extended application (user priority scheduling)

Create resource pool respool_1, and create users user_1, user_2, and user_3 to associate with resource pool respool_1. In the scenario where the query_band load behavior is not set, user_1, user_2, and user_3 are used to run jobs. At this time, user_1, user_2, and user_3 jobs are all routed to respool_1 to run, and the priority is Medium.

Set the priority of the query_band key-value pair "UserName=elk" to High; set the priority of the query_band key-value pair "UserName=on" to Low.

Remarks: "UserName=elk" and "UserName=on" are only used for user identification and have no special meaning. Users can configure them as needed.

Set the user default query_band as follows:

ALTER USER user_2 SET query_band='UserName=elk';
ALTER USER user_3 SET query_band='UserName=on'; 

The query_band is not set separately in the session, and user_1, user_2, and user_3 are used to run the job. At this time, the job priority of user_1 is Medium (the default priority), the job priority of user_2 is High (corresponding to the key-value pair "UserName=elk"), and user_3 The job priority is Low (corresponding to the key-value pair "UserName=on").

In addition, users can also set query_band that contains multiple key-value pairs, and in different scenarios (or different time periods), load identification is performed according to different key-value pairs to achieve more flexible load control, which will not be described here.

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/8657211