Control MySQL-05-

That automation control platform operation and maintenance of the building, divided into meta data management, backup management, instance management, host management, task management, log management, routine maintenance.

standardization

  Standardization is the basis for large-scale, automated. Develop standards and use saltstack to maintain DB server infrastructure software installation and configuration specification file:

  1. RAID5 disk made of a unified model to expand the utilization of space.
  2. For the WB, IO scheduling policy for the deadline, and optimize other aspects of the IO SSD RAID card reader unified strategy.
  3. The unified directory configuration, distinguished by the port, e.g. my3306, my3307, create the corresponding data in the directory my3306 below, log directory, run the file directory, tmp catalog.
  4. Each instance of an exclusive profile, except server_id, innodb_buffer_pool_size other parameters and other parameters are consistent.
  5. MySQL software catalog and online versions of the environment consistent.
Backup Monitoring

  If there is no uniform entry to view the results of the backup succeeded or failed, DBA database backup effectiveness of their maintenance ignorant, unexpected problem arose when the need to restore but can not recover, it will be a fatal blow:

  1. Real-time view of the implementation of the backup, you should back up the current number of instances, the number of instances have been completed, the number of backup failure.
  2. Display duration of each backup takes.
  3. View past five days of backup statistics, such as total number, size and so on.
Mission Systems

  All automated management platform requires a core component - a task management system, actively or passively various task scheduling. And the need to support multiple types of tasks: support in accordance with time (minute, hour, day, week, month), also supports repetitive tasks certain intervals.
  The system consists of the task scheduling logic agent and issued on the database server tasks, task scheduling metadata table records all the time policy tasks and task associated hosts. Through the mission system, we completely removed the crontab script on the DB host, dynamic modification of the task execution time, the policy and the need to perform a breeze.

Backup Management

  Backup database is to use xtrabackup do physical backup, compressed, and then rsync to backup purpose machine, regularly back up to an offsite remote room.

  1. Reconstruction using python underlying backup script, executed by the agent on the server db, add callback interfaces api for setting the operating state of the backup task, the backup fails if the instance is present on a host, an alarm will be sent to the mobile phone DBA, DBA can directly view the backup system to back up its error logs, retry, eliminating the need for steps to log DB host execution.
  2. And a coupling system task, we PFDs crontab timing backup tasks.
  3. Support set the backup time and instance management page is to be backed up, supports dynamic adjustment of the machine backup purposes.
  4. Daily check effectiveness of the implementation of the core database for backup. If the backup verification fails, triggering an alarm via SMS micro-channel or platform alarm to notify a DBA to check and re-discover the backup.
Host Management
  • Host is the basis for maintaining metadata database instance, contains: Core host name, ip address, room location, memory, space, etc.
  • Using the timing task information acquired by the host Zabbix / open-falcon of api periodically, such as: disk space, available memory space,
  • Cluster database operations, such as: space remaining warning
Instance Management

  In order to play as much as possible the performance of the host, often using multi-mode single instance, host and instance DB-many relationship. By way of example management system, we can achieve the following functions:

  1. View the current lists of examples, acquisition of the current instance of the data size, the size of the log, the master status from the delay, slow check number and the like. We can also set whether to enable instance through lists
  2. Add a single instance, a pair of main adding one or more from the library. Examples of the new process is a standard command via rsync template database on a remote machine or a local backup machine (mysql example of a pre-generated and off), and then render server_id, buffer_pool_size my.cnf with other standard template to generate my.cnf configuration file, specific steps can be viewed through the management interface, task scheduling system supports some of the steps of the retry failed.
  3. Examples of consistency check from the master. In the master-slave MySQL replication, since it is possible from the master copy error, the master switch or the like from the improper use causes the main application from the data inconsistency. Day for all core database, check the consistency of master and slave, to avoid affecting online.
  4. Examples of resolution, for the prior example of a plurality of the same schema inside split into different instances inside.
  5. The metadata instance daily snapshots, such as slow search data, directory size, easy examples of historical data analysis.
Log Management

  Log management for maintaining slow_log and killed_sql (to be written to kill the sql specific rules specified log file)

  • Showcase various business slow query most topN and corresponding slow check analysis (pt-query-digest)
  • Slow query instance exceeds a certain threshold will trigger an alarm message, promptly notify the DBA and development concerns.
  • Show topN situation is SQL-kill
Metadata Management

  Metadata Management binlog contains metadata overflow check the primary key, and the like fragmentation information.

  • binlog metadata management major record each instance of each binlog start time and end of time, retention binlog long time during data recovery can quickly navigate to a log.
  • By primary key overflow checking, we can timely discover the primary key table increment which has reached a critical value, to avoid increasing the overflow from the primary key can not be inserted lead to failure.
  • By providing a database name, the table name, the key fragments, you can quickly locate an instance to improve targeting efficiency Example
Routine maintenance

  Routine maintenance is mainly low-frequency part of the solution but time-consuming human flesh operations: Bulk View examples of certain parameters, batch modify the configuration, emergency binlog recovery.

  • Batch execute SQL, intelligent perform maintenance of SQL, such as: the need to modify the value of a parameter, or get the value of a parameter, DML is not allowed to perform.
  • Bulk edit configuration files, such as: slow adjustment check time.
  • Binlog resolved, based on open source binlog2sql, according to the database name provided, table name, time period, using the metadata found in the specified binlog binlog is resolved to text files can be viewed and downloaded on the page.
Data Operations

  (An example of the size of the data space, memory size, etc.), done according to the law of development operations analysis of accumulated data of the specific business scenario these accumulated trends and costing formed.

  • Trends are representative of the overall database space and memory utilization, as well as the core business of the growth curve, convenient dba machine resource allocation.
  • Cost accounting and cost statistics occupancy cost ratios of the individual business, providing a reference for business decision-making layer.
HA management

  High Availability Management include: health check, failover, automatically switch, status monitoring.

  • It provides a complete Restful API (front and rear end on separation) to manage the cluster and instances.
  • relay log parse and process the data when a failover MySQL padding problem in two ways based on GTID.
  • And switching the active failover completed within several seconds of time.
  • Failover may be used VIP, DNS, middleware, etc., it can also be used in addition to the switching control client access
other

  In the subsequent operation and maintenance aspects also need to implement second-level monitoring, log inspection, inspection examples, examples of split levels and other functions, for development needs to improve database performance diagnostics, automatic analysis database slow check function.
  From the user point of view interactive use, management and control is the primary DBA to use, but the final target system or service is the business side of development, how to improve the effective utilization of the system, bring development in the delivery and maintenance also need to think about the use of proceeds and target landing.

Reference Documents

https://cloud.tencent.com/developer/article/1173882

Published 21 original articles · won praise 1 · views 2703

Guess you like

Origin blog.csdn.net/ManWZD/article/details/104102894