MySQL高可用管理工具:orchestrator

orchestrator 是一个MySQL高可用复制拓补的管理和可视化工具,允许:

1)发现:

orchestrator主动发现拓补结构并读取基本的MySQL信息,如复制状态和配置。

2)重构:

可以将一个不可用的服务器从拓补结构中剔除,并把数据副本移动到另一个从库下面

3)恢复:

在不可用时,可以选择一个适合的从库提升为主库

4)可视化:

web可视化,并允许直接在界面上进行复制拓补变更等操作

一、环境:

系统:

        centos7.2

三台主机:

        master:192.168.89.100

        slave1:192.168.89.102

        slave2:192.168.89.103

后端端口: 

          orchestrator实例端口:3306

   后端mysql服务器实例端口:3307

二、安装:

orchestrator要求与mysql实例为1:1的关系。开启raft时需要关闭IPV6。关闭selinux。

1)安装yum扩展源:yum -y install epel-release

2)更新系统上的软件:yum -y update

3)安装依赖:yum install -y jq oniguruma oniguruma-devel

4)下载服务端和客户端的软件包:

      wget https://github.com/github/orchestrator/releases/download/v3.0.14/orchestrator-3.0.14-1.x86_64.rpm
      wget https://github.com/github/orchestrator/releases/download/v3.0.14/orchestrator-client-3.0.14-1.x86_64.rpm

5)安装:yum localinstall -y orchestrator*.rpm

6)安装后,自动在/usr/local下创建orchestrator目录。

orchestrator:执行程序。

*.conf.json:配置文件模板。

resources:web,伪GTID相关文件

7)配置文件直接使用自带的模板文件:cp -a orchestrator-sample.conf.json orchestrator.conf.json

8)配置文件参数说明:

{
  "Debug": true,     #debug模式,输出详细信息
  "EnableSyslog": false,   #是否输出到系统日志里
  "ListenAddress": ":3000",   #orchestrator的监听端口,web端口
  "MySQLTopologyUser": "failover",   #后端被管理的mysql实例中的账号,所有实例都要有,本次为3307端口
  "MySQLTopologyPassword": "123456",   #密码
  "MySQLTopologyCredentialsConfigFile": "",   #验证的配置文件,账号密码可以直接写入文件,读取
  "MySQLTopologySSLPrivateKeyFile": "",   #ssl验证文件
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,   #跳过验证
  "MySQLTopologyUseMutualTLS": false,   #使用TLS验证
  "MySQLOrchestratorHost": "127.0.0.1",   #orchestrator的IP,也可以是本机IP
  "MySQLOrchestratorPort": 3306,   #orchestrator所在的端口,本次为3306端口
  "MySQLOrchestratorDatabase": "orchestrator",   #orchestrator元数据的数据库名称
  "MySQLOrchestratorUser": "root",   #管理orchestrator数据库的账户
  "MySQLOrchestratorPassword": "123456",  #密码
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,   #orchestrator连接mysql超时秒数
  "DefaultInstancePort": 3307,  #mysql实例的端口,本次为3307,对外提供服务的实例
  "DiscoverByShowSlaveHosts": true,   #是否启用审查和自动发现
  "InstancePollSeconds": 5,   #orchestrator探测mysql间隔秒数

  "SkipMaxScaleCheck": true,    #没有MaxScale binlogserver设置为true

  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "none",  #解析主机名,默认default   不解析为none
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,  #跳过二进制服务器检测
  "ExpiryHostnameResolvesMinutes": 60,  #域名检测过期分钟数
  "RejectHostnameResolvePattern": "",   #禁止的域名正则表达式
  "ReasonableReplicationLagSeconds": 10,    #复制延迟高于10秒表示异常
  "ProblemIgnoreHostnameFilters": [],  #主机正则匹配筛选最小化
  "VerifyReplicationFilters": false,  #重构钱检查复制筛选器
  "ReasonableMaintenanceReplicationLagSeconds": 20,   #上移和下移的阈值
  "CandidateInstanceExpireMinutes": 60,  #实例过期分钟数
  "AuditLogFile": "",  #审计日志
  "AuditToSyslog": false,  #审计日志输出到系统日志
  "RemoveTextFromHostnameDisplay": ":3306",  #去除集群的文本
  "ReadOnly": true,  #全局只读
  "AuthenticationMethod": "",  #身份验证模式
  "HTTPAuthUser": "",  #http验证用户名
  "HTTPAuthPassword": "",  #http验证密码
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "SlaveLagQuery": "",  #使用SHOW SLAVE STATUS进行延迟判断
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",  #查询集群别名
  "DetectClusterDomainQuery": "",  #查询集群Domain
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",   #从正则表达式中筛选数据中心名称
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",  #返回实例的物理环境
  "PromotionIgnoreHostnameFilters": [], 
  "DetectSemiSyncEnforcedQuery": "",  #查询以确定是否强制完全半同步写入
  "ServeAgentsHttp": false,  #产生一个agent的http接口
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",  #为空禁用伪GTID
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,  #查看二进制日志的块大小,更小锁定更少
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,   #该时间内发现故障,不被多次发现
  "RecoveryPeriodBlockSeconds": 3600,   #该时间内发现故障,不会多次转移
  "RecoveryIgnoreHostnameFilters": [],  #恢复会忽略的主机
  "RecoverMasterClusterFilters": [   #对匹配的集群主恢复
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [   #对匹配的集群恢复
    "*"
  ],
  "OnFailureDetectionProcesses": [   #故障转移之前的HOOK
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}  autoMasterRecovery:  {autoMasterRecovery}  losthost: {lostSlaves}  slavehost: {slaveHosts}   orchestratorHost: {orchestratorHost}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [   #主变为只读之前执行的HOOK
    "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only autoMasterRecovery:  {autoMasterRecovery}  losthost: {lostSlaves}  slavehost: {slaveHosts}   orchestratorHost: {orchestratorHost}' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [  #执行恢复操作前执行
    "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [   #恢复成功后执行
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],  #任何失败的恢复执行
  "PostMasterFailoverProcesses": [  #恢复成功时执行
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [  #成功的中间主恢复时执行
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [   #旧主位于新主之后执行
    "echo 'Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,  #共同恢复,否则失败
  "DetachLostSlavesAfterMasterFailover": true,  #恢复可能丢失一些副本
  "ApplyMySQLPromotionAfterMasterFailover": true,  #在将要提升的主机上设置read_only=0
  "PreventCrossDataCenterMasterFailover": false,  #允许跨DC转移
  "MasterFailoverDetachSlaveMasterHost": false,   #确保新主不复制旧主的数据
  "MasterFailoverLostInstancesDowntimeMinutes": 0,  #故障转移之后,其他从停机的时间
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,

  "RaftEnabled": true,   #raft模式
  "BackendDB": "mysql",  #后台数据库类型
  "RaftBind": "192.168.89.103",  #绑定之地,本机IP
  "RaftDataDir": "/var/lib/orchestrator",  #数据目录,如果不存在,则自动创建
  "DefaultRaftPort": 10008,  #raft通信端口,所有机器必须保持一致
  "RaftNodes": [   #raft节点,必须包含所有节点
    "192.168.89.100", 
    "192.168.89.102",
    "192.168.89.103"
    ],

  "ConsulAddress": "",
  "ConsulAclToken": ""
}
 

三、启动:

1)在所有被管理的mysql实例上(本机为3307端口)建立管理账号:

GRANT ALL ON *.* TO 'failover'@'127.0.0.1' IDENTIFIED BY '123456'; 

GRANT ALL ON *.* TO 'failover'@'192.168.89.%' IDENTIFIED BY '123456'; 

在所有orchestrator实例的mysql上建立管理账号:

GRANT ALL ON *.* TO 'root'@'127.0.0.1' IDENTIFIED BY '123456'; 

2)创建复制拓补环境,这里不细说,把三台主机创建成复制结构即可。

3)编辑hosts文件,写入三台的hostname。

4)启动orchestrator:

        nohup orchestrator http &      orchestrator会自动到/usr/local/orchestrator下查找到配置文件

5)打开web界面:IP:3000。orchestrator会自动发现复制拓补结构并显示到界面上,如果没有发现,则可以手动发现。

orchestrator-client -c discover -i 实例:端口

orchestrator可以很好地与proxysql中间件结合,做读写分离与高可用。事实上,percona官方也推荐这套组合,mysqlfailover+proxysql也是很好的一套,不过MySQLfailover需要自己定制脚本。

目前,我们生产环境是orchestrator+proxysql。

猜你喜欢

转载自blog.csdn.net/baijiu1/article/details/89395654