Operation and maintenance engineers technical guidance -MySQL

1. fault type of problem

Have you ever had any fault in their daily work?
What failures have you worked in your daily work?
Do you usually work live?

Fault 01: mysql version of the software when the wrong choice

每一个故事的背后都有一个事故,每个故障总结一个故事

软件版本  64位 32位选择错误

Fault 02: Installation fault

1. The user did not create rights to

2. Initialize mysqld mysql_install_db

本地的mariadb没卸载, /etc/my.cnf没有删掉 ,初始化时会导致失败

3. The profile does not correspond to the actual situation (basedir, datadir)

此处该加上一个励志的故事,融入个人情感,刚开始学mysql的不容易与坚持

4. Restart, data disk is not mounted
/dev/sdb ----> /data 没有自动挂载 导致mysql启动不了
排查思路: 日志文件中的错误信息,当时对日志不是特别熟

5.mysql upgrade fails 5.5 ---> 8.0
当时不清楚mysql的这些原因,没有注意,领导让测试8.0的环境
我就下载了一个8.0.11版本,我安装到了测试服务器,原生产核心功能表的一部分数据通过MDP工具导出来了,恢复到测试中,然后应用测试的时候直接导致无法连接,通过查看官方文档8.0的那个<what is new?>,才明白了不能大版本升级

排错思路:

主要原因就是版本的特性原因,8.0不升级数据字典,,,8.0的密码和用户管理发生了巨大变化 先升级到5.7 再升级8.0,sql_mode,数据类型差别

这个问题当时卡了一个小时,第二周就要上线了,因为也没人协助我,从网上一直搜查资料一直熬到快三点了才解决,想想这几年刚大学毕业,一直觉得差其他人好多的,所以自己就特别要强,不会的东西就一直搞搞搞,现在想想,觉得自己做的无怨无悔,这个行业带给我很多好的结果,我就认为我自己肯定行,自从经历了那么多的熬夜查文档,解决问题的那个时刻真的让我就特别的开心,现在出现故障慢慢的就不怵了

Fault 03: database not connected

1. No Network, firewall

1.网络不通:
    网线坏了(老鼠咬断了  机柜压断了  被人拔了 哈哈...)
    网卡  bond 交换机  路由器  回路 网络流量满负荷
    解决思路:  监控

2.防火墙:敲错防火墙规则,上来写错了一条规则,导致内网服务器访问不了了
        不过幸亏是在测试环境做的,有个好习惯就是任何调优配置都先在测试环境中配置

3.没启动 端口 IP

4.应用端客户端工具版本过低
https://downloads.mysql.com/archives/

5.连接数(499)
    redis雪崩 穿透
    日志
    show processlist;

Problem 04: profile issues

Problem 05: multi-instance

Problem 06: sql_mode (groupby, time types) Migration Upgrade

Problem 07: Data Types

Problem 08: Character Set: garbled

Problem 09: collation issues

Problem 10: update problem binlog2sql

Problem 11: DDL, database rammed live

show processlist;
pt-osc

Problem 12: select query slow

头一天好好的,第二天就慢了 
optimize table t1;

Problem 13: Slow statement processing, the same stored procedure is executed dozens of times a day

slowlog 抓到是一个存储过程,执行几十次

Problem 14: delete

binlog2sql Flip delete replaced update

Problem 15: indexing problems:

荣誉索引过多,索引列比较长(前缀),联合索引(索引覆盖长度,顺序)
slowlog ----> explai ---->索引

Problem 16: Storage Engine

1. 表空间迁移
2. 每周六全备,其他时候binlog增量.
    异常断电,binlog损坏,ibdata1损坏
3. 碎片整理     
    alter table t1 engine innodb;
4. 锁等待  
5. 幻读,不可重复读

Problem 17: Log Failure

1.reset master rm -rf 导致主从IO线程故障
   数据库如果出现损坏 无法完整恢复
   
2. gtid : --skip-gtids 导致数据无法恢复

3. slowlog

Problem 18: Backup and Recovery

1. mysqldump 加了 --set-gtid-purged=off,主从构建不成功
2. --max_allowed_packet,大表备份时报错
3. -E -R --triggers没加
4. 增量合并失败.

Problem 19: Master-slave

1.主从故障: IO SQL show slave  status \G
2.主从延时: 延时时间 日志量差异
3.主从不一致: 从库宕机 pt工具
4.延时从库 解决逻辑故障
5.过滤复制 只复制了部分库 没有复制mysql,查询时连接不上或没有权限
6.gtid复制搭建

Problem 20: High Availability MHA

只有vip功能 缺了binlogserver 故障提醒功能
MHA+keepalive 权重问题

Problem 21: Distributed Mycat

1.分片方式,分片策略设计不合理
2.跨分片join 全局表 ER表

Problem: 22: Optimization

1. innodb_flush_log_at_trx_commit=0
2. sync_binlog=0
3. innodb_flush_method=fsync  占用大量的额外内存,
   配合固态硬盘使用 O_direct

2. architecture class

1. A master + 1 from separate read and write proxysql maxscale (50G)
8核32G
阈值:
并发连接 800-1000
并发查询 5W QPS
并发事物 300 TPS
2. A master from + 3 + delay from separate read and write (100-200G)
8核32G
阈值:
并发连接 800-1000
并发查询 8W QPS
并发事物 200 TPS
A main filter 3. Multi-copy replication from the cascade + + (300-500G)
8核32G
阈值:
并发连接 800-1000
并发查询 15-20W QPS
并发事物 200 TPS
4.MHA + ProxySQL 1 from the master 3
1主2从做MHA+proxySQL 1从做容灾
16核64G*3 + 8核16G

阈值:
并发连接 1500-2000
并发查询 12W QPS
并发事物 400 TPS

此架构适合电商平台,物流
2T数据
5. PXC + proxySQL (MGC + maxscale)
16核64G*3 + 8核16G
阈值:
并发连接 1500-2000
并发查询 12W QPS
并发事物 400 TPS

此架构适合电商平台,物流 
2T数据
6.Mycat + MHA (PXC) * 3 highly available distributed clusters
16核128G*7
阈值:
并发连接 3000-5000
并发查询 20W QPS
并发事物 800-1000 TPS

教育行业(大数据平台) 
9T数据
7.redis sentinel+Docker
redis Cluster + k8s
8. MongDB replication
保险类公司
16核 256内存 + 40T*4台
20T左右数据 +保单
9. MongDB Sharding + HASH
数据在PB级别: 共享单车 百度地图 京东 360 
16核128G*9台*40T

3. Optimization category

Lock Wait
index SLOW +
innoDB replaced TokuDB (MyROCKS)

4. Upgrade Migration

1. zabbix
2. MongDB Sharding + HASH
数据在PB级别: 共享单车 百度地图 京东 360 
3年-5年的某银行流水 30T
16核128G*9台*40T

Objective interview questions:

  1. Your corporate structure is sawed, do what business, the amount of data, QPS, TPS?
  2. Your company databases are used for those products, architecture, respectively, what service?
  3. What do you handle failure?
  4. Do you think what we need to recruit company employees? Do you think you're the kind of person? Why do you think we want to recruit you?
  5. You know what degree of SQL statements?
  6. The index you understand it? Btree algorithm to find what principle? Clustered index and secondary index difference? How do you understand the index tree height?
  7. Please introduce you to understand your storage engine type?
  8. Please describe the data InnoDB, MyRocks, TokuDB difference?
  9. InnoDB core features: Transactions, ACID, lock, isolation levels, redo, undo, MVCC, phantom read, dirty reads, non-repeatable read, defragmentation how to do?
  10. What are binary log format? What are you operating on the binary log too?
  11. How your company's backup strategy designed? How long will you back up? Hot Standby function MDP how to achieve, what XBK backup principle is?
    All equipment is large, a very small table corruption, what is your idea of fast recovery ?
  12. Copy the master-slave principle, from the master how to monitor, how to troubleshoot, how to troubleshoot latency problem. You reducing the delay from the master and programs have any good suggestions?
    SECOND, behind Master how come?
    Delay from the library is used to do ? semi-synchronous replication principle? enhanced semi-synchronous, the principle of non-destructive copy, MGR group copy principle? Paxos principle?
    GTID copy and traditional replication difference?
  13. MHA Failover principle, PXC works?
  14. Your understanding of the distributed architecture?
  15. Redis: persistent way, data types, transaction explained?
  16. Works redis sentinel high-availability cluster, redis Cluster works?
  17. MongoDB replicationSet works (raft distributed consensus agreement), Sharding CLuster works.
    18. What you have done to optimize?
  18. PT tools are what users?
  19. What stress testing tool used?
  20. You (recovery installation, network management, file management, table space, backup, DG, RAC) for Oracle understand it? Pg (installation, basic management, backup, recovery, clustering) know?
  21. TiDB you know?
  22. Do you know about cloud databases?
  23. You know about Docker, K8s it?

Guess you like

Origin www.cnblogs.com/linuxcx/p/11590180.html