叮铃铃,叮铃铃……, “服务器挂了,redis起不来了,Egan能不能帮忙看下吧”,电话里头的焦急的声音。

“我还在路上,能不能等会再说”,然后Egan就把电话给挂了。

滴答滴答……,时间过了好一会

叮铃铃,叮铃铃……,“在电脑前了吗?今天早上服务器突然进不去了,然后打阿里云客服说重启服务器,然后让老板把服务器重启之后,redis就起不来了”

“嗯嗯,了解了,你能不能把服务器账号密码发来一下,我排查下”,Egan回到,“用QQ说吧”。

下面差不多是简短的沟通记录吧

对方: @Egan xxx.xx.xxx.xx 这个是服务器地址,的redis你知道是安装在哪里吗?

对方: 服务器密码:xxxxxxx

Egan:好的

对方: 看docker里面有,但是起不来

刚开始我就看了这个主要启动没反应

对方: @Egan 帮忙看下

然后Egan就开始排查起来。

xxx@iZwz9isa7dpx6izf3br71hZ:~# docker ps -aCONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                           PORTS               NAMES
26a714a7ef63        ubuntu:14.04        "/bin/bash"              19 months ago       Exited (0) 19 months ago                             proc_maraidb
4c38df7be60a        redis               "docker-entrypoint..."   19 months ago       Exited (1) 18 minutes ago                            redis_proc
47b893baeba2        ubuntu:14.04        "/bin/bash"              19 months ago       Exited (255) About an hour ago                       deadlock

然后Egan也尝试了下,启动redis

xxx@iZwz9isa7dpx6izf3br71hZ:~# docker start redis_procredis_proc

这样发现没有可见的报错,又查看了下docker运行的容器,发现还是没有启动

xxx@iZwz9isa7dpx6izf3br71hZ:~# docker psCONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                           PORTS               NAMES

想想查看下docker日志吧。然后就出来一堆的东西,应该是redis数据持久化一些乱七八糟的,具体可以看下面.不过最重要的问题就在最下面几行,日志里面也提供了解决方案。

xxx@iZwz9isa7dpx6izf3br71hZ:~# docker logs --tail 300 4c38df7be60a196:C 19 Nov 06:26:47.253 * AOF rewrite: 1 MB of memory used by copy-on-write1:M 19 Nov 06:26:47.285 * Background AOF rewrite terminated with success1:M 19 Nov 06:26:47.285 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 19 Nov 06:26:47.285 * Background AOF rewrite finished successfully1:M 20 Nov 01:17:02.547 * Starting automatic rewriting of AOF on 4810% growth1:M 20 Nov 01:17:02.548 * Background append only file rewriting started by pid 1971:M 20 Nov 01:17:02.668 * AOF rewrite child asks to stop sending diffs.197:C 20 Nov 01:17:02.668 * Parent agreed to stop sending diffs. Finalizing AOF...197:C 20 Nov 01:17:02.668 * Concatenating 0.01 MB of AOF diff received from parent.197:C 20 Nov 01:17:02.673 * SYNC append only file rewrite performed197:C 20 Nov 01:17:02.674 * AOF rewrite: 1 MB of memory used by copy-on-write1:M 20 Nov 01:17:02.748 * Background AOF rewrite terminated with success1:M 20 Nov 01:17:02.748 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 20 Nov 01:17:02.748 * Background AOF rewrite finished successfully1:M 20 Nov 08:01:28.498 * Starting automatic rewriting of AOF on 3252% growth1:M 20 Nov 08:01:28.500 * Background append only file rewriting started by pid 1981:M 20 Nov 08:01:28.552 * AOF rewrite child asks to stop sending diffs.198:C 20 Nov 08:01:28.552 * Parent agreed to stop sending diffs. Finalizing AOF...198:C 20 Nov 08:01:28.552 * Concatenating 0.00 MB of AOF diff received from parent.198:C 20 Nov 08:01:28.552 * SYNC append only file rewrite performed198:C 20 Nov 08:01:28.553 * AOF rewrite: 0 MB of memory used by copy-on-write1:M 20 Nov 08:01:28.600 * Background AOF rewrite terminated with success1:M 20 Nov 08:01:28.600 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 20 Nov 08:01:28.600 * Background AOF rewrite finished successfully1:M 21 Nov 03:09:02.099 * Starting automatic rewriting of AOF on 2841% growth1:M 21 Nov 03:09:02.100 * Background append only file rewriting started by pid 1991:M 21 Nov 03:09:02.139 * AOF rewrite child asks to stop sending diffs.199:C 21 Nov 03:09:02.139 * Parent agreed to stop sending diffs. Finalizing AOF...199:C 21 Nov 03:09:02.139 * Concatenating 0.00 MB of AOF diff received from parent.199:C 21 Nov 03:09:02.139 * SYNC append only file rewrite performed199:C 21 Nov 03:09:02.140 * AOF rewrite: 0 MB of memory used by copy-on-write1:M 21 Nov 03:09:02.200 * Background AOF rewrite terminated with success1:M 21 Nov 03:09:02.200 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 21 Nov 03:09:02.200 * Background AOF rewrite finished successfully1:M 21 Nov 09:34:00.565 * Starting automatic rewriting of AOF on 6107% growth1:M 21 Nov 09:34:00.567 * Background append only file rewriting started by pid 2001:M 21 Nov 09:34:00.607 * AOF rewrite child asks to stop sending diffs.200:C 21 Nov 09:34:00.607 * Parent agreed to stop sending diffs. Finalizing AOF...200:C 21 Nov 09:34:00.607 * Concatenating 0.00 MB of AOF diff received from parent.200:C 21 Nov 09:34:00.607 * SYNC append only file rewrite performed200:C 21 Nov 09:34:00.608 * AOF rewrite: 0 MB of memory used by copy-on-write1:M 21 Nov 09:34:00.667 * Background AOF rewrite terminated with success1:M 21 Nov 09:34:00.667 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 21 Nov 09:34:00.667 * Background AOF rewrite finished successfully1:M 22 Nov 05:52:48.083 * Starting automatic rewriting of AOF on 4632% growth1:M 22 Nov 05:52:48.084 * Background append only file rewriting started by pid 2011:M 22 Nov 05:52:48.126 * AOF rewrite child asks to stop sending diffs.201:C 22 Nov 05:52:48.126 * Parent agreed to stop sending diffs. Finalizing AOF...201:C 22 Nov 05:52:48.126 * Concatenating 0.00 MB of AOF diff received from parent.201:C 22 Nov 05:52:48.126 * SYNC append only file rewrite performed201:C 22 Nov 05:52:48.127 * AOF rewrite: 0 MB of memory used by copy-on-write1:M 22 Nov 05:52:48.184 * Background AOF rewrite terminated with success1:M 22 Nov 05:52:48.184 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)1:M 22 Nov 05:52:48.184 * Background AOF rewrite finished successfully1:M 22 Nov 20:10:56.034 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.1:M 22 Nov 20:32:01.494 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.1:M 23 Nov 00:16:56.059 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.1:M 23 Nov 00:29:00.010 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 3.2.8 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'1:M 23 Nov 01:33:42.484 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.1:M 23 Nov 01:33:42.491 # Server started, Redis version 3.2.81:M 23 Nov 01:33:42.491 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.1:M 23 Nov 01:33:42.491 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.1:M 23 Nov 01:33:43.534 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

主要是4个问题,导致redis启动不了的主要是最后一行。“Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix ” ,翻译一下读取仅追加文件的错误文件格式:备份AOF文件,然后使用./redis-check-a of--fix<filename>,有一个AOF的备份文件,通过这个./redis-check-a of--fix<filename>还原,查了一下redis的备份文件为“appendonly.aof”.

然后前面三项主要是警告。那我们都来一项一项进行处理。

第一个警告:WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

方法1: 临时设置生效: sysctl -w net.core.somaxconn = 511方法2: 永久生效: 修改/etc/sysctl.conf文件,增加一行net.core.somaxconn= 511
然后执行命令
sysctl -p

第二个警告:WARNING overcommitmemory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommitmemory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

解决方案方法1: 临时设置生效: sysctl -w vm.overcommit_memory = 1方法2: 永久生效: 修改/etc/sysctl.conf文件,增加一行vm.overcommit_memory = 1
然后执行命令
sysctl -p补充: overcommit_memory参数说明:
设置内存分配策略(可选,根据服务器的实际情况进行设置)
/proc/sys/vm/overcommit_memory
可选值:0、1、2。
0, 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。
1, 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。
2, 表示内核允许分配超过所有物理内存和交换空间总和的内存
注意:redis在dump数据的时候,会fork出一个子进程,理论上child进程所占用的内存和parent是一样的,比如parent占用的内存为8G,这个时候也要同样分配8G的内存给child,如果内存无法负担,往往会造成redis服务器的down机或者IO负载过高,效率下降。所以这里比较优化的内存分配策略应该设置为 1(表示内核允许分配所有的物理内存,而不管当前的内存状态如何)。

第三个警告:WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

解决方案:上面也提供了解决方案,'echo never > /sys/kernel/mm/transparent_hugepage/enabled'  这一行,直接执行就好了,但是这样的话,只是当前生效而已,如果电脑重启之后,又是需要重新设置的,所以把这个命令加入到启动过程中。
编辑/etc/rc.local,加入echo never > /sys/kernel/mm/transparent_hugepage/enabled。

好了,前面三个警告都解决了.

重点问题来了,docker里面的redis如何寻找到备份文件(appendonly.aof)呢?搜索?那肯定很慢,找了找,docker有提供容器挂载目录存放的位置,找到位置即可

执行命令:docker inspect [容器] 然后出来一堆的东西,

xxx@iZwz9isa7dpx6izf3br71hZ:~# docker inspect 4c38df7be60a[
    {        "Id": "4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e",        "Created": "2017-04-13T12:33:15.036327978Z",        "Path": "docker-entrypoint.sh",        "Args": [            "redis-server",            "--appendonly",            "yes"
        ],        "State": {            "Status": "exited",            "Running": false,            "Paused": false,            "Restarting": false,            "OOMKilled": false,            "Dead": false,            "Pid": 0,            "ExitCode": 1,            "Error": "",            "StartedAt": "2018-11-23T03:04:00.909539222Z",            "FinishedAt": "2018-11-23T03:04:01.961967557Z"
        },        "Image": "sha256:83d6014ac5c8193fa43dd16b161f0e524141800f10cff44d7bad3b637991cf16",        "ResolvConfPath": "/var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/resolv.conf",        "HostnamePath": "/var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/hostname",        "HostsPath": "/var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/hosts",        "LogPath": "/var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e-json.log",        "Name": "/redis_proc",        "RestartCount": 0,        "Driver": "aufs",        "MountLabel": "",        "ProcessLabel": "",        "AppArmorProfile": "",        "ExecIDs": null,        "HostConfig": {            "Binds": null,            "ContainerIDFile": "",            "LogConfig": {                "Type": "json-file",                "Config": {}
            },            "NetworkMode": "default",            "PortBindings": {                "6379/tcp": [
                    {                        "HostIp": "",                        "HostPort": "16379"
                    }
                ]
            },            "RestartPolicy": {                "Name": "no",                "MaximumRetryCount": 0
            },            "AutoRemove": false,            "VolumeDriver": "",            "VolumesFrom": null,            "CapAdd": null,            "CapDrop": null,            "Dns": [],            "DnsOptions": [],            "DnsSearch": [],            "ExtraHosts": null,            "GroupAdd": null,            "IpcMode": "",            "Cgroup": "",            "Links": null,            "OomScoreAdj": 0,            "PidMode": "",            "Privileged": false,            "PublishAllPorts": false,            "ReadonlyRootfs": false,            "SecurityOpt": null,            "UTSMode": "",            "UsernsMode": "",            "ShmSize": 67108864,            "Runtime": "runc",            "ConsoleSize": [
                0,
                0
            ],            "Isolation": "",            "CpuShares": 0,            "Memory": 0,            "NanoCpus": 0,            "CgroupParent": "",            "BlkioWeight": 0,            "BlkioWeightDevice": null,            "BlkioDeviceReadBps": null,            "BlkioDeviceWriteBps": null,            "BlkioDeviceReadIOps": null,            "BlkioDeviceWriteIOps": null,            "CpuPeriod": 0,            "CpuQuota": 0,            "CpuRealtimePeriod": 0,            "CpuRealtimeRuntime": 0,            "CpusetCpus": "",            "CpusetMems": "",            "Devices": [],            "DiskQuota": 0,            "KernelMemory": 0,            "MemoryReservation": 0,            "MemorySwap": 0,            "MemorySwappiness": -1,            "OomKillDisable": false,            "PidsLimit": 0,            "Ulimits": null,            "CpuCount": 0,            "CpuPercent": 0,            "IOMaximumIOps": 0,            "IOMaximumBandwidth": 0
        },        "GraphDriver": {            "Name": "aufs",            "Data": null
        },        "Mounts": [
            {                "Type": "volume",                "Name": "94dbaf4fad236d5dc94f2f825202eb279d0d3217a47cb050ec51eba1b5ff2d51",                "Source": "/var/lib/docker/volumes/94dbaf4fad236d5dc94f2f825202eb279d0d3217a47cb050ec51eba1b5ff2d51/_data",                "Destination": "/data",                "Driver": "local",                "Mode": "",                "RW": true,                "Propagation": ""
            }
        ],        "Config": {            "Hostname": "4c38df7be60a",            "Domainname": "",            "User": "",            "AttachStdin": false,            "AttachStdout": false,            "AttachStderr": false,            "ExposedPorts": {                "6379/tcp": {}
            },            "Tty": false,            "OpenStdin": false,            "StdinOnce": false,            "Env": [                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",                "GOSU_VERSION=1.7",                "REDIS_VERSION=3.2.8",                "REDIS_DOWNLOAD_URL=http://download.redis.io/releases/redis-3.2.8.tar.gz",                "REDIS_DOWNLOAD_SHA1=6780d1abb66f33a97aad0edbe020403d0a15b67f"
            ],            "Cmd": [                "redis-server",                "--appendonly",                "yes"
            ],            "Image": "redis",            "Volumes": {                "/data": {}
            },            "WorkingDir": "/data",            "Entrypoint": [                "docker-entrypoint.sh"
            ],            "OnBuild": null,            "Labels": {}
        }
    .....
]

总之很长很长,这里直接指出吧

红框部分目录,然后直接复制,然后 cd /var/lib/docker/volumes/94dbaf4fad236d5dc94f2f825202eb279d0d3217a47cb050ec51eba1b5ff2d51/_data,进行打开,你会发现就在这里

好了,备份文件我们找到了,那么redis安装运行还有恢复工具(redis-check-aof)。同理在这里寻找,找到环境选项来

红框部分的几个目录。一个一个找过去。功夫不顾有心人,找到了,在 /usr/local/bin目录

然后执行命令 ./redis-check-aof --fix /var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/appendonly.aof 然后会出现"Continue? [y/N]:"让你是否继续,你输入Y就可以了。

xxx@iZwz9isa7dpx6izf3br71hZ:/usr/local/bin# ./redis-check-aof --fix  /var/lib/docker/containers/4c38df7be60ac0517961903351ecda36566f7150b39b7d29cc1ccf2665d6eb7e/appendonly.aof0x         366c32f: Expected prefix '*', got: '
AOF analyzed: size=57066334, ok_up_to=57066287, diff=47
This will shrink the AOF from 57066334 bytes, with 47 bytes, to 57066287 bytes
Continue? [y/N]: y
Successfully truncated AOF

大功告成,Egag内心小喜,赶紧运行执行了docker 启动redis命令

xxx@iZwz9isa7dpx6izf3br71hZ:/usr/local/bin# docker start redis_procredis_proc
root@iZwz9isa7dpx6izf3br71hZ:/usr/local/bin# docker psCONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                     NAMES
4c38df7be60a        redis               "docker-entrypoint..."   19 months ago       Up 5 seconds          0.0.0.0:16379->6379/tcp   redis_proc

好了,redis成功起来了、

Egan: redis起来了,我整理下这个问题的处理,待会发给你对方:OK

就这样,好了这篇文章就这样来了

最后,安利一个全能支付Java开发工具包.优雅的轻量级支付模块集成支付对接支付整合(微信支付,支付宝,银联,友店,富友,跨境支付paypal,payoneer皮卡)app,扫码,即时到帐刷卡付条码付转账服务商模式、支持多种支付类型多支付账户,支付与业务完全剥离,简单几行代码即可实现支付,简单快速完成支付模块的开发,可轻松嵌入到任何系统里 https://www.oschina.net/p/pay-java-parent