[Reprint] some of the pits on the US Mission in Redis stepped on -2.bgrewriteaof problem

Some pits on the US Mission in Redis stepped on -2.bgrewriteaof problem

Blog Category:

 

   Please indicate the source Ha: http://carlosfu.iteye.com/blog/2254154


    

 One, background

1. AOF:

    Redis of AOF mechanism somewhat similar to Mysql binlog, is a persistent way Redis provided (the other is RDB), it will all write commands written to the log according to a certain frequency (no, always, every seconds) file, after the shutdown restart recovery Redis database.

     

 

2. AOF rewrite:

     (1) With the AOF files increases, there will be mostly repeat the command or commands can be combined (100 incr = set key 100)

     (2) the benefits rewritten: AOF reduce log size, reduce memory footprint and speed up database recovery time.

    

 

 

 

Second, the stand-alone multi-instance there may be hidden dangers Swap and the OOM:

    Since Redis single-threaded model, in theory, each instance of redis only use one CPU, which means you can deploy multiple instances on a single multi-core servers (the actual is to do so). But Redis of AOF rewrite is by Redis fork out a process to achieve, so experienced Redis development and operation and maintenance personnel will tell you that on one server to set aside half the memory (to prevent the occurrence of AOF rewrite focus appeared swap and OOM).

    

 

 

 

Third, the best practice

1. meta information: redis as a cloud system, you need to record data each dimension, such as: business groups, machines, examples, application, responsible for multiple dimensions of data, operation and maintenance Redis believe that every person should have such a persistent data (such as Mysql), there are some general operation and maintenance interface, provide the basis for automation and operation and maintenance

    For example, as follows:

 

 

    

 

2. AOF management style:

 (1) 自动:让每个redis决定是否做AOF重写操作(根据auto-aof-rewrite-percentage和auto-aof-rewrite-min-size两个参数):

  

  

 (2) crontab: 定时任务,可能仍然会出现多个redis实例,属于一种折中方案。

 

 (3) remote集中式:

       最终目标是一台机器一个时刻,只有一个redis实例进行AOF重写。

       具体做法其实很简单,以机器为单位,轮询每个机器的实例,如果满足条件就运行(比如currentSize和baseSize满足什么关系)bgrewriteaof命令。

       期间可以监控发生时间、耗时、频率、尺寸的前后变化            

策略 优点 缺点
自动 无需开发

1. 有可能出现(无法预知)上面提到的Swap和OOM

2. 出了问题,处理起来其实更费时间。

AOF控制中心(remote集中式)

1. 防止上面提到Swap和OOM。

2. 能够收集更多的数据(aof重写的发生时间、耗时、频率、尺寸的前后变化),更加有利于运维和定位问题(是否有些机器的实例需要拆分)。

控制中心需要开发。

 

一台机器轮询执行bgRewriteAof代码示例:

Java代码   收藏代码
package com.sohu.cache.inspect.impl;  
  
import com.sohu.cache.alert.impl.BaseAlertService;  
import com.sohu.cache.entity.InstanceInfo;  
import com.sohu.cache.inspect.InspectParamEnum;  
import com.sohu.cache.inspect.Inspector;  
import com.sohu.cache.util.IdempotentConfirmer;  
import com.sohu.cache.util.TypeUtil;  
import org.apache.commons.collections.MapUtils;  
import org.apache.commons.lang.StringUtils;  
import redis.clients.jedis.Jedis;  
  
import java.util.Collections;  
import java.util.LinkedHashMap;  
import java.util.List;  
import java.util.Map;  
import java.util.concurrent.TimeUnit;  
  
  
public class RedisIsolationPersistenceInspector extends BaseAlertService implements Inspector {  
  
    public static final int REDIS_DEFAULT_TIME = 5000;  
  
    @Override  
    public boolean inspect(Map<InspectParamEnum, Object> paramMap) {  
        // 某台机器和机器下所有redis实例  
        final String host = MapUtils.getString(paramMap, InspectParamEnum.SPLIT_KEY);  
        List<InstanceInfo> list = (List<InstanceInfo>) paramMap.get(InspectParamEnum.INSTANCE_LIST);  
        // 遍历所有的redis实例  
        for (InstanceInfo info : list) {  
            final int port = info.getPort();  
            final int type = info.getType();  
            int status = info.getStatus();  
            // 非正常节点  
            if (status != 1) {  
                continue;  
            }  
            if (TypeUtil.isRedisDataType(type)) {  
                Jedis jedis = new Jedis(host, port, REDIS_DEFAULT_TIME);  
                try {  
                    // 从redis info中索取持久化信息  
                    Map<String, String> persistenceMap = parseMap(jedis);  
                    if (persistenceMap.isEmpty()) {  
                        logger.error("{}:{} get persistenceMap failed", host, port);  
                        continue;  
                    }  
                    // 如果正在进行aof就不做任何操作,理论上要等待它完毕,否则  
                    if (!isAofEnabled(persistenceMap)) {  
                        continue;  
                    }  
                    // 上一次aof重写后的尺寸和当前aof的尺寸  
                    long aofCurrentSize = MapUtils.getLongValue(persistenceMap, "aof_current_size");  
                    long aofBaseSize = MapUtils.getLongValue(persistenceMap, "aof_base_size");  
                    // 阀值大于60%  
                    long aofThresholdSize = (long) (aofBaseSize * 1.6);  
                    double percentage = getPercentage(aofCurrentSize, aofBaseSize);  
                    // 大于60%且超过60M  
                    if (aofCurrentSize >= aofThresholdSize && aofCurrentSize > (64 * 1024 * 1024)) {  
                        // bgRewriteAof 异步操作。  
                        boolean isInvoke = invokeBgRewriteAof(jedis);  
                        if (!isInvoke) {  
                            logger.error("{}:{} invokeBgRewriteAof failed", host, port);  
                            continue;  
                        } else {  
                            logger.warn("{}:{} invokeBgRewriteAof started percentage={}", host, port, percentage);  
                        }  
                        // 等待Aof重写成功(bgRewriteAof是异步操作)  
                        while (true) {  
                            try {  
                                // before wait 1s  
                                TimeUnit.SECONDS.sleep(1);  
                                Map<String, String> loopMap = parseMap(jedis);  
                                Integer aofRewriteInProgress = MapUtils.getInteger(loopMap, "aof_rewrite_in_progress", null);  
                                if (aofRewriteInProgress == null) {  
                                    logger.error("loop watch:{}:{} return failed", host, port);  
                                    break;  
                                } else if (aofRewriteInProgress <= 0) {  
                                    // bgrewriteaof Done  
                                    logger.warn("{}:{} bgrewriteaof Done lastSize:{}Mb,currentSize:{}Mb", host, port,  
                                            getMb(aofCurrentSize),  
                                            getMb(MapUtils.getLongValue(loopMap, "aof_current_size")));  
                                    break;  
                                } else {  
                                    // wait 1s  
                                    TimeUnit.SECONDS.sleep(1);  
                                }  
                            } catch (Exception e) {  
                                logger.error(e.getMessage(), e);  
                            }  
                        }  
                    } else {  
                        if (percentage > 50D) {  
                            long currentSize = getMb(aofCurrentSize);  
                            logger.info("checked {}:{} aof increase percentage:{}% currentSize:{}Mb", host, port,  
                                    percentage, currentSize > 0 ? currentSize : "<1");  
                        }  
                    }  
                } finally {  
                    jedis.close();  
                }  
            }  
        }  
        return true;  
    }  
  
    private long getMb(long bytes) {  
        return (long) (bytes / 1024 / 1024);  
    }  
  
    private boolean isAofEnabled(Map<String, String> infoMap) {  
        Integer aofEnabled = MapUtils.getInteger(infoMap, "aof_enabled", null);  
        return aofEnabled != null && aofEnabled == 1;  
    }  
  
    private double getPercentage(long aofCurrentSize, long aofBaseSize) {  
        if (aofBaseSize == 0) {  
            return 0.0D;  
        }  
        String format = String.format("%.2f", (Double.valueOf(aofCurrentSize - aofBaseSize) * 100 / aofBaseSize));  
        return Double.parseDouble(format);  
    }  
  
    private Map<String, String> parseMap(final Jedis jedis) {  
        final StringBuilder builder = new StringBuilder();  
        boolean isInfo = new IdempotentConfirmer() {  
            @Override  
            public boolean execute() {  
                String persistenceInfo = null;  
                try {  
                    persistenceInfo = jedis.info("Persistence");  
                } catch (Exception e) {  
                    logger.warn(e.getMessage() + "-{}:{}", jedis.getClient().getHost(), jedis.getClient().getPort(),  
                            e.getMessage());  
                }  
                boolean isOk = StringUtils.isNotBlank(persistenceInfo);  
                if (isOk) {  
                    builder.append(persistenceInfo);  
                }  
                return isOk;  
            }  
        }.run();  
        if (!isInfo) {  
            logger.error("{}:{} info Persistence failed", jedis.getClient().getHost(), jedis.getClient().getPort());  
            return Collections.emptyMap();  
        }  
        String persistenceInfo = builder.toString();  
        if (StringUtils.isBlank(persistenceInfo)) {  
            return Collections.emptyMap();  
        }  
        Map<String, String> map = new LinkedHashMap<String, String>();  
        String[] array = persistenceInfo.split("\r\n");  
        for (String line : array) {  
            String[] cells = line.split(":");  
            if (cells.length > 1) {  
                map.put(cells[0], cells[1]);  
            }  
        }  
  
        return map;  
    }  
  
    public boolean invokeBgRewriteAof(final Jedis jedis) {  
        return new IdempotentConfirmer() {  
            @Override  
            public boolean execute() {  
                try {  
                    String response = jedis.bgrewriteaof();  
                    if (response != null && response.contains("rewriting started")) {  
                        return true;  
                    }  
                } catch (Exception e) {  
                    String message = e.getMessage();  
                    if (message.contains("rewriting already")) {  
                        return true;  
                    }  
                    logger.error(message, e);  
                }  
                return false;  
            }  
        }.run();  
    }  
}  

 

 

 

 

附图一张:

 

 

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12009713.html