Analysis of upstream service jitter caused by application deployment and optimization practice

Author: JD Logistics Zhu Yongchang

background introduction

This article mainly focuses on the problem of upstream service jitter caused by application deployment, combined with the example of the Baichuan distribution system, provides analysis and solutions, and provides a set of practical practical solutions.

As the dedicated gateway of the transaction order center, the Baichuan distribution system provides unified external standard services (including order receiving, modification, cancellation, return, etc.) for the transaction order center, and internally distributes traffic to applications of different business lines based on configuration rules superior. As more and more traffic is cut into the Baichuan system, the problem of service jitter caused by system deployment and the timeout of upstream system calls has gradually become prominent. In order to provide a stable transaction service system and improve system availability, it is necessary to optimize this problem.

After investigation, there are two preheating schemes within the group:

(1) The warm-up program officially provided by JSF;

(2) Xingyun orchestration and deployment combined with a warm-up plan for recording and playback. Neither approach has the desired effect.

About the program

(1) First of all, the prerequisite for use is that the JSF consumer must upgrade the JSF version to 1.7.6. There are dozens of upstream callers in the Baichuan distribution system, and it is difficult to promote all callers to upgrade the version; secondly, the JSF platform preheating rules are based on The interface latitude is configured. The Baichuan distribution system provides 46 interfaces externally, and the configuration is complicated; the most important thing is that the preheating rule configuration of this solution is the preheating weight of an interface within a fixed preheating period (such as 1 minute) ( The proportion of received calls), a simple understanding is a small traffic test run, which determines that the solution cannot fully warm up system resources. After the warm-up period, all traffic will still cause service jitter due to the need to create or initialize resources. For transactions For the order receiving service, the jitter will cause the order to fail, and there is a risk of card order.

About the program

(2) Preheating is achieved by recording online traffic and performing pressure test playback, which is suitable for reading interfaces, but if no special processing is done for writing interfaces, online data will be affected; to solve this problem, the current solution is to use pressure testing to identify Identify pressure test warm-up traffic, but the transaction business logic is complicated, and there are many downstream dependencies, and the relevant systems currently do not support it. If it is transformed separately, there will be many interfaces and high risks.

Based on the above situation, we used the example of upstream service jitter caused by the deployment of Baichuan distribution system, traced its appearance clues, studied the JSF source code in depth, and finally found the key factors that caused service jitter, and developed a more effective warm-up solution. The verification results show that The warm-up effect of this solution is obvious, and the MAX value of the method performance of the service caller is reduced by 90%, falling within the timeout period, eliminating the problem of timeout caused by the upstream call due to machine deployment.

problem phenomenon

During the system's online deployment, the upstream caller of the pure configuration order receiving service reported that the order receiving service fluctuated, and the call timed out.

Check the UMP management of this service and find that the method performance monitoring MAX value of this service is up to 3073ms, which does not exceed the timeout period set by the caller of 10000ms (as shown in Figure 1)

Figure 1 Service internal monitoring management

Check the PFinder performance monitoring of this service, and find that the performance monitoring MAX value of the upstream caller’s application calling this service exceeds 10000ms for many times (you can directly check the caller’s UMP management, if the caller cannot provide UMP management, you can also use PFinder’s Use the topology function to view, as shown in Figure 2)

Figure 2 Service external monitoring management

analysis of idea

From the above problems, it can be seen that the MAX value of the interface performance of the service provider does not fluctuate significantly during the system deployment period, but the MAX value of the interface performance of the service caller fluctuates significantly. From this, it can be determined that the time-consuming is not in the internal processing logic of the service provider, but before (or after) entering the internal processing logic of the service provider. So what happened before or after? We are not in a hurry to answer this question, and we will gradually trace and explore based on some existing clues.

Clue 1: The CPU of the machine will spike briefly during deployment (as shown in Figure 3)

If a request is called to the current machine at this time, the interface performance will definitely be affected. Therefore, consider that the machine deployment is completed and the machine CPU is stable before launching the JSF service. This can be achieved by adjusting the JSF delayed release parameters. The specific configuration is as follows:

 <jsf:provider id="createExpressOrderService" 
               interface="cn.jdl.oms.api.CreateExpressOrderService"
               ref="createExpressOrderServiceImpl"
               register="true"
               concurrents="400"
               alias="${provider.express.oms}"
               // 延迟发布2分钟
               delay="120000">
</jsf:provider>

However, practice has proved that the JSF service did go online after a delay of 2 minutes (as shown in Figure 4), and the CPU has been in a stable state at this time, but the moment JSF goes online caused a second surge in CPU, and the caller still has services. Call timeout phenomenon. 

Figure 3 The CPU temporarily soared during the machine deployment process

Figure 4 CPU surges at the moment of deployment and JSF launch


Clue 2: The number of JVM threads soared when JSF went online (as shown in Figure 5)

Figure 5 The number of threads soared at the moment JSF went online

Using the jstack command tool to view the thread stack, you can find that the thread with the largest increase in number is the JSF-BZ thread, and all of them are in the blocking waiting state:

"JSF-BZ-22000-137-T-350" #1038 daemon prio=5 os_prio=0 tid=0x00007f02bcde9000 nid=0x6fff waiting on condition [0x00007efa10284000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

"JSF-BZ-22000-137-T-349" #1037 daemon prio=5 os_prio=0 tid=0x00007f02bcde7000 nid=0x6ffe waiting on condition [0x00007efa10305000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

"JSF-BZ-22000-137-T-348" #1036 daemon prio=5 os_prio=0 tid=0x00007f02bcdd8000 nid=0x6ffd waiting on condition [0x00007efa10386000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

...


You can search in the JSF source code through the keyword "JSF-BZ", and you can find the thread pool initialization source code about "JSF-BZ" as follows:

private static synchronized ThreadPoolExecutor initPool(ServerTransportConfig transportConfig) {
    final int minPoolSize, aliveTime, port = transportConfig.getPort();

    int maxPoolSize = transportConfig.getServerBusinessPoolSize();
    String poolType = transportConfig.getServerBusinessPoolType();
    if ("fixed".equals(poolType)) { minPoolSize = maxPoolSize;
    aliveTime = 0;
    } else if ("cached".equals(poolType)) { minPoolSize = 20;
    maxPoolSize = Math.max(minPoolSize, maxPoolSize);
    aliveTime = 60000;
    } else { throw new IllegalConfigureException(21401, "server.threadpool", poolType);
    }

    String queueType = transportConfig.getPoolQueueType();
    int queueSize = transportConfig.getPoolQueueSize();
    boolean isPriority = "priority".equals(queueType);
    BlockingQueue<Runnable> configQueue = ThreadPoolUtils.buildQueue(queueSize, isPriority);

    NamedThreadFactory threadFactory = new NamedThreadFactory("JSF-BZ-" + port, true);
    RejectedExecutionHandler handler = new RejectedExecutionHandler() {
        private int i = 1;

        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) { if (this.i++ % 7 == 0) {
            this.i = 1;
            BusinessPool.LOGGER.warn("[JSF-23002]Task:{} has been reject for ThreadPool exhausted! pool:{}, active:{}, queue:{}, taskcnt: {}", new Object[] { r, Integer.valueOf(executor.getPoolSize()), Integer.valueOf(executor.getActiveCount()), Integer.valueOf(executor.getQueue().size()), Long.valueOf(executor.getTaskCount()) });
        }

        RejectedExecutionException err = new RejectedExecutionException("[JSF-23003]Biz thread pool of provider has bean exhausted, the server port is " + port);

        ProviderErrorHook.getErrorHookInstance().onProcess(new ProviderErrorEvent(err));
        throw err;
        }
    };
    LOGGER.debug("Build " + poolType + " business pool for port " + port + " [min: " + minPoolSize + " max:" + maxPoolSize + " queueType:" + queueType + " queueSize:" + queueSize + " aliveTime:" + aliveTime + "]");

    return new ThreadPoolExecutor(minPoolSize, maxPoolSize, aliveTime, TimeUnit.MILLISECONDS, configQueue, (ThreadFactory)threadFactory, handler);
}

public static BlockingQueue<Runnable> buildQueue(int size, boolean isPriority) {
    BlockingQueue<Runnable> queue;
    if (size == 0) {
      queue = new SynchronousQueue<Runnable>();
    }
    else if (isPriority) {
      queue = (size < 0) ? new PriorityBlockingQueue<Runnable>() : new PriorityBlockingQueue<Runnable>(size);
    } else {
      queue = (size < 0) ? new LinkedBlockingQueue<Runnable>() : new LinkedBlockingQueue<Runnable>(size);
    } 
    
    return queue;
  }


In addition, the JSF official documentation on the thread pool is as follows:

Combined with JSF source code and JSF official documentation, we can know that the blocking queue of JSF-BZ thread pool uses SynchronousQueue , which is a synchronous blocking queue, in which each put must wait for a take, and vice versa. The JSF-BZ thread pool uses the scalable queue-free thread pool by default, and the initial number of threads is 20. Then when JSF goes online, a large number of concurrent requests come in, and the initialization threads are far from enough, so a large number of threads are created.

Now that we know that it is caused by insufficient number of JSF thread pool initialization threads, we can consider preheating the JSF thread pool when the application starts, that is to say, create a sufficient number of threads for backup when the application starts. By consulting the JSF source code, we found the following ways to warm up the JSF thread pool:

// 从Spring上下文获取JSF ServerBean,可能有多个
Map<String, ServerBean> serverBeanMap = applicationContext.getBeansOfType(ServerBean.class);
if (CollectionUtils.isEmpty(serverBeanMap)) {
    log.error("application preheat, jsf thread pool preheat failed, serverBeanMap is empty.");
    return;
}

// 遍历所有serverBean,分别做预热处理
serverBeanMap.forEach((serverBeanName, serverBean) -> {
    if (Objects.isNull(serverBean)) {
        log.error("application preheat, jsf thread pool preheat failed, serverBean is null, serverBeanName:{}", serverBeanName);
        return;
    }
    // 启动ServerBean,启动后才可以获取到Server
    serverBean.start();
    Server server = serverBean.getServer();
    if (Objects.isNull(server)) {
        log.error("application preheat, jsf thread pool preheat failed, JSF Server is null, serverBeanName:{}", serverBeanName);
        return;
    }

    ServerTransportConfig serverTransportConfig = server.getTransportConfig();
    if (Objects.isNull(serverTransportConfig)) {
        log.error("application preheat, jsf thread pool preheat failed, serverTransportConfig is null, serverBeanName:{}", serverBeanName);
        return;
    }
    // 获取JSF业务线程池
    ThreadPoolExecutor businessPool = BusinessPool.getBusinessPool(serverTransportConfig);
    if (Objects.isNull(businessPool)) {
        log.error("application preheat, jsf biz pool preheat failed, businessPool is null, serverBeanName:{}", serverBeanName);
        return;
    }

    int corePoolSize = businessPool.getCorePoolSize();
    int maxCorePoolSize = Math.max(corePoolSize, 500);

    if (maxCorePoolSize > corePoolSize) {
        // 设置JSF server核心线程数
        businessPool.setCorePoolSize(maxCorePoolSize);
    }
    // 初始化JSF业务线程池所有核心线程
    if (businessPool.getPoolSize() < maxCorePoolSize) {
        businessPool.prestartAllCoreThreads();
    }
}


Clue 3: After the JSF-BZ thread pool is warmed up, the number of JVM threads still increases when JSF goes online

Continue to use the jstack command tool to view the thread stack. After comparison, it can be found that the number of threads that have increased is the JSF-SEV-WORKER thread:

"JSF-SEV-WORKER-139-T-129" #1295 daemon prio=5 os_prio=0 tid=0x00007ef66000b800 nid=0x7289 runnable [0x00007ef627cf8000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x0000000644f558b8> (a io.netty.channel.nio.SelectedSelectionKeySet)
	- locked <0x0000000641eaaca0> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000641eaab88> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

"JSF-SEV-WORKER-139-T-128" #1293 daemon prio=5 os_prio=0 tid=0x00007ef60c002800 nid=0x7288 runnable [0x00007ef627b74000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x0000000641ea7450> (a io.netty.channel.nio.SelectedSelectionKeySet)
	- locked <0x0000000641e971e8> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000641e970d0> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

"JSF-SEV-WORKER-139-T-127" #1291 daemon prio=5 os_prio=0 tid=0x00007ef608001000 nid=0x7286 runnable [0x00007ef627df9000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x0000000641e93998> (a io.netty.channel.nio.SelectedSelectionKeySet)
	- locked <0x0000000641e83730> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000641e83618> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None


So what does the JSF-SEV-WORKER thread do? Can we also warm it up? With these questions, check the JSF source code again:

private synchronized EventLoopGroup initChildEventLoopGroup() {
     NioEventLoopGroup nioEventLoopGroup = null;
     int threads = (this.childNioEventThreads > 0) ? this.childNioEventThreads : Math.max(8, Constants.DEFAULT_IO_THREADS);
 
     NamedThreadFactory threadName = new NamedThreadFactory("JSF-SEV-WORKER", isDaemon());
     EventLoopGroup eventLoopGroup = null;
     if (isUseEpoll()) {
       EpollEventLoopGroup epollEventLoopGroup = new EpollEventLoopGroup(threads, (ThreadFactory)threadName);
     } else {
       nioEventLoopGroup = new NioEventLoopGroup(threads, (ThreadFactory)threadName);
     } 
     return (EventLoopGroup)nioEventLoopGroup;
}


From the JSF source code, it can be seen that the JSF-SEV-WORKER thread is a thread created by JSF using Netty to handle network communication. Carefully study the JSF source code to find the method of preheating the JSF-SEV-WORKER thread. The code is as follows:

// 通过serverTransportConfig获取NioEventLoopGroup
// 其中,serverTransportConfig的获取方式可参考JSF-BZ线程预热代码
NioEventLoopGroup eventLoopGroup = (NioEventLoopGroup) serverTransportConfig.getChildEventLoopGroup();

int threadSize = this.jsfSevWorkerThreads;
while (threadSize-- > 0) {
    new Thread(() -> {
        // 通过手工提交任务的方式创建JSF-SEV-WORKER线程达到预热效果
        eventLoopGroup.submit(() -> log.info("submit thread to netty by hand, threadName:{}", Thread.currentThread().getName()));
    }).start();
}


The warm-up effect of JSF-BZ thread and JSF-SEV-WORKER thread is shown in the following figure:

Figure 6 JSF-BZ/JSF-SEV-WORKER thread warm-up effect

Mining source code clues

So far, after JSF delayed release and JSF internal thread pool warm-up, the system deployment caused service caller jitter timeout phenomenon has been alleviated to some extent (from the original 10000ms-20000ms to 5000ms-10000ms), although it is effective, but still Some are not. There should still be room for optimization. Now is the time to consider the question we left at the beginning: "What did the service caller go through before (or after) entering the service provider's internal processing logic?". The easiest thing to think of is that the network has passed through the middle, but the network factor can basically be ruled out, because the network performance of the machine is normal during the deployment process, so what other influencing factors are there? At this point we still have to go back to the JSF source code to find clues.

Figure 7 Provider internal processing in JSF source code

After carefully studying the JSF source code, we can find that JSF has a series of encoding, decoding, serialization, and deserialization operations for interface access, and in these operations we have a pleasant discovery: local cache, part of the source code is as follows :

DESC_CLASS_CACHE

private static final ConcurrentMap<String, Class<?>> DESC_CLASS_CACHE = new ConcurrentHashMap<String, Class<?>>();

private static Class<?> desc2class(ClassLoader cl, String desc) throws ClassNotFoundException {
  switch (desc.charAt(0)) {
    case 'V':
      return void.class;
    case 'Z': return boolean.class;
    case 'B': return byte.class;
    case 'C': return char.class;
    case 'D': return double.class;
    case 'F': return float.class;
    case 'I': return int.class;
    case 'J': return long.class;
    case 'S': return short.class;
    case 'L':
      desc = desc.substring(1, desc.length() - 1).replace('/', '.');
      break;
    case '[':
      desc = desc.replace('/', '.');
      break;
    default:
      throw new ClassNotFoundException("Class not found: " + desc);
  } 
  
  if (cl == null)
    cl = ClassLoaderUtils.getCurrentClassLoader(); 
  Class<?> clazz = DESC_CLASS_CACHE.get(desc);
  if (clazz == null) {
    clazz = Class.forName(desc, true, cl);
    DESC_CLASS_CACHE.put(desc, clazz);
  } 
  return clazz;
}


NAME_CLASS_CACHE

private static final ConcurrentMap<String, Class<?>> NAME_CLASS_CACHE = new ConcurrentHashMap<String, Class<?>>();

private static Class<?> name2class(ClassLoader cl, String name) throws ClassNotFoundException {
  int c = 0, index = name.indexOf('[');
  if (index > 0) {
    
    c = (name.length() - index) / 2;
    name = name.substring(0, index);
  } 
  if (c > 0) {
    
    StringBuilder sb = new StringBuilder();
    while (c-- > 0) {
      sb.append("[");
    }
    if ("void".equals(name)) { sb.append('V'); }
    else if ("boolean".equals(name)) { sb.append('Z'); }
    else if ("byte".equals(name)) { sb.append('B'); }
    else if ("char".equals(name)) { sb.append('C'); }
    else if ("double".equals(name)) { sb.append('D'); }
    else if ("float".equals(name)) { sb.append('F'); }
    else if ("int".equals(name)) { sb.append('I'); }
    else if ("long".equals(name)) { sb.append('J'); }
    else if ("short".equals(name)) { sb.append('S'); }
    else { sb.append('L').append(name).append(';'); }
     name = sb.toString();
  }
  else {
    
    if ("void".equals(name)) return void.class; 
    if ("boolean".equals(name)) return boolean.class; 
    if ("byte".equals(name)) return byte.class; 
    if ("char".equals(name)) return char.class; 
    if ("double".equals(name)) return double.class; 
    if ("float".equals(name)) return float.class; 
    if ("int".equals(name)) return int.class; 
    if ("long".equals(name)) return long.class; 
    if ("short".equals(name)) return short.class;
  
  } 
  if (cl == null)
    cl = ClassLoaderUtils.getCurrentClassLoader(); 
  Class<?> clazz = NAME_CLASS_CACHE.get(name);
  if (clazz == null) {
    clazz = Class.forName(name, true, cl);
    NAME_CLASS_CACHE.put(name, clazz);
  } 
  return clazz;
}


SerializerCache

private ConcurrentHashMap _cachedSerializerMap;

public Serializer getSerializer(Class<?> cl) throws HessianProtocolException {
  Serializer serializer = (Serializer)_staticSerializerMap.get(cl);
  if (serializer != null) {
    return serializer;
  }
  
  if (this._cachedSerializerMap != null) {
    serializer = (Serializer)this._cachedSerializerMap.get(cl);
    if (serializer != null) {
      return serializer;
    }
  } 
  
  int i = 0;
  for (; serializer == null && this._factories != null && i < this._factories.size(); 
    i++) {

    
    AbstractSerializerFactory factory = this._factories.get(i);
    
    serializer = factory.getSerializer(cl);
  } 
  
  if (serializer == null)
  {
    if (isZoneId(cl)) {
      ZoneIdSerializer zoneIdSerializer = ZoneIdSerializer.getInstance();
    } else if (isEnumSet(cl)) {
      serializer = EnumSetSerializer.getInstance();
    } else if (JavaSerializer.getWriteReplace(cl) != null) {
      serializer = new JavaSerializer(cl, this._loader);
    }
    else if (HessianRemoteObject.class.isAssignableFrom(cl)) {
      serializer = new RemoteSerializer();


    
    }
    else if (Map.class.isAssignableFrom(cl)) {
      if (this._mapSerializer == null) {
        this._mapSerializer = new MapSerializer();
      }
      serializer = this._mapSerializer;
    } else if (Collection.class.isAssignableFrom(cl)) {
      if (this._collectionSerializer == null) {
        this._collectionSerializer = new CollectionSerializer();
      }
      
      serializer = this._collectionSerializer;
    } else if (cl.isArray()) {
      serializer = new ArraySerializer();
    } else if (Throwable.class.isAssignableFrom(cl)) {
      serializer = new ThrowableSerializer(cl, getClassLoader());
    } else if (InputStream.class.isAssignableFrom(cl)) {
      serializer = new InputStreamSerializer();
    } else if (Iterator.class.isAssignableFrom(cl)) {
      serializer = IteratorSerializer.create();
    } else if (Enumeration.class.isAssignableFrom(cl)) {
      serializer = EnumerationSerializer.create();
    } else if (Calendar.class.isAssignableFrom(cl)) {
      serializer = CalendarSerializer.create();
    } else if (Locale.class.isAssignableFrom(cl)) {
      serializer = LocaleSerializer.create();
    } else if (Enum.class.isAssignableFrom(cl)) {
      serializer = new EnumSerializer(cl);
    } 
  }
  if (serializer == null) {
    serializer = getDefaultSerializer(cl);
  }
  
  if (this._cachedSerializerMap == null) {
    this._cachedSerializerMap = new ConcurrentHashMap<Object, Object>(8);
  }
  
  this._cachedSerializerMap.put(cl, serializer);
  
  return serializer;
}


DeserializerCache

private ConcurrentHashMap _cachedDeserializerMap;

public Deserializer getDeserializer(Class<?> cl) throws HessianProtocolException {
  Deserializer deserializer = (Deserializer)_staticDeserializerMap.get(cl);
  if (deserializer != null) {
    return deserializer;
  }
  if (this._cachedDeserializerMap != null) {
    deserializer = (Deserializer)this._cachedDeserializerMap.get(cl);
    if (deserializer != null) {
      return deserializer;
    }
  } 
  
  int i = 0;
  for (; deserializer == null && this._factories != null && i < this._factories.size(); 
    i++) {
    
    AbstractSerializerFactory factory = this._factories.get(i);
    
    deserializer = factory.getDeserializer(cl);
  } 
  
  if (deserializer == null)
    if (Collection.class.isAssignableFrom(cl)) {
      deserializer = new CollectionDeserializer(cl);
    }
    else if (Map.class.isAssignableFrom(cl)) {
      deserializer = new MapDeserializer(cl);
    }
    else if (cl.isInterface()) {
      deserializer = new ObjectDeserializer(cl);
    }
    else if (cl.isArray()) {
      deserializer = new ArrayDeserializer(cl.getComponentType());
    }
    else if (Enumeration.class.isAssignableFrom(cl)) {
      deserializer = EnumerationDeserializer.create();
    }
    else if (Enum.class.isAssignableFrom(cl)) {
      deserializer = new EnumDeserializer(cl);
    }
    else if (Class.class.equals(cl)) {
      deserializer = new ClassDeserializer(this._loader);
    } else {
      
      deserializer = getDefaultDeserializer(cl);
    }  
  if (this._cachedDeserializerMap == null) {
    this._cachedDeserializerMap = new ConcurrentHashMap<Object, Object>(8);
  }
  this._cachedDeserializerMap.put(cl, deserializer);
  
  return deserializer;
}


As shown in the above source code, we found four local caches. Unfortunately, these four local caches are all private, and we cannot initialize them directly. However, we still found a method from the source code that can indirectly initialize and warm up these four local caches. The code is as follows:

DESC_CLASS_CACHE, NAME_CLASS_CACHE warm-up code

// DESC_CLASS_CACHE预热
ReflectUtils.desc2classArray(ReflectUtils.getDesc(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest")));
// NAME_CLASS_CACHE预热
ReflectUtils.name2class("cn.jdl.oms.express.model.CreateExpressOrderRequest");



SerializerCache, DeserializerCache warm-up code

public class JsfSerializerFactoryPreheat extends HessianSerializerFactory {

    public static void doPreheat(String className) {
        try {
            // 序列化
            JsfSerializerFactoryPreheat.SERIALIZER_FACTORY.getSerializer(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest"));
            // 反序列化
            JsfSerializerFactoryPreheat.SERIALIZER_FACTORY.getDeserializer(Class.forName(className));
        } catch (Exception e) {
            // do nothing
            log.error("JsfSerializerFactoryPreheat failed:", e);
        }
    }
}


From the JSF source code for the interface input and output parameter encoding, decoding, serialization, and deserialization operations, we also thought that the application interface has Fastjson serialization operations for the input and output parameters, and the SerializeConfig needs to be initialized during Fastjson serialization, which will have a certain impact on performance. Impact (refer to
https://www.ktanx.com/blog/p/3181). We can initialize and warm up Fastjson with the following code:

JSON.parseObject(JSON.toJSONString(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest").newInstance()), Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest"));


So far, we have done the following for app startup warmup:

• JSF delayed release

• JSF-BZ thread pool preheating

• JSF-SEV-WORKER thread warm-up

• JSF encoding, decoding, serialization, deserialization cache preheating

• Fastjson initialization warm-up

After the above warm-up operations, the phenomenon of service jitter caused by application deployment has been significantly improved, from 10000ms-20000ms before governance to 2000ms-3000ms (slightly higher than the daily traffic jitter).

solution

Based on the above analysis, JSF thread pool preheating, local cache preheating, and Fastjson preheating are integrated and packaged to provide a simple and usable preheating tool. The Jar package has been uploaded to the private server. If you are interested, please refer to the instructions: application startup preheating Instructions for use of thermal tools.

Service jitter caused by application deployment is a common problem. Currently, there are the following options for this problem:

1. JSF official warm-up program (
https://cf.jd.com/pages/viewpage.action?pageId=1132755015)

Principle: Use JSF1.7.6's preheating strategy to dynamically deliver, through the server load balancing capability, adjust the traffic weight of the interface that needs to be preheated when going online, and test run with small traffic to achieve the purpose of preheating.

Advantages: The platform configuration is sufficient, and the access cost is low.

Disadvantages: preheating by weight, resource preheating is insufficient; the service caller needs to upgrade the JSF version to 1.7.6, and it is difficult to promote the version upgrade when there are many upstream callers.

2. Preheating scheme for traffic recording and playback

Principle: Record the real traffic on the line, and then replay the traffic to the newly deployed machine through pressure testing to achieve the purpose of warming up.

Advantages: Combined with Xingyun deployment and orchestration, offline, deployment, warm-up, and online, the warm-up can be made more fully by means of stress testing.

Disadvantages: The use process is cumbersome; it is only friendly to the read interface, and the write interface needs to pay attention to whether the data has an impact on the line.

3. The scheme of this article

Principle: Preheat the system by initializing the JSF thread pool, local cache, and Fastjson of the service provider.

Advantages: resources are fully preheated; easy to use and support custom extensions.

Disadvantages: It does not support middleware other than JSF, such as Redis, ES, etc., but it can be implemented through custom extensions.

warm-up effect

Before warming up:

After warming up:

Using the warm-up tool provided in this article, the comparison effect before and after warm-up is obvious. As shown in the figure above, the performance MAX value of the caller method has been reduced from the original 10000ms-20000ms to 2000ms-3000ms, which is basically close to the daily MAX shake point.

Summarize

Upstream service jitter caused by application deployment is a common problem. If the upstream system is sensitive to service jitter, or if it will cause business impact, this issue still needs to be paid enough attention to. The Baichuan distribution system involved in this article simply provides JSF services to the outside world without the introduction of other middleware. It is characterized by many interfaces and a large number of calls.

This problem is not obvious in the early stage of system operation, and the upstream deployment is basically insignificant. However, as the number of calls increases, the problem gradually becomes prominent. This problem can be alleviated simply by expanding the capacity, but this will bring a lot of resources. Waste, contrary to the "cost reduction" principle. To this end, starting from the existing clues, we gradually dig into the JSF source code, and fully initialize and warm up the thread pool and local cache when the system starts, so as to effectively reduce the service jitter when JSF goes online.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/8657367