[Reserved] remember once due to improper use Redis application stuck bug troubleshooting and solutions!

Note: This article analyzes the problem of the idea of ​​being a good learning record, reproduced from the original public number.

 

First of all that under Description: The network API sandbox environment for one week of application stuck, all api no response phenomenon

When the test started to complain about slow response time environment, we restart the application, the application returns to normal, so do not deal with.

But then problems arise with increasing frequency frequent, more and more colleagues began to complain, so the feeling of the code may have problems, start troubleshooting.

First, the development of the local ide found no problems are found, the application stuck when the database, redis are normal, and no special error log. Began to suspect that the problem is a sandbox environment machine (test environment itself is very crisp! _!)

So ssh on the server execute the following command

top

 

Then found the machine fairly normal, it intended to look jvm stack information

Look at the issue more consumption of resources application threads

carried out top -H -p 12798

 

Find the thread before three relatively resource intensive

jstack view of heap memory

jstack  12798 | grep  12799 of hex 31ff

 

I did not see any problem, but also look down 10 rows, then execution

 

See some of the threads are in a lock state. But there is no code of business-related appearance, ignored. At this time no clue. Think about it. Decision to abandon the jammed state machine

In order to protect the scene of the accident to dump all the problems process heap memory, and then restart the debug mode test environments, intended for direct remote debug a problem when the problem reassert machine

The next day the problem recurs, then informed the operation and maintenance nginx forwards remove this problem apply their own remote debug tomcat.

Free to find himself an interface, the interface breakpoint at the entrance where the tragedy began, nothing happens! API waiting for service response, went into the break.

At this time a bit ignorant force, calm for a while, got a break aop place before the entrance, and then debug again, this time into the break, after f8 N times found in redis command execution when the main card.

Continue with, and eventually found the problem in one place to jedis of:

 1 /**
 2  * Returns a Jedis instance to be used as a Redis connection. The instance can be newly created or retrieved from a
 3  * pool.
 4  * 
 5  * @return Jedis instance ready for wrapping into a {@link RedisConnection}.
 6  */
 7 protected Jedis fetchJedisConnector() {
 8    try {
 9       if (usePool && pool != null) {
10          return pool.getResource();
11       }
12       Jedis jedis = new Jedis(getShardInfo());
13       // force initialization (see Jedis issue #82)
14       jedis.connect();
15       return jedis;
16    } catch (Exception ex) {
17       throw new RedisConnectionFailureException("Cannot get Jedis connection", ex);
18    }
19 }

Above pool.getResource () after the beginning thread wait

. 1  public T the getResource () {
 2    the try {
 . 3      return internalPool.borrowObject ();
 . 4    } the catch (Exception E) {
 . 5      the throw  new new JedisConnectionException ( "Could Not The GET A Resource from the pool" , E);
 . 6    }
 . 7  }
 . 8  
. 9  return internalPool.borrowObject (); this code is a code should be leased, then with
 10  
. 11  public T borrowObject ( Long borrowMaxWaitMillis) throws Exception {
 12 is      the this .assertOpen ();
 13 is     AbandonedConfig ac = this.abandonedConfig;
14     if (ac != null && ac.getRemoveAbandonedOnBorrow() && this.getNumIdle() < 2 && this.getNumActive() > this.getMaxTotal() - 3) {
15         this.removeAbandoned(ac);
16     }
17 
18     PooledObject p = null;
19     boolean blockWhenExhausted = this.getBlockWhenExhausted();
20     long waitTime = 0L;
21 
22     while(p == null) {
23         boolean create = false;
24         if (blockWhenExhausted) {
25             p = (PooledObject)this.idleObjects.pollFirst();
26             if (p == null) {
27                 create = true;
28                 p = this.create();
29             }
30 
31             if (p == null) {
32                 if (borrowMaxWaitMillis < 0L) {
33                     p = (PooledObject)this.idleObjects.takeFirst();
34                 } else {
35                     waitTime = System.currentTimeMillis();
36                     p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS);
37                     waitTime = System.currentTimeMillis() - waitTime;
38                 }
39             }
40 
41             if (p == null) {
42                 throw new NoSuchElementException("Timeout waiting for idle object");
43             }

Which sections of code

1 if (p == null) {
2     if (borrowMaxWaitMillis < 0L) {
3         p = (PooledObject)this.idleObjects.takeFirst();
4     } else {
5         waitTime = System.currentTimeMillis();
6         p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS);
7         waitTime = System.currentTimeMillis() - waitTime;
8     }
9 }

borrowMaxWaitMillis <0 would have been executed, and then the cycle began to suspect that this has not configured values

Find redis pool configuration and found that is not configured MaxWaitMillis, after configuring the else is an Exception not solve the problem

Continue F8

 1 public E takeFirst() throws InterruptedException {
 2     this.lock.lock();
 3 
 4     Object var2;
 5     try {
 6         Object x;
 7         while((x = this.unlinkFirst()) == null) {
 8             this.notEmpty.await();
 9         }
10 
11         var2 = x;
12     } finally {
13         this.lock.unlock();
14     }
15 
16     returnvar2;
17 }

Here to find lock word, I began to suspect that all requests are blocked api

Then again ssh server installation arthas, (Arthas is a diagnostic tool Alibaba Java open source)

Command execution thread

 

Http-nio found a large number of threads waiting state, http-nio-8083-exec- this thread is actually out of tomcat threads http request

Free to find a thread to view the heap memory

thread -428

 

It is able to confirm that the problem has been around in api, is this connection redis get the code caused,

Interpretation of this code is that all memory threads are waiting for @ 53e5504e object to release the lock. So jstack global search, a 53e5504e, did not find the thread object is located.

Since then. Can determine the cause of the problem is the problem of obtaining redis connection. But what causes can not get the connection can not be determined

Again arthas execution of thread -b (thread -b, find out the current thread block other threads)

 

no result. Here and think differently, you should be able to find a thread is blocked, then I looked under the command of the document and found following sentence

 

Well, we happen to be the latter. . . .

Finishing again at the idea. This amendment redis pool configuration, will get a connection timeout to 2s, then the complex issues again now observe what is done when the last normal.

Add some configuration

1 JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory();
2 .......
3 JedisPoolConfig config = new JedisPoolConfig();
4 config.setMaxWaitMillis(2000);
5 .......
6 jedisConnectionFactory.afterPropertiesSet();

Restart the service, wait. . . .

After another day, again recurring

ssh server, check the tomcat accesslog, found that a large number of requests appear api 500,

org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource fr
om the pool
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:140)
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:229)
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:57)
    at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:128)
    at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:91)
    at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:78)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:177)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:152)
    at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:85)
    at org.springframework.data.redis.core.DefaultHashOperations.get(DefaultHashOperations.java:48)

 

Find the source of the first occurrence of 500 places,

The following code found

.......
Cursor c = stringRedisTemplate.getConnectionFactory().getConnection().scan(options);
while (c.hasNext()) {
.....,,
   }

This code analysis, stringRedisTemplate.getConnectionFactory (). The getConnection () After obtaining the pool redisConnection, and no subsequent operation

That this time redis link connection pool after the lease has not been released or returned to the link pool, although business has been processed redisConnection has been idle, but the pool of state redisConnection not return to idle state

 

Normal should be

 

Since then the problem has been found.

Summary: spring stringRedisTemplate to do some routine operations redis package, but does not support commands such as Scan SetNx, then need to get some special jedis Connection Commands

use

stringRedisTemplate.getConnectionFactory().getConnection()

It is not recommended

We can use

1 stringRedisTemplate.execute(new RedisCallback() {
2 
3      @Override
4      public Cursor doInRedis(RedisConnection connection) throws DataAccessException {
5 
6        return connection.scan(options);
7      }
8    });

 

After performing, or finished using the Connection, with

RedisConnectionUtils.releaseConnection(conn, factory);

To release the connection.

Meanwhile, redis do not recommend the use of keys in command, redis pool configuration should be accompanied by a reasonable, otherwise the error log problem occurs, no error, positioning difficult.

 

[Reserved] Description

Author: wooden -_-

Source: https: //my.oschina.net/xiaomu0082/blog/2990388

Guess you like

Origin www.cnblogs.com/wang-meng/p/11569697.html
Recommended