[HBase]记一个state异常

昨天联调的时候发现一个问题,请求发送后,页面上一直不返回数据,好像一直在waiting。用stack看了下,第一个线程在等待hbase RPC请求返回数据

"http-bio-8080-exec-5" daemon prio=10 tid=0x00002aaab3661000 nid=0x2bb6 in Object.wait() [0x0000000045258000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000d2067280> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:907)
        - locked <0x00000000d2067280> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:157)
        at $Proxy9.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:198)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:335)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:312)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1293)
        - locked <0x00000000d1d791e0> (a java.lang.String)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1

 而其他线程则处于block状态

"http-bio-8080-exec-8" daemon prio=10 tid=0x0000000041d07800 nid=0x2f5b waiting for monitor entry [0x0000000046d74000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1283)
        - waiting to lock <0x00000000d1d791e0> (a java.lang.String)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1249)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1236)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:945)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:942)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:845)

因为之前看过对应的代码,请见 http://iwinit.iteye.com/blog/1815347 和 http://iwinit.iteye.com/blog/1811265

看代码是region location的时候RPC meta表所在regionserver的时候卡住了。因为系统刚起来的时候没有cache,所以请求都需要region location,.meta.只在一台rs上,所以都block住了。再看之前的RPC分析文章,第一个请求确实已经发出去了,处于wait状态,也就是等待IO线程notify。奇怪的是,jstack里并没有IO线程(IPC关键字)。也就是说第一个请求线程永远无法被唤醒了:(。Log也没有任何记录,开始各种debug,因为我们客户端是自己的client,SecureConnection,其receiveResponse方法如下

try {
        int id = in.readInt();                    // try to read an id
 
        if (LOG.isDebugEnabled())
          LOG.debug(getName() + " got value #" + id);
//这里把call删除掉了,后续IO线程会因为空闲而自动关闭退出
        Call call = calls.remove(id);
//协议中的state字段
        int state = in.readInt();     // read call status
        if (LOG.isDebugEnabled()) {
          LOG.debug("call #"+id+" state is " + state);
        }
//对state处理,注意这里没有从的else处理,也就是说如果state不是这3种的话,直接就不处理了,但是call请求确已经被删除掉了。。。
        if (state == Status.SUCCESS.state) {
          Writable value = ReflectionUtils.newInstance(valueClass, conf);
          value.readFields(in);                 // read value
          if (LOG.isDebugEnabled()) {
            LOG.debug("call #"+id+", response is:\n"+value.toString());
          }
          call.setValue(value);
        } else if (state == Status.ERROR.state) {
          call.setException(new RemoteException(WritableUtils.readString(in),
                                                WritableUtils.readString(in)));
        } else if (state == Status.FATAL.state) {
          // Close the connection
          markClosed(new RemoteException(WritableUtils.readString(in),
                                         WritableUtils.readString(in)));
        }
      } catch (IOException e) {

 debug的时候发现server端范围的state是5xxxxxx的数字。。。原因是sever段没有启用secure导致协议不一致了。。client没有容错导致。

 

 

猜你喜欢

转载自iwinit.iteye.com/blog/1828593