Hotspot 热点代码编译和栈上替换源码解析

一、InvocationCounter

1、定义

2、init方法

3、_invocation_counter和_backedge_counter

4、reinitialize

二、InterpreterGenerator::generate_counter_incr

三、InterpreterGenerator::generate_counter_overflow

四、for循环

1、字节码实现

2、goto指令

在上一篇《Hotspot 字节码执行与栈顶缓存实现源码解析》中分析InterpreterGenerator::generate_normal_entry方法的实现时提到了用于增加方法调用计数的InterpreterGenerator::generate_counter_incr方法和在方法调用计数达到阈值后自动触发方法编译的InterpreterGenerator::generate_counter_overflow方法，本篇就详细探讨这两个方法的实现为线索研究热点代码编译的实现。热点代码编译实际是有两个场景的，一种是正常的方法调用，调用次数超过阈值形成热点，一种是通过for循环执行一段代码，for循环的次数超过阈值形成热点，这两种情形都会触发方法的编译，不同的是前者是提交一个编译任务给后台编译线程，编译完成后通过适配器将对原来字节码的调用转换成对本地代码的调用，后者是尽可能的立即完成编译，并且在完成必要的栈帧迁移转换后立即执行编译后的本地代码，即完成栈上替换。

一、InvocationCounter

1、定义

InvocationCounter表示一个调用计数器，可以在初始化时为不同状态下的InvocationCounter定义对应的阈值和触发的动作，当计数超过阈值时自动执行初始化时指定的触发动作。为了节省内存空间，InvocationCounter的状态和计数被编码在一个int数据（4字节，32位，对应_counter属性）中，各部分占的位数如下图，其中carry相当于一个分隔符，只占1位。

InvocationCounter定义的状态枚举State如下：

即实际的有效状态只有前面两种。

InvocationCounter定义的私有属性如下，注意除第一个外，其他的都是静态属性

_counter：unsigned int，保存调用计数，state。
_init：int [number_of_states]，不同State下的阈值
_action：Action [number_of_states]，不同State下的达到阈值执行的动作
InterpreterInvocationLimit：int，执行方法编译的阈值
InterpreterBackwardBranchLimit：int，执行栈上替换的阈值
InterpreterProfileLimit：int，收集解释器执行性能数据的阈值

其中Action的定义如下：

InvocationCounter定义的方法主要是属性操作相关的方法，重点关注其init方法和reinitialize方法的实现。

2、init方法

init方法用于初始化计数器相关属性的，源码说明如下：

void InvocationCounter::init() {
  _counter = 0;  //所有的位都初始化成0
  reset();
}

void InvocationCounter::reset() {
  // Only reset the state and don't make the method look like it's never
  // been executed
  set_state(wait_for_compile);
}

void InvocationCounter::set_state(State state) {
//校验state的合法性
  assert(0 <= state && state < number_of_states, "illegal state");
  //获取该state下的调用计数，初始为0
  int init = _init[state];
  // prevent from going to zero, to distinguish from never-executed methods
  //初始状态下count()返回0，init也是0
  //当运行一段时间发生状态切换后，count()返回值大于0，如果此时init==0说明是第一次执行此状态下的调用，将init初始化为1
  if (init == 0 && count() > 0)  init = 1;
  int carry = (_counter & carry_mask);    // the carry bit is sticky
  初始化counter
  _counter = (init << number_of_noncount_bits) | carry | state;
}

//返回总的调用计数
int    count() const                           { return _counter >> number_of_noncount_bits; }

其中init方法的调用链如下：

MethodCounters用于热点代码跟踪中的方法调用计数，其中MethodData用于保存JIT编译器为了优化代码而收集的方法执行性能相关数据，初始化为null。两者都定义了两个InvocationCounter的属性，且属性名一样，以MethodCounters为例，如下：

上述调用链就是初始化各自的 InvocationCounter的属性。

3、_invocation_counter和_backedge_counter

其中_invocation_counter记录方法调用的次数，_backedge_counter记录循环跳转的次数，可以用如下测试用例说明：

import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;

public class HSDBTest {

    public static void main(String[] args) {

        forTest();

        while (true){
            try {
                System.out.println(getProcessID());
                Thread.sleep(600*1000);
            } catch (Exception e) {

            }
        }
    }

    public static void forTest(){
        int a=0;
        for (int i=0;i<10000;i++){
            a++;
        }
        System.out.println(a);
    }

    public static final int getProcessID() {
        RuntimeMXBean runtimeMXBean = ManagementFactory.getRuntimeMXBean();
        System.out.println(runtimeMXBean.getName());
        return Integer.valueOf(runtimeMXBean.getName().split("@")[0])
                .intValue();
    }

}

运行起来后通过HSDB查看forTest方法对应的Method实例的相关属性，结果如下：

这里的9和80001都是 _counter的值，实际的调用计数需要将其右移三位计算，如

结果如下：

与forTest中循环了10000次，forTest本身只调用了一次完全一致。

4、reinitialize

reinitialize方法主要用于初始化InvocationCounter的静态属性，其源码实现如下：

void InvocationCounter::reinitialize(bool delay_overflow) {
  //确保number_of_states小于等于4
  guarantee((int)number_of_states <= (int)state_limit, "adjust number_of_state_bits");
  //设置两种状态下的触发动作
  def(wait_for_nothing, 0, do_nothing);
  //如果延迟处理，delay_overflow肯定是true，所以不会走到dummy_invocation_counter_overflow，该方法是空实现
  if (delay_overflow) {
    def(wait_for_compile, 0, do_decay);
  } else {
    def(wait_for_compile, 0, dummy_invocation_counter_overflow);
  }

  //计算InterpreterInvocationLimit等阈值
  InterpreterInvocationLimit = CompileThreshold << number_of_noncount_bits;
  InterpreterProfileLimit = ((CompileThreshold * InterpreterProfilePercentage) / 100)<< number_of_noncount_bits;


  if (ProfileInterpreter) {
    InterpreterBackwardBranchLimit = (CompileThreshold * (OnStackReplacePercentage - InterpreterProfilePercentage)) / 100;
  } else {
    InterpreterBackwardBranchLimit = ((CompileThreshold * OnStackReplacePercentage) / 100) << number_of_noncount_bits;
  }

  //校验计算结果的合法性
  assert(0 <= InterpreterBackwardBranchLimit,
         "OSR threshold should be non-negative");
  assert(0 <= InterpreterProfileLimit &&
         InterpreterProfileLimit <= InterpreterInvocationLimit,
         "profile threshold should be less than the compilation threshold "
         "and non-negative");
}

static address do_nothing(methodHandle method, TRAPS) {
  //获取并校验目标方法的MethodCounters
  MethodCounters* mcs = method->method_counters();
  assert(mcs != NULL, "");
  //重置调用计数为CompileThreshold的一般
  mcs->invocation_counter()->set_carry();
  //显示的将状态置为wait_for_nothing
  mcs->invocation_counter()->set_state(InvocationCounter::wait_for_nothing);
  return NULL;
}

void InvocationCounter::set_carry() {
  //执行set_carry_flag后，_counter会变得很大
  set_carry_flag();
  int old_count = count();
  //new_count的值一般情况下取后者
  int new_count = MIN2(old_count, (int) (CompileThreshold / 2));
  if (new_count == 0)  new_count = 1;
  //重置调用计数
  if (old_count != new_count)  set(state(), new_count);
}

inline void InvocationCounter::set(State state, int count) {
  assert(0 <= state && state < number_of_states, "illegal state");
  int carry = (_counter & carry_mask);    // the carry bit is sticky
  //重新计算_counter
  _counter = (count << number_of_noncount_bits) | carry | state;
}

  void set_carry_flag()                          {  _counter |= carry_mask; }

static address do_decay(methodHandle method, TRAPS) {
   //获取并校验目标方法的MethodCounters
  MethodCounters* mcs = method->method_counters();
  assert(mcs != NULL, "");
  mcs->invocation_counter()->decay();
  return NULL;
}

inline void InvocationCounter::decay() {
  int c = count();
  //将c右移一位，实际效果相当于除以2
  int new_count = c >> 1;
  //避免new_count变成0
  if (c > 0 && new_count == 0) new_count = 1;
  //重置调用计数
  set(state(), new_count);
}

address dummy_invocation_counter_overflow(methodHandle m, TRAPS) {
  ShouldNotReachHere();
  return NULL;
}

能够修改_action属性的只有InvocationCounter::def方法，该方法和reinitialize方法的调用链如下：

即InvocationCounter::def方法只有InvocationCounter::reinitialize方法调用。reinitialize方法的两个调用方，其传递的参数都是全局属性DelayCompilationDuringStartup，该属性默认为true，表示在启动的时候延迟编译从而快速启动，以invocationCounter_init的调用为例，如下图：

从上述源码分析可知，该方法并没有将方法编译作为InvocationCounter的Action，那么方法编译是什么时候触发的了？

二、InterpreterGenerator::generate_counter_incr

InterpreterGenerator::generate_counter_incr方法主要用于增加MethodData或者MethodCounters中的调用计数属性，并在超过阈值时跳转到特定的分支，具体跳转的目标地址需要结合InterpreterGenerator::generate_normal_entry方法的实现，其在generate_normal_entry的调用如下图：

invocation_counter_overflow，profile_method，profile_method_continue三个标签的绑定都在后面。


// increment invocation count & check for overflow
//
// rbx: method
void InterpreterGenerator::generate_counter_incr(
        Label* overflow,
        Label* profile_method,
        Label* profile_method_continue) {
  Label done;
  // Note: In tiered we increment either counters in Method* or in MDO depending if we're profiling or not.
  //如果启用分级编译，server模式下默认启用
  if (TieredCompilation) {
    //因为InvocationCounter的_counter中调用计数部分是前29位，所以增加一次调用计数不是从1开始，而是1<<3即8
    int increment = InvocationCounter::count_increment;
    //Tier0InvokeNotifyFreqLog默认值是7，count_shift是_counter属性中非调用计数部分的位数，这里是3
    int mask = ((1 << Tier0InvokeNotifyFreqLog)  - 1) << InvocationCounter::count_shift;
    Label no_mdo;
    //如果开启性能收集
    if (ProfileInterpreter) {
      // Are we profiling?
       //校验Method中的_method_data属性非空，如果为空则跳转到no_mdo
      __ movptr(rax, Address(rbx, Method::method_data_offset()));
      __ testptr(rax, rax);
      __ jccb(Assembler::zero, no_mdo);
      //获取MethodData的_invocation_counter属性的_counter属性的地址
      const Address mdo_invocation_counter(rax, in_bytes(MethodData::invocation_counter_offset()) +
                                                in_bytes(InvocationCounter::counter_offset()));
      //此时rcx中的值无意义                                         
      __ increment_mask_and_jump(mdo_invocation_counter, increment, mask, rcx, false, Assembler::zero, overflow);
      __ jmp(done);
    }
    __ bind(no_mdo);
    //获取MethodCounters的_invocation_counter属性的_counter属性的地址，get_method_counters方法会将MethodCounters的地址放入rax中
    const Address invocation_counter(rax,
                  MethodCounters::invocation_counter_offset() +
                  InvocationCounter::counter_offset());
    //获取MethodCounters的地址并将其放入rax中             
    __ get_method_counters(rbx, rax, done);
    //增加计数
    __ increment_mask_and_jump(invocation_counter, increment, mask, rcx,
                               false, Assembler::zero, overflow);
    __ bind(done);
  } else {
    //获取MethodCounters的_backedge_counter属性的_counter属性的地址
    const Address backedge_counter(rax,
                  MethodCounters::backedge_counter_offset() +
                  InvocationCounter::counter_offset());
    //获取MethodCounters的_invocation_counter属性的_counter属性的地址
    const Address invocation_counter(rax,
                  MethodCounters::invocation_counter_offset() +
                  InvocationCounter::counter_offset());
    //获取MethodCounters的地址并将其放入rax中
    __ get_method_counters(rbx, rax, done);

    //如果开启性能收集
    if (ProfileInterpreter) {
      //因为value为0，所以这里啥都不做
      __ incrementl(Address(rax,
              MethodCounters::interpreter_invocation_counter_offset()));
    }
    //更新invocation_counter
    __ movl(rcx, invocation_counter);
    __ incrementl(rcx, InvocationCounter::count_increment);
    __ movl(invocation_counter, rcx); // save invocation count

    __ movl(rax, backedge_counter);   // load backedge counter
    //计算出status的位
    __ andl(rax, InvocationCounter::count_mask_value); // mask out the status bits
    //将rcx中的调用计数同rax中的status做且运算
    __ addl(rcx, rax);                // add both counters

    // profile_method is non-null only for interpreted method so
    // profile_method != NULL == !native_call

    if (ProfileInterpreter && profile_method != NULL) {
      //如果rcx的值小于InterpreterProfileLimit，则跳转到profile_method_continue
      __ cmp32(rcx, ExternalAddress((address)&InvocationCounter::InterpreterProfileLimit));
      __ jcc(Assembler::less, *profile_method_continue);

      //如果大于，则校验methodData是否存在，如果不存在则跳转到profile_method
      __ test_method_data_pointer(rax, *profile_method);
    }
    //比较rcx的值是否超过InterpreterInvocationLimit，如果大于等于则跳转到overflow
    __ cmp32(rcx, ExternalAddress((address)&InvocationCounter::InterpreteoverflowrInvocationLimit));
    __ jcc(Assembler::aboveEqual, *overflow);
    __ bind(done);
  }
}


void InterpreterMacroAssembler::increment_mask_and_jump(Address counter_addr,
                                                        int increment, int mask,
                                                        Register scratch, bool preloaded,
                                                Condition cond, Label* where) {
   //preloaded一般传false                                      
  if (!preloaded) {
    //将_counter属性的值复制到scratch，即rcx中
    movl(scratch, counter_addr);
  }
  //将_counter属性增加increment
  incrementl(scratch, increment);
  //将scratch寄存器中的值写入到_counter属性
  movl(counter_addr, scratch);
  //将mask与scratch中的值做且运算
  andl(scratch, mask);
  if (where != NULL) {
    //如果且运算的结果是0，即达到阈值的时候，则跳转到where，即overflow处
    jcc(cond, *where);
  }
}

void InterpreterMacroAssembler::get_method_counters(Register method,
                                                    Register mcs, Label& skip) {
  Label has_counters;
  //获取当前Method的_method_counters属性
  movptr(mcs, Address(method, Method::method_counters_offset()));
  //校验_method_counters属性是否非空，如果不为空则跳转到has_counters
  testptr(mcs, mcs);
  jcc(Assembler::notZero, has_counters);
  //如果为空，则调用build_method_counters方法创建一个新的MethodCounters
  call_VM(noreg, CAST_FROM_FN_PTR(address,
          InterpreterRuntime::build_method_counters), method);
  //将新的MethodCounters的地址放入mcs中，校验其是否为空，如果为空则跳转到skip
  movptr(mcs, Address(method,Method::method_counters_offset()));
  testptr(mcs, mcs);
  jcc(Assembler::zero, skip); // No MethodCounters allocated, OutOfMemory
  bind(has_counters);
}

从上述源码分析可知，MethodData主要用于保存解释器方法执行性能的数据，是C2优化的基础；MethodCounters主要用于保存方法调用计数相关。

三、InterpreterGenerator::generate_counter_overflow

InterpreterGenerator::generate_counter_overflow方法用于处理方法调用计数超过阈值的情形，是触发方法编译的入口。其在InterpreterGenerator::generate_normal_entry方法中的调用如下图：

InterpreterGenerator::generate_counter_incr中判断overflow后就会跳转到invocation_counter_overflow标签处，即执行generate_counter_overflow方法。generate_counter_overflow方法的源码说明如下：

void InterpreterGenerator::generate_counter_overflow(Label* do_continue) {

  // Asm interpreter on entry
  // r14 - locals
  // r13 - bcp
  // rbx - method
  // edx - cpool --- DOES NOT APPEAR TO BE TRUE
  // rbp - interpreter frame

  // On return (i.e. jump to entry_point) [ back to invocation of interpreter ]
  // Everything as it was on entry
  // rdx is not restored. Doesn't appear to really be set.

  //InterpreterRuntime::frequency_counter_overflow需要两个参数，第一个参数thread在执行call_VM时传递，第二个参数表明
  //调用计数超过阈值是否发生在循环分支上，如果否则传递NULL，我们传递0，即NULL，如果是则传该循环的跳转分支地址
  //这个方法返回编译后的方法的入口地址，如果编译没有完成则返回NULL
  __ movl(c_rarg1, 0);
  __ call_VM(noreg,
             CAST_FROM_FN_PTR(address,
                              InterpreterRuntime::frequency_counter_overflow),
             c_rarg1);
  //恢复rbx中的Method*，method_offset是全局变量
  __ movptr(rbx, Address(rbp, method_offset));   // restore Method*
  //跳转到do_continue标签
  __ jmp(*do_continue, relocInfo::none);
}


nmethod* InterpreterRuntime::frequency_counter_overflow(JavaThread* thread, address branch_bcp) {
  //非OSR，即非栈上替换方法，永远返回null，即不会立即执行编译，而是提交任务给后台编译线程编译
  nmethod* nm = frequency_counter_overflow_inner(thread, branch_bcp);
  assert(branch_bcp != NULL || nm == NULL, "always returns null for non OSR requests");
  if (branch_bcp != NULL && nm != NULL) {
    //目标方法是一个需要栈上替换的方法，因为frequency_counter_overflow_inner返回的nm没有加载，所以需要再次查找
    frame fr = thread->last_frame();
    Method* method =  fr.interpreter_frame_method();
    int bci = method->bci_from(fr.interpreter_frame_bcp());
    nm = method->lookup_osr_nmethod_for(bci, CompLevel_none, false);
  }
  return nm;
}

//branch_bcp表示调用计数超过阈值时循环跳转的地址
IRT_ENTRY(nmethod*,
          InterpreterRuntime::frequency_counter_overflow_inner(JavaThread* thread, address branch_bcp))
  // use UnlockFlagSaver to clear and restore the _do_not_unlock_if_synchronized
  // flag, in case this method triggers classloading which will call into Java.
  UnlockFlagSaver fs(thread);

  frame fr = thread->last_frame();
  //验证当前方法是解释执行方法
  assert(fr.is_interpreted_frame(), "must come from interpreter");
  //获取当前解释执行的方法
  methodHandle method(thread, fr.interpreter_frame_method());
  //branch_bcp非空则获取其相对于方法字节码起始地址code_base的偏移，否则等于InvocationEntryBci，InvocationEntryBci表明这是非栈上替换的方法编译
  const int branch_bci = branch_bcp != NULL ? method->bci_from(branch_bcp) : InvocationEntryBci;
  const int bci = branch_bcp != NULL ? method->bci_from(fr.interpreter_frame_bcp()) : InvocationEntryBci;
  //校验是否发生异常
  assert(!HAS_PENDING_EXCEPTION, "Should not have any exceptions pending");
  //如果要求栈上替换则返回该方法对应的nmethod，否则返回空，然后提交一个方法编译的任务给后台编译线程
  nmethod* osr_nm = CompilationPolicy::policy()->event(method, method, branch_bci, bci, CompLevel_none, NULL, thread);
  assert(!HAS_PENDING_EXCEPTION, "Event handler should not throw any exceptions");

  if (osr_nm != NULL) {
    //如果使用偏向锁，则将当前栈帧持有的所有偏向锁都释放调用，因为这些偏向锁在栈上替换的时候需要迁移
    if (UseBiasedLocking) {
      ResourceMark rm;
      GrowableArray<Handle>* objects_to_revoke = new GrowableArray<Handle>();
      for( BasicObjectLock *kptr = fr.interpreter_frame_monitor_end();
           kptr < fr.interpreter_frame_monitor_begin();
           kptr = fr.next_monitor_in_interpreter_frame(kptr) ) {
        if( kptr->obj() != NULL ) {
          objects_to_revoke->append(Handle(THREAD, kptr->obj()));
        }
      }
      BiasedLocking::revoke(objects_to_revoke);
    }
  }
  return osr_nm;
IRT_END

int Method::bci_from(address bcp) const {
  return bcp - code_base();
}

nmethod* lookup_osr_nmethod_for(int bci, int level, bool match_level) {
    //method_holder方法返回该方法所属的Klass
    return method_holder()->lookup_osr_nmethod(this, bci, level, match_level);
  }


nmethod* InstanceKlass::lookup_osr_nmethod(const Method* m, int bci, int comp_level, bool match_level) const {
  // This is a short non-blocking critical region, so the no safepoint check is ok.
  //获取操作OsrList的锁
  OsrList_lock->lock_without_safepoint_check();
  //返回_osr_nmethods_head属性，即栈上替换的nmethod链表的头
  nmethod* osr = osr_nmethods_head();
  nmethod* best = NULL;
  while (osr != NULL) {
    //校验这个方法是栈上替换方法
    assert(osr->is_osr_method(), "wrong kind of nmethod found in chain");

    if (osr->method() == m &&
        (bci == InvocationEntryBci || osr->osr_entry_bci() == bci)) {
        //如果要求comp_level匹配
      if (match_level) {
        //校验osr的comp_level与待查找方法的comp_level是否匹配
        if (osr->comp_level() == comp_level) {
          // Found a match - return it.
          OsrList_lock->unlock();
          return osr;
        }
      } else {
      //查找该方法编译优化级别最高的osr，如果找到了则返回
        if (best == NULL || (osr->comp_level() > best->comp_level())) {
          if (osr->comp_level() == CompLevel_highest_tier) {
            // Found the best possible - return it.
            OsrList_lock->unlock();
            return osr;
          }
          best = osr;
        }
      }
    }
    //不是目标方法，继续查找下一个
    osr = osr->osr_link();
  }
  OsrList_lock->unlock();
  //如果没有最高优化级别的osr，则要求其优化级别大于或者等于要求的级别
  if (best != NULL && best->comp_level() >= comp_level && match_level == false) {
    return best;
  }
  return NULL;
}

从上述源码分析可知，generate_counter_overflow方法触发的主要是方法调用次数超过阈值这种情形下的方法编译，这种编译不是立即执行的，不需要做栈上替换，而是提交一个任务给后台编译线程，编译线程编译完成后自动完成相关替换。

上述源码涉及了两个关键枚举，MethodCompilation和CompLevel，这两者的定义都在globalDefinitions.hpp中，如下图：

MethodCompilation表示方法编译的类型，栈上替换和非栈上替换，其中InvocationEntryBci表示非栈上替换方法编译。

CompLevel表示分级编译的级别，默认情况下是CompLevel_none即解释执行，注释中的C1，C2和Shark都是Hotspot中包含的编译器类型，其中C1只能做简单优化，但是编译快；C2和Shark能够做更复杂的编译优化，但是更耗时，且需要开启profile收集方法执行的性能数据。

四、for循环

上述源码分析只是解释了正常方法调用次数超过阈值时触发方法编译的情形，那么对于方法执行过程中循环次数超过阈值的情形是怎么处理的了？

1、字节码实现

以上述forTest方法的字节码为例说明，通过javap -v可查看对应方法的字节码，各字节码说明如下：

public static void forTest();
    descriptor: ()V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=2, args_size=0
         0: iconst_0   将常量0放到栈顶，即变量a的初始化
         1: istore_0   将栈顶的int变量放到本地变量表中索引为0的位置
         2: iconst_0   将常量0放到栈顶，即变量i的初始化
         3: istore_1   将栈顶的int变量放到本地变量表中索引为1的位置
         4: iload_1    将本地变量表中索引为1的int变量放到栈顶，即变量i放到栈顶
         5: sipush        10000  将short类型变量10000放到栈顶，short类型变量会在执行时扩展成int变量
         8: if_icmpge     20  if_icmpge指令会比较栈顶的两个值，如果变量i>=10000时会跳转到偏移量是20的指令处，即从getstatic指令开始执行
        11: iinc          0, 1  将本地变量中索引为0的变量自增1，即变量a的自增
        14: iinc          1, 1  将本地变量中索引为1的变量自增1，即变量i的自增
        17: goto          4   跳转到偏移量是4的指令处，即iload_1
        20: getstatic     #3   获取System类的out属性    // Field java/lang/System.out:Ljava/io/PrintStream;
        23: iload_0   将本地变量表中索引为0的int变量放到栈顶，即变量a放到栈顶
        24: invokevirtual #5    调用println方法              // Method java/io/PrintStream.println:(I)V
        27: return   返回
      //行号表
      LineNumberTable:
        line 21: 0
        line 22: 2
        line 23: 11
        line 22: 14
        line 25: 20
        line 26: 27
      //本地变量表
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            4      16     1     i   I
            2      26     0     a   I

从上述字节码分析可知，for循环中实现跳转的核心指令就是goto指令了，下面来分析goto指令的实现。

2、goto指令

在templateTable.cpp中该指令的实现如下：

branch方法不仅用于goto指令的实现，而是几乎所有的分支跳转指令的基础，该方法的调用链如下：

branch方法的实现跟CPU相关，我们重点关注templateTable_x86_64.cpp中的实现。其源码说明如下：

void TemplateTable::branch(bool is_jsr, bool is_wide) {
  //将当前栈帧中保存的Method* 拷贝到rcx中
  __ get_method(rcx); // rcx holds method
  //如果开启了profile则执行分支跳转相关的性能统计
  __ profile_taken_branch(rax, rbx); // rax holds updated MDP, rbx
                                     // holds bumped taken count

  const ByteSize be_offset = MethodCounters::backedge_counter_offset() +
                             InvocationCounter::counter_offset();
  const ByteSize inv_offset = MethodCounters::invocation_counter_offset() +
                              InvocationCounter::counter_offset();

  //如果是宽指令
  if (is_wide) {
    __ movl(rdx, at_bcp(1));
  } else {
    //将当前字节码位置往后偏移1字节处开始的2字节数据读取到rdx中
    __ load_signed_short(rdx, at_bcp(1));
  }
  //将rdx中的值字节次序变反
  __ bswapl(rdx);

  if (!is_wide) {
    //将rdx中的值右移16位，上述两步就是为了计算跳转分支的偏移量
    __ sarl(rdx, 16);
  }
  //将rdx中的数据从2字节扩展成4字节
  __ movl2ptr(rdx, rdx);

  //如果是jsr指令
  if (is_jsr) {
    // Pre-load the next target bytecode into rbx
    __ load_unsigned_byte(rbx, Address(r13, rdx, Address::times_1, 0));

    // compute return address as bci in rax
    __ lea(rax, at_bcp((is_wide ? 5 : 3) -
                        in_bytes(ConstMethod::codes_offset())));
    __ subptr(rax, Address(rcx, Method::const_offset()));
    // Adjust the bcp in r13 by the displacement in rdx
    __ addptr(r13, rdx);
    // jsr returns atos that is not an oop
    __ push_i(rax);
    __ dispatch_only(vtos);
    return;
  }

  // Normal (non-jsr) branch handling  正常的分支跳转处理

  //将当前字节码地址加上rdx保存的偏移量，计算跳转的目标地址
  __ addptr(r13, rdx);

 //校验这两个属性必须都为true，即栈上替换必须要求使用UseLoopCounter，这两个默认值都是true
  assert(UseLoopCounter || !UseOnStackReplacement,
         "on-stack-replacement requires loop counters");
  Label backedge_counter_overflow;
  Label profile_method;
  Label dispatch;
  if (UseLoopCounter) {
    // increment backedge counter for backward branches
    // rax: MDO
    // ebx: MDO bumped taken-count
    // rcx: method
    // rdx: target offset
    // r13: target bcp
    // r14: locals pointer
    //校验rdx是否大于0，如果大于0说明是往前跳转，如果小于0说明是往后跳转，如果大于0则跳转到dispatch，即通常的if分支判断
    __ testl(rdx, rdx);             // check if forward or backward branch
    __ jcc(Assembler::positive, dispatch); // count only if backward branch

    //如果是往回跳转，即通常的循环
    // check if MethodCounters exists
    Label has_counters;
    //获取_method_counters属性的地址到rax中，并校验其是否非空
    __ movptr(rax, Address(rcx, Method::method_counters_offset()));
    __ testptr(rax, rax);
    //如果非空则跳转到has_counters
    __ jcc(Assembler::notZero, has_counters);
    //如果为空，则通过InterpreterRuntime::build_method_counters方法创建一个新的MethodCounters
    __ push(rdx);
    __ push(rcx);
    __ call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::build_method_counters),
               rcx);
    __ pop(rcx);
    __ pop(rdx);
    __ movptr(rax, Address(rcx, Method::method_counters_offset()));
    //如果创建失败，则跳转到到dispatch分支
    __ jcc(Assembler::zero, dispatch);
    __ bind(has_counters);

    //如果启用分层编译，server模式下为true
    if (TieredCompilation) {
      Label no_mdo;
      int increment = InvocationCounter::count_increment;
      int mask = ((1 << Tier0BackedgeNotifyFreqLog) - 1) << InvocationCounter::count_shift;
      //如果开启profile性能收集，server模式下默认为true
      if (ProfileInterpreter) {
        // 获取_method_data属性到rbx中，并校验其是否为空，如果为空则跳转到no_mdo
        __ movptr(rbx, Address(rcx, in_bytes(Method::method_data_offset())));
        __ testptr(rbx, rbx);
        __ jccb(Assembler::zero, no_mdo);
        //_method_data属性不为空，则增加其中的backedge counter计数器，如果超过阈值则跳转到backedge_counter_overflow
        const Address mdo_backedge_counter(rbx, in_bytes(MethodData::backedge_counter_offset()) +
                                           in_bytes(InvocationCounter::counter_offset()));
        __ increment_mask_and_jump(mdo_backedge_counter, increment, mask, rax, false, Assembler::zero,
                                   UseOnStackReplacement ? &backedge_counter_overflow : NULL);
        __ jmp(dispatch);
      }
      __ bind(no_mdo);
      // Increment backedge counter in MethodCounters*
      __ movptr(rcx, Address(rcx, Method::method_counters_offset()));
      //增加_method_counters属性中的backedge_counter的调用计数，如果超过阈值则跳转到backedge_counter_overflow
      __ increment_mask_and_jump(Address(rcx, be_offset), increment, mask,
                                 rax, false, Assembler::zero,
                                 UseOnStackReplacement ? &backedge_counter_overflow : NULL);
    } else {
      //如果不启用分层编译，client模式下即C1编译下TieredCompilation为false
      //增加_method_counters属性中backedge counter计数
      __ movptr(rcx, Address(rcx, Method::method_counters_offset()));
      __ movl(rax, Address(rcx, be_offset));        // load backedge counter
      __ incrementl(rax, InvocationCounter::count_increment); // increment counter
      __ movl(Address(rcx, be_offset), rax);        // store counter
      
      //增加_method_counters属性中invocation counter计数
      __ movl(rax, Address(rcx, inv_offset));    // load invocation counter
      __ andl(rax, InvocationCounter::count_mask_value); // and the status bits
      __ addl(rax, Address(rcx, be_offset));        // add both counters

      //C1编译下，CompLevel为3时会开启有限的性能数据收集
      if (ProfileInterpreter) {
        //判断rax中的值是否大于InterpreterProfileLimit，如果小于则跳转到dispatch
        __ cmp32(rax,
                 ExternalAddress((address) &InvocationCounter::InterpreterProfileLimit));
        __ jcc(Assembler::less, dispatch);

        //从栈帧中获取methodData的指针，判断其是否为空，如果为空则跳转到profile_method
        __ test_method_data_pointer(rax, profile_method);

        //c1,c2下都为true
        if (UseOnStackReplacement) {
           //rbx中值在执行profile_taken_branch时，赋值成MDO backward count，判断其是否小于InterpreterBackwardBranchLimit
           //如果小于则跳转到dispatch
          __ cmp32(rbx,
                   ExternalAddress((address) &InvocationCounter::InterpreterBackwardBranchLimit));
          __ jcc(Assembler::below, dispatch);

          // When ProfileInterpreter is on, the backedge_count comes
          // from the MethodData*, which value does not get reset on
          // the call to frequency_counter_overflow().  To avoid
          // excessive calls to the overflow routine while the method is
          // being compiled, add a second test to make sure the overflow
          // function is called only once every overflow_frequency.
          //如果大于InterpreterBackwardBranchLimit，则跳转到backedge_counter_overflow
          const int overflow_frequency = 1024;
          __ andl(rbx, overflow_frequency - 1);
          __ jcc(Assembler::zero, backedge_counter_overflow);

        }
      } else {
        if (UseOnStackReplacement) {
         //rax中的值保存的是_method_counters属性两个计数器的累加值，判断其是否大于InterpreterBackwardBranchLimit，如果大于则跳转到backedge_counter_overflow
          __ cmp32(rax,
                   ExternalAddress((address) &InvocationCounter::InterpreterBackwardBranchLimit));
          __ jcc(Assembler::aboveEqual, backedge_counter_overflow);

        }
      }
    }
    __ bind(dispatch);
  }

  //r13已经变成目标跳转地址，这里是加载跳转地址的第一个字节码到rbx中
  __ load_unsigned_byte(rbx, Address(r13, 0));

  // continue with the bytecode @ target
  // eax: return bci for jsr's, unused otherwise
  // ebx: target bytecode
  // r13: target bcp
  //开始执行跳转地址处的字节码,后面的部分除非跳转到对应的标签处，否则不会执行
  __ dispatch_only(vtos);

 
  if (UseLoopCounter) {
    if (ProfileInterpreter) {
      // 执行profile_method，执行完成跳转至dispatch
      __ bind(profile_method);
      __ call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::profile_method));
      __ load_unsigned_byte(rbx, Address(r13, 0));  // restore target bytecode
      __ set_method_data_pointer_for_bcp();
      __ jmp(dispatch);
    }

    if (UseOnStackReplacement) {
      // 当超过阈值后会跳转到此分支
      __ bind(backedge_counter_overflow);
      //对rdx中的数取补码
      __ negptr(rdx);
      //将r13的地址加到rdx上，这两步是计算跳转地址
      __ addptr(rdx, r13); // branch bcp
      // 调用方法frequency_counter_overflow([JavaThread*], address branch_bcp)，其中第一个参数JavaThread通过call_vm传递，此处调用的方法和上一节一样
      __ call_VM(noreg,
                 CAST_FROM_FN_PTR(address,
                                  InterpreterRuntime::frequency_counter_overflow),
                 rdx);
      //恢复待执行的字节码
      __ load_unsigned_byte(rbx, Address(r13, 0));  // restore target bytecode

      // rax: osr nmethod (osr ok) or NULL (osr not possible)
      // ebx: target bytecode
      // rdx: scratch
      // r14: locals pointer
      // r13: bcp
      // 校验frequency_counter_overflow方法返回的编译结果是否为空，如果为空则跳转到dispatch，即继续执行字节码
      __ testptr(rax, rax);                        // test result
      __ jcc(Assembler::zero, dispatch);         // no osr if null
      // nmethod may have been invalidated (VM may block upon call_VM return)
      //如果不为空，即表示方法编译完成，将_entry_bci属性的偏移复制到rcx中
      __ movl(rcx, Address(rax, nmethod::entry_bci_offset()));
      //如果rcx等于InvalidOSREntryBci，则跳转到dispatch
      __ cmpl(rcx, InvalidOSREntryBci);
      __ jcc(Assembler::equal, dispatch);

      //开始执行栈上替换了
      // We have the address of an on stack replacement routine in eax
      // We need to prepare to execute the OSR method. First we must
      // migrate the locals and monitors off of the stack.
      //将rax中的osr的地址拷贝到r13中
      __ mov(r13, rax);                             // save the nmethod
      //调用OSR_migration_begin方法，完成栈帧上变量和monitor的迁移
      call_VM(noreg, CAST_FROM_FN_PTR(address, SharedRuntime::OSR_migration_begin));

      // eax is OSR buffer, move it to expected parameter location
      //将rax中的值拷贝到j_rarg0
      __ mov(j_rarg0, rax);

      // We use j_rarg definitions here so that registers don't conflict as parameter
      // registers change across platforms as we are in the midst of a calling
      // sequence to the OSR nmethod and we don't want collision. These are NOT parameters.

      const Register retaddr = j_rarg2;
      const Register sender_sp = j_rarg1;

      // 从当前调用栈pop出原来解释器的栈帧
      __ movptr(sender_sp, Address(rbp, frame::interpreter_frame_sender_sp_offset * wordSize)); // get sender sp
      __ leave();                                // remove frame anchor
      __ pop(retaddr);                           // get return address
      __ mov(rsp, sender_sp);                   // set sp to sender sp
      // Ensure compiled code always sees stack at proper alignment
      __ andptr(rsp, -(StackAlignmentInBytes));

      // push the return address
      __ push(retaddr);

      // 跳转到OSR nmethod，开始执行
      __ jmp(Address(r13, nmethod::osr_entry_point_offset()));
    }
  }
}

从上述源码分析可知，当某个方法往回跳转的次数即循环的次数超过阈值会触发方法的即时编译，并且用编译后的本地代码替换掉原来的字节码指令，所谓的栈上替换就是替换调用入口地址，将原来解释器上的变量，monitor等迁移到编译后的本地代码对应的栈帧中。