Android development god Jake Wharton asks: Divide by 2 and shift right by 1. Who is better?

I have been trying to port the AndroidX collection library to Kotlin multiplatform to test binary compatibility, performance, ease of use and different memory models. Some data structures in the class library use binary trees based on arrays to store elements. There are many places in Java code shift operation instead of the second power division. When ported to Kotlin, these codes will be transformed into slightly twisted infix operators, which somewhat confuses the code intent.

Regarding the better performance of shift operations and multiplication/division , I have done some research. Most people have heard of "shift operations have better performance", but they also doubt its authenticity. Some people think that the compiler may do some optimizations before the code runs to the CPU.

In order to satisfy my curiosity and avoid using Kotlin's infix shift operator, I will answer who is better and some related questions. Let's go!

Who optimized the code?

Before our code is executed CPU, there are several important compilers: javac/kotlinc, D8、R8 and  ART .

There are opportunities for optimization at every step, but have they done it?

class Example {
  static int multiply(int value) {
    return value * 2;
  }
  static int divide(int value) {
    return value / 2;
  }
  static int shiftLeft(int value) {
    return value << 1;
  }
  static int shiftRight(int value) {
    return value >> 1;
  }
}

Compile the above code under JDK 14 and  javap display the bytecode.

$ javac Example.java
$ javap -c Example
Compiled from "Example.java"
class Example {
  static int multiply(int);
    Code:
       0: iload_0
       1: iconst_2
       2: imul
       3: ireturn

  static int divide(int);
    Code:
       0: iload_0
       1: iconst_2
       2: idiv
       3: ireturn

  static int shiftLeft(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishl
       3: ireturn

  static int shiftRight(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishr
       3: ireturn
}

Each method starts with a command, which means to load the first parameter. Multiplication and division are used instruction to load literal 2. It was then performed , and instructions to carry out int type of division. Shifting operation is loaded to a literal, then use and instructions shift operation. iload_0 iconst_2imul idiv ishlishr

There is no optimization here, but if you know something about java, you won't be surprised. javacIt is not a compiler that will optimize, but most of the work is left to the runtime compiler or AOT on the JVM.

kotlinc

fun multiply(value: Int) = value * 2
fun divide(value: Int) = value / 2
fun shiftLeft(value: Int) = value shl 1
fun shiftRight(value: Int) = value shr 1

Under Kotlin 1.4-M1 version, by  compiling Kotlin into Java bytecode, then use it to   view. kotlincjavap

$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int multiply(int);
    Code:
       0: iload_0
       1: iconst_2
       2: imul
       3: ireturn

  public static final int divide(int);
    Code:
       0: iload_0
       1: iconst_2
       2: idiv
       3: ireturn

  public static final int shiftLeft(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishl
       3: ireturn

  public static final int shiftRight(int);
    Code:
       0: iload_0
       1: iconst_1
       2: ishr
       3: ireturn
}

The output result is exactly the same as Java.

This is using the original JVM backend of Kotlin, but using the forthcoming IR-based backend (via -Xuse-ir) also produces the same output.

Frame the above sentence because I can’t understand ~

D8

Use the latest D8 compiler to generate a DEX file from the bytecode converted from the Kotlin code in the above example.

$ java -jar $R8_HOME/build/libs/d8.jar \
      --release \
      --output . \
      ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'divide'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000118:                              |[000118] ExampleKt.divide:(I)I
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00                         |0002: return v0
#1              : (in LExampleKt;)
  name          : 'multiply'
  type          : '(I)I'
  access        : 0x0019 (PUBLIC STATIC FINAL)
  code          -

000130:                              |[000130] ExampleKt.multiply:(I)I
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00                         |0002: return v0
#2              : (in LExampleKt;)
  name          : 'shiftLeft'
  type          : '(I)I'
  access        : 0x0019 (PUBLIC STATIC FINAL)
  code          -
000148:                              |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00                         |0002: return v0
#3              : (in LExampleKt;)
  name          : 'shiftRight'
  type          : '(I)I'
  access        : 0x0019 (PUBLIC STATIC FINAL)
  code          -

(Slightly optimized output results)

Dalvik bytecode is based on registers, and Java bytecode is based on stacks. In the end, each method actually uses only one bytecode to manipulate the associated integer operations. They all use the v1 register to store the first method parameter, and also need a literal 1 or 2.

So no changes will be made. D8 is not an optimizing compiler (although it can do method-local optimization ).

R8

In order to run R8, we need to configure obfuscation rules to prevent our code from being removed.

-keep,allowoptimization class ExampleKt {
  <methods>;
}

The above rules --pg-conf are passed through  parameters

$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --release \
      --pg-conf rules.txt \
      --output . \
      ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'divide'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000118:                              |[000118] ExampleKt.divide:(I)I
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00                         |0002: return v0

    #1              : (in LExampleKt;)
      name          : 'multiply'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000130:                              |[000130] ExampleKt.multiply:(I)I
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00                         |0002: return v0

    #2              : (in LExampleKt;)
      name          : 'shiftLeft'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000148:                              |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101                    |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00                         |0002: return v0

    #3              : (in LExampleKt;)
      name          : 'shiftRight'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
000160:                              |[000160] ExampleKt.shiftRight:(I)I
000170: e100 0101                    |0000: shr-int/lit8 v0, v1, #int 1 // #01
000174: 0f00                         |0002: return v0

The output is exactly the same as D8.

ART

Use the Dalvik bytecode output from R8 above as the input of ART and run on the x86 virtual machine of Android 10.

$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ su
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
  0: int ExampleKt.divide(int) (dex_method_idx=0)
    CODE: (code_offset=0x00001010 size_offset=0x0000100c size=15)...
      0x00001010:     89C8      mov eax, ecx
      0x00001012:   8D5001      lea edx, [eax + 1]
      0x00001015:     85C0      test eax, eax
      0x00001017:   0F4DD0      cmovnl/ge edx, eax
      0x0000101a:     D1FA      sar edx
      0x0000101c:     89D0      mov eax, edx
      0x0000101e:       C3      ret
  1: int ExampleKt.multiply(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
      0x00001030:     D1E1      shl ecx
      0x00001032:     89C8      mov eax, ecx
      0x00001034:       C3      ret
  2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
    CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
      0x00001030:     D1E1      shl ecx
      0x00001032:     89C8      mov eax, ecx
      0x00001034:       C3      ret
  3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
    CODE: (code_offset=0x00001040 size_offset=0x0000103c size=5)...
      0x00001040:     D1F9      sar ecx
      0x00001042:     89C8      mov eax, ecx
      0x00001044:       C3      ret

(Slightly optimized output results)

The x86 assembly code shows that ART has intervened in mathematical operations and replaced some of them with shift operations.

First of all, multiplyand shiftLeftnow we have the same realization, they both use shlto a left shift operation. In addition, if you look at the file offset (the leftmost column), you will find that it is exactly the same. ART recognizes that these two methods have the same method body, and performs deduplication operations when compiling into x86 assembly code.

Then, divideand shiftRightimplementation is not the same, they are not commonly used sarto perform a right shift operation. divideIn the method invocation sarbefore the additional four instructions used to process the input is a negative number.

Execute the same instructions on an Android 10 Pixel4 device, let’s see how ART compiles the code into ARM assembly code.

OatDexFile:
0: LExampleKt; (offset=0x000005a4) (type_idx=1) (Verified) (OatClassAllCompiled)
  0: int ExampleKt.divide(int) (dex_mmultiply and shiftLeft ethod_idx=0)
    CODE: (code_offset=0x00001009 size_offset=0x00001004 size=10)...
      0x00001008: 0fc8      lsrs r0, r1, #31
      0x0000100a: 1841      adds r1, r0, r1
      0x0000100c: 1049      asrs r1, #1
      0x0000100e: 4608      mov r0, r1
      0x00001010: 4770      bx lr
  1: int ExampleKt.multiply(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
      0x00001020: 0048      lsls r0, r1, #1
      0x00001022: 4770      bx lr
  2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
    CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
      0x00001020: 0048      lsls r0, r1, #1
      0x00001022: 4770      bx lr
  3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
    CODE: (code_offset=0x00001031 size_offset=0x0000102c size=4)...
      0x00001030: 1048      asrs r0, r1, #1
      0x00001032: 4770      bx lr

Similarly, multiplyand shiftLeftused lslsto complete a left shift operation is repeated, and in addition to the method body. shiftRightBy asrsinstruction completion to the right, while the right shift instruction another division used lsrsto process the input is a negative number.

So far, we can say with certainty, using value << 1instead value * 2will not bring any benefit . Stop doing such things in arithmetic operations, and only keep them when bitwise operations are strictly required.

However, value / 2and value >> 1still produce different assembly instructions, so there will be a different performance. Fortunately, value / 2general division operations will not be performed , and they are still based on shift operations, so their performance differences may not be large.

Is shift faster than division?

In order to determine which is faster, shift operation or division operation, I used  Jetpack benchmark  for testing.

class DivideOrShiftTest {
  @JvmField @Rule val benchmark = BenchmarkRule()

  @Test fun divide() {
    val value = "4".toInt() // Ensure not a constant.
    var result = 0
    benchmark.measureRepeated {
      result = value / 2
    }
    println(result) // Ensure D8 keeps computation.
  }

  @Test fun shift() {
    val value = "4".toInt() // Ensure not a constant.
    var result = 0
    benchmark.measureRepeated {
      result = value shr 1
    }
    println(result) // Ensure D8 keeps computation.
  }
}

I don’t have an x86 device, so I tested it on Android 10 Pixel3 and the results are as follows:

android.studio.display.benchmark=4 ns DivideOrShiftTest.divide
count=4006
mean=4
median=4
min=4
standardDeviation=0

There is actually no difference between using division and shifting. The difference between them is nanoseconds. If you use a negative number, there will be no difference in the result.

So far, we can say with certainty, using value >> 1instead value / 2will not bring any benefit . Stop doing such things in arithmetic operations, and only keep them when bitwise operations are strictly required.

Can D8/R8 reduce Apk volume?

If there are two expressions for the same operation, the better performance should be selected. If the performance is the same, you should choose the one that can reduce the Apk volume.

Now we all know value * 2and value << 1produced the same assembly code on ART. Therefore, if which one can save more space on Dalvik, we should undoubtedly use it instead of another way of writing. Let's take a look at the output of D8, which also produces bytecode of the same size:

    #1              : (in LExampleKt;)
      name          : 'multiply'
      ⋮
000140: da00 0102                    |0000: mul-int/lit8 v0, v1, #int 2 // #02
#2              : (in LExampleKt;)
  name          : 'shiftLeft'
  ⋮

Multiplication may consume more space for storing literal quantities. Compare it  value * 32_768 with  value << 15 .

    #1              : (in LExampleKt;)
      name          : 'multiply'
      ⋮
000128: 1400 0080 0000               |0000: const v0, #float 0.000000 // #00008000
00012e: 9201 0100                    |0003: mul-int v1, v1, v0
#2              : (in LExampleKt;)
  name          : 'shiftLeft'
  ⋮

I mentioned this issue on D8  , but I strongly suspect that the probability of this occurrence is 0, so it is not worth it. The output of D8 and R8 also shows that for Dalvik,  the cost of value / 2 sum  value >> 1is the same.

    #0              : (in LExampleKt;)
      name          : 'divide'
      ⋮
000128: db00 0102                    |0000: div-int/lit8 v0, v1, #int 2 // #02
#2              : (in LExampleKt;)
  name          : 'shiftLeft'
  ⋮

When the size reaches literals 32768, the above will change the size of the bytecode. Due to the negative number, it is not absolutely safe to use the right shift to replace the division of the power of 2 unconditionally. We can make substitutions while guaranteeing non-negative numbers.

Does the division of unsigned numbers also use shifts?

Java bytecode does not have unsigned numbers, but you can use signed numbers to simulate. Java provides static methods to convert signed numbers to unsigned numbers. Kotlin provides an unsigned type UInt, which provides the same functionality, but unlike Java, it is independently abstracted as a data type. It is conceivable that the division of two powers can definitely be rewritten with a right shift operation.

Use Kotlin to demonstrate the following two situations.

fun javaLike(value: Int) = Integer.divideUnsigned(value, 2)
fun kotlinLike(value: UInt) = value / 2U

By  kotlinc compiling (Kotlin 1.4-M1)

$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int javaLike(int);
    Code:
       0: iload_0
       1: iconst_2
       2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
       5: ireturn
public static final int kotlinLike-WZ4Q5Ns(int);
Code:
0: iload_0
1: istore_1
2: iconst_2
3: istore_2
4: iconst_0
5: istore_3
6: iload_1
7: iload_2
8: invokestatic  #20       // Method kotlin/UnsignedKt."uintDivide-J1ME1BU":(II)I
11: ireturn
}

It is not recognized Kotlin a second power divider, it would have a iushrshift operation instead. I also submitted this issue to Jetbrain .

Use -Xuse-iwill not bring any change (in addition to removing some of the load / store). However, it is different for Java 8.

$ kotlinc -jvm-target 1.8 Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
  public static final int javaLike(int);
    Code:
       0: iload_0
       1: iconst_2
       2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
       5: ireturn
public static final int kotlinLike-WZ4Q5Ns(int);
Code:
0: iload_0
1: iconst_2
2: invokestatic  #12       // Method java/lang/Integer.divideUnsigned:(II)I
5: ireturn
}

Integer.divideUnsignedThe method is available since Java 8. Since the two function bodies are completely the same in this way, we still go back to the old version for comparison.

Next is R8. The obvious difference from the above is that we use the Kotlin standard library as input, and also specify the lowest api --min-api 24. Because Integer.divideUnsignedonly available in API 24 and beyond.

$ java -jar $R8_HOME/build/libs/r8.jar \
      --lib $ANDROID_HOME/platforms/android-29/android.jar \
      --min-api 24 \
      --release \
      --pg-conf rules.txt \
      --output . \
      ExampleKt.class kotlin-stdlib.jar
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '039'
Class #0            -
  Class descriptor  : 'LExampleKt;'
  Access flags      : 0x0011 (PUBLIC FINAL)
  Superclass        : 'Ljava/lang/Object;'
  Direct methods    -
    #0              : (in LExampleKt;)
      name          : 'javaLike'
      type          : '(I)I'
      access        : 0x0019 (PUBLIC STATIC FINAL)
      code          -
0000f8:                              |[0000f8] ExampleKt.javaLike:(I)I
000108: 1220                         |0000: const/4 v0, #int 2 // #2
00010a: 7120 0200 0100               |0001: invoke-static {v1, v0}, Ljava/lang/Integer;.divideUnsigned:(II)I // method@0002
000110: 0a01                         |0004: move-result v1
000112: 0f01                         |0005: return v1
#1              : (in LExampleKt;)
  name          : 'kotlinLike-WZ4Q5Ns'
  type          : '(I)I'
  access        : 0x0019 (PUBLIC STATIC FINAL)
  code          -

000114:                              |[000114] ExampleKt.kotlinLike-WZ4Q5Ns:(I)I
000124: 8160                         |0000: int-to-long v0, v6
000126: 1802 ffff ffff 0000 0000     |0001: const-wide v2, #double 0.000000 // #00000000ffffffff
000130: c020                         |0006: and-long/2addr v0, v2
000132: 1226                         |0007: const/4 v6, #int 2 // #2
000134: 8164                         |0008: int-to-long v4, v6
000136: c042                         |0009: and-long/2addr v2, v4
000138: be20                         |000a: div-long/2addr v0, v2
00013a: 8406                         |000b: long-to-int v6, v0
00013c: 0f06                         |000c: return v6

Kotlin has its own unsigned integer implementation, and it is directly inlined into the function body. It is implemented in this way, the parameters and literals are converted to long, the long is divided, and finally converted to int. When we eventually run them through ART they’re just translated to equivalent x86 so we’re going to leave this function behind. (这句没太懂). Optimization opportunities have been missed here.

For the Java version, R8 did not use shift operations instead divideUnsigned. I have submitted an issue to continue tracking.

The final optimization opportunity is ART.


 

$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ sugenzong
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
  0: int ExampleKt.javaLike(int) (dex_method_idx=0)
    CODE: (code_offset=0x00001010 size_offset=0x0000100c size=63)...
      0x00001010:         85842400E0FFFF             test eax, [esp + -8192]
        StackMap[0] (native_pc=0x1017, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
      0x00001017:                     55             push ebp
      0x00001018:                 83EC18             sub esp, 24
      0x0000101b:                 890424             mov [esp], eax
      0x0000101e:     6466833D0000000000             cmpw fs:[0x0], 0  ; state_and_flags
      0x00001027:           0F8519000000             jnz/ne +25 (0x00001046)
      0x0000102d:             E800000000             call +0 (0x00001032)
      0x00001032:                     5D             pop ebp
      0x00001033:             BA02000000             mov edx, 2
      0x00001038:           8B85CE0F0000             mov eax, [ebp + 4046]
      0x0000103e:                 FF5018             call [eax + 24]
        StackMap[1] (native_pc=0x1041, dex_pc=0x1, register_mask=0x0, stack_mask=0b)
      0x00001041:                 83C418             add esp, 24
      0x00001044:                     5D             pop ebp
      0x00001045:                     C3             ret
      0x00001046:         64FF15E0020000             call fs:[0x2e0]  ; pTestSuspend
        StackMap[2] (native_pc=0x104d, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
      0x0000104d:                   EBDE             jmp -34 (0x0000102d)
  1: int ExampleKt.kotlinLike-WZ4Q5Ns(int) (dex_method_idx=1)
    CODE: (code_offset=0x00001060 size_offset=0x0000105c size=67)...
      ⋮

ART does not have inline calls divideUnsigned, instead it uses regular method calls. I submitted this issue for tracking.

At last

It's been a long journey, congratulations you have done it (or just turned to the bottom of the article). Let us summarize.

  1. ART uses left shift/right shift to rewrite the multiplication/division of the power of two (additional instructions will be added when dealing with negative numbers).

  2. There is no significant performance gap between shift right and division by power of two.

  3. The Dalvik bytecode size of shift, multiplication and division is the same.

  4. No one has optimized unsigned division (at least not yet), but you probably haven't used it either.

With these facts, you can answer the question at the beginning of the article.

On Android, choose to divide by 2 or shift to the right by 1?

neither! Use shift operations only when you actually need bitwise operations, and use multiplication and division for other mathematical operations. I will start to switch the bitwise operation of the AndroidX collection to multiplication and division.

The articles are continuously updated every week. You can search for "Programming Ape Development Center" on WeChat to read and update them at the first time (one or two articles earlier than the blog), and " click on the interview/more information under the official account " to get it directly for free ①Summary of interview questions for Android development posts of first- and second-tier Internet companies (answer analysis) and ②Summary of Android architecture knowledge points pdf+③Super-clear Android advanced mind map.

 

Guess you like

Origin blog.csdn.net/qq_39477770/article/details/108773252