I have been trying to port the AndroidX collection library to Kotlin multiplatform to test binary compatibility, performance, ease of use and different memory models. Some data structures in the class library use binary trees based on arrays to store elements. There are many places in Java code shift operation instead of the second power division. When ported to Kotlin, these codes will be transformed into slightly twisted infix operators, which somewhat confuses the code intent.
Regarding the better performance of shift operations and multiplication/division , I have done some research. Most people have heard of "shift operations have better performance", but they also doubt its authenticity. Some people think that the compiler may do some optimizations before the code runs to the CPU.
In order to satisfy my curiosity and avoid using Kotlin's infix shift operator, I will answer who is better and some related questions. Let's go!
Who optimized the code?
Before our code is executed CPU, there are several important compilers: javac/kotlinc
, D8、R8
and ART
.
There are opportunities for optimization at every step, but have they done it?
class Example {
static int multiply(int value) {
return value * 2;
}
static int divide(int value) {
return value / 2;
}
static int shiftLeft(int value) {
return value << 1;
}
static int shiftRight(int value) {
return value >> 1;
}
}
Compile the above code under JDK 14 and javap
display the bytecode.
$ javac Example.java
$ javap -c Example
Compiled from "Example.java"
class Example {
static int multiply(int);
Code:
0: iload_0
1: iconst_2
2: imul
3: ireturn
static int divide(int);
Code:
0: iload_0
1: iconst_2
2: idiv
3: ireturn
static int shiftLeft(int);
Code:
0: iload_0
1: iconst_1
2: ishl
3: ireturn
static int shiftRight(int);
Code:
0: iload_0
1: iconst_1
2: ishr
3: ireturn
}
Each method starts with a command, which means to load the first parameter. Multiplication and division are used instruction to load literal 2. It was then performed , and instructions to carry out int type of division. Shifting operation is loaded to a literal, then use and instructions shift operation. iload_0
iconst_2
imul
idiv
ishl
ishr
There is no optimization here, but if you know something about java, you won't be surprised. javac
It is not a compiler that will optimize, but most of the work is left to the runtime compiler or AOT on the JVM.
kotlinc
fun multiply(value: Int) = value * 2
fun divide(value: Int) = value / 2
fun shiftLeft(value: Int) = value shl 1
fun shiftRight(value: Int) = value shr 1
Under Kotlin 1.4-M1 version, by compiling Kotlin into Java bytecode, then use it to view. kotlinc
javap
$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
public static final int multiply(int);
Code:
0: iload_0
1: iconst_2
2: imul
3: ireturn
public static final int divide(int);
Code:
0: iload_0
1: iconst_2
2: idiv
3: ireturn
public static final int shiftLeft(int);
Code:
0: iload_0
1: iconst_1
2: ishl
3: ireturn
public static final int shiftRight(int);
Code:
0: iload_0
1: iconst_1
2: ishr
3: ireturn
}
The output result is exactly the same as Java.
This is using the original JVM backend of Kotlin, but using the forthcoming IR-based backend (via -Xuse-ir) also produces the same output.
Frame the above sentence because I can’t understand ~
D8
Use the latest D8 compiler to generate a DEX file from the bytecode converted from the Kotlin code in the above example.
$ java -jar $R8_HOME/build/libs/d8.jar \
--release \
--output . \
ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0 -
Class descriptor : 'LExampleKt;'
Access flags : 0x0011 (PUBLIC FINAL)
Superclass : 'Ljava/lang/Object;'
Direct methods -
#0 : (in LExampleKt;)
name : 'divide'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000118: |[000118] ExampleKt.divide:(I)I
000128: db00 0102 |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00 |0002: return v0
#1 : (in LExampleKt;)
name : 'multiply'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000130: |[000130] ExampleKt.multiply:(I)I
000140: da00 0102 |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00 |0002: return v0
#2 : (in LExampleKt;)
name : 'shiftLeft'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000148: |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101 |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00 |0002: return v0
#3 : (in LExampleKt;)
name : 'shiftRight'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
(Slightly optimized output results)
Dalvik bytecode is based on registers, and Java bytecode is based on stacks. In the end, each method actually uses only one bytecode to manipulate the associated integer operations. They all use the v1 register to store the first method parameter, and also need a literal 1 or 2.
So no changes will be made. D8 is not an optimizing compiler (although it can do method-local optimization ).
R8
In order to run R8, we need to configure obfuscation rules to prevent our code from being removed.
-keep,allowoptimization class ExampleKt {
<methods>;
}
The above rules --pg-conf
are passed through parameters
$ java -jar $R8_HOME/build/libs/r8.jar \
--lib $ANDROID_HOME/platforms/android-29/android.jar \
--release \
--pg-conf rules.txt \
--output . \
ExampleKt.class
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '035'
Class #0 -
Class descriptor : 'LExampleKt;'
Access flags : 0x0011 (PUBLIC FINAL)
Superclass : 'Ljava/lang/Object;'
Direct methods -
#0 : (in LExampleKt;)
name : 'divide'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000118: |[000118] ExampleKt.divide:(I)I
000128: db00 0102 |0000: div-int/lit8 v0, v1, #int 2 // #02
00012c: 0f00 |0002: return v0
#1 : (in LExampleKt;)
name : 'multiply'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000130: |[000130] ExampleKt.multiply:(I)I
000140: da00 0102 |0000: mul-int/lit8 v0, v1, #int 2 // #02
000144: 0f00 |0002: return v0
#2 : (in LExampleKt;)
name : 'shiftLeft'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000148: |[000148] ExampleKt.shiftLeft:(I)I
000158: e000 0101 |0000: shl-int/lit8 v0, v1, #int 1 // #01
00015c: 0f00 |0002: return v0
#3 : (in LExampleKt;)
name : 'shiftRight'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000160: |[000160] ExampleKt.shiftRight:(I)I
000170: e100 0101 |0000: shr-int/lit8 v0, v1, #int 1 // #01
000174: 0f00 |0002: return v0
The output is exactly the same as D8.
ART
Use the Dalvik bytecode output from R8 above as the input of ART and run on the x86 virtual machine of Android 10.
$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ su
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
0: int ExampleKt.divide(int) (dex_method_idx=0)
CODE: (code_offset=0x00001010 size_offset=0x0000100c size=15)...
0x00001010: 89C8 mov eax, ecx
0x00001012: 8D5001 lea edx, [eax + 1]
0x00001015: 85C0 test eax, eax
0x00001017: 0F4DD0 cmovnl/ge edx, eax
0x0000101a: D1FA sar edx
0x0000101c: 89D0 mov eax, edx
0x0000101e: C3 ret
1: int ExampleKt.multiply(int) (dex_method_idx=1)
CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
0x00001030: D1E1 shl ecx
0x00001032: 89C8 mov eax, ecx
0x00001034: C3 ret
2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
CODE: (code_offset=0x00001030 size_offset=0x0000102c size=5)...
0x00001030: D1E1 shl ecx
0x00001032: 89C8 mov eax, ecx
0x00001034: C3 ret
3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
CODE: (code_offset=0x00001040 size_offset=0x0000103c size=5)...
0x00001040: D1F9 sar ecx
0x00001042: 89C8 mov eax, ecx
0x00001044: C3 ret
(Slightly optimized output results)
The x86 assembly code shows that ART has intervened in mathematical operations and replaced some of them with shift operations.
First of all, multiply
and shiftLeft
now we have the same realization, they both use shl
to a left shift operation. In addition, if you look at the file offset (the leftmost column), you will find that it is exactly the same. ART recognizes that these two methods have the same method body, and performs deduplication operations when compiling into x86 assembly code.
Then, divide
and shiftRight
implementation is not the same, they are not commonly used sar
to perform a right shift operation. divide
In the method invocation sar
before the additional four instructions used to process the input is a negative number.
Execute the same instructions on an Android 10 Pixel4 device, let’s see how ART compiles the code into ARM assembly code.
OatDexFile:
0: LExampleKt; (offset=0x000005a4) (type_idx=1) (Verified) (OatClassAllCompiled)
0: int ExampleKt.divide(int) (dex_mmultiply and shiftLeft ethod_idx=0)
CODE: (code_offset=0x00001009 size_offset=0x00001004 size=10)...
0x00001008: 0fc8 lsrs r0, r1, #31
0x0000100a: 1841 adds r1, r0, r1
0x0000100c: 1049 asrs r1, #1
0x0000100e: 4608 mov r0, r1
0x00001010: 4770 bx lr
1: int ExampleKt.multiply(int) (dex_method_idx=1)
CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
0x00001020: 0048 lsls r0, r1, #1
0x00001022: 4770 bx lr
2: int ExampleKt.shiftLeft(int) (dex_method_idx=2)
CODE: (code_offset=0x00001021 size_offset=0x0000101c size=4)...
0x00001020: 0048 lsls r0, r1, #1
0x00001022: 4770 bx lr
3: int ExampleKt.shiftRight(int) (dex_method_idx=3)
CODE: (code_offset=0x00001031 size_offset=0x0000102c size=4)...
0x00001030: 1048 asrs r0, r1, #1
0x00001032: 4770 bx lr
Similarly, multiply
and shiftLeft
used lsls
to complete a left shift operation is repeated, and in addition to the method body. shiftRight
By asrs
instruction completion to the right, while the right shift instruction another division used lsrs
to process the input is a negative number.
So far, we can say with certainty, using value << 1
instead value * 2
will not bring any benefit . Stop doing such things in arithmetic operations, and only keep them when bitwise operations are strictly required.
However, value / 2
and value >> 1
still produce different assembly instructions, so there will be a different performance. Fortunately, value / 2
general division operations will not be performed , and they are still based on shift operations, so their performance differences may not be large.
Is shift faster than division?
In order to determine which is faster, shift operation or division operation, I used Jetpack benchmark for testing.
class DivideOrShiftTest {
@JvmField @Rule val benchmark = BenchmarkRule()
@Test fun divide() {
val value = "4".toInt() // Ensure not a constant.
var result = 0
benchmark.measureRepeated {
result = value / 2
}
println(result) // Ensure D8 keeps computation.
}
@Test fun shift() {
val value = "4".toInt() // Ensure not a constant.
var result = 0
benchmark.measureRepeated {
result = value shr 1
}
println(result) // Ensure D8 keeps computation.
}
}
I don’t have an x86 device, so I tested it on Android 10 Pixel3 and the results are as follows:
android.studio.display.benchmark=4 ns DivideOrShiftTest.divide
count=4006
mean=4
median=4
min=4
standardDeviation=0
There is actually no difference between using division and shifting. The difference between them is nanoseconds. If you use a negative number, there will be no difference in the result.
So far, we can say with certainty, using value >> 1
instead value / 2
will not bring any benefit . Stop doing such things in arithmetic operations, and only keep them when bitwise operations are strictly required.
Can D8/R8 reduce Apk volume?
If there are two expressions for the same operation, the better performance should be selected. If the performance is the same, you should choose the one that can reduce the Apk volume.
Now we all know value * 2
and value << 1
produced the same assembly code on ART. Therefore, if which one can save more space on Dalvik, we should undoubtedly use it instead of another way of writing. Let's take a look at the output of D8, which also produces bytecode of the same size:
#1 : (in LExampleKt;)
name : 'multiply'
⋮
000140: da00 0102 |0000: mul-int/lit8 v0, v1, #int 2 // #02
#2 : (in LExampleKt;)
name : 'shiftLeft'
⋮
Multiplication may consume more space for storing literal quantities. Compare it value * 32_768
with value << 15
.
#1 : (in LExampleKt;)
name : 'multiply'
⋮
000128: 1400 0080 0000 |0000: const v0, #float 0.000000 // #00008000
00012e: 9201 0100 |0003: mul-int v1, v1, v0
#2 : (in LExampleKt;)
name : 'shiftLeft'
⋮
I mentioned this issue on D8 , but I strongly suspect that the probability of this occurrence is 0, so it is not worth it. The output of D8 and R8 also shows that for Dalvik, the cost of value / 2
sum value >> 1
is the same.
#0 : (in LExampleKt;)
name : 'divide'
⋮
000128: db00 0102 |0000: div-int/lit8 v0, v1, #int 2 // #02
#2 : (in LExampleKt;)
name : 'shiftLeft'
⋮
When the size reaches literals 32768
, the above will change the size of the bytecode. Due to the negative number, it is not absolutely safe to use the right shift to replace the division of the power of 2 unconditionally. We can make substitutions while guaranteeing non-negative numbers.
Does the division of unsigned numbers also use shifts?
Java bytecode does not have unsigned numbers, but you can use signed numbers to simulate. Java provides static methods to convert signed numbers to unsigned numbers. Kotlin provides an unsigned type UInt
, which provides the same functionality, but unlike Java, it is independently abstracted as a data type. It is conceivable that the division of two powers can definitely be rewritten with a right shift operation.
Use Kotlin to demonstrate the following two situations.
fun javaLike(value: Int) = Integer.divideUnsigned(value, 2)
fun kotlinLike(value: UInt) = value / 2U
By kotlinc
compiling (Kotlin 1.4-M1)
$ kotlinc Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
public static final int javaLike(int);
Code:
0: iload_0
1: iconst_2
2: invokestatic #12 // Method java/lang/Integer.divideUnsigned:(II)I
5: ireturn
public static final int kotlinLike-WZ4Q5Ns(int);
Code:
0: iload_0
1: istore_1
2: iconst_2
3: istore_2
4: iconst_0
5: istore_3
6: iload_1
7: iload_2
8: invokestatic #20 // Method kotlin/UnsignedKt."uintDivide-J1ME1BU":(II)I
11: ireturn
}
It is not recognized Kotlin a second power divider, it would have a iushr
shift operation instead. I also submitted this issue to Jetbrain .
Use -Xuse-i
will not bring any change (in addition to removing some of the load / store). However, it is different for Java 8.
$ kotlinc -jvm-target 1.8 Example.kt
$ javap -c ExampleKt
Compiled from "Example.kt"
public final class ExampleKt {
public static final int javaLike(int);
Code:
0: iload_0
1: iconst_2
2: invokestatic #12 // Method java/lang/Integer.divideUnsigned:(II)I
5: ireturn
public static final int kotlinLike-WZ4Q5Ns(int);
Code:
0: iload_0
1: iconst_2
2: invokestatic #12 // Method java/lang/Integer.divideUnsigned:(II)I
5: ireturn
}
Integer.divideUnsigned
The method is available since Java 8. Since the two function bodies are completely the same in this way, we still go back to the old version for comparison.
Next is R8. The obvious difference from the above is that we use the Kotlin standard library as input, and also specify the lowest api --min-api 24
. Because Integer.divideUnsigned
only available in API 24 and beyond.
$ java -jar $R8_HOME/build/libs/r8.jar \
--lib $ANDROID_HOME/platforms/android-29/android.jar \
--min-api 24 \
--release \
--pg-conf rules.txt \
--output . \
ExampleKt.class kotlin-stdlib.jar
$ dexdump -d classes.dex
Opened 'classes.dex', DEX version '039'
Class #0 -
Class descriptor : 'LExampleKt;'
Access flags : 0x0011 (PUBLIC FINAL)
Superclass : 'Ljava/lang/Object;'
Direct methods -
#0 : (in LExampleKt;)
name : 'javaLike'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
0000f8: |[0000f8] ExampleKt.javaLike:(I)I
000108: 1220 |0000: const/4 v0, #int 2 // #2
00010a: 7120 0200 0100 |0001: invoke-static {v1, v0}, Ljava/lang/Integer;.divideUnsigned:(II)I // method@0002
000110: 0a01 |0004: move-result v1
000112: 0f01 |0005: return v1
#1 : (in LExampleKt;)
name : 'kotlinLike-WZ4Q5Ns'
type : '(I)I'
access : 0x0019 (PUBLIC STATIC FINAL)
code -
000114: |[000114] ExampleKt.kotlinLike-WZ4Q5Ns:(I)I
000124: 8160 |0000: int-to-long v0, v6
000126: 1802 ffff ffff 0000 0000 |0001: const-wide v2, #double 0.000000 // #00000000ffffffff
000130: c020 |0006: and-long/2addr v0, v2
000132: 1226 |0007: const/4 v6, #int 2 // #2
000134: 8164 |0008: int-to-long v4, v6
000136: c042 |0009: and-long/2addr v2, v4
000138: be20 |000a: div-long/2addr v0, v2
00013a: 8406 |000b: long-to-int v6, v0
00013c: 0f06 |000c: return v6
Kotlin has its own unsigned integer implementation, and it is directly inlined into the function body. It is implemented in this way, the parameters and literals are converted to long, the long is divided, and finally converted to int. When we eventually run them through ART they’re just translated to equivalent x86 so we’re going to leave this function behind. (这句没太懂)
. Optimization opportunities have been missed here.
For the Java version, R8 did not use shift operations instead divideUnsigned
. I have submitted an issue to continue tracking.
The final optimization opportunity is ART.
$ adb push classes.dex /sdcard/classes.dex
$ adb shell
generic_x86:/ $ sugenzong
generic_x86:/ # dex2oat --dex-file=/sdcard/classes.dex --oat-file=/sdcard/classes.oat
generic_x86:/ # oatdump --oat-file=/sdcard/classes.oat
OatDexFile:
0: LExampleKt; (offset=0x000003c0) (type_idx=1) (Initialized) (OatClassAllCompiled)
0: int ExampleKt.javaLike(int) (dex_method_idx=0)
CODE: (code_offset=0x00001010 size_offset=0x0000100c size=63)...
0x00001010: 85842400E0FFFF test eax, [esp + -8192]
StackMap[0] (native_pc=0x1017, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
0x00001017: 55 push ebp
0x00001018: 83EC18 sub esp, 24
0x0000101b: 890424 mov [esp], eax
0x0000101e: 6466833D0000000000 cmpw fs:[0x0], 0 ; state_and_flags
0x00001027: 0F8519000000 jnz/ne +25 (0x00001046)
0x0000102d: E800000000 call +0 (0x00001032)
0x00001032: 5D pop ebp
0x00001033: BA02000000 mov edx, 2
0x00001038: 8B85CE0F0000 mov eax, [ebp + 4046]
0x0000103e: FF5018 call [eax + 24]
StackMap[1] (native_pc=0x1041, dex_pc=0x1, register_mask=0x0, stack_mask=0b)
0x00001041: 83C418 add esp, 24
0x00001044: 5D pop ebp
0x00001045: C3 ret
0x00001046: 64FF15E0020000 call fs:[0x2e0] ; pTestSuspend
StackMap[2] (native_pc=0x104d, dex_pc=0x0, register_mask=0x0, stack_mask=0b)
0x0000104d: EBDE jmp -34 (0x0000102d)
1: int ExampleKt.kotlinLike-WZ4Q5Ns(int) (dex_method_idx=1)
CODE: (code_offset=0x00001060 size_offset=0x0000105c size=67)...
⋮
ART does not have inline calls divideUnsigned
, instead it uses regular method calls. I submitted this issue for tracking.
At last
It's been a long journey, congratulations you have done it (or just turned to the bottom of the article). Let us summarize.
-
ART uses left shift/right shift to rewrite the multiplication/division of the power of two (additional instructions will be added when dealing with negative numbers).
-
There is no significant performance gap between shift right and division by power of two.
-
The Dalvik bytecode size of shift, multiplication and division is the same.
-
No one has optimized unsigned division (at least not yet), but you probably haven't used it either.
With these facts, you can answer the question at the beginning of the article.
On Android, choose to divide by 2 or shift to the right by 1?
neither! Use shift operations only when you actually need bitwise operations, and use multiplication and division for other mathematical operations. I will start to switch the bitwise operation of the AndroidX collection to multiplication and division.
The articles are continuously updated every week. You can search for "Programming Ape Development Center" on WeChat to read and update them at the first time (one or two articles earlier than the blog), and " click on the interview/more information under the official account " to get it directly for free ①Summary of interview questions for Android development posts of first- and second-tier Internet companies (answer analysis) and ②Summary of Android architecture knowledge points pdf+③Super-clear Android advanced mind map.