Analysis of APK from the perspective of Android system (1) - Static analysis of APK files

statement

  • From the perspective of Android mobile phone users, installing various APPs is basically a three-way process of "search->download->install" from the application market. For the Android system, this is a big project, because the APK is an "outsider" to the Android system, so problems such as how to install it, support its operation with restrictions, and how to prevent it from doing bad things come.
  • The reason for writing this column is to enable customers to quickly install large .
  • This column will analyze the entire life cycle of APK (installation-run-uninstallation) from the perspective of the Android system, including static analysis of APK, PackageManagerService, pm command, PackageInstaller, Installd.
  • This code is based on LineageOS 14.1 (Android 7.1.1). It refers to some blogs and books. It is not convenient to list them one by one. It is only for learning and knowledge sharing.

1 APK file structure

  Android applications are distributed and installed in APK file format. An APK file contains both the application's code and resources, including the application's manifest file. They can also contain a code signing file. The APK file format is an extended format of JavaJAR, and of course it is also an extended format of the popular ZIP file format, which can be decompressed using a ZIP format compression tool. The following is the decompressed content of a typical APK file:

pedro@x86:$ tree -L 2
.
├── AndroidManifest.xml										APP属性定义文件
├── classes2.dex											Java源码编译后的代码文件
├── classes.dex
├── asserts													声音、字体、网页....资源
├── lib														应用中调用到的库
│   ├── armeabi
│   ├── arm64-v8a
├── META-INF												APK的签名文件(*.RSA、*.SF、*.MF 文件)
│   ├── androidx.activity_activity.version
│   ├── ...省略...
│   ├── androidx.viewpager2_viewpager2.version
│   ├── androidx.viewpager_viewpager.version
│   ├── CERT.RSA
│   ├── CERT.SF
│   ├── com
│   └── MANIFEST.MF
├── res														APP中使用到的资源目录
│   ├── anim												动画资源
│   ├── animator
│   ├── animator-v21
│   ├── anim-v21
│   ├── color												颜色资源
│   ├── color-v23
│   ├── drawable											可绘制的图片资源
│   ├── drawable-hdpi-v4
│   ├── drawable-ldrtl-hdpi-v17
│   ├── drawable-ldrtl-mdpi-v17
│   ├── drawable-ldrtl-xhdpi-v17
│   ├── drawable-ldrtl-xxhdpi-v17
│   ├── drawable-ldrtl-xxxhdpi-v17
│   ├── drawable-mdpi-v4
│   ├── drawable-v21
│   ├── drawable-v23
│   ├── drawable-v24
│   ├── drawable-watch-v20
│   ├── drawable-xhdpi-v4
│   ├── drawable-xxhdpi-v4
│   ├── drawable-xxxhdpi-v4
│   ├── interpolator
│   ├── interpolator-v21
│   ├── layout												页面布局文件
│   ├── layout-land
│   ├── layout-sw600dp-v13
│   ├── layout-v21
│   ├── layout-v26
│   ├── layout-watch-v20
│   ├── mipmap-anydpi-v26
│   ├── mipmap-hdpi-v4
│   ├── mipmap-mdpi-v4
│   ├── mipmap-xhdpi-v4
│   ├── mipmap-xxhdpi-v4
│   ├── mipmap-xxxhdpi-v4
│   └── xml													应用属性配置文件
└── resources.arsc											编译后的资源文件,如 strings.xml

  The AndroidManifest.xml file in the APK registers all the important information of the APK (four components, package name, META, etc.). Analyzing AndroidManifest.xml will help us understand the architecture of the entire application. Several important files after APK decompression, classes.dex file, AndroidManifest.xml file, so file (c/c++ code file, there will be this file if JNI exists), resources.arsc (resource file).

2 Types of APPs

  The Java language is used for Android application development. If NDK is also used, C/C++ is required for JNI calls. Currently, there are many languages ​​for Android application development on the market. For example: PhoneaGap provides HTML5+JavaScript for development, Cocos2dX also provides C/C++ language for cross-platform development, etc.

At present, mainstream application development methods are roughly divided into three categories (as shown in the figure below):

  • Web App (Web App)
  • Hybrid App
  • Native App (native App).
    insert image description here

2.1 Native APP

  The Google-based SDK is developed using Java language or C/C++ language, called Native App. Native App refers to a native program, which generally relies on the operating system and has strong interaction. It is a complete App with strong scalability and requires users to download, install and use it.

  • Advantages: create a perfect user experience; stable performance; fast operation and smooth use; access to local resources (address book, photo album); design excellent motion effects, transitions; have intimate notifications or reminders at the system level; high user retention rate .
  • Disadvantages: high distribution costs (different platforms have different development languages ​​and interface adaptations); high maintenance costs (for example, an app has been updated to version V4, but users are still using versions V2 and V3, requiring more developers Maintain the previous version); the update is slow, and according to different platforms, different processes such as submission-review-online, etc., the process that needs to go through is more complicated.

2.2 Web APP

  Apps written in Html5 language do not need to be downloaded and installed. Similar to the so-called light applications, they are applications that exist in the browser and can be understood as touch-screen web applications. Of course, there are also some applications that are uploaded to the market in the form of APK files packaged by the Android SDK. The main development method is to use HTML5+JavaScript, using WebView as the local interface for development, and its architecture is shown in the figure below.

  • Advantages: low development cost; fast update; no need to notify users of updates, no need for manual upgrade; able to cross multiple platforms and terminals. Because the development and use costs are relatively low, many giant companies have launched their own WebApp construction platforms, also known as light applications.
  • Disadvantages: Temporary population, unable to obtain system-level notifications, reminders, animation effects, etc. Low user retention rate, many design restrictions, and poor experience.
    insert image description here

2.3 Hybrid APP

  A half-native and half-Web hybrid App that needs to be downloaded and installed. It looks similar to a Native App, but only has a small UI WebView, and the content accessed is the Web. The common ones are news apps and video apps, which generally use Native framework and Web content.
  Hybrid apps also include apps developed in other languages ​​(C#, JavaScript). The principle of development using a third-party language is basically to compile it into a so file, and then call its logic through JNI.

3 Android Code Signing

  Why does Android need code signing? The reason is usually: for integrity and reliability . Before executing any third-party program, you want to be sure that the program has not been tampered with (integrity) and that the program was indeed created by who it claims to be (authenticity). The build/ directory of AOSP contains an Android-specific tool called signapk.
  The Android code signing mechanism is based on the Java JAR signing mechanism, so like many code signing schemes, it uses public key encryption and X.509 certificates. In fact, code signing certificates must be issued by a trusted platform CA. Although there are many CAs that issue certificates, it is still difficult to obtain a single certificate that is trusted by all target devices. Android solves this problem very simply: it doesn't care what the certificate is about or who issued it. Therefore, your certificate does not need to be issued by a CA. In fact, all code signing used in Android is self-signed.
  The signature verification process not only verifies whether the code has been tampered with, but also verifies that the signature is indeed generated by the expected key, but there is a problem that code signing cannot directly solve the problem of whether the code signer (software publisher) can be trusted . The usual way to establish a trust relationship is to require the code signer to hold a digital certificate and attach it to the signed code. The verifier's decision to trust can be based on a trust model (such as PKI or trust network), or it can be treated on a case-by-case basis.
  Another problem with code signing is that it doesn't address whether the code being signed is sure to be safe .

4 DEX files

  Dex files are executable files on Android, which are compiled by the Java virtual machine JVM and then compiled by the virtual machine Dalvik in Android.
insert image description here

5 so documents

  The bottom layer of the Android platform is the Linux system, and the Linux system originally uses the C/C++ language. It's just that Android uses a Dalvik/ART virtual machine on it so that the development of applications uses the Java language. The virtual machine Dalvik/ART itself supports JNI programming.
  NDK (Native Development Kit) is a tool library proposed for "Java+C" development, which ensures program compatibility, debugging and the convenience of calling API when developing in C/C++. Using NDK for application development can also avoid some shortcomings in Java:

Purpose illustrate
code protection The Java code in the APK is easy to be decompiled, read and tampered with, but it is more difficult to disassemble the so library developed by C/C++
Can easily use existing open source libraries Most existing open source libraries (OpenCV, OpenGL, etc.) are written in C/C++ code
Improve program execution efficiency Use C to develop the application logic that requires high performance, thereby improving the execution efficiency of the application
Easy to transplant The so library written in C/C++ can be easily reused on other embedded platforms

6 APK Reverse Engineering

  The reverse cracking of APK is also a university question. My work does not involve this, so I will not study it.

Guess you like

Origin blog.csdn.net/Xiaoma_Pedro/article/details/131150343