Recently, I read "In-depth Understanding of Android Kernel Design Thoughts (Second Edition)", which I personally feel is very good and has a lot of content. Now I will extract the content that I personally think is more important in the book, so that I can read it at any time later.

Computer Architecture

Hardware is the cornerstone of software, and all software functions are ultimately realized by hardware. As a discipline, computer architecture is an abstraction of software and hardware.

1.1 Von Neumann structure
insert image description here

1.2 Harvard Architecture
Harvard Architecture (Harvard Architecture) does not appear as the opposite of von Neumann architecture; instead, they all belong to the stored-program type system. The difference is that the instructions and data of the former are not stored in the same memory, that is, the Harvard structure is an improvement and perfection of the von Neumann structure.
insert image description here
Since instruction fetches and data cannot be synchronized, the execution speed of the von Neumann structure is not dominant. The computer using the Harvard structure can pre-read the next instruction while executing the operation due to the separate storage of instructions and data, so its throughput can be improved to a certain extent.
The disadvantage of the Harvard structure is that the structure is complex and requires two memories, so it is usually used in occasions that have special requirements for speed and a relatively high cost budget. Chips using the Harvard structure currently on the market include ARM9 and ARM11 from ARM, and AVR series from ATMEL.
Regardless of the structure, the basic elements they contain remain unchanged, namely:
CPU (central processing unit);
internal memory;
input device;
output device.
Among them, input and output devices are generally referred to as I/O devices (external memory actually also falls into this category). Therefore, in the end, the computer structure can be simplified as:
central processing unit;
internal memory;
I/O equipment.

what is an operating system

The commonality of the operating system:
the operating system has requirements for the hardware device itself.
The same operating system can be installed on different models of machines.
The operating system provides a usable human-computer interaction interface.
The operating system supports users to write and install programs.
insert image description here
The operating system "shoulder" two important tasks.
1. For the lower
management hardware. The hardware here is a general concept, which includes all hardware components in the system such as CPU, memory, Flash, and various I/O devices.
2. Facing the upper layer
On the one hand, the operating system needs to provide users with an available human-computer interaction interface; on the other hand, it is also responsible for providing a convenient, reliable, and efficient API (Application Programming Interface) for the development of third-party programs. In this way, the design and implementation of the upper-layer application does not need to be directly oriented to the hardware, thereby greatly reducing the time for application development.
Definition of operating system: A computer operating system is a collection of software responsible for managing system hardware and providing stable programming interfaces and human-computer interaction interfaces for upper-level applications.

Classic implementation of interprocess communication

Each process in the operating system usually runs in an independent memory space, and there are strict mechanisms to prevent illegal access between processes. However, this does not mean that processes and processes are not allowed to communicate with each other;
broadly speaking, inter-process communication (Inter-process communication, IPC) refers to the communication between several threads running in different processes (whether on the same machine or not). Data exchange, as shown in the figure,
insert image description here
can be seen from this definition:
The processes participating in the communication in IPC can run on the same machine, and they are allowed to exist in their own device environments (RPC). If the processes run across machines, they are usually connected by a network, which undoubtedly increases the difficulty for the realization of inter-process communication.
insert image description here
There are various ways to achieve this. In principle, any data exchange across processes can be called inter-process communication. In addition to the traditional message passing, pipes, etc., some simple methods can also be used to achieve process communication that does not require high performance. For example:
file sharing
For example, two processes agree to use a certain file on the disk as the medium of information exchange. In this case, special attention should be paid to the synchronization problem when different processes access shared files.
The public information mechanism provided by the operating system
, such as the registry on Windows, is accessible to all processes, so it can also be used as a platform for information exchange between processes under certain circumstances.
Although the inter-process communication mechanisms adopted by various operating systems can be said to be varied, the ones to be discussed below are widely used in almost all operating systems because of their advantages such as high efficiency and stability.

1.1 Shared Memory

Shared memory is a commonly used inter-process communication mechanism. Since two processes can directly share and access the same memory area, data copy operations are reduced, so the advantage in speed is more obvious. In general, the steps to implement memory sharing are shown in the figure.
insert image description here
Step1. Create a memory shared area
Process 1 first applies for a shared area from the memory through the API provided by the operating system—for example, it can be realized through the shmget function in the Linux environment. The generated shared memory block will be bound to a specific key (that is, the first parameter of shmget).
Step2. Mapping the memory shared area
After successfully creating the memory shared area, we need to map it to the space of process 1 for further operations. In the Linux environment, this step can be achieved through shmat.
Step3. Access the memory shared area
Process 1 has created the memory shared area, so how does process 2 access it? That's right, use the key in the first step. Specifically, process 2 only needs to pass shmget and pass in the same key value. Then process 2 executes shmat to map this memory into its
space.
Step4. Inter-process communication
After each process of shared memory implements memory mapping, it can use this area for information exchange. Since the memory sharing itself does not have a synchronization mechanism, the processes participating in the communication need to negotiate by themselves.
Step5. Undo the memory mapping area
After the inter-process communication is completed, each process needs to undo the previous mapping operation. In Linux, this step can be achieved through shmdt.
Step6. Delete the memory shared area
Finally, the shared area must be deleted in order to reclaim memory. In the Linux environment, it can be realized through the shctl function.

1.2 Pipeline (Pipe)

Pipeline is also a common inter-process communication method in the operating system, which is applicable to all POSIX systems and Windows series products.
The word Pipe vividly describes the behavior of the two parties in communication, that is, process A and process B.
1. Separate the two sides of the pipeline for data transmission and communication.
2. The pipeline is one-way, which means that if a process wants to "read" and "write", then two pipelines must be established. This is very similar to the characteristics of water pipes, usually the water flow can only go forward or reverse.
3. A pipe has both a "read" end (read end) and a "write" end (write end). For example, process A writes data from write end, then process B can read data from read end.
4. The pipeline has a capacity limit. That is, when the pipe is full, the write operation (write) will be blocked; conversely, the read operation (read) will also be blocked.

1.3 UNIX Domain Socket

UNIX Domain Socket (UDS) is specifically proposed for inter-process communication within a single machine, and is sometimes called IPC Socket.
The Network Socket that everyone is familiar with is based on the TCP/IP protocol stack and requires a series of operations such as subcontracting and reorganization. And because UDS is a "safe and reliable operation" in the machine, the implementation mechanism does not depend on these protocols.
One of the most used IPC mechanisms in Android is Binder, followed by UDS. Relevant information shows that the Android system before version 2.2 used Binder as the basis for inter-process communication in the entire GUI architecture. Later, due to some reasons, I had to abandon it and use UDS, which shows that the latter still has certain advantages.
The typical process of using UDS for inter-process communication is shown in the figure. The
insert image description here
basic process of UDS is the same as that of traditional Socket, but the parameters are different. The following provides an example of UDS, the functions are as follows:
the server listens to IPC requests;
the client initiates an IPC application;
the two parties successfully establish an IPC connection;
the client sends data to the server to prove that the IPC communication is valid.

1.4 RPC（Remote Procedure Calls）

The communication parties involved in RPC usually run on two different machines. In the RPC mechanism, developers do not need to be particularly concerned about how the specific intermediate transmission process is implemented. This "transparency" can greatly reduce the difficulty of research and development, as shown in the figure.
insert image description here

Generally speaking, a complete RPC communication requires the following steps:
1. The client process calls the Stub interface;
2. The Stub is packaged according to the requirements of the operating system and executes the corresponding system call;
3. The kernel completes the communication with the server It is responsible for sending the client data packet to the server-side kernel;
4. The server-side Stub unpacks and calls the process that matches the data packet;
5. The process executes the operation;
6. The server uses the reverse process of the above steps to transfer the result returned to the client.

Classic implementation of synchronization mechanism

Since the operating system supports the concurrent execution of multiple processes (multi-threads), it is inevitable that there will be mutual constraints between them. For example, two processes (threads) need to share the only hardware device, or the same memory area; or like a production pipeline, the
work of a process (thread) depends on the execution result of the other party on the shared resource—in other words, they There is a cooperative relationship.
By definition, if there is a timing relationship between multiple (including two) processes that need to work together to complete a task, it is called synchronization; if they do not meet the conditions for coordination, but only because they share exclusive resources The resulting relationship is called mutual
exclusion.
Several synchronization mechanisms common in operating systems.

Semaphore

The semaphore and PV primitive operations were invented by Dijkstra and are also one of the most widely used mutual exclusion methods. It includes the following elements:
Semaphore S (semaphore);
Operation P (from Dutch proberen, meaning test), sometimes expressed as wait();
Operation V (from Dutch verhogen, meaning increment), sometimes Expressed as signal().
Semaphore S is used to indicate the available number of shared resources. The P primitive can decrement the S count, and the V increments its count. It can be seen from this that when a process wants to enter the shared area, it must first execute the P operation; similarly, when it wants to exit the shared area, it executes the V operation. PV primitives are all atomic operations
(Atomic Operations), which means that their execution is not allowed to be interrupted.
insert image description here
The execution process of the P operation:
the semaphore S is decremented by 1;
if S is still ≥ 0 at this time, it means that the shared resource is allowed to be accessed at this time, so the caller will return directly, and then start operating the shared resource; otherwise, it must wait for
others Actively release resources. In this case, the caller will be added to the waiting queue until it is subsequently awakened;
when someone releases the shared resource, the related (depending on the specific situation) object in the waiting queue will be awakened. At this time The object has access to the resource.
The execution process of the V operation:
the semaphore S is incremented by 1;
at this time, if S>0, it means that there is currently no waiter who wants to access the resource, so return directly;
otherwise, the V operation will wake up the relevant objects in the waiting queue, corresponding to the P operation the last step.

Mutex

Mutex is the abbreviation of Mutual Exclusion, and its interpretation is mutual exclusion. So, what is the difference and connection between it and Semaphore?
According to the general view in the computer field, if a resource allows multiple objects to be accessed at the same time, it is called Counting Semaphore; and for a Semaphore that only allows a value of 0 or 1 (ie locked/unlocked), it is called Binary Semaphore. The latter can be considered to have the same properties as Mutex. In other words, a Mutex is usually shared control over an exclusive resource—either the resource is occupied (locked) or accessible (unlocked). In many operating systems, there is no essential difference between Binary Semaphore and Mutex. The former is a specific Semaphore mechanism, while the latter is simpler to implement than Semaphore.

Monitor

The monitor is actually an extension and improvement of the Semaphore mechanism, and it is a synchronization method with simpler control.
The readability of the program using the Semaphore mechanism is relatively poor, and the management of the semaphore is also scattered in each participating object, which may cause a series of problems, such as deadlock and process starvation. In order to make the mutually exclusive access to resources more conducive to maintenance, scientists have proposed
the concept of a monitor. As follows:
Monitor (Monitor) is an object (object) or module (module) that can be safely accessed by multiple processes/threads.
The methods in the monitor are protected by mutual exclusion, which means that only one visitor is allowed to use them at a time. In addition, the monitor also has the following attributes:
security;
mutual exclusion;
sharing.

Linux Futex

Futex (Fast Userspace muTEXes) is a synchronization mechanism invented by Hubertus Franke and others. It first appeared in Linux version 2.5.7 and became part of the main kernel baseline in 2.6.x. Its core strengths are already reflected in its name, namely "Fast". The "fastness" of Futex is mainly reflected in the fact that it can handle most synchronization scenarios in the application space (it will only enter the kernel space when arbitration is required), which saves a lot of time for system calls and context switching.
An important application scenario of Futex in Android is the ART virtual machine. If the ART_USE_FUTEXES macro is enabled in the Android version, then the synchronization mechanism in the ART virtual will be implemented with Futex as the cornerstone.
For the case where there is no competition, the futex mechanism can be used to complete the lock acquisition in the user state, without entering the kernel state through the system call, thereby improving the efficiency.

synchronous paradigm

After understanding the principles of the above synchronization mechanisms, let's analyze another example, that is, the classic producer-consumer problem. This classic model is applied in many places in the Android system source code, such as the data interaction between AudioTrack and AudioFlinger in the audio subsystem.
() "The producer-consumer problem") is described as follows:
two processes share a buffer of size N - one process is responsible for filling data (producer), while the other process is responsible for reading data into it (consumer).
The core of the problem has two points:
when the buffer is full, the producer is prohibited from continuing to add data until the consumer has read some data;
when the buffer is empty, the consumer should wait for the other party to continue producing before performing operations.
insert image description here
If you use semaphores to solve this problem, you need to use 3 Semaphore. The functions are as follows:
S_emptyCount: used for the producer to obtain the available buffer space size, the initial value is N.
S_fillCount: Used for consumers to obtain the available data size, the initial value is 0.
S_mutex: used to operate the buffer, the initial value is 1.
For the producer, the execution steps are as follows:
loop start;
Produce_item;
P(S_emptyCount);
P(S_mutex);
Put_item_to_buffer;
V(S_mutex);
V(S_fillCount);
continue loop.
For consumers, the execution steps are as follows:
loop start;
P(S_fillCount);
P(S_mutex);
Read_item_from_buffer;
V(S_mutex);
V(S_emptyCount);
Consume;
continue looping.
At the beginning, the value of S_emptyCount is N, and the value of S_fillCount is 0, so the consumer is in a waiting state after P(S_fillCount). The producer first produces a product, and then obtains S_emptyCount—because it is N, it is in an accessible state, so the product
can be put into the buffer. After that, the producer increases the count of available products through V (S_fillCount), and wakes up the consumers who are waiting on this Semaphore. So the consumer starts to read the data, and uses V (S_emptyCount) to indicate that the buffer has another empty position.
Producers and consumers are repeated in this way to complete the entire work, so that we can ensure the orderly execution of producers and consumers through the Semaphore mechanism.

Synchronization mechanism in Android

The synchronization classes encapsulated by Android include:
The Mutex
header file is frameworks/native/include/utils/Mutex.h.
The Mutex in Android is just a simple repackage of the API provided by pthread, so the function declaration and implementation body are placed in the same header file, which also facilitates the operation of the caller.
In addition, Mutex also contains a nested class of AutoLock, which is an auxiliary class designed to take advantage of the characteristics of the variable life cycle.
The Condition
header file is frameworks/native/include/utils/Condition.h
Condition is the implementation class of "condition variable" in the Android system, and it depends on Mutex to complete.
The header file of Barrier
is frameworks/native/services/surfaceflinger/Barrier.h
Barrier is a model implemented based on both Mutex and Condition.

Synchronization between processes - Mutex

class Mutex {
    
    
public:
 enum {
    
    
 PRIVATE = 0,//只限同一进程间的同步
 SHARED = 1//支持跨进程间的同步
 };

This shows that Mutex can not only handle intra-process synchronization, but also perfectly solve the problem of inter-process synchronization.
If you specify its type as SHARED when Mutex is constructed, it means that it is suitable for cross-process sharing. At this time, Mutex will further call the pthread_mutexattr_setpshared interface to set the PTHREAD_PROCESS_SHARED attribute for this mutex.
Unlike Semaphore, Mutex has only two states, namely 0 and 1. So this class only provides 3 important interface functions:

status_t lock(); //获取资源锁
 void unlock(); //释放资源锁
 status_t tryLock(); /*不论成功与否都会及时返回，而不是等待*/

When the caller wishes to access a critical resource, it must first obtain a resource lock through lock(). If the resource is available at this time, this function will return immediately; otherwise, it will enter a blocking wait until someone releases the resource lock and wakes it up. To release the resource lock, call unlock(), and other objects waiting to use the lock will be woken up, and then continue to perform its tasks. In addition, Mutex also specially provides a tryLock() to meet the diverse needs of the program. This function will only "tentatively" query whether the resource lock is available - if the answer is yes, acquire it, and then return successfully (return value 0), from this point of view it behaves the same as lock(); But in the case that the resource is temporarily unavailable, it will not enter the wait, but also return immediately, but the return value is not 0.
The implementation of these three functions is very simple, the specific source code is as follows:

inline status_t Mutex::lock() {
    
    
 return -pthread_mutex_lock(&mMutex);//变量mMutex的类型是
pthread_mutex_t
}
inline void Mutex::unlock() {
    
    
 pthread_mutex_unlock(&mMutex);
}
inline status_t Mutex::tryLock() {
    
    
 return -pthread_mutex_trylock(&mMutex);
}

Mutex is actually just a repackage based on the pthread interface.

Condition judgment - Condition

Condition literally means "condition". In other words, its core idea is to judge "whether the condition has been met" - if it is met, it will return immediately and continue to perform unfinished actions; otherwise, it will go to sleep and wait until someone wakes it up when the condition is met.
Can this situation be achieved with Mutex? In theory, it is indeed possible. As an example, suppose two threads A and B share a global variable vari and their behavior is as follows.
Thread A: Constantly modify vari, and the value after each change is unknown.
Thread B: When vari is 0, it needs to do some action.
Obviously, both A and B want to access the shared resource of vari, which belongs to the problem domain of Mutex. But the details that need to be discussed are: thread A's "attempt" is only to obtain access to vari; while thread B is "drunkard", and its real waiting condition is "vari is equal to 0".
Then if you use Mutex to complete it, thread B can only judge whether the condition is satisfied by continuously reading vari, which is somewhat similar to the following pseudo code

while(1)//死循环，直到条件满足时退出
{
    
    
 acquire_mutex_lock(vari);//获取vari的Mutex锁
 if(0 == vari) //条件满足
 {
    
    
 release_mutex_lock(vari);//释放锁
 break;
 }
 else
 {
    
    
 release_mutex_lock(vari);//释放锁
 sleep();//休眠一段时间
 } 
}

For thread B, it is unknown when the condition is met (vari==0), which is very different from other threads that only use vari (such as thread A), so polling is obviously a huge waste CPU time
Give another example in life to deepen everyone's understanding. For example, if there is a public toilet, it is assumed that only one person can use it at a time. Now divide the people who want to use this resource into two categories: one is of course users who use the toilet normally (similar to thread A); the other is the staff who change toilet paper (similar to thread B). What happens if we use Mutex to solve this resource synchronization and sharing problem
?
First of all, it will not have much impact on users, they still queue up, use, and return normally.
But it's a bit of a hassle for the staff. Under the Mutex mechanism, workers also need to queue up normally like everyone else. Only when he is in the queue can he go in and see if the toilet paper is used up - replace it when it is used up, otherwise he will just exit without doing anything, and then continue to wait in line, and so on. Suppose it takes 5 minutes to line up once, and the probability that the staff needs to change the toilet paper after entering is only 1/10. Then it is conceivable that the efficiency of this staff member is quite low, because his time is wasted waiting for "the toilet paper is empty".
So, we need to find another model to solve this special synchronization scenario. One of the feasible methods is that the staff does not need to queue, but someone else notifies him of the event that the toilet is out of paper. This not only reduces the number of queuing personnel, improves the utilization rate of resources, but also improves the efficiency of the staff, which can be said to kill two birds with one stone.

class Condition {
    
    
public:
 enum {
    
     //和Mutex一样，它支持跨进程共享
 PRIVATE = 0,
 SHARED = 1
 };
 …
 status_t wait(Mutex& mutex); //在某个条件上等待
 status_t waitRelative(Mutex& mutex, nsecs_t reltime); /*也是在某个条
件上等待，增加了超
 时退出功能*/
 void signal(); //条件满足时通知相应等待者
 void broadcast(); //条件满足时通知所有等待者
private:
#if defined(HAVE_PTHREADS)
 pthread_cond_t mCond;
#else
 void* mState;
#endif
};

"Fence, Barrier" - Barrier

Condition means "condition", and Barrier means "fence, obstacle". The latter is an application of the former, that is, the Barrier is a Condition filled with "specific conditions", which provides a good example for us to understand Condition:

class Barrier
{
    
    
public:
 inline Barrier() : state(CLOSED) {
    
     }
 inline ～Barrier() {
    
     }
 void open() {
    
    
 Mutex::Autolock _l(lock);
 state = OPENED;
 cv.broadcast();
 }
 void close() {
    
    
 Mutex::Autolock _l(lock);
 state = CLOSED;
 }
 void wait() const {
    
    
 Mutex::Autolock _l(lock);
 while (state == CLOSED) {
    
    
 cv.wait(lock);
 }
 }
private:
 enum {
    
     OPENED, CLOSED };
 mutable Mutex lock;
 mutable Condition cv;
 volatile int state;
};

Barrier provides a total of 3 interface functions, namely wait(), open() and close(). . Since it is said to be an instance of Condition, what is the "condition"? After a little observation, you will find that the variable state==OPENED; the other state is of course CLOSED-this is a bit similar to the opening and closing of a car fence. Before the car can pass, it must first confirm that the barrier is open, so it calls wait(). If the conditions are not met, then the car can only stop and wait. This function first acquires a Mutex lock, and then calls the Condition object cv, why? We know that Mutex is used to ensure the mutual exclusion of shared resources, which means that the next operation in wait() involves access to a certain mutual exclusion resource, that is, the state variable. It can be imagined that if there is no lock for state access, is it possible to cause problems when wait and open/close operate it at the same time?
Suppose there are the following steps:
Step 1. Thread A obtains the state value through wait() and finds that it is CLOSED.
Step 2. Thread B obtains the state value through open(), and changes it to OPENED.
Step 3. open() wakes up the waiting thread. Because thread A has not yet gone to sleep at this time, there is actually no thread that needs to be woken up.
Step4. In addition, thread A enters waiting because state == CLOSED, but at this time the barrier has actually been opened, which will cause the thread where the caller of wait() is not to be woken up.
This makes it very clear that access to state must be protected by a mutex.
Next, we analyze the implementation of Condition::wait():

inline status_t Condition::wait(Mutex& mutex) {
    
    
 return -pthread_cond_wait(&mCond, &mutex.mMutex);
}

Like Mutex, the API method provided by pthread is directly called.
The logical semantics of pthread_cond_wait are as follows.
Step1. Release the lock mutex.
Step2. Enter sleep and wait.
Step3. Acquire the mutex lock after waking up.
That is to say, after going through the process of releasing first and then acquiring the lock, why is it designed like this?
Since wait is about to enter dormancy and wait, if it does not release the Mutex lock first at this time, how can open()/close() access the "condition variable" state? This will undoubtedly cause the program to fall into a deadlock state waiting for each other. So it needs to release the lock first, and then go to sleep. Later, because the open() operation will release the lock, it will give wait() the opportunity to obtain the Mutex lock again.

Automatic operation of unlocking - Autolock

There is also an Autolock nested class inside the Mutex class. Literally, it should be to realize the
automatic operation of adding and unlocking, so how to realize it?
In fact, it is very simple, just look at the construction and destructor of this class and you will understand:

class Autolock {
    
    
 public:
 inline Autolock(Mutex& mutex) : mLock(mutex) {
    
     mLock.lock(); }
 inline Autolock(Mutex* mutex) : mLock(*mutex) {
    
     mLock.lock(); }
 inline ～Autolock() {
    
     mLock.unlock(); }
 private:
 Mutex& mLock;
 };

When Autolock is constructed, it will actively call the lock() method of the internal member variable mLock to acquire a lock. The situation is just the opposite when destructing, calling its unlock() method to release the lock

Read-write lock - ReaderWriterMutex

Focus on analyzing a special mutex in the Art virtual machine, namely ReaderWriterMutex. Literally understood, it stands for "read-write lock". Compared with ordinary mutex, it mainly provides the following different interfaces:

void ExclusiveLock(Thread* self) ACQUIRE();
void ExclusiveUnlock(Thread* self) RELEASE();
bool ExclusiveLockWithTimeout(Thread* self, int64_t ms, int32_t ns)
 EXCLUSIVE_TRYLOCK_FUNCTION(true);
void SharedLock(Thread* self) ACQUIRE_SHARED() ALWAYS_INLINE;
void SharedUnlock(Thread* self) RELEASE_SHARED() ALWAYS_INLINE;

Among them, Exclusive and Shared respectively represent Write and Read permissions, which also well explain the characteristics of this lock that allows multiple objects to share the Read lock, but at the same time only one object can have the Write lock. ReaderWriterMutex can have the following 3 states.
Free: Not yet held by any object.
Exclusive: It is currently held by only one object.
Shared: When held by multiple objects.
The allowed operations and operation results between each state are shown in the table.
insert image description here

Basics of operating system memory management

Memory Management (Memory Management) aims to provide a stable and reliable memory allocation, release and protection mechanism for all Tasks in the system.
Memory needs to understand several cores:
virtual memory;
memory allocation and recovery;
memory protection.

Virtual Memory

The emergence of virtual memory provides the possibility for the operation of large-volume programs. Its basic idea is.
Part of the space of the external storage is used as an expansion of the memory, such as dividing 4GB from the hard disk.
When memory resources are insufficient, the system will automatically select data blocks with low priority according to a certain algorithm and store them in the hard disk.
If these data blocks in the hard disk need to be used later, the system will generate a "page fault" command, and then swap them back into the memory.
These operations are automatically completed by the operating system kernel and are "completely transparent" to the upper application.

Memory allocation and recovery

For applications, the allocation and recovery of memory is their primary concern. In other words, this is the point of direct interaction between the program developer and the memory management module of the operating system.
insert image description here
Since it is an important part of the operating system, the memory management module also follows the definition of the operating system, which is the "superstructure" that controls and uses the hardware. (Internal memory) provides effective interface methods. The core issues faced by Linux Kernel include but not limited to:
1. Ensure hardware independence
The physical memory model, size and even architecture (such as different architectures) of each hardware platform may be different. This difference must not be reflected in the application program, and the operating system should be as "transparent" upward as possible.
2.
There are many issues to be considered in dynamic memory allocation and recycling, such as how to divide memory into different usage areas; the granularity of allocation, that is, the smallest unit of allocation; how to manage and distinguish between used and unused memory; how to recycle and reuse, etc. .
3. Memory fragmentation
Like disk management, memory also suffers from fragmentation. Take a simple example, as shown in the figure.
insert image description here
In this example, assume that the initial state has 6 unused memory cells. As the program continues to apply for use, the first 5 blocks have been successfully allocated. At this time, if a program releases the second unit, there will be two discontinuous unused memory units in the end—fragmentation
In
the Android system, memory allocation and recovery are divided into two directions
Native layer
local Layer programs are basically written in C/C++, and memory functions directly related to developers include malloc/free, new/delete, etc.
Java Layer
Most Android applications are written in the Java language. Compared with C, Java has made a lot of efforts in memory management, which can help developers get rid of various problems of memory to a certain extent. However, Java itself is not omnipotent, and developers need to
Maintain a good memory usage specification during the process, and have a deeper understanding of the memory management mechanism provided by Android.

Inter-process communication - mmap

Before the upper layer application uses the Binder driver, it must provide an environment for its normal work through mmap().
As its name suggests (Memory Map), mmap can map a device or file into the memory space of the application process, so that accessing this memory is equivalent to reading and writing the device/file without having to go through read( ), write(). It can be seen that theoretically, mmap can also be used for inter-process communication, that is, to share memory by mapping the same piece of physical memory. This method can improve the efficiency of inter-process communication to a certain extent because it reduces the number of data copies.
insert image description here

Copy on Write

COW (Copy on Write) is a very key technology in Linux. Its basic idea can be summed up in one sentence, that is, multiple objects share certain resources (such as code segments and data segments) at the beginning, and do not have their own copy until an object needs to modify the resource.
When we call the fork() function to generate a child process, the kernel does not immediately allocate its own physical memory to the "separate" child, but still shares the inherent resources of the parent process
. In this way, the process of "separation" is very fast - in theory, only one "portal" needs to be registered. And if the new process is "not very satisfied" with the existing resources and
wants to make some modifications by itself, then it is only at this time that it needs to provide it with its own "space to display". Especially if the child process calls exec soon after fork() (the probability is
high ) to load an image that is very different from the parent process, then the existence of COW can obviously avoid unnecessary resource operations to a large extent, thus Improved running speed.

Low Memory Killer in Android

A common feature of embedded devices is relatively limited memory capacity. When the number of running programs exceeds a certain number, or involves complex calculations, insufficient memory is likely to occur, which will cause the system to freeze. The Android system is no exception, it also faces the dilemma of shortage of physical memory of the device. In addition, careful developers should have noticed that for an Android program that has already been started once, the time it takes to start it again is significantly reduced. The reason is that the Android system does not immediately clean up those programs that have "faded out of view" (such as calling Activity.finish to exit the UI interface). In other words, they still reside in memory for a certain amount of time (although the user is no longer aware of their existence). The advantage of this is obvious, that is, the next startup does not need to re-create a process for the program; the disadvantage also exists, that is, the probability of memory OOM (Out Of Memory) is increased.
So, how should we grasp the balance point?
Developers who are familiar with Linux should know that the underlying kernel has its own memory monitoring mechanism, OOMKiller. Once it is found that the available memory of the system reaches a critical value, the OOM manager will automatically jump out to "clean up the mess". Depending on the strategy, the OOM processing methods are slightly different. However, its core idea is always:
according to the order of priority, gradually kill the process from low to high, and reclaim memory.
On the one hand, the priority setting strategy should consider the degree of damage to the system (such as the core process of the system, the priority is usually higher), on the other hand, it is also hoped to release as much useless memory as possible. According to experience, a reasonable strategy should at least combine the following factors:
memory consumed by the process;
CPU time occupied by the process;
oom_adj (OOM weight).

Android anonymous shared memory (Anonymous SharedMemory)

Anonymous Shared Memory (Ashmem for short) is a unique memory sharing mechanism of Android, which can map the specified physical memory to the virtual address space of each process, so as to conveniently realize memory sharing between processes.

JNI

JNI (Java NativeInterface) is a programming framework that allows Java programs running on the JVM to call (and vice versa) native code (usually JNI-oriented native code is written in C, C++, and assembly language). Native code is usually associated with the hardware or operating system, which will destroy the portability of Java itself to a certain extent. However, sometimes this method is necessary. For example, in the Android system, a large number of JNI methods are used to call the implementation library of the local layer.
There are usually the following three situations that require the use of JNI
1. The application needs the support of some platform-related features, which Java cannot meet.
2. Compatible with previous code bases written in other languages. The use of JNI technology allows the Java layer code to access these old libraries to achieve a certain degree of code reuse.
3. Certain key operations of the application require higher speed. This part of the code can be written in a low-level language such as assembly, and then provide an access interface to the Java layer through JNI.
Local implementation of Java functions
The steps to create a local function that can be called by Java code are as follows:
1. Add the native declaration to the Java method that needs local implementation;
2. Use the javac command to compile the Java class;
3. Use javah to generate the .h header
4. Implement the native method in the local code; 5.
Compile the above-mentioned native method to generate a dynamic link library;
6. Load this dynamic link library in the Java class;
7. Other places in the Java code can call this normally A native method.

Reflection mechanism in Java

The most common way to create a Class class in Java is as follows:
FileOutputStream fout = new FileOutputStream(fd);
This situation means that we can determine the Class type at compile time. At this time, the compiler can do a lot of optimization work on the new keyword, so the operating efficiency in this scenario is usually the best, and it is the preferred method for developers.
For those cases where the Class class cannot be determined at the compilation stage, we hope that there is a technology that can dynamically create an object during the running of the program-this is the reflection mechanism.
The reflection mechanism can give the program the ability to check and correct runtime behavior. Let's take the dynamic creation of the Class class as an example to briefly analyze its internal implementation
principle

Class<?> clazz = null;
 try {
    
    
 clazz =
Class.forName("android.media.MediaMetadataRetriever");
 instance = clazz.newInstance();
 Method method = clazz.getMethod("setDataSource",
String.class);
 method.invoke(instance, filePath);

Class.forName is called to the local layer through the native function classForName. The corresponding specific functions in the Android N version are as follows:

/*art/runtime/native/java_lang_Class.cc*/
static jclass Class_classForName(JNIEnv* env, jclass, jstring javaName, jboolean initialize, jobject javaLoader) {
    
    
 ScopedFastNativeObjectAccess soa(env);
 ScopedUtfChars name(env, javaName);
 …
 Handle<mirror::ClassLoader>
class_loader(hs.NewHandle(soa.Decode<mirror::ClassLoader *>
(javaLoader)));
 ClassLinker* class_linker = Runtime::Current()-
>GetClassLinker();
 Handle<mirror::Class> c(
 hs.NewHandle(class_linker->FindClass(soa.Self(),
descriptor.c_str(), 
class_loader)));
 …
 return soa.AddLocalReference<jclass>(c.Get());
}

It can be seen that forName finally finds the target class object through ClassLinker::FindClass. The other interfaces provided by the reflection mechanism are also similar - only through the unified management of the virtual machine, it is possible to provide the program with flexible and diverse dynamic capabilities, while ensuring the normal operation of the program.

Android Framework-Basic operating system