traced_perf of Perfetto toolset

1. Overview of Perf tools

Linux contains many performance analysis tools, perf (specifically linux-tools perf) tool is a tool introduced in 2009 in linux kernel 2.6.31. Its main function is to track hardware performance counter (PMU), tracepoints, software performance counter (hrtimer), dynamic probes and other information. The Linux kernel encapsulates this information and provides it in the form of syscall (perf_event_open, etc.), abstracting it into the concept of events, which can be used by userspace processes. As a command-line tool under Linux, perf can read these events, and combined with performance analysis scenarios, it provides sub-tool commands such as stats, top, record, report, etc., to meet more detailed analysis requirements.

Android generally uses not the old-fashioned linux-tools perf tool, but the perf tool customized by Android to support some features expanded in Android.

  • simpleperf:

Android was first introduced in Android 6.0 (2015), and it has a history of 7 years (2022). Its main function is to realize the basic functions of the perf tool in Linux.

  • traced_perf:

Google started development in 2019[started development in 2019][modified a bit], it was developed as a consumer of perfetto rather than a separate project. It was developed to be able to:

a. Use perfetto's mature platform to provide capabilities in profiling, unwinding, UI, etc.

b. With the increasingly strict Android permission control and MAC requirements, the original Simpleperf's independent selinux domain to complete all functions can no longer meet the requirements of sandbox, and strict domain isolation is required

This article focuses on traced_perf.

2. The structure of traced_perf

2.1. Code structure

The code of traced_perf is located in the external/perfetto/src/profiling/perf/ directory of AOSP. It can be seen that the code of traced_perf is actually a subdirectory of the perfetto project.

The code in this directory is as follows:

c692aac8dad9f5280059e1be7aa41ef8.png

It can be seen that the code is divided into three categories:

  • Compilation script related: BUILD.gn

  • Unit test related: X_unittest.cc

  • main code logic

In addition to the above code directory, there is also a file in the main directory of perfetto: external/perfetto/traced_perf.rc

This file is the startup script for the traced_perf executable.

2.2 Runtime structure

According to the compilation script of external/perfetto/Android.bp, it can be seen that traced_perf will eventually be compiled into an executable file and installed to /system/bin/traced_perf. This executable file exists in the form of a daemon, and its startup and termination are controlled by its corresponding rc startup script.

2c463cf967b40c94849bc51206cc37d3.png

38649ef06ee1b8b54be7001e51c7b385.png

Its runtime life cycle can be analyzed through the traced_perf.rc file:

7f6a01d7e6dd0428bb368c3fbca7a0ae.png

88bde971d3e3a42a108c8f84e5a4bab9.png

  • Permission configuration of traced_perf

▫The user of traced_perf is set to nobody, which can ensure that the permissions will not affect other users, and avoid getting privileges after being maliciously cracked

▫The traced_perf group includes nobody, readproc, and readtracefs. readproc is to give it the permission to read the /proc/PID directory, and readtracefs is to give it the permission to read the directory of tracefs mount. These two permissions are The permissions necessary for traced_perf to function properly.

▫traced_perf endows corresponding capabilities, respectively KILL, DAC_READ_SEARCH. KILL is to enable traced_perf to send signals to other processes. DAC_READ_SEARCH is to make it possible to at least get the permissions of some files without even being able to detect the existence of some files. Both permissions are required for traced_perf to function properly.

▫task_profiles is to set traced_perf to a type of cgroup with high capacity (unwinding), so that the scheduler can give it more reasonable resource allocation.

  • Resources for traced_perf

traced_perf has applied for a unix_socket named traced_perf. This unix_socket is the communication channel between traced_perf and the process to be profiling, which will be covered in subsequent chapters.

  • Life cycle control of traced_perf

The life cycle control of traced_perf is done through property trigger. When persist.traced_perf.enable is set to true, traced_perf will be started automatically. At the same time, it will also be controlled by sys.init.perf_lsm_hooks and traced.lazy.traced_perf.

3. The architecture of traced_perf

3.1 framework of perfetto

traced_perf is an integral part of the perfetto toolset, which follows perfetto's service model. The service model of perfetto is shown in the figure below:

7f064e73b6d31678d129d614fbebed45.png

3.1.1producer

traced_perf is the producer of the Tracing service, and its interaction with the tracing service consists of two channels, namely the IPC channel and shared_memory. The IPC channel is a unix socket, which will be described in detail later.

shared_memory refers to the shared memory channel established with the tracing service. This shared memory channel has two functions:

1. Efficient inter-process data transfer, where the transfer is mainly structured sampling point data.

2. Isolate from the control flow to avoid security and privacy risks caused by malicious cracking.

traced_perf itself acts as the producer side and provides Data Source, and each producer can provide multiple Data Sources. The Data Source provided by traced_perf itself includes linux.perf and a metadata Data Source. We will describe this Data Source in detail later.

3.1.2Tracing Service

As the core service of perfetto on the mobile phone, Tracing Service assumes the role of master control. Tracing Service is mainly represented as a traced process on the mobile phone side. On the one hand, it receives the control of the configuration file of the consumer, and on the other hand, converts the configuration file into the control of the Producer; at the same time, it also serves as a bridge between the producer and the consumer. The data channel between the producer and the consumer uses trace buffer memory, which is not shared between processes, so that data isolation can be maintained.

cb2aa73a01b01f3030f3cbdba7a9faa3.png

3.1.3Consumer

The consumer side refers to the consumption side of perfetto trace data, such as perfetto ui, shell command, traceur, etc. The consumer side can also be customized, adding a customized consumer in Android to customize the Data Source deal with.

The IPC channel between the consumer side and the Tracing Service is mainly connected through a unix socket.

3.2Interaction between traced_perf and perfetto

3.2.1 Overall process

The schematic diagram of the overall process is as follows, and readers can understand it according to this process. The interaction process involves the implementation of many classes inside Perfetto. It is recommended that readers first understand the declaration and function implementation of the C++ classes involved, instead of falling into the tracking of the calling process at the beginning, and avoid falling into the complex logic of multi-layer nesting middle. After the functions and external relations of several key classes are clarified, the calling process is tracked in turn through the calling relationship.

626ef7fb28ecef689b14306c56dc4ca9.png

  • PerfProducer:

Call ConnectService to establish a connection process

Implement the OnConnect process

Implement OnTracingSetup, StartDataSource and other functions

  • ProducerEndPoint

Create the necessary objects to establish inter-process communication

Implement OnConnect

  • ClientImpl

Establish a Socket connection

Implement onConnect, onDataAvaliable, etc.

It can be seen from the above process splitting that the responsibilities of each class are very clear.

3.2.2 IPC channel establishment

For traced_perf, the establishment of an IPC channel consists of the following important processes:

1. Instantiate task_runner and AndroidRemoteDescriptorGetter. task_runner is an instance of the Looper tool class used in traced_perf, and AndroidRemoteDescriptorGetter is a class created by traced_perf to obtain the private process data of the application that wants to trace. There are related descriptions in subsequent chapters.

2. Establish a connection with Tracing Service

3. Start the message loop

16bb8f7117381b8322654a6174b8b8a5.png

3.2.3 IPC Channel Framework

The framework of the IPC channel is relatively complicated. This section analyzes a principle.

  • TaskRunner: It is a Looper interface, and the instance used by PerfProducer is the TaskRunner instantiated based on unix domain socket. This task_runner_ is passed between various structures and undertakes the distribution and processing of various messages.

  • ProducerEndPoint: It is the interface class of the producer side of the Tracing service, which is instantiated through ProducerIPCClientImpl.

In order to be able to register the PerfProducer class as the producer of the Tracing Service, the following operations need to be performed:

08d4adab88d6d6f4de8c8e14e87dffd5.png

Among them, ProducerIPCClient::Connect is a static method that instantiates ProducerIPCClientImpl and returns it in the form of unique_ptr.

902590daffe1e9a6797f6c534514da7a.png

d473bd0e01154954001a8c41e28c8d1d.png

After the above process is completed, the event processing process of PerfProducer is actually established.

Focusing on ConnectService, the second parameter in ProducerIPCClient::Connect is the this pointer, which actually passes the PerfProducer object pointer to the ProducerEndpoint object, which is passed through the third parameter producer in the ProducerIPCClientImpl constructor.

3.2.3.1 Related concepts

  • DataSource

As the name suggests, this is the meaning of the data source. According to Perfetto's framework diagram, the consumer side needs to indicate which "data source" to collect data from, and the Producer can provide the data source. The definition of data source in perfetto is stipulated in the form of proto. In PerfProducer, it abstracts the definition of data source and describes it through DataSourceState.

dfc863f18ba08daaf27f48e89cb9e63e.png

5db27af76dcc189185ed7dbe5591f887.png

fdc5d6850fa18bb6da16b0df3a69efb8.png

865c4dd0b6ccc74bdb08d2a88ba83892.png

b02d2d199c895fa8eab5991c4e3e4303.png

A data structure corresponding to DataSource is the structure of DataSourceState in traced_perf. You can see that DataSourceState maintains a TraceWriter pointer, and this TraceWriter provides methods for writing Trace data.

b504b3c40239a248ea13c78676888200.png

1fce7bc5ea42a563cdb39ca11f01fddf.png

3.2.3.2TraceWriter

The TraceWriter class is to allow users to write Trace data in the form of zero copy in the shared memory of perfetto, which is convenient for users to write Trace data efficiently.

  • NewTracePacket

Create a TracePacket and return a handle

  • FinishTracePacket

Complete the previously created TracePacket

  • Flush

Brush TracePacket into the service side

3.2.3.3 Reception of IPC messages

The ProducerEndPoint object will communicate with the PerfProducer through the service_sock_name provided by the PerfProducer object. After the connection is established, it will enter the IPC process, and the server will send the corresponding command according to the protocol format defined by perfetto. The message protocol is as follows:

5e110bb624c88a99160c16d60ff2dff0.png

23479d3fb1c82b2eb77e956a20f02797.png

e9101a8f513047b81ea261f2d5fb9e23.png

The above message will be parsed by ProducerEndPoint and finally converted into a virtual function call of the Producer interface class (note that ProducerEndPoint maintains a pointer to a Producer (PerfProducer) instance).

Producer instances need to implement the following interfaces:

  • OnConnect

Will be called when a Socket connection is established with the Tracing Service

  • OnDisconnect

It will be called when the Socket is disconnected from the Tracing Service. At this point the PerfProducer object can be destroyed.

  • OnStartupTracingSetup

When called before the first DataSource is created, some initialization work can be done.

  • SetupDataSource

Called when DataSource is set, the passed parameters include DataSourceInstanceId and DataSourceConfig

  • StartDataSource

Start DataSource

  • StopDataSource

stopDataSource

  • Flush

Tracing Service requires Producer to write data into shared memory.

  • ClearIncrementalState

The Producer side should stop referring to the data written to the shared memory before this call.

3.2.3.4 Sending of IPC messages

ProducerEndPoint provides the following interfaces:

  • Disconnect:

It is used to disconnect from ProducerEndPoint, at this time, no callback message from Service can be received anymore.

  • RegisterDataSource

Register DataSource

  • UpdateDataSource

updateDataSource

  • RegisterTraceWriter

Register TraceWriter

  • CommitData

Notify Tracing Service that the data in shared memory has been updated.

  • CreateTraceWriter

Create TraceWriter

  • Other synchronization methods

4. Event processing of traced_perf

In the previous chapter, we discussed the relationship between traced_perf and the framework of perfetto. This chapter focuses on how traced_perf can be implemented as a producer of perfetto under the framework of perfetto producer, so as to achieve profiling thread counter information, obtain call stack, and analyze for performance issues.

In the previous chapter, it was described that trace_perf receives events from the tracing service through the IPC channel, and these events are finally transformed into rewritten functions of the Producer. Then traced_perf, as the producer of the tracing service, needs to implement this function to complete the entire process.

15663d605dcbb359b15110345d14a458.png

The box in the figure is the event processing status of PerfProducer, and the words on the connection line are events that occurred in traced_perf or commands received through IPC.

4.1 Implementation of onConnect

948dc7d77cc99dc291e385bc3607f284.png

The implementation of onConnect is very simple. First, set the state machine of the connection state to the "kConnected" state, and then instantiate two DataSourceDescriptors named "linux.perf" and "perfetto.metatrace", and then use the RegisterDataSource method of the endpoint_ pointer Register DataSource, where endpoint_ is the pointer to the ProducerEndPoint object mentioned in the previous chapter.

4.2 Implementation of StartDataSource

There are two parameters of StartDataSource, namely DataSourceInstanceID and DataSourceConfig, where DataSourceInstanceID is a unique unsigned 64-bit id used to identify an instance of DataSource; DataSourceConfig is a protobuf class generated by data_source_config.proto, and its prototype can be referred to: https: //cs.android.com/android/platform/superproject/+/master:external/perfetto/protos/perfetto/config/data_source_config.proto;l=1;bpv=1;bpt=0

4.3 Start MetaTraceSource

ebff3693b662b08cddb190c1c434c72d.png

Through the endpoint_ smart pointer, call the CreateTraceWriter method to create a TraceWriter object. At the same time, enable this metatrace and save it in a map structure maintained by metatrace_writers_.

4.4 The lookup operation of the mapping between tracepoint and id

Tracepoint is generally provided to the configuration file by name, but in the linux kernel, its corresponding id is generally used for API access control, so a mapping extraction is required here. Generally speaking, this id can be extracted from the events/GROUP/NAME/id file located in tracefs.

4.5 Open the eventfd corresponding to the perf event

3f01c1bb4e3b199c039adcb24bff8165.png

First, convert the configuration file encapsulated in pb format into a perf_event_attr data structure, and then call the system call provided by the linux kernel to register with the operating system.

The linux syscall required to open perf event is perf_event_open. The parameters of this API are more complicated. For details, please refer to the official document: https://man7.org/linux/man-pages/man2/perf_event_open.2.html

Here we focus on the key configuration information:

▫perf_event_attr

91cae745240487c5b0fa169d94d6fbf1.png

perf_event_attr is a relatively large structure, which contains various attribute information for perf_event configuration. Taking the relatively simple tracepoint event as an example, generally speaking, the following required fields need to be set:

type: set to PERF_TYPE_TRACEPOINT type

size: set to sizeof(perf_event_attr)

config: Set to the id information of the mapping obtained in the previous step

sample_type: Set the data type contained in the sample

read_format: Set the data type contained in the value returned by read

Switch bitmask configuration: include whether to include mmap data, whether to include nearly 30 configuration items such as comm

pid

Get which pid's perf event event

cpu

Get the perf event event of which cpu

groud_fd

Multiple events can be returned through the same event fd, and one of the events can be passed into -1 as the group leader, and subsequent events can pass the returned fd into this parameter.

4.6 Create TraceWriter and enable perf event

7b86c2580a45f56d81447013c8221a2b.png

4.7 Notify Unwinder to start DataSource

bed34054116859507ff07d36a2481483.png

4.8 Start the periodic reading task

fb71dc60be653c2f9c488ed6584887a0.png

The periodic reading task is mainly to obtain the data of perf event from the shared memory of the kernel. In subsequent chapters we will focus on the acquired data.

aa9eb0d7f4a01d801799bc6a6603f15a.png

In the ReadAndParsePerCpuBuffer in the TickDataSourceRead function, the sample data read from the shared memory of the kernel will be pushed to the queue of the unwinding_worker. When PostProcessQueue is called, the thread corresponding to unwinding_worker will be woken up, and the unwind operation will be performed until all samples are unwinded.

If the state of the DataSource is not stopped, more samples need to be fetched, so in this task, the delayed task is called again, and the task_runner is continued to schedule the task.

5. Sample acquisition

The acquisition of the Sample event is obtained from the ring buffer shared memory provided by the Linux kernel. This part of the operation is performed in PerfProducer::ReadAndParsePerCpuBuffer. This part of the operation is relatively cumbersome, and a part is intercepted in the figure below. Its basic process is:

Circulate through the ReadUntilSample of EventReader to obtain the parsed Sample. If there are some filter items configured in the DataSource config, filter out the uninteresting Sample until no Sample is generated or enough Samples have been obtained.

cb44f2504ea99c4c45bf11bd1a77bf8f.png

5.1 Acquisition of ring buffer data of PerfRingBuffer

Review the prototype of the perf_event_open function:

int syscall(SYS_perf_event_open, struct perf_event_attr *attr,                   pid_t pid, int cpu, int group_fd, unsigned long flags);                  

Among them, the perf_event_attr structure contains many configuration parameters. The parameters related to obtaining samples through the ring buffer are as follows:

  • sample_period/sample_freq: Indicates how often to get a sample.

  • sample_type: Indicate what type of data will be included in the sample, such as Instruction pointer, TID, Sample time, address information, etc.

The file descriptor returned by perf_event_open can then be called by the mmap system to return a memory address space shared by Kernel and Userspace. The data in this memory address is generally written by Kernel, and the Userspace program is responsible for parsing it. The distribution of shared memory addresses of mmap is as follows:

f393ec8c2940cd4bac2bf920f87fe6c2.png

The data structure corresponding to the metadata page is as follows:

d9aca5799a6abdc5950fb810c65bb81b.png

9922f8c7dbcdffafa65803ab115830d2.png

  • data_head: Point to the first address of the data area. This address is self-increasing continuously. When using it, it is necessary to perform a wrap operation between its address and the size of the mmap buffer.

  • data_tail: This data needs to be written by userspace, indicating the location of the last data read by userspace, so that the kernel will not overwrite unread data.

  • data_offset: The starting position of perf_sample is determined by this debriefing.

  • data_size: contains perf_sample area size information

The perf sample provided by the Linux kernel also contains a fixed format. The data prototype of each perf sample is as follows:

8762ffaefcf39cab8ad61089ea99326a.png

06fe2aa58dfd36d047f8f7b950961374.png

Note the if comment on the right side of the above structure. If the corresponding option is not configured in sample_type, the corresponding field is not included. When parsing the sample, pay attention to the value.

perf_event_header is the header information of each sample, and its definition is as follows:

8306087d3fba32a6c47c8d83177b3a6d.png

  • size: the size of this perf sample

  • misc: contains some additional information about this sample

  • type: different sample types, only when the type is PERF_RECORD_SAMPLE, there is the above perf sample data structure. For example, when its type is PERF_RECORD_LOST, the data structure of the corresponding perf event is

1a285c548a45f2690de22d5ab02e1c31.png

5.2 Analysis of Sample of EventReader

5.2.1 Reading of perf sample

a6d0a1307966a296392149497aa1d9b9.png

The acquisition of perf sample is actually reading the ring buffer, which contains a read offset and a write offset. The write offset is written by the kernel. As long as the read offset is smaller than the write offset, it means that there is still unread data in the ring buffer.

It is noted here that the ring buffer can actually perform a rollback operation. If a rollback operation occurs, the data needs to be reorganized.

5.2.2 Analysis of Perf sample

The parsing work of perf sample itself is carried out through EventReader::ParseSampleRecord. The parsed data structure is ParsedSample, which is defined as:

1103ac1c4d9d93f4be2b0b49e4ac829c.png

13cca1d716acff05b587a09cb923c25f.png

It can be seen that the information traced_perf focuses on includes:

  • CommonSampleData: cpu_mode, cpu, pid, tid, timestamp, timebase_count and other information

  • regs: userspace register information for unwinding

  • stack: userspace stack information

  • kernel_ips: Kernel instruction pointer information

The following is the specific process of Sample analysis

6aea7ddf8b77978f267ab41052bcfc0e.png

The above function will return the parsed Perf sample, ParsedSample. After a series of screening logic, the sample will be sent to the queue provided by unwindwing_worker for subsequent unwinding operations.

cc5449a4dbecb0aad18849cc4524d0f1.png

So far, all the perf event information required from the kernel has been collected and parsed. The next step is to convert it into readable callstack information, which is inseparable from the unwinding operation.

6. Unwinding operation

The unwinding operation occurs after parsing the perf event sample, and the call to initiate the action is:

1bbeee6fee428643a7e7d7bfafc9aa2a.png

Its main processing logic is located in the Unwinder::ConsumeAndUnwindReadySamples function.

a3317f6ec1027ed2f2a5cfcb4d2f0f94.png

a0b398ec618afde5055de0c8db7cc15c.png

When unwinding succeeds, it calls PostEmitSample in PerfProducer, and writes the data after unwinding into TraceWriter.

6.1 Analysis of kernel stack

The analysis of the kernel stack is relatively simple, and its main operation function is in Unwinder::SynbolizeKernelCallchain. Its main principle is to resolve the correspondence between kernel addresses and symbols in "/proc/kallsyms". According to the corresponding relationship, the instruction pointer in the kernel state in the sample is translated into address information. The address information of the kernel state is introduced in the previous chapter.

26b3dcc3d6e5789767524e2d1905b895.png

6.2 Analysis of user stack

The analysis of the user stack is relatively complicated. The analysis of the user stack must first obtain several necessary information:

  1. Userspace register information

  2. Userspace stack information

  3. /proc//mem information

  4. /proc//maps information

Among them, the first two pieces of information have been successfully obtained through the previous Sample parsing operation, so how to obtain the third and fourth pieces of information?

In the previous Android version, the privileged process can directly access the two information corresponding to the pid. As Android pays more and more attention to security and privacy, the sensitive information of different processes is relatively strongly isolated. Therefore, in order to obtain this information, traced_perf must be implemented in a relatively complicated manner according to a mechanism that conforms to the Android security design.

6.2.1How traced_perf requests the maps and mem information of the target process

In the AndroidRemoteDescriptorGetter class, the action of obtaining /proc//mem and /proc//maps operations is realized. The acquisition operation is completed by initiating a signal operation. The target of the signal is the target process:

7aff45e884f4b39aad1c073658b03e52.png

The receiving of information is done through the socket, that is, the socket created when the traced_perf process was just started:

5a46a4d3ab4d850561b1b5c5ceb35f12.png

In the data collection operation of this socket, the file descriptors of the above two files are obtained (the file descriptors have been converted from the kernel mode and can be used normally in the traced_perf process).

e40df4ee196165f2aef7a01c56fa2352.png

The delegate in the above code actually points to the PerfProducer object, so delegate_->onProcDescriptors will send two file descriptors to the PerfProducer object. And PerfProducer then sends this file descriptor to the UnwinderHandle object.

6.2.6 How the target process sends maps and mem information to traced_perf

As mentioned above, traced perf notifies the target process through signal, so that the target process sends the file descriptor, so why does the target process respond to such information? (The target process may be very diverse, including daemon, system apk, three-party apk, etc.), the answer is in the C library.

254ad5a008572b837b50fd9e6c07fd55.png

When the target process receives the signal, it establishes a connection with the traced_perf process through the unix socket, then opens two files maps and mem, and sends them through the sendmsg of the unix socket. For sending file descriptors through unix socket, you can refer to the document: https://man7.org/linux/man-pages/man7/unix.7.html, which will not be described in detail here.

The above code in android_profiling_dynamic.cpp will be compiled as part of the C library and loaded by most processes.

228627562da3f9d19d4ad33669c38af7.png

So far, all the required information for the user stack has been prepared.

6.2.3 Analysis of user stack

After all the information is prepared, the actual analysis of the user stack can be done directly through the method Unwinder::Unwind provided by libunwindstack, but the process is very straightforward.

ebddcef51aef7e52204d51af981cb207.png

7. Writing of Sample

The operation of writing the Sample data into the trace is also relatively straightforward. Just write the information obtained in the previous process through the TrackPacket protobuf structure returned by the TraceWriter.

95a57880d51a324501498e6ea81b9c75.png

At this point, the entire traced_perf acquisition sample execution process is probably completed.

8. Summary

The main parts of the workflow of traced_perf include:

  • Embedding of perfetto process: traced_perf is the producer of perfetto

  • Sample acquisition

  • Unwinding operation

These three main contents may seem complicated, but in fact the overall structure is relatively clear. Perfetto has opened up the framework of tracing and profiling, and it is only a step-by-step matter for the Tracing producer to connect to perfetto.

9. Reference link

TracedPerf source code: https://cs.android.com/

Traced_perf related documents: https://perfetto.dev/docs

perf history: https://en.wikipedia.org/wiki/Perf_(Linux)

Simpleperf related documents: https://android.googlesource.com/platform/prebuilts/simpleperf/+/782cdf2ea6e33f2414b53884742d59fe11f01ebe/README.md

perf_event_open: https://man7.org/linux/man-pages/man2

bfc9e8f737c7852b0470ebb7085ed043.gif

Long press to follow Kernel Craftsman WeChat

Linux Kernel Black Technology | Technical Articles | Featured Tutorials

Guess you like

Origin blog.csdn.net/feelabclihu/article/details/128744823