ALSA Compress-Offload API

Overview

From the early days of the ALSA API, it was defined to support PCM, or with fixed bitrate payloads such as IEC61937 in mind. It is common for parameters and return values ​​to be calculated in frames, which makes extending existing APIs to support compressed data streams challenging.

In recent years, audio digital signal processors (DSPs) are often integrated into system-on-chip (SoC) designs, and DSPs are often integrated into audio codecs (here audio codecs and audio data compression such as AAC Different solutions, it refers to devices mainly used to complete the conversion of analog signals and digital signals). Processing compressed data on a processor such as a DSP can significantly reduce power consumption compared to host-based processing. Linux support for this type of hardware is not very good, mainly due to the lack of common APIs available in the mainline kernel.

Rather than requiring changes to the API of the ALSA PCM interface that would break compatibility, a new "compressed data" API was introduced to provide a control and data flow interface to the audio DSP.

The design of this API was inspired by 2 years of experience with the Intel Moorestown SOC, with many of the corrections necessary to make the API available to others by uploading it to the mainline kernel instead of the staging tree.

need

The main requirements include the following:

  • Separation between byte count and time. Compressed formats may have a header for each file, or may have no header at all. Payload size may vary from frame to frame. Therefore, when dealing with compressed data, it is impossible to reliably estimate the duration of the audio buffer. Specialized mechanisms are required to achieve reliable audio-to-video synchronization, which requires accurate reporting of the number of samples that have been rendered at a given point in time.

  • Handles multiple formats. PCM data only requires specification of sample rate, number of channels, and bit width. Conversely, compressed data may be in a variety of formats. Audio DSPs may also embed support for a limited number of audio encoders and decoders in the firmware, or may support a greater selection through dynamic downloading of libraries.

  • Focus on the main formats. This API provides support for the most popular formats for audio and video capture and playback. It may add new formats as audio compression technology advances.

  • Handle multiple configurations. Even for a given format like AAC, some implementations may support AAC multichannel instead of HE-AAC stereo. Likewise, the WMA10 M3 level may require many memory and CPU cycles. The new API needs to provide a common way to list these formats.

  • Render/fetch only. This API does not provide any hardware acceleration methods, where PCM samples are returned to user space for more processing. This API focuses on providing a compressed data stream to the DSP and assumes that the decoded data is routed to a physical output or logical backend.

  • Complexity hidden. Existing userspace multimedia frameworks have ready-made enums/structures for each compression format. This new API assumes a platform-specific compatibility layer that converts and leverages the capabilities of audio DSPs, such as the Android HAL or PulseAudio sinks. By construction, regular applications should not use this API.

design

The new API shares many of the same concepts with the PCM API in terms of flow control. Regardless of their content, start, pause, resume, drain, and stop commands have the same semantics.

The concept of a memory ring buffer being split into a series of fragments is borrowed from the ALSA PCM API. However, only the byte size can be specified.

Drag/trick mode is assumed to be handled by the host.

The concept of fast rewind/fast forward is not supported. Data submitted to the ring buffer cannot be invalidated unless all buffers are deleted.

The Compressed Data API makes no assumptions about how data is presented to the audio DSP. DMA transfers from main memory to the embedded audio cluster or the SPI interface of an external DSP are possible. As in the case of ALSA PCM, a core set of routines is exposed; each driver implementer must write support for a set of mandatory routines, and possibly use optional routines.

The main supplementary content is

get_caps

This routine returns a list of supported audio formats. Querying codecs on a capture stream will return the codecs, on a playback stream the decoders will be listed.

get_codec_caps

For each codec, this routine returns a list of capabilities. The intent of this routine is to ensure that all capabilities correspond to valid settings and to minimize the risk of configuration failure. For example, for complex codecs such as AAC, the number of channels supported may depend on the specific profile. If capabilities are exposed through a single descriptor, it may happen that a specific combination of profile/channel count/format is not supported. Likewise, embedded DSPs have limited memory and CPU cycles, and some implementations may make the list of capabilities dynamic and dependent on existing workloads. In addition to the codec settings, this routine returns the minimum buffer size for implementation processing. This information can be a function of the DMA buffer size, the number of bytes required for synchronization, etc., and can be used by userspace to define how much needs to be written in the ring buffer before playback can begin.

set_params

This routine sets the configuration selected for a specific codec. The most important field in the parameters is the codec type; in most cases the decoder will ignore the other parameters and the encoder will remain strictly consistent with the settings.

get_params

This routine returns the actual settings used by the DSP. Changes to settings should still be exceptional.

get_timestamp

The timestamp becomes a multi-field structure. It lists bytes transferred, samples processed, and samples rendered/captured. All of these values ​​can be used to determine average bitrate, determine if the ring buffer needs refilling, or latency due to decoding/encoding/IO on the DSP.

Note that the codec/profile/mode list is derived from the OpenMAX AL specification, not reinvented. Modifications include:

  • Added FLAC and IEC formats
  • Merging of encoder/decoder capabilities
  • Profile/mode lists are bitmasked to make descriptors more compact
  • Add set_params for decoder (missing in OpenMAX AL)
  • Added AMR/AMR-WB encoding mode (missing in OpenMAX AL)
  • Add formatting information for WMA
  • Add encoding options when needed (derived from OpenMAX AL)
  • Add rateControlSupported (missing in OpenMAX AL)

state machine

The compressed audio stream state machine is described as follows

                                      +----------+
                                      |          |
                                      |   OPEN   |
                                      |          |
                                      +----------+
                                           |
                                           |
                                           | compr_set_params()
                                           |
                                           v
       compr_free()                  +----------+
+------------------------------------|          |
|                                    |   SETUP  |
|          +-------------------------|          |<-------------------------+
|          |       compr_write()     +----------+                          |
|          |                              ^                                |
|          |                              | compr_drain_notify()           |
|          |                              |        or                      |
|          |                              |     compr_stop()               |
|          |                              |                                |
|          |                         +----------+                          |
|          |                         |          |                          |
|          |                         |   DRAIN  |                          |
|          |                         |          |                          |
|          |                         +----------+                          |
|          |                              ^                                |
|          |                              |                                |
|          |                              | compr_drain()                  |
|          |                              |                                |
|          v                              |                                |
|    +----------+                    +----------+                          |
|    |          |    compr_start()   |          |        compr_stop()      |
|    | PREPARE  |------------------->|  RUNNING |--------------------------+
|    |          |                    |          |                          |
|    +----------+                    +----------+                          |
|          |                            |    ^                             |
|          |compr_free()                |    |                             |
|          |              compr_pause() |    | compr_resume()              |
|          |                            |    |                             |
|          v                            v    |                             |
|    +----------+                   +----------+                           |
|    |          |                   |          |         compr_stop()      |
+--->|   FREE   |                   |  PAUSE   |---------------------------+
     |          |                   |          |
     +----------+                   +----------+

Seamless playback

When playing a record, the decoder is able to skip encoder delays and padding and move directly from one track content to another. The end user can see this as seamless playback because we don't have silence when switching from one track to another.

Additionally, low-intensity noise may be generated due to encoding. It's difficult to achieve perfect seamless results with all types of compressed data, but it works well for most musical content. The decoder needs to know the encoder delay and encoder padding. So we need to pass it to the DSP. This metadata is extracted from the ID3/MP4 header and is not present in the bitstream by default, so a new interface is required to pass this information to the DSP. Additionally, the DSP and user space need to switch from one track to another and start using the second track's data.

The main supplementary contents are:

set_metadata

This routine sets the encoder delay and encoder padding. The decoder can use this to remove silence. This needs to be set before writing the data in the track.

set_next_track

This routine tells the DSP that metadata and write operations sent after this will correspond to subsequent tracks.

partial_drain

This function is called when the end of the file is reached. User space can notify the DSP that EOF has been reached and now the DSP can start skipping fill delays. The next data written will also belong to the next track.

The sequence process of seamless playback is:

  • Open
  • Acquisition Capabilities (caps) / Codec Capabilities (caps)
  • Setting parameters
  • Set metadata for first track
  • Populate the data for the first track
  • trigger start
  • User space ends all sending,
  • Data indicating the next track by sending set_next_track
  • Set metadata for next track
  • Then call partial_drain to flush most of the buffer in the DSP
  • Populate the data for the next track
  • DSP switches to second track

(Note: partial_drain and the order of writing the data of the next track can also be reversed)

Seamless playback state machine

For seamless playback, we move from the running state to the partially exhausted state and back while setting the metadata and signal for the next track

                          +----------+
  compr_drain_notify()    |          |
+------------------------>|  RUNNING |
|                         |          |
|                         +----------+
|                              |
|                              |
|                              | compr_next_track()
|                              |
|                              V
|                         +----------+
|    compr_set_params()   |          |
|             +-----------|NEXT_TRACK|
|             |           |          |
|             |           +--+-------+
|             |              | |
|             +--------------+ |
|                              |
|                              | compr_partial_drain()
|                              |
|                              V
|                         +----------+
|                         |          |
+------------------------ | PARTIAL_ |
                          |  DRAIN   |
                          +----------+

not support

  • Supporting VoIP/circuit switched calls is not a goal of this API. Supporting dynamic bitrate changes requires tight coupling between the DSP and host stack, limiting power savings.

  • Packet loss hiding is not supported. This would require an additional interface to allow the decoder to synthesize data if frames are lost during transmission. This feature may be added in the future.

  • This API does not handle volume control/routing. Devices exposing a compressed data interface will be treated as regular ALSA devices; changing volume and routing information will be provided via regular ALSA kcontrol.

  • Embedded sound effects. Such sound effects should be enabled in the same way regardless of whether the input is PCM or compressed.

  • Multi-channel IEC encoding. It's not clear whether this is required.

  • As mentioned above, encoding/decoding acceleration is not supported. The output of the decoder can be routed to the capture stream and even transcoded. This routing will be enabled via ALSA kcontrol.

  • Audio policy/resource management. The API does not provide any hooks to query audio DSP utilization, nor does it provide any preemption mechanism.

  • There is no concept of underrun/overrun. Since the bytes written are inherently compressed and the data written/read is not converted directly into rendered output in time, this will not handle underrun/overrun issues, which may be handled in the user library

author

  • Mark Brown and Liam Girdwood discuss the need for this API
  • Harsha Priya's work on intel_sst compression API
  • Rakesh Ughreja provided valuable feedback
  • Sing Nallasellan, Sikkandar Madar, and Prasanna Samaga demonstrate and quantify the benefits of audio offloading on a real platform.

original

Done.

Guess you like

Origin blog.csdn.net/tq08g2z/article/details/134892398