Android Audio Visualization: Exploration and Practice of Spectral Effects

Audio visualization, in a nutshell, is the conversion of sound to images.

With the advent of the visual industry era, users gradually pay attention to the ultimate experience of products. Among the many excellent music apps on the market, spectrum animation is a classic application scenario:

insert image description here

Image source: Migu Music

This article takes the terminal as an example, describes the acquisition of Androidaudio signal data , data processing , and common problemsAndroid , and provides a general and feasible solution for the realization of terminal audio visualization.

1. Acquisition of Spectrum Data

1. Time domain and frequency domain

To draw spectrum dynamic effects, first of all, you need to obtain the melody corresponding to the song. Here, you need to have a basic understanding of the concepts of time domain and frequency domain in signal processing:

The time domain (time domain) describes the relationship between a mathematical function or a physical signal and time. For example, the time domain waveform of a signal can express the change of the signal over time; the frequency domain (frequency domain) is
a coordinate system used to describe the characteristics of the signal in terms of frequency. The vertical axis is the amplitude of the frequency signal, which is commonly referred to as a spectrogram.

For us, although the time domain is real, the expression of the signal is obscure ; while the frequency domain is more concise and intuitive than the former .

Therefore, developers need to convert the time domain to the frequency domain. Here we use Fourier transform to decompose the signal into amplitude components and frequency components, and correlate the time domain (red) and frequency domain (blue) of the function:

In this way, we achieve the purpose of observing the analog signal from the frequency domain, and display the spectrum to the user, thereby enhancing the user's immersive experience when listening to music.

Next, the author briefly introduces the FFT algorithm (very important) used in the time domain to frequency domain .

2. FFT-Fast Fourier Transform

The most basic method for signal analysis is called Discrete Fourier Transform (DFT) Fourier analysis method, which transforms the signal from the time domain to the frequency domain, and then studies the spectral structure and variation of the signal.

01_fft_serializer.png

In some complex scenarios, such as the finite-length sequence corresponding to the above figure, the DFTamount of calculations used will be very large, and it is difficult to deal with the problem in real time. Therefore, we introduce the fast Fourier transform (FFT) algorithm, which DFTreduces the amount of calculations by several orders of magnitude.

After introducing this concept, readers can understand why it is always necessary to FFTobtain the corresponding data output through the algorithm to complete the drawing. For the sake of understanding, we will refer to the corresponding output byte[]as fftthe following Androidofficial APIstatement:

public int getFft(byte[] fft) {
    
    
  //...
}

3. Advantages and disadvantages of native API

FFTThe algorithm is complex and error-prone, and Androidthe official also provides a simple APIclass Visualizerthat is easy for developers to call. Only need to pass in audioSession, IDthe system can automatically capture the audio signal that the current device is broadcasting, and fftthen call back to the developer after the conversion is completed, and the developer can directly draw the spectrum.

A brief summary of its advantages are:

  • 1. The system automatically completes the sampling of the audio signal amplitude without adaptation, and is applicable to any native and third-party players;
  • 2. The system automatically completes fftthe conversion, and the developer gets the data through the callback to draw directly, and the cost of getting started is low.

Therefore, native Visualizer APIhas naturally become the first choice for implementation, but with the deepening of implementation, more problems continue to be exposed:

  • 1. 0When , the callback function will not return fftdata;
  • 2. The class itself is just a APIshell, and the real implementation is at nativethe layer, so it is difficult to modify and expand in a targeted manner;
  • 3. Some special models have compatibility issues;
  • 4. Since the audio signal is obtained internally from the current system, the microphone permission must be grantedVisualizer API before use , otherwise nothing will be returned.

In actual development, the above defects are unacceptable to the product, especially the 3、4two. With the implementation of a series of user privacy protection policies in the past two years, it is impossible to convince anyone that "microphone permissions are required to display UI special effects"Visualizer API . To sum up, the original solution was decisively abandoned.

4. Reorganize thinking

Now back to square one, we need to customize Visualizer.

The specific idea is that when digitizing audio, we sample the signal amplitude frequently, which is called pulse code modulation (hereinafter referred to as PCM) . As described above for the time domain , the amplitude is then quantized, but it is not intuitive. Therefore, we need to PCMstructure the signal, that is, FFTthe algorithm.

Visualizer API——All of this was done automatically by the native before , and now it is controlled and implemented by ourselves. Finally, we get the frequency of the sound at each moment, and display these amplitudes to the user on the page.

5. Underlying implementation

How to get the data of the current audio PCM? Here, the underlying player can be provided, and the three-party players ( ijkplayer, ExoPlayer) with relatively well-established communities all provide corresponding support, and developers can call APIor modify the source code according to their own business.

We continue to obtain a frame of PCMdata, usually this is one ByteBuffer, these bytemay come from multiple channel, simplifying the processing can take channelthe average value of all, and then get these data for Fourier transform.

Macroscopically understand the Fourier transform, that is, Ntransform a time domain point into a frequency domain point, divide Nthe sampling frequency , and the frequency interval is , according to the Nyquist sampling theorem, the sampling frequency is more than twice the highest frequency of the signal, so the number of effective frequency points is , and the function is as follows:Fs△f=Fs / NN / 2

01_fft_serializer.png

As mentioned above, the computer directly performs Fourier transform with a complexity of 2. O(N^2)Generally, a fast algorithm is used FFT, and the complexity is reduced to 2. O(N*LogN)We don’t need to implement it manually. There are already mature and stable third-party libraries in the community to provide such support.

In addition, the frequency domain points after Fourier transform are in the complex domain, which is not convenient for visualization. The processing method is to ignore the phase information and only take the (real number) amplitude information.

6. Performance optimization

Even if FFTthe algorithm is DFToptimized to a certain extent, the performance consumption of actual execution cannot be ignored. How to optimize this?

NetEase Cloud Music also mentioned this idea in this article : first, put the algorithm execution in the layer FFTof the playback process , call back to the layer through call , and then transfer the result data across the process to the main process, and then perform subsequent drawing operations:nativeJNIjava

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-aTcOih9w-1689157101651)(https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/b367b8546ec142c7be6253ed65636c9b~tplv-k3 u1fbpfcp-watermark.awebp)]

In addition, since PCMeach frame contains a lot of data, the initial single result fftis a length of 4096- float[]we can use these frequencies to immediately draw 4096a bar graph, but there is no such scenario in practice.

Therefore, before the algorithm is executed, we can subsampleFFT the initial data for different frequency bands .

According to acoustics, the human ear can easily distinguish the difference between the 100hzpitch and 200hzpitch, but it is difficult to distinguish the pitch difference between the sum 8100hzand pitch, although they are all phase differences . It can be said that the change between frequency and pitch is not a linear relationship, but a logarithmic relationship.8200hz100hz

Therefore, our audio sampling is mainly based on the low frequency band , supplemented by the high frequency band . For details, please refer to Wikipedia .

In this way, FFTwhen the algorithm is executed, the size of the single-input data source 4096is reduced from the initial 64to or 128, and the algorithm execution speed is greatly improved.

2. Data processing and drawing

After getting fftthe data, the next step is to deal with the problems encountered in each stage of the data.

1. Average Algorithm

First of all, because the data is obtained from different sampling points in different frequency bands, there will be several cases of sudden high or low data in a single continuous data interval. Here, the data is simply processed and weighted average :

01_data_math1.png

2. Fitting curve

For the data of a certain frame, even after simple weighted average processing, the display effect is still not good for the columnar special effect in the middle of the beginning of the article. How to smooth the uneven data to form a visual envelope effect ?

Here we introduce curve fitting . As the name suggests, it is a data processing method that uses continuous curves to approximately describe or compare the functional relationship between the coordinates represented by discrete point groups on the plane.

The method of least squares is the most common method for solving curve fitting problems. Its general form is:

After the introduction, the discretized data points form a smooth curve, which can significantly improve the user's visual experience.

3. Define the attenuation coefficient

After the above several steps of processing, the effect of a single frame has been significantly improved, but with the addition of the time dimension, the display effect of two adjacent frames has obvious jumps, which seems to the user to drop frames.

Here we introduce the attenuation coefficient . Before the data of each frame is drawn, it is compared with the data of the previous frame. When the data of a single frequency point has high jitter, the jitter is suppressed by the attenuation coefficient .

3. Other issues

1. The sound and special effects are out of sync

This problem gifcannot be reflected in the picture, but users will notice it immediately when listening to music: the spectral effect jumps too early, always one step faster than the rhythm of the music we actually hear.

This phenomenon is caused because the custom player is already drawing the processed before the data is passed to Androidthe system , and the player has a custom buffer inside, which will cause the visual effect to be ahead of the audio effect, resulting in delayed output.AudioTrackfft

Solving this problem is also very simple, just PCMrefer to the source code of the buffer for the data, and then define a corresponding one ByteBuffer.

2. Draw the scheme

For the technical scheme of spectrum drawing, you can choose the system to Canvas APIcustomize View, and execute the relevant calculation and processing of data in CPUit.

Considering the partial display of the spectrum animation itself UI, and the lack of direct and complex interaction with the user, as well as performance considerations, it was finally used OpenGL, and the data calculation-related processing was directly handed over to GPUthe calculation and rendering through matrix transformation.

summary

This article Androidgives a general description of the overall process of realizing spectrum effects on the terminal. Since spectrum animation is a highly customized effect, readers do not need to dwell on the details, but focus on the solutions to practical problems. In actual development, the development can be flexibly used according to the needs of its own products.

References

Guess you like

Origin blog.csdn.net/mq2553299/article/details/131687833