Audio visualization, in a nutshell, is the conversion of sound to images.
With the advent of the visual industry era, users gradually pay attention to the ultimate experience of products. Among the many excellent music apps on the market, spectrum animation is a classic application scenario:
Image source: Migu Music
This article takes the terminal as an example, describes the acquisition of Android
audio signal data , data processing , and common problemsAndroid
, and provides a general and feasible solution for the realization of terminal audio visualization.
1. Acquisition of Spectrum Data
1. Time domain and frequency domain
To draw spectrum dynamic effects, first of all, you need to obtain the melody corresponding to the song. Here, you need to have a basic understanding of the concepts of time domain and frequency domain in signal processing:
The time domain (time domain) describes the relationship between a mathematical function or a physical signal and time. For example, the time domain waveform of a signal can express the change of the signal over time; the frequency domain (frequency domain) is
a coordinate system used to describe the characteristics of the signal in terms of frequency. The vertical axis is the amplitude of the frequency signal, which is commonly referred to as a spectrogram.
For us, although the time domain is real, the expression of the signal is obscure ; while the frequency domain is more concise and intuitive than the former .
Therefore, developers need to convert the time domain to the frequency domain. Here we use Fourier transform to decompose the signal into amplitude components and frequency components, and correlate the time domain (red) and frequency domain (blue) of the function:
In this way, we achieve the purpose of observing the analog signal from the frequency domain, and display the spectrum to the user, thereby enhancing the user's immersive experience when listening to music.
Next, the author briefly introduces the FFT algorithm (very important) used in the time domain to frequency domain .
2. FFT-Fast Fourier Transform
The most basic method for signal analysis is called Discrete Fourier Transform (DFT) Fourier analysis method, which transforms the signal from the time domain to the frequency domain, and then studies the spectral structure and variation of the signal.
In some complex scenarios, such as the finite-length sequence corresponding to the above figure, the DFT
amount of calculations used will be very large, and it is difficult to deal with the problem in real time. Therefore, we introduce the fast Fourier transform (FFT) algorithm, which DFT
reduces the amount of calculations by several orders of magnitude.
After introducing this concept, readers can understand why it is always necessary to FFT
obtain the corresponding data output through the algorithm to complete the drawing. For the sake of understanding, we will refer to the corresponding output byte[]
as fft
the following Android
official API
statement:
public int getFft(byte[] fft) {
//...
}
3. Advantages and disadvantages of native API
FFT
The algorithm is complex and error-prone, and Android
the official also provides a simple API
class Visualizer
that is easy for developers to call. Only need to pass in audioSession
, ID
the system can automatically capture the audio signal that the current device is broadcasting, and fft
then call back to the developer after the conversion is completed, and the developer can directly draw the spectrum.
A brief summary of its advantages are:
- 1. The system automatically completes the sampling of the audio signal amplitude without adaptation, and is applicable to any native and third-party players;
- 2. The system automatically completes
fft
the conversion, and the developer gets the data through the callback to draw directly, and the cost of getting started is low.
Therefore, native Visualizer API
has naturally become the first choice for implementation, but with the deepening of implementation, more problems continue to be exposed:
- 1.
0
When , the callback function will not returnfft
data; - 2. The class itself is just a
API
shell, and the real implementation is atnative
the layer, so it is difficult to modify and expand in a targeted manner; - 3. Some special models have compatibility issues;
- 4. Since the audio signal is obtained internally from the current system, the microphone permission must be granted
Visualizer API
before use , otherwise nothing will be returned.
In actual development, the above defects are unacceptable to the product, especially the 3、4
two. With the implementation of a series of user privacy protection policies in the past two years, it is impossible to convince anyone that "microphone permissions are required to display UI special effects"Visualizer API
. To sum up, the original solution was decisively abandoned.
4. Reorganize thinking
Now back to square one, we need to customize Visualizer
.
The specific idea is that when digitizing audio, we sample the signal amplitude frequently, which is called pulse code modulation (hereinafter referred to as PCM) . As described above for the time domain , the amplitude is then quantized, but it is not intuitive. Therefore, we need to PCM
structure the signal, that is, FFT
the algorithm.
Visualizer API
——All of this was done automatically by the native before , and now it is controlled and implemented by ourselves. Finally, we get the frequency of the sound at each moment, and display these amplitudes to the user on the page.
5. Underlying implementation
How to get the data of the current audio PCM
? Here, the underlying player can be provided, and the three-party players ( ijkplayer
, ExoPlayer
) with relatively well-established communities all provide corresponding support, and developers can call API
or modify the source code according to their own business.
We continue to obtain a frame of PCM
data, usually this is one ByteBuffer
, these byte
may come from multiple channel
, simplifying the processing can take channel
the average value of all, and then get these data for Fourier transform.
Macroscopically understand the Fourier transform, that is, N
transform a time domain point into a frequency domain point, divide N
the sampling frequency , and the frequency interval is , according to the Nyquist sampling theorem, the sampling frequency is more than twice the highest frequency of the signal, so the number of effective frequency points is , and the function is as follows:Fs
△f=Fs / N
N / 2
As mentioned above, the computer directly performs Fourier transform with a complexity of 2. O(N^2)
Generally, a fast algorithm is used FFT
, and the complexity is reduced to 2. O(N*LogN)
We don’t need to implement it manually. There are already mature and stable third-party libraries in the community to provide such support.
In addition, the frequency domain points after Fourier transform are in the complex domain, which is not convenient for visualization. The processing method is to ignore the phase information and only take the (real number) amplitude information.
6. Performance optimization
Even if FFT
the algorithm is DFT
optimized to a certain extent, the performance consumption of actual execution cannot be ignored. How to optimize this?
NetEase Cloud Music also mentioned this idea in this article : first, put the algorithm execution in the layer FFT
of the playback process , call back to the layer through call , and then transfer the result data across the process to the main process, and then perform subsequent drawing operations:native
JNI
java
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-aTcOih9w-1689157101651)(https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/b367b8546ec142c7be6253ed65636c9b~tplv-k3 u1fbpfcp-watermark.awebp)]
In addition, since PCM
each frame contains a lot of data, the initial single result fft
is a length of 4096
- float[]
we can use these frequencies to immediately draw 4096
a bar graph, but there is no such scenario in practice.
Therefore, before the algorithm is executed, we can subsampleFFT
the initial data for different frequency bands .
According to acoustics, the human ear can easily distinguish the difference between the 100hz
pitch and 200hz
pitch, but it is difficult to distinguish the pitch difference between the sum 8100hz
and pitch, although they are all phase differences . It can be said that the change between frequency and pitch is not a linear relationship, but a logarithmic relationship.8200hz
100hz
Therefore, our audio sampling is mainly based on the low frequency band , supplemented by the high frequency band . For details, please refer to Wikipedia .
In this way, FFT
when the algorithm is executed, the size of the single-input data source 4096
is reduced from the initial 64
to or 128
, and the algorithm execution speed is greatly improved.
2. Data processing and drawing
After getting fft
the data, the next step is to deal with the problems encountered in each stage of the data.
1. Average Algorithm
First of all, because the data is obtained from different sampling points in different frequency bands, there will be several cases of sudden high or low data in a single continuous data interval. Here, the data is simply processed and weighted average :
2. Fitting curve
For the data of a certain frame, even after simple weighted average processing, the display effect is still not good for the columnar special effect in the middle of the beginning of the article. How to smooth the uneven data to form a visual envelope effect ?
Here we introduce curve fitting . As the name suggests, it is a data processing method that uses continuous curves to approximately describe or compare the functional relationship between the coordinates represented by discrete point groups on the plane.
The method of least squares is the most common method for solving curve fitting problems. Its general form is:
After the introduction, the discretized data points form a smooth curve, which can significantly improve the user's visual experience.
3. Define the attenuation coefficient
After the above several steps of processing, the effect of a single frame has been significantly improved, but with the addition of the time dimension, the display effect of two adjacent frames has obvious jumps, which seems to the user to drop frames.
Here we introduce the attenuation coefficient . Before the data of each frame is drawn, it is compared with the data of the previous frame. When the data of a single frequency point has high jitter, the jitter is suppressed by the attenuation coefficient .
3. Other issues
1. The sound and special effects are out of sync
This problem gif
cannot be reflected in the picture, but users will notice it immediately when listening to music: the spectral effect jumps too early, always one step faster than the rhythm of the music we actually hear.
This phenomenon is caused because the custom player is already drawing the processed before the data is passed to Android
the system , and the player has a custom buffer inside, which will cause the visual effect to be ahead of the audio effect, resulting in delayed output.AudioTrack
fft
Solving this problem is also very simple, just PCM
refer to the source code of the buffer for the data, and then define a corresponding one ByteBuffer
.
2. Draw the scheme
For the technical scheme of spectrum drawing, you can choose the system to Canvas API
customize View
, and execute the relevant calculation and processing of data in CPU
it.
Considering the partial display of the spectrum animation itself UI
, and the lack of direct and complex interaction with the user, as well as performance considerations, it was finally used OpenGL
, and the data calculation-related processing was directly handed over to GPU
the calculation and rendering through matrix transformation.
summary
This article Android
gives a general description of the overall process of realizing spectrum effects on the terminal. Since spectrum animation is a highly customized effect, readers do not need to dwell on the details, but focus on the solutions to practical problems. In actual development, the development can be flexibly used according to the needs of its own products.