Simple recording mute and noise reduction for Android

Demand:
Customers reported that there was a lot of noise in the recording of the product (because we adjusted the recording gain of the Codec to the maximum, and there is no dedicated audio processing chip on the circuit, and the CPU is directly connected to the MIC (with ground cover)). When the shell and hardware cannot be modified, the software has to think of ways to try to solve the problem.

The first thing that comes to mind is dual-microphone noise reduction. The principle is roughly as follows: one main microphone is used for calls, and the other collects environmental noise, analyzes the audio waveform and operates the phase, and superimposes it on the sampling waveform of the main microphone to form phase cancellation, which reduces noise. Noisy. The disadvantage is that the two microphones cannot be too close, and the distance between the two microphones and the speaker cannot be too far. If it is too far, the angle will be very small and cannot be distinguished at all. In addition, depending on the use of the product, the upper and lower microphones have separate Odds are called the main microphone. Therefore, the results of the experimental test were not very good.

Considering that the recording noise cannot be distinguished when there is "human voice", or the impact is very small, but there is obvious environmental noise when it is silent, so I want to use the silent noise reduction method to avoid the problem.

This article is just a simple mute noise reduction. The principle is as follows: considering that when starting recording, it will take a while (for example, 0.5s) before there will be a voice. The size of the noise (threshold) can be predicted based on this 0.5s time, and then used as Basics to detect the starting point of "human voice". Before the human voice arrives, all audio data is set to 0, which is to perform mute processing, so this is called mute noise reduction. When the human voice arrives, the actual audio data (including the noise data inside) is returned. The method of calculating the threshold is simply summing and averaging.

The following code is implemented in hardware/alsa_sound/AudioStreamInALSA.cpp on the RK platform.


#define MUTE_NOISE_REDUCTION
#ifdef MUTE_NOISE_REDUCTION
bool enable_reduction_noise = false;	//由属性sys.is.audiorecord.only控制

int threshold_def = 0x400;	//默认阈值
int threshold = 0;	//自适应噪声阈值
int threshold_count = 0;	//计数,超过THRESHOLD_COUNT则使用threshold来检测“人声”
#define THRESHOLD_COUNT 10

#define MUTE_DELAY_COUNT 15		//播放人声后保留的音频帧数、不静音

#define AUDIO_BUFFER_NUM 4		//缓存音频数据的帧数
#define AUDIO_BUFFER_SIZE 1024	//一帧的音频数据大小
char *audio_buffer[AUDIO_BUFFER_NUM];	//audio_buffer用于缓存音频数据
char *audio_buffer_temp;	//用于交互音频数据
int audio_buffer_pos=0;
#endif

#ifdef MUTE_NOISE_REDUCTION
    {
    
    
        unsigned int value = 0;
        int is_voice = 0;
        static int is_mute_delay_count;
        //ALOGE("in_begin_swip_num:%d in_begin_narrow_num=%d",in_begin_swip_num,in_begin_narrow_num);		

         if(enable_reduction_noise && bytes > AUDIO_BUFFER_SIZE){
    
    
            bytes = AUDIO_BUFFER_SIZE;
        }

        if(enable_reduction_noise){
    
    
            unsigned char * buffer_temp=(unsigned char *)buffer;
            unsigned int total = 0;
            unsigned int total_count=0;
			unsigned int total_temp = 0;
            short data16;
            int j = 0;
            for(j=0; j<bytes; j=j+2){
    
    
                value = buffer_temp[j+1];	//第二个字节为高位数据
                value = (value<<8)+buffer_temp[j];	//获得一个16bit的音频数据
                data16 = value&0xFFFF;
                if( (data16 & 0x8000) == 0){
    
    //正数
					total +=data16;		//思考:会不会溢出
					total_count++;		//计数
                }
            }

			total_temp = total/total_count;
            if(total_temp > threshold_def){
    
    
                is_voice++;		//检测到人声
            }else {
    
    	//is noise
				if(threshold_count == 0){
    
    
					threshold = total_temp;
				}else{
    
    
					threshold = (threshold+total_temp)/2;
				}
				threshold_count++;
				if(threshold_count >= THRESHOLD_COUNT){
    
    
					threshold_def = threshold*2;	//更新阈值,这里的2要对产品实验来确定。
					threshold_count = THRESHOLD_COUNT;	//此后一直用新阈值,直到停止录音
				}
			}

			//is_mute_delay_count的意义是,如果前面播放了人声,那再停止说话之后继续保留MUTE_DELAY_COUNT的音频数据,这样不会“戛然而止”。
	        if( is_voice != 0 ){
    
    
	            is_mute_delay_count=MUTE_DELAY_COUNT;
	        }else{
    
    
	            if(is_mute_delay_count != 0)
	                is_mute_delay_count--;
	        }

			//audio_buffer的意义:检测到人声,要返回说话前的一小段音频数据,否则声音从静音到人声有个POP声的跳跃。
			//这里用audio_buffer来缓存AUDIO_BUFFER_NUM帧数据。
	        if(is_mute_delay_count == 0){
    
    //Mute in order to remove noise
	            memcpy(audio_buffer[audio_buffer_pos], (char *)buffer, bytes);	//缓存音频
	            memset(buffer, 0, bytes);	//返回静音数据
	        }else {
    
    
	            memcpy(audio_buffer_temp, (char *)buffer, bytes);
	            memcpy((char *)buffer, audio_buffer[audio_buffer_pos], bytes);	//返回旧的音频数据
	            memcpy(audio_buffer[audio_buffer_pos], (char *)audio_buffer_temp, bytes); 	//保存新的音频数据
	        }
			audio_buffer_pos++;
			if(audio_buffer_pos>=AUDIO_BUFFER_NUM)
                audio_buffer_pos=0;
        }
    }
#endif

Guess you like

Origin blog.csdn.net/suwen8100/article/details/117068805