【Data Compression 8】MPEG Audio Coding

Experimental principle

1. The design idea of ​​perceptual audio coding

The MPEG-1 audio codec consists of two lines:

1. After the code stream is read in, the PCM samples are transformed into 32 sub-band frequency domain signals through the filter bank, and then the sub-band encoding for correlation calculation is performed. This line is the main line

2. Using psychoacoustic models and dynamic bit allocation and other related operations to calculate the number of quantized bits used, which plays a role in removing redundant information
insert image description here

2. The realization process of the psychoacoustic model

1. Human auditory system

Whether a person hears a sound depends on the frequency of the sound and whether the amplitude of the sound is above the hearing threshold at that frequency

The human auditory system is roughly equivalent to a signal passing through a group of parallel band-pass filters with different center frequencies. The filter with the same center frequency as the signal frequency has the maximum response, and the filter with a center frequency that deviates more from the signal frequency will not produce response. The filter bank consists of 25 overlapping bandpass filters in the frequency range from 0Hz to 20kHz.
insert image description here

2. Critical band

1. The critical frequency band means that when a pure tone is masked by continuous noise with its center frequency and a certain bandwidth, if the power of the pure tone just happens to be heard is equal to the noise power in this frequency band, the bandwidth is critical bandwidth

2. It is generally believed that there are 25 critical frequency bands from 20Hz to 16kHz, the unit is bark, 1bark=the width of a critical frequency band

3. Masking value

The comprehensive masking effect when multiple maskers exist at the same time can be understood as the masking effect of each masker changes independently and then adds linearly

1. When two signals overlap and fall in a critical frequency band, the masking components of the two can be added linearly

2. For a complex audio signal, its spectrum can be divided into a series of discrete segments, each segment is a masking signal, and the masking tones do not overlap each other, that is, a critical band is used as a unit. The sound pressure level of each masker tone is obtained by linearly adding the short-term power spectral density on the corresponding critical frequency band

4. Polyphase filter banks

used to split subbandsinsert image description here

5. Psychoacoustic Model Ⅰ

Computational complexity is low, but compression is too severe for parts that are assumed to be inaudible to the user.

process

1. Transform the samples into the frequency domain

The 32 equally divided sub-band signals cannot accurately reflect the auditory characteristics of the human ear, so the problem of insufficient frequency resolution of FFT compensation is introduced;

layer1 has 384 sample points per frame, so a 512-point sample window is used;

layer2 and layer3 have 1152 sample points per frame, use a sample window of 1024 points, calculate twice per frame, and select the smaller of the two signal-mask ratios (SMR)

2. Determine the sound pressure level
insert image description here

3. Consider the quiet time threshold.
In the standard, there is a "frequency, critical frequency band rate and absolute threshold" table compiled according to the sampling rate of the input PCM signal

4. Decompose the audio signal into "tone" and "non-tone/noise" parts
because the masking ability of these two signals is different

5. Elimination of tone and non-tone masking components
Use the absolute threshold given in the standard to eliminate the masked components, considering that in each critical frequency band, only the highest power components are retained in the distance less than 0.5 bark

6. Calculation of single masking threshold
The single masking threshold of tonal components and non-tonal components is obtained according to the algorithm in the standard

7. Calculation of global masking threshold
insert image description here

8. The masking threshold of each sub-band
Select the smallest threshold in this sub-band as the sub-band threshold

9. Calculate the signal masking ratio SMR of each subband
SMR=signal energy/masking threshold
Pass the SMR to the coding unit

experiment process

1. Output audio sampling rate and target bit rate

set command line
insert image description here

2. main function

int frameNum = 0;
 
int main(int argc, char** argv)
{
    
    
	typedef double SBS[2][3][SCALE_BLOCK][SBLIMIT];   // SCALE_BLOCK=12,SBLIMIT=32
	SBS* sb_sample;
	typedef double JSBS[3][SCALE_BLOCK][SBLIMIT];
	JSBS* j_sample;
	typedef double IN[2][HAN_SIZE];   //HAN_SIZE=512
	IN* win_que;
	typedef unsigned int SUB[2][3][SCALE_BLOCK][SBLIMIT];
	SUB* subband;
 
	frame_info frame;     
	frame_header header;    
	char original_file_name[MAX_NAME_SIZE];     
	char encoded_file_name[MAX_NAME_SIZE];      
	short** win_buf;
	static short buffer[2][1152];     
	static unsigned int bit_alloc[2][SBLIMIT];     
	static unsigned int scfsi[2][SBLIMIT];
	static unsigned int scalar[2][3][SBLIMIT];     
	static unsigned int j_scale[3][SBLIMIT];
	static double smr[2][SBLIMIT], lgmin[2][SBLIMIT], max_sc[2][SBLIMIT];    
	// FLOAT snr32[32];
	short sam[2][1344];		/* was [1056]; */
	int model, nch, error_protection;    
	static unsigned int crc;
	int sb, ch, adb;      
	unsigned long frameBits, sentBits = 0;
	unsigned long num_samples;
	int lg_frame;
	int i;
	FILE* result = NULL;
	result = fopen("result.txt", "wb");
 
	/* Used to keep the SNR values for the fast/quick psy models */
	static FLOAT smrdef[2][32];
 
	static int psycount = 0;
	extern int minimum;
 
	time_t start_time, end_time;
	int total_time;
 
	sb_sample = (SBS*)mem_alloc(sizeof(SBS), "sb_sample");
	j_sample = (JSBS*)mem_alloc(sizeof(JSBS), "j_sample");
	win_que = (IN*)mem_alloc(sizeof(IN), "Win_que");
	subband = (SUB*)mem_alloc(sizeof(SUB), "subband");
	win_buf = (short**)mem_alloc(sizeof(short*) * 2, "win_buf");    
 
	/* clear buffers */
	memset((char*)buffer, 0, sizeof(buffer));
	memset((char*)bit_alloc, 0, sizeof(bit_alloc));
	memset((char*)scalar, 0, sizeof(scalar));
	memset((char*)j_scale, 0, sizeof(j_scale));
	memset((char*)scfsi, 0, sizeof(scfsi));
	memset((char*)smr, 0, sizeof(smr));
	memset((char*)lgmin, 0, sizeof(lgmin));
	memset((char*)max_sc, 0, sizeof(max_sc));
	//memset ((char *) snr32, 0, sizeof (snr32));
	memset((char*)sam, 0, sizeof(sam));
 
	global_init();     
 
	header.extension = 0;
	frame.header = &header;
	frame.tab_num = -1;		/* no table loaded */
	frame.alloc = NULL;
	header.version = MPEG_AUDIO_ID;	/* Default: MPEG-1 */     //MPEG_AUDIO_ID=1
 
	total_time = 0;
 
	time(&start_time);
 
	programName = argv[0];
	if (argc == 1)		/* no command-line args */  
		short_usage();
	else
		parse_args(argc, argv, &frame, &model, &num_samples, original_file_name,
			encoded_file_name);     
	print_config(&frame, &model, original_file_name, encoded_file_name);   
 
	/* this will load the alloc tables and do some other stuff */
	hdr_to_frps(&frame);
	nch = frame.nch;
	error_protection = header.error_protection;
 
 
 
	while (get_audio(musicin, buffer, num_samples, nch, &header) > 0) {
    
        
		if (glopts.verbosity > 1)
			if (++frameNum % 10 == 0)
				fprintf(stderr, "[%4u]\r", frameNum);
		fflush(stderr);
		win_buf[0] = &buffer[0][0];
		win_buf[1] = &buffer[1][0];
 
		adb = available_bits(&header, &glopts);   //bit预算
		lg_frame = adb / 8;
		if (header.dab_extension) {
    
    
			/* in 24 kHz we always have 4 bytes */
			if (header.sampling_frequency == 1)
				header.dab_extension = 4;
			/* You must have one frame in memory if you are in DAB mode                 */
			/* in conformity of the norme ETS 300 401 http://www.etsi.org               */
				  /* see bitstream.c            */
			if (frameNum == 1)
				minimum = lg_frame + MINIMUM;
			adb -= header.dab_extension * 8 + header.dab_length * 8 + 16;
		}
 
		int totalbit = adb;
 
		{
    
    
			int gr, bl, ch;
			/* New polyphase filter
		   Combines windowing and filtering. Ricardo Feb'03 */
			for (gr = 0; gr < 3; gr++)     
				for (bl = 0; bl < 12; bl++)     
					for (ch = 0; ch < nch; ch++)
						WindowFilterSubband(&buffer[ch][gr * 12 * 32 + 32 * bl], ch,
							&(*sb_sample)[ch][gr][bl][0]);     
		}
 
		if (frameNum == 5) {
    
    
			//输出提示信息
			fprintf(result, "第%d帧\n", frameNum);
 
			//输出可用比特数
			fprintf(result, "分配比特数:%d\n", totalbit);
 
			//输出比例因子
			fprintf(result, "比例因子:\n");
			for (int i = 0; i < nch; i++) {
    
       
				fprintf(result, "声道[%d]:\n", i + 1);
 
				for (int j = 0; j < frame.sblimit; j++) {
    
       
					if (j % 4 == 0 && j != 0) {
    
    
						fprintf(result, "\n");
					}
 
					fprintf(result, "子带%d:\t", j);
 
					for (int k = 0; k < 3; k++) {
    
        
						fprintf(result, "%d\t", scalar[i][k][j]);
					}
 
					fprintf(result, "\t");
 
				}
 
				fprintf(result, "\n");
			}
 
			fprintf(result, "\n");
 
			//输出比特分配结果
			fprintf(result, "\n比特分配结果:\n");
			for (int i = 0; i < nch; i++) {
    
       
				fprintf(result, "声道[%d]:\n", i + 1);
				for (int j = 0; j < frame.sblimit; j++) {
    
       
					if (j % 4 == 0 && j != 0) {
    
    
						fprintf(result, "\n");
					}
 
					fprintf(result, "子带%d:\t%d\t", j, bit_alloc[i][j]);
 
				}
 
				fprintf(result, "\n");
 
			}
 
			fflush(result);
		}
 
 
#ifdef REFERENCECODE
		{
    
    
			/* Old code. left here for reference */
			int gr, bl, ch;
			for (gr = 0; gr < 3; gr++)
				for (bl = 0; bl < SCALE_BLOCK; bl++)
					for (ch = 0; ch < nch; ch++) {
    
    
						window_subband(&win_buf[ch], &(*win_que)[ch][0], ch);
						filter_subband(&(*win_que)[ch][0], &(*sb_sample)[ch][gr][bl][0]);
					}
		}
#endif
 
 
#ifdef NEWENCODE
		scalefactor_calc_new(*sb_sample, scalar, nch, frame.sblimit);
		find_sf_max(scalar, &frame, max_sc);
		if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
    
    
			/* this way we calculate more mono than we need */
			/* but it is cheap */
			combine_LR_new(*sb_sample, *j_sample, frame.sblimit);
			scalefactor_calc_new(j_sample, &j_scale, 1, frame.sblimit);
		}
#else
		scale_factor_calc(*sb_sample, scalar, nch, frame.sblimit);   
		pick_scale(scalar, &frame, max_sc);     
		if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
    
    
			/* this way we calculate more mono than we need */
			/* but it is cheap */
			combine_LR(*sb_sample, *j_sample, frame.sblimit);
			scale_factor_calc(j_sample, &j_scale, 1, frame.sblimit);
		}
#endif
 
 
 
		if ((glopts.quickmode == TRUE) && (++psycount % glopts.quickcount != 0)) {
    
    
			/* We're using quick mode, so we're only calculating the model every
			   'quickcount' frames. Otherwise, just copy the old ones across */
			for (ch = 0; ch < nch; ch++) {
    
        //nch表示通道数
				for (sb = 0; sb < SBLIMIT; sb++)    //SBLIMIT=32
					smr[ch][sb] = smrdef[ch][sb];
			}
		}
		else {
    
    
			/* calculate the psymodel */
			switch (model) {
    
        
			case -1:
				psycho_n1(smr, nch);
				break;
			case 0:	/* Psy Model A */
				psycho_0(smr, nch, scalar, (FLOAT)s_freq[header.version][header.sampling_frequency] * 1000);
				break;
			case 1:
				psycho_1(buffer, max_sc, smr, &frame);    /
				break;
			case 2:
				for (ch = 0; ch < nch; ch++) {
    
    
					psycho_2(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				}
				break;
			case 3:
				/* Modified psy model 1 */
				psycho_3(buffer, max_sc, smr, &frame, &glopts);
				break;
			case 4:
				/* Modified Psycho Model 2 */
				for (ch = 0; ch < nch; ch++) {
    
    
					psycho_4(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				}
				break;
			case 5:
				/* Model 5 comparse model 1 and 3 */
				psycho_1(buffer, max_sc, smr, &frame);
				fprintf(stdout, "1 ");
				smr_dump(smr, nch);
				psycho_3(buffer, max_sc, smr, &frame, &glopts);
				fprintf(stdout, "3 ");
				smr_dump(smr, nch);
				break;
			case 6:
				/* Model 6 compares model 2 and 4 */
				for (ch = 0; ch < nch; ch++)
					psycho_2(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				fprintf(stdout, "2 ");
				smr_dump(smr, nch);
				for (ch = 0; ch < nch; ch++)
					psycho_4(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				fprintf(stdout, "4 ");
				smr_dump(smr, nch);
				break;
			case 7:
				fprintf(stdout, "Frame: %i\n", frameNum);
				/* Dump the SMRs for all models */
				psycho_1(buffer, max_sc, smr, &frame);
				fprintf(stdout, "1");
				smr_dump(smr, nch);
				psycho_3(buffer, max_sc, smr, &frame, &glopts);
				fprintf(stdout, "3");
				smr_dump(smr, nch);
				for (ch = 0; ch < nch; ch++)
					psycho_2(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				fprintf(stdout, "2");
				smr_dump(smr, nch);
				for (ch = 0; ch < nch; ch++)
					psycho_4(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				fprintf(stdout, "4");
				smr_dump(smr, nch);
				break;
			case 8:
				/* Compare 0 and 4 */
				psycho_n1(smr, nch);
				fprintf(stdout, "0");
				smr_dump(smr, nch);
 
				for (ch = 0; ch < nch; ch++)
					psycho_4(&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
						(FLOAT)s_freq[header.version][header.sampling_frequency] *
						1000, &glopts);
				fprintf(stdout, "4");
				smr_dump(smr, nch);
				break;
			default:
				fprintf(stderr, "Invalid psy model specification: %i\n", model);
				exit(0);
			}
 
			if (glopts.quickmode == TRUE)
				/* copy the smr values and reuse them later */
				for (ch = 0; ch < nch; ch++) {
    
    
					for (sb = 0; sb < SBLIMIT; sb++)
						smrdef[ch][sb] = smr[ch][sb];
				}
 
			if (glopts.verbosity > 4)
				smr_dump(smr, nch);
 
		}
 
#ifdef NEWENCODE
		sf_transmission_pattern(scalar, scfsi, &frame);
		main_bit_allocation_new(smr, scfsi, bit_alloc, &adb, &frame, &glopts);
		//main_bit_allocation (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
 
		if (error_protection)
			CRC_calc(&frame, bit_alloc, scfsi, &crc);
 
		write_header(&frame, &bs);
		//encode_info (&frame, &bs);
		if (error_protection)
			putbits(&bs, crc, 16);
		write_bit_alloc(bit_alloc, &frame, &bs);
		//encode_bit_alloc (bit_alloc, &frame, &bs);
		write_scalefactors(bit_alloc, scfsi, scalar, &frame, &bs);
		//encode_scale (bit_alloc, scfsi, scalar, &frame, &bs);
		subband_quantization_new(scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
			*subband, &frame);
		//subband_quantization (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
		//	  *subband, &frame);
		write_samples_new(*subband, bit_alloc, &frame, &bs);
		//sample_encoding (*subband, bit_alloc, &frame, &bs);
#else
		transmission_pattern(scalar, scfsi, &frame);    
		main_bit_allocation(smr, scfsi, bit_alloc, &adb, &frame, &glopts);    
		if (error_protection)  
			CRC_calc(&frame, bit_alloc, scfsi, &crc);
		encode_info(&frame, &bs);   
		if (error_protection)
			encode_CRC(crc, &bs);
		encode_bit_alloc(bit_alloc, &frame, &bs);              
		encode_scale(bit_alloc, scfsi, scalar, &frame, &bs);    
		subband_quantization(scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
			*subband, &frame);      
		sample_encoding(*subband, bit_alloc, &frame, &bs);     
 
 
	/* If not all the bits were used, write out a stack of zeros */
		for (i = 0; i < adb; i++)    //写码流
			put1bit(&bs, 0);     //剩余未分配比特数写入stack
		if (header.dab_extension) {
    
    
			/* Reserve some bytes for X-PAD in DAB mode */
			putbits(&bs, 0, header.dab_length * 8);       //写码流
 
			for (i = header.dab_extension - 1; i >= 0; i--) {
    
    
				CRC_calcDAB(&frame, bit_alloc, scfsi, scalar, &crc, i);
				/* this crc is for the previous frame in DAB mode  */
				if (bs.buf_byte_idx + lg_frame < bs.buf_size)
					bs.buf[bs.buf_byte_idx + lg_frame] = crc;
				/* reserved 2 bytes for F-PAD in DAB mode  */
				putbits(&bs, crc, 8);
			}
			putbits(&bs, 0, 16);
		}
 
		frameBits = sstell(&bs) - sentBits;
 
		if (frameBits % 8) {
    
    	/* a program failure */   
			fprintf(stderr, "Sent %ld bits = %ld slots plus %ld\n", frameBits,
				frameBits / 8, frameBits % 8);
			fprintf(stderr, "If you are reading this, the program is broken\n");
			fprintf(stderr, "email [mfc at NOTplanckenerg.com] without the NOT\n");
			fprintf(stderr, "with the command line arguments and other info\n");
			exit(0);
		}
 
		sentBits += frameBits;
 
	}
 
	close_bit_stream_w(&bs);
 
	if ((glopts.verbosity > 1) && (glopts.vbr == TRUE)) {
    
    
		int i;
#ifdef NEWENCODE
		extern int vbrstats_new[15];
#else
		extern int vbrstats[15];
#endif
		fprintf(stdout, "VBR stats:\n");
		for (i = 1; i < 15; i++)
			fprintf(stdout, "%4i ", bitrate[header.version][i]);
		fprintf(stdout, "\n");
		for (i = 1; i < 15; i++)
#ifdef NEWENCODE
			fprintf(stdout, "%4i ", vbrstats_new[i]);
#else
			fprintf(stdout, "%4i ", vbrstats[i]);
#endif
		fprintf(stdout, "\n");
	}
 
	fprintf(stderr,
		"Avg slots/frame = %.3f; b/smp = %.2f; bitrate = %.3f kbps\n",
		(FLOAT)sentBits / (frameNum * 8),
		(FLOAT)sentBits / (frameNum * 1152),
		(FLOAT)sentBits / (frameNum * 1152) *
		s_freq[header.version][header.sampling_frequency]);
 
	if (fclose(musicin) != 0) {
    
    
		fprintf(stderr, "Could not close \"%s\".\n", original_file_name);
		exit(2);
	}
 
	fprintf(stderr, "\nDone\n");
	fclose(result);
 
	time(&end_time);
	total_time = end_time - start_time;
	printf("total time is %d\n", total_time);
 
	exit(0);
}

Experimental results

music noise Blend frame 20 scale factor
insert image description here insert image description here insert image description here
insert image description here insert image description here insert image description here
insert image description here insert image description here insert image description here

It can be observed that the higher the frequency (the larger the sub-band number), the fewer bits are allocated.

Guess you like

Origin blog.csdn.net/ppinecone/article/details/125792675