Bitmap method to solve the problem of duplicate checking

Introduction to bitmap

There is a corresponding bitset container in C++STL, which is actually a bitmap method

Using each bit to store a certain state is suitable for large-scale data, but there are not many data states. It is usually used to judge whether a certain data exists or not.

To use the bitmap method to deal with problems, you need to know in advance what the maximum value is in the data. If the maximum value Max=100000, the unsigned char array size size=Max/8+1

When recording a number, you need to calculate its byte position and bit position.

// 计算整数应该放置的字节位,即数组下标
int index = num / 8; 
// 计算对应字节的位位置
int offset = num % 8;

When you need to judge whether it is repeated, perform the following operations

// 计算整数应该放置的字节位,即数组下标
int index = num / 8; 
// 计算对应字节的位位置
int offset = num % 8;

int v = p[index] & (1 << offset);

if (0 != v){
	cout << vec[i] << "是第一个重复的数据" << endl;
	break; // 如果要找所有重复的数字,这里就不用break了
}else{
    // 表示该数据不存在,把相应位置置1,表示记录该数据
    p[index] = p[index] | (1 << offset);
}

Using bitmaps to solve the problem of mass data duplication, the disadvantage is that the memory occupied by the bitmap array depends on how large the maximum number in the data is. For example, there are 10 integers and the maximum is 1 billion, so you have to press 10 The number of billion is calculated to open up the size of the bitmap array.

Guess you like

Origin blog.csdn.net/qq_44132777/article/details/114901811