自己写的C++源码性能远不如SSE代码,每次实现也很麻烦,所以直接将SSE与opencv的库进行对比,opencv虽然也远不如商用视觉算法库,但也算是优化的很好了,下面看一下他和SSE的速度差别吧!
以下分别是单阈值和双阈值的SSE代码:
void ThresholdBinarize(cv::Mat InImg, cv::Mat& OutImg, unsigned char nThreshold)
{
if (InImg.type() != CV_8UC1)
{
return ;
}
OutImg = Mat(InImg.rows, InImg.cols, CV_8UC1);
unsigned char *Src = InImg.data;
unsigned char *Dst = OutImg.data;
int Width = InImg.cols;
int Height = InImg.rows;
const int BlockSize = 16;
int Block = Width / BlockSize;
for (int Y = 0; Y < Height; Y++)
{
unsigned char *LinePS = Src + Y * Width;
unsigned char *LinePD = Dst + Y * Width;
for (int X = 0; X < Block * BlockSize; X += BlockSize)
{
__m128i Src1, Result;
Src1 = _mm_loadu_si128((__m128i *)(LinePS + X));
_mm_set1_epi8(nThreshold));
Result = _mm_cmpeq_epi8(_mm_max_epu8(Src1, _mm_set1_epi8(nThreshold)), Src1);
_mm_storeu_si128((__m128i*)(LinePD + X), Result);
}
for (int X = Block * BlockSize; X < Width; X++)
{
LinePD[X] = LinePS[X] >= nThreshold ? 255 : 0;
}
}
}
void ThresholdBinarize(cv::Mat InImg, cv::Mat& OutImg, unsigned char nMinThreshold, unsigned char nMaxThreshold) {
if (InImg.type() != CV_8UC1)
{
return;
}
OutImg = Mat(InImg.rows, InImg.cols, CV_8UC1);
unsigned char *Src = InImg.data;
unsigned char *Dst = OutImg.data;
int Width = InImg.cols;
int Height = InImg.rows;
const int BlockSize = 16;
int Block = Width / BlockSize;
for (int Y = 0; Y < Height; Y++)
{
unsigned char *LinePS = Src + Y * Width;
unsigned char *LinePD = Dst + Y * Width;
for (int X = 0; X < Block * BlockSize; X += BlockSize)
{
__m128i Src1, Result;
Src1 = _mm_loadu_si128((__m128i *)(LinePS + X));
Result = _mm_cmpeq_epi8(_mm_max_epu8(Src1, _mm_set1_epi8(nMinThreshold)), Src1);
Result = _mm_and_si128(Result, _mm_cmpeq_epi8(_mm_min_epu8(Src1, _mm_set1_epi8(nMaxThreshold)), Src1));
_mm_storeu_si128((__m128i*)(LinePD + X), Result);
}
for (int X = Block * BlockSize; X < Width; X++)
{
LinePD[X] = LinePS[X] >= nMinThreshold && LinePS[X] <= nMaxThreshold ? 255 : 0;
}
}
}
二值化图像有一个判断的操作:
if (LinePD[S]<=nThreshold)
LinePD[X] = 255;
判断指令在SSE是很难实现的,我这里使用了
_mm_cmpeq_epi8(_mm_max_epu8(Src1, _mm_set1_epi8(nThreshold)), Src1);
_mm_max_epu8(Src1, _mm_set1_epi8(nThreshold))将Src1与nThreshold比较并取较大的值,Src1中超过nThreshold的值便能不变,而小于nThreshold的值置为nThreshold。
再通过_mm_cmpeq_epi8将该值与Src1做比较,如果相等,则令结果等于255,如果不相等,则令结果等于0,这样便能使原图中超过nThreshold的值置为255,小于nThreshold的值置为0。
opencv直接调用threshold(srcImg, dst, 50, 255, CV_THRESH_BINARY);
结果如下:
需要注意的是双阈值的SSE实现功能与threshold(srcImg, dst, 50, 255, CV_THRESH_BINARY)并不相同。SSE的功能是将灰度大于小阈值并小于大阈值的像素置为255,其他像素置为0。而opencv的threshold是将灰度大于第一个参数的区域的像素置为第二个参数。
虽然达不到理论上的速度,但SSE还是明显快于opencv,毕竟opencv也是优化的很好了,而我的SSE代码没有进行过多的优化。那咋办嘛,凑合着过呗。