Opencv study notes 7: code performance measurement and effect improvement

Opencv study notes 7: code performance measurement and effect improvement


Target

In image processing, it is very important whether the algorithm is efficient or not. Next we will learn:
1. Measure the efficiency of your algorithm
2. Some tips to improve the performance of the algorithm
3. We will learn to use cv2.getTickCount, cv2.getTickFrequency, etc.

Python provides the time module that can be used to test the time spent of the algorithm, and another module profile helps us get detailed reports of the code, such as how much time a function spends.

1. Evaluate the time spent on a piece of code

cv2.getTickCount function returns the number of clock-cycles after a reference event (like the moment machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get number of clock-cycles used to execute a function.

cv2.getTickFrequency function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following:

e1 = cv2.getTickCount()


# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()
We will demonstrate with following example. Following example apply median filtering with a kernel of odd size ranging from 5 to 49. (Don’t worry about what will the result look like, that is not our goal):

img1 = cv2.imread('messi5.jpg')

e1 = cv2.getTickCount()
for i in xrange(5,49,2):
    img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t
 #Result I got is 0.521107655 seconds

We can also use python's time module to do this, using the timr.time() module.


2. Default optimization of Opencv

Many OpenCv functions use SSE2, AVX, etc. for default optimization, so if our system supports these, we should open these accelerators (almost all modern systems support these operations), and Opnecv will enable acceleration by default when compiling. By default, the optimized code is used. We can use cv2.useOptimized() to check whether optimization is enabled. Let's see a simple example.

#check if optimization is enabled
# check if optimization is enabled
In [5]: cv2.useOptimized()
Out[5]: True

In [6]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop

# Disable it
In [7]: cv2.setUseOptimized(False)

In [8]: cv2.useOptimized()
Out[8]: False

In [9]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop

See, optimized median filtering is ~2x faster than unoptimized version. If you check its source, you can see median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).


3. Use IPython for code testing performance

Sometimes we need to compare the performance of two similar operations, IPython gives us a useful function %timeit. This function runs the code multiple times and returns an average running time, which is good for measuring the efficiency of the code.
For example:
implement x = 5 in python and numpy respectively
; y = x**2
x = 5; y = x*x
x = np.uint8([5]) ; y = x*x ;
compare the time between the two difference

In [10]: x = 5

In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop

In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop

In [15]: z = np.uint8([5])

In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop

In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop

We can see that in python it will be about 20 times faster than in Nmupy. If you take into account the time consuming of array creation, it is actually more than 100 times faster.

Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy takes advantage when size of array is a little bit bigger.

We will try one more example. This time, we will compare the performance of cv2.countNonZero() and np.count_nonzero() for same image.

In [35]: %timeit z = cv2.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop

In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop

See, OpenCV function is nearly 25x faster than Numpy function.

Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.


4. Other performance optimization techniques

There are many techniques to optimize the performance of python and numpy, here are only some related ones, the main method is to find a simple way to implement it, find the bottleneck of the algorithm, and optimize the algorithm.
1. Avoid using loops, especially double triple loops, they are slow.
2. Vectorize your code as much as possible, as both Numpy and Opencv do a lot of optimization for vectors.
3. Take advantage of buffer consistency
4. Never copy an array unless necessary, use a view instead. Copying an array is a time-consuming operation.
If after doing all of the above, your code is still slow, or your code just needs to use a lot of loops, use some additional library like Cython that will make your code run faster.


5. Other reference materials

Python Optimization Techniques
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

Scipy Lecture Notes - Advanced Numpy
http://scipylectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy

Timing and Profiling in IPython
http://pynash.org/2013/03/06/timing-and-profiling.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325171927&siteId=291194637
Recommended