Opencv study notes 7: code performance measurement and effect improvement
Target
In image processing, it is very important whether the algorithm is efficient or not. Next we will learn:
1. Measure the efficiency of your algorithm
2. Some tips to improve the performance of the algorithm
3. We will learn to use cv2.getTickCount, cv2.getTickFrequency, etc.
Python provides the time module that can be used to test the time spent of the algorithm, and another module profile helps us get detailed reports of the code, such as how much time a function spends.
1. Evaluate the time spent on a piece of code
cv2.getTickCount function returns the number of clock-cycles after a reference event (like the moment machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get number of clock-cycles used to execute a function.
cv2.getTickFrequency function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following:
e1 = cv2.getTickCount()
# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()
We will demonstrate with following example. Following example apply median filtering with a kernel of odd size ranging from 5 to 49. (Don’t worry about what will the result look like, that is not our goal):
img1 = cv2.imread('messi5.jpg')
e1 = cv2.getTickCount()
for i in xrange(5,49,2):
img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t
#Result I got is 0.521107655 seconds
We can also use python's time module to do this, using the timr.time() module.
2. Default optimization of Opencv
Many OpenCv functions use SSE2, AVX, etc. for default optimization, so if our system supports these, we should open these accelerators (almost all modern systems support these operations), and Opnecv will enable acceleration by default when compiling. By default, the optimized code is used. We can use cv2.useOptimized() to check whether optimization is enabled. Let's see a simple example.
#check if optimization is enabled
# check if optimization is enabled
In [5]: cv2.useOptimized()
Out[5]: True
In [6]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop
# Disable it
In [7]: cv2.setUseOptimized(False)
In [8]: cv2.useOptimized()
Out[8]: False
In [9]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop
See, optimized median filtering is ~2x faster than unoptimized version. If you check its source, you can see median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).
3. Use IPython for code testing performance
Sometimes we need to compare the performance of two similar operations, IPython gives us a useful function %timeit. This function runs the code multiple times and returns an average running time, which is good for measuring the efficiency of the code.
For example:
implement x = 5 in python and numpy respectively
; y = x**2
x = 5; y = x*x
x = np.uint8([5]) ; y = x*x ;
compare the time between the two difference
In [10]: x = 5
In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop
In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop
In [15]: z = np.uint8([5])
In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop
In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop
We can see that in python it will be about 20 times faster than in Nmupy. If you take into account the time consuming of array creation, it is actually more than 100 times faster.
Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy takes advantage when size of array is a little bit bigger.
We will try one more example. This time, we will compare the performance of cv2.countNonZero() and np.count_nonzero() for same image.
In [35]: %timeit z = cv2.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop
In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop
See, OpenCV function is nearly 25x faster than Numpy function.
Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.
4. Other performance optimization techniques
There are many techniques to optimize the performance of python and numpy, here are only some related ones, the main method is to find a simple way to implement it, find the bottleneck of the algorithm, and optimize the algorithm.
1. Avoid using loops, especially double triple loops, they are slow.
2. Vectorize your code as much as possible, as both Numpy and Opencv do a lot of optimization for vectors.
3. Take advantage of buffer consistency
4. Never copy an array unless necessary, use a view instead. Copying an array is a time-consuming operation.
If after doing all of the above, your code is still slow, or your code just needs to use a lot of loops, use some additional library like Cython that will make your code run faster.
5. Other reference materials
Python Optimization Techniques
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Scipy Lecture Notes - Advanced Numpy
http://scipylectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy
Timing and Profiling in IPython
http://pynash.org/2013/03/06/timing-and-profiling.html