[OpenCV-Python] 11 Program performance testing and optimization

OpenCV-Python: core operations

11 Program performance testing and optimization

Goal
In image processing, you have to do a lot of calculations every second, so your program must not only give correct results, but also must be fast. So in this section we will learn:
  • Check the efficiency of the program
  • Some techniques to improve the efficiency of the program
  • The functions you have to learn are: cv2.getTickCount, cv2.getTickFrequency, etc.
In addition to OpenCV, Python also provides a module called time , You can use it to measure the running time of the program. Another module called profile will help you get a detailed report about your program, which contains the time required for each function in the code to run, and the number of times each function is called. If you are using IPython, all these features are integrated in a user-friendly way. We will learn a few important ones. If you want to learn more detailed knowledge, open the links in more resources.

11.1 Use OpenCV to test program efficiency

The cv2.getTickCount function returns the number of clocks from the reference point to the execution of this function. So when you call a function before and after it is executed, you will get the execution time (number of clocks) of this function.
  cv2.getTickFrequency returns the clock frequency, or the number of clocks per second. So you can get how many seconds a function has run in the following way:

e1 = cv2.getTickCount()
# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()

We will use the following example to demonstrate. The following example uses kernel functions with different window sizes (5, 7, 9) to do median filtering:

img1 = cv2.imread('messi5.jpg')

e1 = cv2.getTickCount()
for i in xrange(5,49,2):
    img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t

# Result I got is 0.521107655 seconds

Note: You can also implement the above functions in the time module. But the function to be used is time.time() instead of cv2.getTickCount. Compare the difference between these two results.

11.2 Default optimization in OpenCV

Many functions in OpenCV have been optimized (using SSE2, AVX, etc.). It also contains some code that has not been optimized. If our system supports optimization, try to use only one point. Optimization is turned on by default at compile time. Therefore, OpenCV runs optimized code. If you turn off optimization, you can only execute inefficient code. You can use the function cv2.useOptimized() to check whether the optimization is turned on, and use the function cv2.setUseOptimized() to turn on the optimization.
Let us look at a simple example.

# check if optimization is enabled
In [5]: cv2.useOptimized()
Out[5]: True

In [6]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop

# Disable it
In [7]: cv2.setUseOptimized(False)

In [8]: cv2.useOptimized()
Out[8]: False

In [9]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop

Can you see it? The optimized median filter is twice as fast. If you look at the source code, you will find that the median filter is optimized by SIMD. So you can turn on optimization at the beginning of the code (you have to remember that optimization is turned on by default).

11.3 Checking program efficiency in IPython

Sometimes you need to compare the efficiency of two similar operations, then you can use the magic command %time provided by IPython for you. He will let the code run several times to get an accurate (run) time. It can also be used to test single lines of code.
For example, do you know which line of code for the same mathematical operation below will execute faster?
. 5 = X; Y = 2 * X *
X =. 5; Y = X * X
X = np.uint ([. 5]); X Y = X *
Y = np.squre (X)
we can in Shell's IPython Use magic commands to find the answer.

In [10]: x = 5

In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop

In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop

In [15]: z = np.uint8([5])

In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop

In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop

It turned out to be the first way of writing, it was actually 20 times faster than Nump. If you consider the array construction, it can reach a difference of 100 times.
Note: Python's scalar calculation is faster than Nump's scalar calculation. For operations containing only one or two elements, Python scalars are faster than Numpy arrays. But when the array is slightly larger, Numpy will win out.
Let's look at a few more examples. Let's compare cv2.countNonZero() and
np.count_nonzero().

In [35]: %timeit z = cv2.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop

In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop

As you can see, the function of OpenCV is 25 times that of Numpy.
Note: In general, OpenCV functions are faster than Numpy functions. Therefore, it is better to use OpenCV functions for the same operation. Of course there are exceptions, especially when using Numpy to manipulate views (not copy).

11.4 More IPython magic commands

There are also several magic commands that can be used to check the efficiency of the program, profiling, line profiling, memory usage, etc. They all have comprehensive documentation. So only the hyperlink is provided here. Those who are interested can learn by themselves.

11.5 Efficiency optimization technology

Some techniques and programming methods allow us to maximize the power of Python and Numpy.

We only mention the relevant ones here, you can find more detailed information through the hyperlinks. The most important point we want to say is: first implement your algorithm in a simple way (correct results are the most important), and when the results are correct, use the methods mentioned above to find the bottleneck of the program to optimize it.

1. Try to avoid using loops, especially double and triple loops, which are inherently very slow.
  2. Use vector operations as much as possible in the algorithm, because both Numpy and OpenCV have optimized vector operations.
  3. Take advantage of cache coherency.
  4. Don't copy the array if it is not necessary. Use views instead of copying. Array copy is very wasteful of resources.
Even after the above optimizations, if your program is still very slow, or if big talk is inevitable, you should try to use other packages, such as Cython, to speed up your program.

For more information, please pay attention to the official account:
img

Guess you like

Origin blog.csdn.net/yegeli/article/details/113405990