Computer Vision - Computer Vision Entry (1): Information Collection and Understanding Before Computer Vision Entry

Before, I always felt that my studies were complex and not specialized. In the course of my studies and work, I constantly found that I really needed to calm down and study, so that I could have skills like my brothers around me and have my own foothold in the society. place.

To tell the truth, it is really difficult to pick a direction under the huge system of computer. Picking and picking by yourself, there is a feeling that the flowers are getting more and more fascinating, but as for my current interests, personality and learning ability, the long-term planning is coming. Look, the direction of computer vision is really a very good choice for me at the moment.

Through this article, I hope that while sorting out the information I have collected, I can sort out my own thinking, figure out what computer vision is roughly, what to learn after entering the pit, what to do, and how it will look before (money) Sample? (This is to explain, it is absolutely impossible to do anything without money. Money represents its value . If the researched things are worthless, then... Smile)

1. What is computer vision:

        Computer Vision, also known as Machine Vision, is a discipline that "teachs" computers how to "see" the world. Under the prospect of machine learning, computer vision, natural language processing (NLP) and speech recognition (Speech Recognition) are listed as the three hotspots in machine learning. Computer vision also consists of a combination of traditional Hand-Crafted Feature (Hand-Crafted Feature) and shallow model such as Histogram of Gradient (HOG) and Scale-Invariant Feature Transform (SIFT). Gradually turned to the deep learning model represented by Convolutional Neural Network (CNN). The concept of computer vision actually partially overlaps with many concepts, including: artificial intelligence, digital image processing, machine learning, deep learning, pattern recognition, probabilistic graphical models, scientific computing, and a range of mathematical calculations.

                                                

2. Some application directions in computer vision (now I have limited knowledge, and I will add it later when I encounter it):

(1) Object recognition and detection:

                Object detection has always been a very basic and important research direction in computer vision. Object recognition and detection, as the name suggests, means that given an input image, the algorithm can automatically find common objects in the image, and output their category and location. come out. Of course, it has also derived sub-category detection algorithms such as face detection (Face Detection), vehicle detection (Viechle Detection) and so on. 

                                  


(2) Semantic segmentation:

                Image semantic segmentation (semantic segmentation), literally understood, is to allow the computer to segment the image according to the semantics of the image. In speech recognition, semantics refers to the meaning of speech. In the field of images, semantics refers to the content of the image and the meaning of the image. understanding.

                                                               

At present, the application fields of semantic segmentation mainly include: geographic information system, unmanned vehicle driving, medical image analysis, robotics and other fields. For details, see: Semantic Segmentation of Computer Vision


(3) Movement and tracking:

                Tracking is also one of the basic problems in the field of computer vision. It has also been developed very sufficiently in recent years. The method has also crossed from the past non-depth algorithm to the deep learning algorithm, and the accuracy is getting higher and higher, but the real-time depth The accuracy of the learning tracking algorithm has been difficult to improve, and the speed of the tracking algorithm with very high accuracy is very slow, so it is difficult to come in handy in practical applications. Visual tracking refers to the detection, extraction, identification and tracking of moving objects in the image sequence, and obtaining the motion parameters of the moving objects, such as position, speed, acceleration and motion trajectory, etc. Behavioral understanding of the target to accomplish higher-level detection tasks. The tracking algorithm needs to find the position of the tracked object from the video, and adapt to various lighting changes, motion blur and appearance changes. But in fact tracking is an ill posed problem, such as tracking a car, if the tracking starts from the rear of the car, if the appearance of the vehicle changes a lot during the traveling process, such as a 180-degree rotation. If it becomes a side, then the existing tracking algorithm is very likely to be unable to track, because most of their models are based on the learning of the first frame, although they will be updated in the subsequent tracking process, but limited by the training samples Too little, so it is difficult to get a good tracking model, and it is difficult to adapt when the appearance of the tracked object changes greatly. So, for now, tracking is not a particularly hot research direction in computer vision, and many algorithms improve self-detection or recognition algorithms. 

                                           


(4) Visual Q&A:

               Visual question answering, also referred to as VQA (Visual Question Answering), is a very popular direction in recent years. Generally speaking, a VQA system needs to take pictures and questions as input, and combine these two parts of information to generate a human language as output. For a specific picture, if we want the machine to use natural language processing (NLP) to answer a specific question about the picture, we need to let the machine have a certain understanding of the content of the picture, the meaning and intent of the question, and related common sense. understand. By its very nature, this is a multidisciplinary research question.

                            

                                 write picture description here

(5) 3D reconstruction:

           Vision-based 3D reconstruction refers to obtaining data images of scene objects through cameras, analyzing and processing the images, and then combining computer vision knowledge to deduce the 3D information of objects in the real environment. The focus of 3D reconstruction technology is how to obtain the depth information of the target scene or object. Under the condition that the depth information of the scene is known, the 3D reconstruction of the scene can be realized only through the registration and fusion of the point cloud data [4]. In-depth application research based on 3D reconstruction model can also be carried out immediately. Those who learn image processing will be exposed to a wider range of more diverse technologies, while those with 3D reconstruction background will be very focused on subdivision algorithms, because 3D reconstruction itself has more subdivided technologies, so they are studying at the postgraduate level. At times, there will be very specific professional directions, such as 3D reconstruction of aerial terrain, or 3D reconstruction of Buddha statues. Because of the difference in the scene, the shooting technology and reconstruction technology used are different, and there are some There is also no relationship between the different techniques (of course the concept of 3D reconstruction itself is the same). Regarding the future hotspots and difficulties of 3D reconstruction, this field can be done very professionally, and there are many scenarios. Each scenario has different challenges.

                                 write picture description here 


3. Classification of image processing and computer vision:

According to the current popular classification method, it can be divided into the following three parts:

  • A. Image processing: some kind of transformation is performed on the input image, and the output is still an image, which basically does not involve or rarely involves the analysis of the image content. The typical ones are image transformation, image enhancement, image denoising, image compression, image restoration, binary image processing and so on. Threshold-based image segmentation also belongs to the category of image processing. Generally, a single image is processed.
  • B. Image analysis: analyze the content of the image and extract meaningful features for subsequent processing. Still processing a single image.
  • C. Computer Vision: Analyze the features obtained by image analysis, extract the semantic representation of the scene, and give the computer the capabilities of the human eye and the human brain. At this time, multiple images or sequence images are processed, and of course, some single images are also included.
      There is no uniform standard for the division of image processing, image analysis and computer vision. In general, image processing books will always introduce more or less some knowledge of image analysis and computer vision, such as Gonzalez's digital image processing. Computer vision books basically cover image processing and image analysis, but they will not be introduced in too much detail. In fact, image processing, image analysis and computer vision can all be included in the category of computer vision: image processing -> low level vision (low level vision), image analysis -> middle level vision (middle level vision), computer vision -> high level vision ( high level vision). This is a general computer vision or machine vision division method. In this paper, the field is still divided into image processing, image analysis and computer vision according to the traditional method.


Fourth, the knowledge and related books involved in image processing and computer vision (too in-depth will not say, here is only a brief introduction (welcome criticism and correction~)):

(1) Mathematical knowledge:

               What we call image processing is actually digital image processing, which is to project continuous three-dimensional random signals in the real world onto the two-dimensional plane of the sensor, and obtain a two-dimensional matrix after sampling and quantization. Digital image processing is the processing of two-dimensional matrices, and recovering three-dimensional scenes from two-dimensional images is one of the main tasks of computer vision. This involves three important properties involved in image processing: continuity, two-dimensional matrix, and randomness. The corresponding mathematical knowledge is advanced mathematics (calculus), linear algebra (matrix theory), probability theory and random processes. These three courses are also the three components of postgraduate mathematics, which constitute the most basic mathematical foundation of image processing and computer vision. If you want to go further, you have to go to the Internet to search for the mathematics books recommended by Lin Dahua.

                CV is a very wide-ranging subject. The current mainstream learning based on vision involves probability statistics, various optimization methods, and graph theory; some research directions (such as those involving object motion) also involve topology and group theory. , matrix optimization; some image segmentation algorithms, such as level-set, involve differential equations and so on. These are not absolute distinctions. The current state-of-art problem may be involved in all aspects, depending on the problem itself. It is too broad, but most of the computer research is only involved, and does not necessarily need to be as rigorous as the mathematics department.

(2) Signal processing

    Image processing is actually two-dimensional and three-dimensional signal processing, and the processed signals have certain randomness. Therefore, both classical signal processing and random signal processing are necessary theoretical foundations in image processing and computer vision.

2.1 Classical Signal Processing

Signals and Systems (Second Edition) Translated by Alan V. Oppenheim et al. Liu Shutang

Discrete-time signal processing (2nd edition) AV Oppenheim waiting for Liu Shutang translation

Digital Signal Processing: Theoretical Algorithms and Implementation Hu Guangshu (editor)

2.2 Random signal processing

Modern Signal Processing by Zhang Xianda

Fundamentals of Statistical Signal Processing: Estimation and Detection Theory Steven M.Kay et al. Translated by Luo Pengfei et al

Principles of Adaptive Filters (4th Edition) by Simon Haykin, translated by Zheng Baoyu et al.

2.3 Wavelet Transform

Wavelet Guidance for Signal Processing: A Sparse Approach (Original Book 3rd Edition) by stephane Malla, translated by Dai Daoqing et al.

2.4 Information Theory

Fundamentals of Information Theory (2nd Edition) Thomas M. Cover, translated by Ruan Jishou et al.


(3) Pattern recognition

Pattern Recognition and Machine Learning Bishop, Christopher M. Springer

Pattern Recognition (English Edition) (4th Edition) by Theodore Reeds

Pattern Classification (2nd Edition) Richard O. Duda et al.

Statistical Pattern Recognition, 3rd Edition Andrew R. Webb等著

Pattern Recognition (3rd Edition) by Zhang Xuegong

(4) Recommended books on image processing and computer vision

图像处理,分析与机器视觉 第三版 Sonka等著 艾海舟等译

Image Processing, Analysis and Machine Vision

                ( 附:这本书是图像处理与计算机视觉里面比较全的一本书了,几乎涵盖了图像视觉领域的各个方面。中文版的个人感觉也还可以,值得一看。)

数字图像处理 第三版 冈萨雷斯等著

Digital Image Processing

(附:数字图像处理永远的经典,现在已经出到了第三版,相当给力。我的导师曾经说过,这本书写的很优美,对写英文论文也很有帮助,建议购买英文版的。)

计算机视觉:理论与算法 Richard Szeliski著

Computer Vision: Theory and Algorithm

                (附:微软的Szeliski写的一本最新的计算机视觉著作。内容非常丰富,尤其包括了作者的研究兴趣,比如一般的书里面都没有的Image Stitching和                       Image Matting等。这也从另一个侧面说明这本书的通用性不如Sonka的那本。不过作者开放了这本书的电子版,可以有选择性的阅读。
                  http://szeliski.org/Book/
                  Multiple View Geometry in Computer Vision 第二版Harley等著
                 引用达一万多次的经典书籍了。第二版到处都有电子版的。第一版曾出过中文版的,后来绝版了。网上也可以找到中英文版的电子版。)

计算机视觉:一种现代方法 DA Forsyth等著

Computer Vision: A Modern Approach

MIT的经典教材。虽然已经过去十年了,还是值得一读。期待第二版

Machine vision: theory, algorithms, practicalities 第三版 Davies著

(附:为数不多的英国人写的书,偏向于工业应用。)

数字图像处理 第四版 Pratt著

Digital Image Processing

(附:写作风格独树一帜,也是图像处理领域很不错的一本书。网上也可以找到非常清晰的电子版。)

(五)、小结

罗嗦了这么多,实际上就是几个建议:
(1)基础书千万不可以扔,也不能低价处理给同学或者师弟师妹。不然到时候还得一本本从书店再买回来的。钱是一方面的问题,对着全新的书看完全没有看自己当年上过的课本有感觉。
(2)遇到有相关的课,果断选修或者蹭之,比如随机过程,小波分析,模式识别,机器学习,数据挖掘,现代信号处理甚至泛函。多一些理论积累对将来科研和工作都有好处。
(3)资金允许的话可以多囤一些经典的书,有的时候从牙缝里面省一点都可以买一本好书。不过千万不要像我一样只囤不看。

五、图像处理绕不开的工具--OpenCV:

        OpenCV的全称,是Open source Computer Vision Library,开放源代码计算机视觉库。也就是说,它是一套关于计算机视觉的开放源代码的API函数库。这也就意味着,(1)不管是科学研究,还是商业应用,都可以利用它来作开发;(2)所有API函数的源代码都是公开的,你可以看到其内部实现的程序步骤;(3)你可以修改OpenCV的源代码,编译生成你需要的特定API函数。但是,作为一个库,它所提供的,仅仅是一些常用的,经典的,大众化的算法的API。一个典型的计算机视觉算法,应该包含以下一些步骤:(1)数据获取(对OpenCV来说,就是图片);(2)预处理;(3)特征提取;(4)特征选择;(5)分类器设计与训练;(6)分类判别;而OpenCV对这六个部分,分别(记住这个词)提供了API。

        你可以将它理解为幼儿园小朋友过家家玩的积木,而OpenCV中的函数,则可以理解为一个一个的积木块,利用所有或者部分积木块,你可以快速的搭建起来具体的计算机视觉方面的应用(比如,字符识别,车牌识别,遗留物检测)。想必你也已经发现,在利用OpenCV这个积木来搭建具体的计算机视觉应用的时候,真正核心的,应该是这些积木块,如果你明白了积木块的工作原理,那么,是不是就可以不用这些积木块了呢?完全正确!不过,一般部分情况下,我们不需要这么做,因为,OpenCV已经帮你做好了一些工作(已经帮你做好了一些积木块,直接拿来用就是了)。但是,诸如前面提到的特征提取模块,很多情况下,OpenCV就无能为力了。这个时候,你就需要翻阅计算机视觉、模式识别、机器学习领域顶级会议、期刊、杂志上面发表的文章了。然后,根据这些文章中阐述的原理和方法,来编程实现你要的东西。实际上,也就等于搭建一个属于你私有的积木块。其实,OpenCV中的每一个API函数,也就是这么来的。

         如今,来自世界各地的各大公司、科研机构的研究人员,共同维护支持着opencv的开源库开发。这些公司和机构包括:微软,IBM,索尼、西门子、google、intel、斯坦福、MIT、CMU、剑桥........

六、结语:

        随着深度学习的大举侵入,现在几乎所有人工智能方向的研究论文几乎都被深度学习占领了,传统方法已经很难见到了。有时候在深度网络上改进一个非常小的地方,就可以发一篇还不错的论文。并且,随着深度学习的发展,很多领域的现有数据集内的记录都在不断刷新,已经向人类记录步步紧逼,有的方面甚至已经超越了人类的识别能力。

        At present, the research of computer vision is in a very good period. There are many problems that we could not solve before. Now we can solve it better, such as face recognition, although we have not actually reached the human visual system in the true sense. The robustness of face recognition. But we are still a long way from truly enabling computers to see and perceive the world like a human. Before we reach this goal, deep learning methods may be an important stepping stone in this process, and we also need to bring more new methods and tools into this field to further promote the development of this field.

        Human energy is limited, which means that it is impossible for us to do a lot of things at the same time, so after you have chosen a direction, we must focus our energy on an issue that interests you, and strive to become a leader in this area. expert. Research is a long-distance race. In many cases, if we persist in one direction for a little longer than others, we have the opportunity to surpass him and become an expert in a certain aspect.


Reference documentation:

        https://www.zhihu.com/question/26836846

        https://blog.csdn.net/carson2005/article/details/6979806

        https://blog.csdn.net/wangss9566/article/details/54618507

        https://blog.csdn.net/qq_26499769/article/details/78989088

        http://blog.csdn.net/dcraw  

Attachment: Links to some great blogs in the field of computer vision, super powerful research institutions, etc.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324536958&siteId=291194637