What I learned while pursuing a master's degree in computer vision and machine learning


Number of positive characters: 5085 Reading time: 10 minutes

This article introduces my experience and course content, but I believe that the courses of other universities will not be much different from what I have studied. Therefore, readers can use this article as a window to understand the master's degree courses and content of machine learning and computer vision.

Posted by Richmond Alake 

url : https://towardsdatascience.com/what-i-learnt-from-taking-a-masters-in-computer-vision-and-machine-learning-69f0c6dfe9df

I wrote this article to reflect on and summarize what I learned and gained while studying for a master's degree in machine learning. I have to admit that some parts of my research are very useful and some are not.

This article introduces my experience and course content, but I believe that the courses of other universities will not be much different from what I have studied. Therefore, some readers can use this article as a window to understand the master's degree courses and content of machine learning and computer vision.

In addition to the knowledge and information that I learned during my studies, I will also introduce and cite the relevant information between the academic knowledge I have acquired and my current job as a computer vision engineer.

 

Precondition

The advanced degree in machine learning has several selected topics that reflect the direction of development in the field of machine learning.

There is too much content to be covered and covered in any course of machine learning. Therefore, the master's degree I am studying needs to ensure that students have the following prerequisites before accepting the course.

 

  • Good understanding of linear algebra and calculus (differentiation/optimization) 

  • Basic statistics and probability research

  • Programming language background  

  • Undergraduate study in computer science, mathematics, physics or electronic and mechanical engineering 

Now I will introduce the key information I learned while studying for a master's degree in machine learning.

 

1. The meter computer vision

Let me start with the most powerful module in the course.

 

In the field of machine learning, I am really interested in computer vision and deep learning. Perhaps I was attracted to this field because the technology developed can have a direct impact.

 

The media is full of praise for the progress of computer vision technology over the past few decades. The rapid emergence of facial recognition systems is not to be missed. Facial recognition systems can now be found in major international airports, banks and government organizations.

My master's program in computer vision research is very organized. You should not expect to directly enter into the realization and analysis of the latest technological level.

In fact, you took a few steps back. You must first start by acquiring knowledge of basic image processing techniques, which were developed before the introduction of advanced computer vision techniques we see and use today.

Photo by Gery Wibowo on Unsplash

In deep learning, we learned that the lower layers of a convolutional neural network learn lower-level patterns from input images (such as lines and edges).

However, before the introduction of convolutional neural networks (CNN) into computer vision, some heuristic-based techniques were used to detect regions of interest and extract features from images.

My computer vision research has ensured my basic understanding of the field by gaining knowledge about how these heuristic-based technologies work and how they are used in practical applications.

The research of computer vision allowed me to acquire the knowledge of traditional machine learning technology in processing images, extract features and classify the descriptors obtained from images.

Here are a few key themes and terms that I introduced during my computer vision research:

Feel free to skip the definition. I put these here to provide some information for curious people.

  • (Scale Invariant Feature Transform) SIFT: This is a computer vision technology used to generate image key point descriptors (feature vectors). The generated descriptor contains information about features such as edges, corners and spots. Descriptors can be used to detect objects across different scales and distorted images. SIFT can be used in applications such as object recognition, gesture recognition and tracking. This is a link to the original research paper introducing the technology. The key to SIFT is that the detected features are invariant to affine transformations such as scaling, translation, and rotation.

  • (Histogram of Directional Gradients) HOG: This is a technique used to extract features from an image. The extracted features come from the information provided by the edges and corners in the image, more specifically, the objects in the image. A simple description of this technique is that it can identify the position of edges (gradients), corners and lines in an image, and it can also obtain information about the edge direction. The HOG descriptor generates a histogram, which contains information about the edge distribution and the direction information detected from the image. This technology can be used in computer vision applications and image processing. This link contains more information.

  • Principal component analysis (PCA): an algorithm to reduce the features in a multi-feature data set. By projecting data points from a higher dimension to a lower plane, but still maintaining information and minimizing information loss, the dimensionality can be reduced.

Other topics worth mentioning are as follows

  • Linear interpolation

  • Unsupervised clustering (K-means)

  • Visual word bag (visual search system)

In the early days of my studies, I expected to start developing applications based on computer vision. Object classification is a hot and popular topic, and it is also a topic that is relatively easy to obtain and implement some basic knowledge.

In my research, my task is to develop a visual search system in Matlab.

Matlab is a programming language developed for effective numerical calculation and matrix processing, and the Matlab library is equipped with a set of algorithms and visualization tools.

Past programming experience in JavaScript, Java and Python helped me easily learn Matlab programming syntax so that I can devote myself to the computer vision aspect of research.


More information

The vision system to be realized is quite basic. Its working principle is to pass a query image through the system, and then the system generates a set of image results, and transmits these results to the system with similar query images.

What I should mention is that the system contains a database for storing images, which is used to extract result images from it (first query the image, then output the result image).

This vision system does not use any fancy deep learning techniques, but uses some of the traditional machine learning techniques mentioned earlier.

You only need to pass an RGB image, convert the image to a grayscale image, and then impose a feature extractor on the image; then, extract the image descriptor and express it in the N-dimensional feature space.

In this feature space, similar images can be calculated by calculating the Euclidean distance between two N-dimensional points.


Things started to get serious

Understanding that computer vision is not limited to processing images, you can also apply algorithms and techniques in video processing.

Remember, videos are just image sequences, so you have not learned anything new in preparing and processing input data.

If you are using object detection frameworks such as YOLO, RCNN, etc., tracking objects in a series of images seems very trivial. But realize that studying computer vision is not about using pre-trained networks and fine-tuning. It is about understanding how the field itself has progressed over the years and the current state of development, and the best way to obtain a solid and solid understanding is to investigate various traditional technologies that have developed over time.

Therefore, for the task of object tracking, the following topics are introduced:

  • Blob tracker

  • Kalman filter

  • Particle filter

  • Markov process

Relevance to computer vision engineers

To be honest, I haven't used any traditional machine learning classifiers, and I don't think I will use them anytime soon.

However, in order to let you understand the relevance of some of the technologies mentioned, it is worth pointing out that autonomous vehicles, license plate readers and lane detectors use one or two of the methods discussed earlier.

Photo by Bram Van Oost on Unsplash.

Photo by TrentSzmolnik on Unsplash.

2. Deep learning

Deep learning is an inevitable development of computer vision research.

Some deep learning topics have been covered in the computer vision module, while other deep learning topics are extensions or improvements to traditional computer vision technology.

The topic teaching in deep learning takes a similar approach to my computer vision research, which is to create a solid understanding of the basic knowledge of the field, and then move to advanced topics and application development.

Imagefrom kisina/Getty Images

The research of in-depth learning begins with an understanding of the basic building blocks of images.

You will soon know that a digital image is a grid containing a collection of pixels.

After understanding the atomic basis of images, you will continue to learn how images are stored in system memory.

Framebuffer (frame buffer) is the name for storing pixel locations in system memory (many moocs will not teach you this).

Photoby GeryWibowo on Unsplash

 

I also learned how the camera equipment captures digital images.

I must admit that it's great to have a certain intuition about how smartphone cameras take images.

Let's continue to explore some cooler things.

If you don't understand Convolutional Neural Networks (CNN), you can't learn deep learning because they complement each other.

My research introduces the timeline of the introduction and development of CNNs (from LeNet-5 to RCNNs) in the past 20 years, and their role in replacing the traditional way of completing typical computer vision tasks similar to object recognition.

In the process of my research, I introduced the exploration of different CNN structures proposed in the early days of deep learning. AlexNet, LeNet, and GoogLeNet are examples for understanding the internal knowledge of convolutional neural networks and their applications in solving tasks such as target detection, recognition, and classification.

An important skill I learned was how to read research papers.

Reading research papers is not a skill you learn directly in this article. If you are serious about deep learning and general learning, it is always a good idea to find sources of information and research. It is quite easy to use the pre-trained model provided by the deep learning framework. Nevertheless, an advanced research still hopes that you understand the inherent details of the technologies and components of each proposed architecture, which is only presented in research papers.

The following is a summary of the topics covered in the deep learning module:

Feel free to skip the definition. I put these here to provide some information for the curious.

 

  • Multilayer perceptron (MLP): Multilayer perceptron (MLP) is a structure in which layers of perceptrons are stacked. MLP consists of an input layer, one or more TLU layers (called hidden layers), and a final layer (called output layers).

  • Neural Style Transfer (NST): A technology that uses deep convolutional neural networks and algorithms to extract content information from one image and extract style information from another reference image. After extracting the style and content, a combined image is generated, where the content and style of the generated image are from different images.

  • Recurrent Neural Network (RNN) and LSTM: A variant of the neural network architecture that can accept input of any size as input and generate output data of random size. The RNN neural network architecture is used to learn temporal relationships.

  • Face detection: This is a term used to realize the technology that can automatically recognize and locate faces in images and videos. Face detection is used in related applications such as facial recognition, photography, and emotion capture.

  • Posture inference: The process of inferring the position of the main body joints from the provided digital assets (such as images, videos or image sequences). Gesture inference technology exists in applications such as action recognition, human interaction, virtual reality and three-dimensional graphics game creation, robotics and other applications.

  • Object recognition: The process of identifying the class associated with the target object. Object recognition and detection are technologies with similar final results and implementation methods. Although in various systems and algorithms, the identification process usually precedes the detection step.

  • Tracking: A method of identifying, detecting, and tracking objects of interest in an image sequence over a period of time. Applications for tracking within the system can be found in many surveillance cameras and traffic monitoring equipment.

  • Object detection: Object detection is associated with computer vision and describes a system that can recognize the existence and location of target objects or objects in an image. Please note that one or more objects to be detected may appear.


Other notable topics and subtopics include neural networks, backpropagation, CNN network architecture, super-resolution, gesture recognition, semantic segmentation, etc.

Relevance to computer vision engineers

This is basically my main source of income.

So far, I have integrated face detection, gesture recognition, pose estimation, and semantic segmentation models on edge devices for gaming purposes.

In my current position, I have implemented, trained and evaluated many deep learning models. If you want to use a large number of cutting-edge algorithms and tools in advanced companies, then deep learning is a field that can put you at the forefront of the actual commercial development of AI.


3. Thesis

The purpose of a master's degree thesis is to enable you to use all the skills, knowledge and intuition acquired during the learning process to design solutions to real-life problems.

My thesis is based on computer vision technology to analyze the motion of quadrupeds. Pose inference is the core computer vision technology I use for motion analysis.

This is the first time I have been introduced into the field of deep learning frameworks. I have decided that my solution for motion analysis will be a deep learning-based solution using convolutional neural networks.

In order to choose a framework, I wandered between Caffe and Keras, but because PyTorch has task-related ready-to-use pre-trained models, I chose PyTorch. Python is my programming language of choice.

 

This is the list of items I learned from the paper:

  • Transfer learning/fine-tuning

  • Python programming language

  • C# programming language

  • Theoretical knowledge of posture inference

  • Knowledge about how to use Unity3D for simulation

  • Experience using Google Cloud Platform

More information about motion analysis

Motion analysis refers to the terminology involved in the process of obtaining motion information and details from clear motion images or the sorting of sequential motion sequence images.

The results of applications and operations using motion analysis introduce motion detection and key point positioning in the most direct form. Sophisticated applications can use sequentially related images to track objects frame by frame.

At present, motion analysis and its various application forms provide significant benefits and a wealth of information when processing temporal data.

Different industries benefit from the results and information provided through motion analysis. Industries such as healthcare, manufacturing, machinery, and finance have a variety of use cases and methods of applying motion analysis to solve problems or create value for consumers.

The diversity of the application of motion analysis in the entire industry has indirectly introduced various forms of motion analysis subsets, such as pose inference, object detection, object tracking, key point detection and other different subsets.


More information about the paper

This article proposes a method for motion analysis using computer vision and machine learning technology. This method uses quadruped synthetic images as a data set to train a pre-trained key point detection network.

KeypointRCNN is a built-in model in the Pytorch library, which extends the functions of the original FastRCNN and Faster RCNN. The method in this paper modifies the Keypoint-RCNN neural network architecture pre-trained on the COCO2017 object detection and segmentation data set, and retrains the last layer with the synthesized data set.

By extending the baseline framework for human key point detection of 17 joints of the human body, I propose an extension of the framework that can predict the position of the main joints on multiple 26-joint quadrupeds.

Thesis result snippet

 

Qualitative and quantitative evaluation strategies are used to show the visual and metric performance of the improved Keypoint-RCNN architecture when predicting key points on synthetic tetrapods.

 

If you have already done this, I applaud you... let this article end

4. Summary

The field of machine learning is changing rapidly; my course content is related to the current state of research in 2018-2019. Now in 2020, we have seen that machine learning has made great contributions in some other fields. So, if you have taken a machine learning course and are studying topics or subject areas that I did not mention in this article, don't be surprised.

So don't forget that in the field of artificial intelligence, as a machine learning practitioner, you not only need to learn how to create models by yourself, but you must also keep learning to keep up with new research.

LiveVideoStackCon 2020 Beijing

October 31-November 1, 2020

Click [read original text] for more detailed information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/109088622