Industry Report | AI+ Manufacturing Empowerment, Machine Vision Opens a New World for Nuggets (Part 1)

Original | Wen BFT robot

01

core points

Al + manufacturing empowerment, META releases SAM to help machine vision usher in the GPT moment.


Machine vision technology enables industrial equipment to "see" its ongoing operations and make quick decisions. A complete machine vision system consists of hardware + software for imaging and image processing.

At present, emerging technologies represented by the integration of "AI + human perception" have gradually penetrated into all aspects of industrial manufacturing. Machine vision, as a seed industry landing technology for AI + manufacturing, has been involved in the tracking of manufacturing production links and product quality inspections. Artificial intelligence is the mother body of machine vision, and deep learning is the technical fortress of machine vision. The recent release of Meta's SAM mode is expected to help machine vision usher in the GPT moment.

The high economic prosperity in the downstream of machine vision has obvious feedback, and AI and machine vision have become rigid needs.

AI + machine vision technology has obvious advantages, policy blessing + social demand (demographic dividend ebb) drives medium and long-term development, and there is a large room for machine vision to penetrate in my country. With the continuous deepening of concepts such as Industry 4.0 + continuous breakthroughs in R&D technology, AI+ machine vision continues to empower downstream industrial applications, and is expected to benefit from the high prosperity of the downstream track. From the perspective of the industry, semiconductors, Automobiles and new energy are expected to become one of the most important driving forces of the industry in the future, and the electronics field will still be the downstream with the widest range of applications in the medium and long term.

From the perspective of application depth, AI endows machine vision with high-precision advantages, making machine vision a rigid requirement in many industries. Machine vision has gradually been embedded in the production and inspection links of semiconductors, automobiles, new energy batteries and photovoltaics, improving automotive electronics. assembly quality, break through the bottleneck of photovoltaic defect detection to improve product yield, etc.

The cost of machine vision is concentrated in the upstream, and the domestic substitution of core links is in the ascendant.


The world is expected to reach a market size of 100 billion in 25 years, and China's growth rate leads the global CAGR of 15%.

Analyze the upstream hardware of the machine vision industry chain in the industrial chain: lens industrial cameras, light sources, and software; the midstream is an equipment manufacturing system integration manufacturer, and the cost is concentrated in industrial cameras with high technical barriers (23% in value) and software algorithms (35% ); 

In terms of competition pattern, in the global machine vision market, companies represented by Cognex (USA), Keyence (Japan), and Basler (Germany) account for >50% of the global market share, with Cognex and Keyence as the leading players. The two giants represented by the early entry, solid product technology, and experience in a wide range of application scenarios take advantage of the market in advance.

The domestic machine vision upstream industry is still in the growth stage, and the growth rate is roughly the same. Pay attention to the development of industrial cameras and software under the domestic substitution + AI iteration.

02

Machine Vision—Eye of Intelligent Manufacturing

2.1 The essence of machine vision is the eyes and brain of the machine

Machine vision technology enables industrial equipment to "see" what it is doing and make quick decisions.

According to the definition of machine vision by the Machine Vision Branch of the Society of Manufacturing Engineers (SME) and the Automation Vision Branch of the American Robotics Industry Association (RIA): Machine vision is the automatic reception and processing of images of a real object through optical devices and non-contact sensors , to obtain the required information or a device for controlling the movement of the robot. In layman's terms, "eye" refers to machine vision using the environment and objects to reflect light to obtain and perceive information; "brain" refers to machine vision to intelligently process and analyze information, and perform corresponding activities based on the analysis results .

According to Yiou Think Tank, machine vision is a rapidly developing branch in the field of artificial intelligence, which uses machines instead of human eyes to make measurements and judgments, and automatically receives and processes images of real objects through optical devices and non-contact sensors. , to obtain the required information or a device for controlling the movement of the robot.

The China Business Industry Research Institute believes that machine vision can replace the human eye to achieve multiple functions in various scenarios. According to functions, it is mainly divided into four categories: detection, measurement, positioning, and identification.

  1. Inspection: Refers to appearance inspection, which has a wide variety of connotations. Such as integrity inspection after product assembly, appearance defect inspection, etc.

  2. Measurement: Calibrate the acquired image pixel information into common measurement units, and then accurately calculate the geometric size of the target object in the image;

  3. Positioning: Obtain the position of the target object, which can be two-dimensional or three-dimensional position information. The accuracy and speed of positioning are the main indicators of the positioning function. On the basis of identifying the object, the coordinate and angle information of the object is accurately given, and the position of the object is automatically judged;

  4. Recognition: Screening based on the target object, including shape, color, barcode, etc.

Figure 1: Machine Vision Workflow

Source: Opto prospectus

2.2 The blessing of AI technology has become a bargaining chip for machine vision to mature

Artificial intelligence is the mother of machine vision, and deep learning is the technical fortress of machine vision.

In the past ten years, thanks to the breakthrough of algorithms such as deep learning, the continuous improvement of computing power, and the continuous accumulation of massive data, artificial intelligence has gradually moved from the laboratory to industrial practice, and the pursuit of ultimate innovation with algorithms, computing power and data as the main theme has continued. Breakthrough, an important technical support for machine vision to achieve update iterations and improve application value.

Among the emerging technologies in the field of artificial intelligence, using the BurstDetection algorithm to detect deep learning is an emerging artificial intelligence technology that has received widespread attention. Deep learning is an algorithm that uses artificial nerves as a framework to perform representation learning on data. Reflected in the deeper neural network and multiple transformations of features, compared with the shallow network with the same number of parameters, the deep network has better feature extraction and generalization capabilities, and continues to bring progress to the field of image recognition.

From 2007 to 2009, Stanford professor Li Feifei led the construction of mageNet, one of the most commonly used data sets for image classification/detection/location. From 2010 to 2017, some large-scale visual recognition challenges such as ILSVRC based on the ImageNet data set promoted The development of neural network and deep learning technology, such as AlexNet can reduce the error rate of image recognition by 14%, and Google Brain uses a multi-CPU combination to build a deep neural network and apply it to image recognition, achieving outstanding results.

Machine vision and artificial intelligence are gradually merging to lead the transition to Industry 4.0.

Machine vision is one of the basic technologies of industrial automation. Through the development of artificial intelligence, Dongfeng realizes another iterative upgrade of machine vision.

Here, on the one hand, Dongfeng is the integration of deep learning, endowing machine vision with higher accuracy and speed; The hardware basis for training tasks.

The development of Fupan machine vision has transformed from an automated machine that can automatically perform simple tasks to an autonomous machine whose visual ability is not restricted by the limit of human visual ability and can think independently, so that it can optimize various elements for a long time. AI+ machine vision is expected to be able to Penetrating industrial manufacturing to a whole new level.

Figure 2: The gradual integration of machine vision and artificial intelligence

Source: Intel official website

In the future, machine vision is expected to be equipped with more advanced AI technology and cut into more differentiated industrial application scenarios.

The topic of artificial intelligence detonated by ChatGPT is continuing to be hot. According to the current focus of the China Academy of Information and Communications Technology and the China Artificial Intelligence Industry Development Alliance, the current focus is gradually transforming from single-point technology to the stage of substantive application transformation, and visual artificial intelligence has already seen a huge wave.

Machine vision equipped with A technology can further optimize performance and adapt to more industrial application scenarios.

One is that deep learning extends multiple model architectures and corresponding performance improvements for machine vision. For example, the Generative Adversarial Network (GAN) can pass the confrontation training of the generator and the discriminator, and its ability to generate images exceeds other methods; the attention mechanism The ViT directly applies the Transformer architecture to a series of image blocks for classification tasks, reducing a large number of required pre-training resources, that is, for image processing; under the continuous training and learning of artificial intelligence algorithms, the image recognition error is not small. The new zoom, combined with machine vision equipment, can play an excellent role in industrial manufacturing.

The second is that AI technology can model different engineering problems and engineering parameters, use the collected high-quality data for machine learning of the model, deeply bind the model with mechanical equipment and production status, and develop an intelligent system based on this, and then generate The production parameters that can be changed in real time and can be optimized are finally handed over to the basic automation to realize the comprehensive upgrade of mechanization-automation-digitization-intelligence.

The third is that AI continues to increase the computing power of chips, and computational optics has become a breakthrough for next-generation machine vision. Relying on the upgrade of algorithms to break through traditional optical imaging devices, further reduce the size of equipment, mine diverse and complex image information, and promote machine vision technology in industrial scenarios Further popularization in.

Figure 3: The development direction of the integration of artificial intelligence and machine vision

Source: Changhong AI Lab, Chen Foji, etc. "Generative Adversarial Network and Its Application Research Review in Image Generation"

2.3 Meta released SAM to open machine vision GPT moment

The Segment Anything Model (SAM) project is a new task, model, and dataset for image segmentation. Using efficient models in a data collection loop builds the largest segmentation dataset to date, with over 1.1 billion masks on 11 million licensed and privacy-respecting images. The model is designed and trained to be promptable, so it can zero-shot transfer to new image distributions and tasks. When the model is fully trained on the network corpus, it is found that its zero-shot performance is even better than that of the adjusted model (Fine-tuned models).

SAM performs zero-shot and few-shot learning on new datasets and tasks through "hint learning" techniques. Meta researchers propose the promptable segmentation task, with the goal of returning a valid segmentation mask given any segmentation prompt. Prompts simply specify what is to be segmented in the image, for example, prompts can include spatial or textual information identifying objects. The requirement for a valid output mask means that even if the cue is ambiguous and may point to multiple objects (e.g. a dot on a shirt might indicate either the shirt or the person wearing it), the output should be a plausible mask for at least one of those objects .Taking the hint segmentation task as a pre-training objective and solving general downstream segmentation tasks through hint engineering.

Figure 4: SAM task details

Source: Alexander Kirillov et al., Segment Anything

SAM consists of an image encoder, a hint encoder, and a mask decoder that predicts segmentation masks.

By separating the SAM into an image encoder and a prompt fast encoder/mask decoder, the same image embedding can be reused (and its cost shared) across different prompts. Given an image embedding, the hint encoder and mask decoder take 50ms to predict the mask from the hint in a web browser. Focuses on point, box, and mask cues, and also presents preliminary results with free-form text cues. To make SAM ambiguous, it is designed to predict multiple masks for a single cue, enabling SAM to handle ambiguities naturally, such as the shirt and person examples.

Figure 5: SAM model structure

Source: Alexander Kirillov et al., Segment Anything

SAM is expected to help the development of machine vision and drive technological innovation in the vertical field of AI+ manufacturing.

SAM has learned general concepts about objects, and it can generate masks for any object in any image or video, even objects and image types not encountered during training, without additional training. Meta anticipates that composable system designs based on techniques such as prompt engineering will support a wider range of applications than systems trained specifically for a fixed set of tasks. SAMs can be powerful components for AR, VR, content creation, scientific domains, and more general AI systems. For example, SAM can identify everyday objects through AR glasses and provide users with tips; SAM may also help farmers in the agricultural field or assist biologists in research.

Figure 6: SAM can recognize everyday objects through AR eyes; Figure 7: SAM in biological applications

Source: Digital Economy Pioneer Public Account

 

For more exciting content, please pay attention to the official account: BFT Robot
This article is an original article, and the copyright belongs to BFT Robot. If you need to reprint, please contact us. If you have any questions about the content of this article, please contact us and we will respond promptly.

Guess you like

Origin blog.csdn.net/Hinyeung2021/article/details/131293455