I cried, how difficult is it to land AI projects in the industry?

Artificial intelligence is the hottest technical term in recent years. If you don't talk about artificial intelligence, it is equivalent to outdated, but when you really enter the field of artificial intelligence, you find that at first you think that the "blocking tiger" is an algorithm, but later found that landing is a huge problem. Starting from the author’s experience and lessons, this article expounds the important details of each link in the development of an AI project, and shows the cumbersome process from an AI project to the final landing.

imagePrefaceimage

AI abuses me thousands of times, and I treat AI as first love. What is a good AI? Answer: AI that can land is a good AI. The AI ​​project grew out of nothing, and finally blossomed. It was nothing more than the process of digging, stepping on, and filling the pit. This article starts from some of the author's experience, experience, and blood and tears, and talks about the feelings of the AI ​​project research and development process.

Cases of abuse

  • Case 1: After N iterations and optimizations, the final draft is finalized. Supports seemingly tall functions such as switching models, cloud training, and manual parameter adjustment. Finally, it was discovered that the customer's requirement was 100% accuracy.

  • Case 2: Various lights of the prototype flashed back and forth, and various sports organizations were dancing. All horns and horns are covered, and all scratches, incompleteness, and dirt are covered. However, it takes 30 seconds to detect a product. It was discovered after visiting the production site once that the manual visual inspection only takes 2 seconds.

  • Case 3: Optics, algorithms, and interfaces are all ok. In the enthusiastic and energetic preparations for the promotion of hundreds of sets of realization, the customer said: Sorry, only one set.

  • Case 4: Same as above, the final customer said that your product is really advanced, let me think about it again. Of course there is no news. Is it fooled? Or is it a white prostitution?

  • Case 5: When we were cheering for the recognition accuracy rate to reach 99%, the customer took a successful recognition and a failed recognition together, and asked: These two are obviously exactly the same, why this failed, this succeeded?

  • Case 6: I went to the production site to train customer labeling. They are very cooperative, and they are also veterans of visual inspection. After I demonstrated a few, let him try. He just refused, and finally knew: Uh, he doesn't know how to use a computer!

  • Case 7: Our algorithm is so good, our model is so advanced. AI+traditional methods work together, perfect. But you need to adjust these 20 hyperparameters. People? Don't go!

  • Case 8: It has been online and finally found that a certain type of defect of a certain model is not well-lit, and it is difficult to judge the image. In the end, it can only be overturned and started over.

  • Case 9: I didn't realize the importance of the data. Every time I tested a few pictures, the result was perfect, and I decided to go online in a hurry. When I finally tested it in large quantities, I found that it didn't work anymore.

  • Case 10: The algorithm is ok, the deployment efficiency is also ok, and the closed-loop ecology of labeling-training-deployment is also ok. Then the customer asked: Can I not train every time I change the model?

Why is it so hard?

Industrial AI, especially defect detection, are hard bones. Although the scene is very simple, although the data is continuous, although the algorithms are very pure. But the demand is too scattered, not because it can’t be done, but because it’s worth it. Because you have to face the following problems:

  • Unexplainable, ambiguous standards.

  • Some standards that are difficult to quantify.

  • Frequent changing requirements are difficult to achieve quick response only by adjusting post-processing parameters.

  • The scene of frequent model changes leaves you little time for training. It can't even provide a good training environment.

  • Three-dimensional products, various defects in lighting, viewing angles, and extremely weak defects.

  • The sample consistency problem that is difficult to guarantee is the root cause of your over-fitting.

  • Can the accuracy rate reach 100%, can someone really guarantee it?

  • Is it faster manually?

  • Is there any cheaper labor cost?

  • Need to cooperate with complicated hardware equipment, especially sports equipment. How can we ensure the stability of the whole set of equipment?

  • Same as above, it is difficult to guarantee the reproducibility of the hardware. You have to thank God for even one model, let alone universal.

  • The problem of maintenance cost in the later stage, because of the too many links involved, requires an "all-rounder" to solve it.

imageGeneral processimage

AI requires agile development, more methodologies, and more stable development processes.

What I want to mention here is that the AI ​​in the industrial scene is just a small component in the whole system, and you will definitely not rely on pure AI to make money. Even so, AI grew out of nothing and still went through the following links:

Demand stage

Including scenario analysis, problem definition, feasibility analysis. Many tasks enter the end directly from this stage. This is a good thing, we must not be blindly confident and blindly optimistic. The so-called blindfolded eyes do not see Taishan, only to see that the algorithm is easy to implement and ignore the above problems, in the end can only be dismal. I am most afraid that after investing too much sunk costs, I want to end but not reconciled.

What are needs, what are real needs, and what are hidden real needs waiting to be discovered. Many times, when chatting with customers about their needs, they can't give a clear demand. The easiest and most direct way is to visit their production site in depth. Integrate with the workers and learn their standards of judgment. Discover the needs for them, especially the following points must be clear in advance:

  • What is absolutely intolerable error, once it occurs, it is a quality accident. We need to know where the lower limit of the algorithm is.

  • It does not involve changing models, whether it can provide scene requirements that meet the model training, such as at least a GPU, or you can go online for cloud training.

  • In terms of time requirements, many replacement manual stations require faster than humans. We need to know the physical limits of the system, such as sports equipment.

  • For gray areas that are difficult to define by the algorithm, manual re-examination will not be accepted. For non-work cases, we must have a backup.

Others are relatively straightforward, just explain the second point. Everyone must know that when we do algorithm reproduction, the training part is several levels more difficult than the inference part. In the same way, when it comes to online deployment, if the user's own training is involved, then the difficulty will rise. The entire closed-loop ecosystem of labeling, data processing, training parameters, test evaluation, etc. must be packaged together, and fully automated. Once there is a problem in a certain link, it is enough for you to be in a hurry. You may even encounter situations such as the user’s computer cannot access the Internet & don’t have a GPU. There is nothing wrong with it. Even if you mention the necessary conditions for training, he will not necessarily match it to you.

The above points must be carefully demonstrated, comprehensively demonstrated, and repeated. Argumentation is not involuntary, not inefficient, not in execution. For projects that are launched in a hurry without detailed argumentation, there are usually countless pits waiting for you in the later stage.

Lighting stage

Including optical design, imaging analysis, and of course structural design that is not so AI. As the saying goes: Seven points depend on lighting, and three points depend on tuning. Lighting is very important, because the subsequent algorithm can only be responsible for the picture. Generally, I use "obvious" and "clear" for feasibility analysis, "clear" comes from demand, and "obvious" comes from optics. The most intuitive judgment is whether the human eye can make accurate judgments through pictures. If there is an ambiguity, it will also become an ambiguity in the algorithm.

Data phase

Including data collection, data labeling, and data processing. The importance of data is self-evident, as the saying goes: Seven points depend on data, and three points depend on tricks. With the data in place, everything is easy to say. The importance of data must be well understood by any practitioner. We want data and effective data. If there is no data, please use the traditional method. Remember that model generalization is not that important, and of course the model does not have that strong generalization ability. It can recognize it because it has seen it. Think of the model as a memory, not a generalizer. Before you, you needed to build a database to store data for comparison during testing. Now you, the model is your database.

Data labeling will involve the definition of standards, and it is often difficult to get a clear standard. In other words, it cannot be quantified as a clear standard. There are often gray areas, which requires a clear understanding in advance. For the handling of gray areas, or customer tolerance, we must plan ahead with strategies. The more difficult thing here is that the gray area may be difficult to quantify. We just know that this sample is a gray area, how gray it is, have no idea.

In addition, it is more important to establish a stable and representative data set as soon as possible, especially the test set. This is very important. This is the baseline of your data, which can help you perform follow-up benchmark experiments very quickly. If you don't know what kind of final result you are responsible for, then you will never stop doing it.

Algorithm design stage

Including task definition, task splitting, model selection. Especially when the task is split, you are not sure to put all the elephants in a refrigerator, and you cannot put all the eggs in a basket.

Put an end to model-only theory & SOTA theory. What we need is to solve specific problems in specific scenarios. This involves the transformation of academic thinking. The academic masters are responsible for data sets such as imageNet and COCO, while I am responsible for my own scenarios and data sets. SOTA focuses on the upper limit of the model, while the actual scenario focuses on the lower limit of the model.

Put an end to the AI-only theory. Regardless of the traditional method or the AI ​​method, what can work is a good algorithm. If the traditional method has no obvious defects, then please choose the traditional solution. Or you can think that the current seemingly tall AI is not the real AI, maybe one day after 30 years, you will say: first try the traditional method YOLO V28!

Training evaluation stage

Including model tuning, model training, and index evaluation. The so-called "alchemy". The first few steps are done, generally there will be no big problems, if there is, please look back. Again, establish a good baseline and gradually improve. Here I want to say, "Advance optimization is the root of all evil". When ensuring accuracy, consider speed and optimize again. Of course, the accuracy you get by combining 58 models is beyond the scope of this discussion.

Deployment phase

There are many pits at this stage, and they are basically technical. It is also the so-called "dirty work". Including model optimization, cross-platform forward reasoning, and model encryption. It's finally the deployment stage, and we have seen the dawn of landing. Regarding the implementation of deep learning artificial intelligence, there are already many solutions, whether it is a computer, a mobile phone or an embedded end, running the trained neural network weights on various platforms, and the application is the most practical. But there is still so much work to be done:

  • Cross-platform: Can run on target hardware, including various types of cpu/gpu/npu/fpga, etc.

  • High performance: fast speed, less memory, etc.

  • Accuracy is not lost: after a throughput, pruning, distillation, map optimization and other operations, the time requirements are finally met, but

  • However, I found that the accuracy of deployment testing dropped by half, WTF.

  • Encryption requirements: You definitely don't want your hard-working results to be used by others in vain!

  • Closed-loop ecology: Of course, you can't do it once and for all, how to collect samples in the application and update the system. You need to make a practical and easy-to-use closed-loop tool chain.

Operation and maintenance stage

Including operation monitoring, model update, etc. You thought you could breathe a sigh of relief, but you didn't. Can you withstand the test of massive production capacity and time? Please watch tremblingly! The core of operation and maintenance is to ensure the safe and stable operation of the business. As mentioned above, the generalization ability of AI is still relatively lacking, so it is very likely that it will not work in the actual operation process. Of course, the most direct way is to continuously expand the data. Of course, you must ensure that your model has enough capacity, if not, then the algorithm design link has not been done well. Collect data and use the closed-loop ecological tool chain mentioned in the deployment phase above to continue this matter.

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/115250835