An introductory compulsory course for AI product managers-case studies (1)

about the author

@毛毛

Product manager

Combines beauty and talent.

In-depth understanding of AI and rich experience.

In the previous section, I introduced "the ability of an AI product manager and the level of understanding of data and algorithms" and "the actual training process of machine learning". Later, I will discuss the popular applications of AI products in the current environment, covering voice Recognition, image recognition, NLP natural language processing, knowledge graph and other production scenarios.

1 Artificial intelligence and ``artificial'' intelligence

The most things people do every day are to see, listen, speak, think, and make decisions. These are the abilities that a complex system of humans needs to possess. If you want to be an intelligent machine like a human, the most basic problem to be solved is image processing. After possessing these abilities, humans can do more things, and machines can do more things in the same way.

As one of the realization methods of artificial intelligence, the core of machine learning is to use algorithms to analyze data, learn rules from the data, and then make decisions and predictions about events in the real world. Due to the strong dependence on data, it is extremely important for data processing and application. AI scenarios need to deal with a large amount of unstructured data, which involves a lot of human work. At the current stage of development, I prefer to call it "artificial" intelligence.

2 How to build AI products

The core stages that need to go through to build an AI product can be briefly summarized as the business combing stage, the data preparation stage, and the product development plan stage.

Business combing

Different industries have different industry backgrounds. Before designing product solutions, you need to understand the business logic of your industry and the demand and pain points you face. AI products essentially solve the problem of efficiency, whether it is to improve the efficiency of information production or information transmission. Efficiency, first of all, it is necessary to find scenarios with efficiency problems and identify whether they are solved in a high degree.

●Determine the business process: draw a business flow chart, sort out the smooth relationship between different roles in the business

●Business classification: Analyze how information is transmitted between different links and classify them according to different types of needs.

●Resource evaluation: evaluate the existing data resources and whether there are enough data to support the development of the product; if the accumulated data of the business is not enough or the quality is not good, are there other channels to collect data or data governance methods.

●Determine the priority: which problems can be solved first, divided according to the important emergency coefficient.

Prepare data

In the workflow of building AI products, preparing data is the most important and extremely critical link. The quality of the data directly affects the availability of the model, and it is also the link that consumes the most energy and workload. The process of data preparation includes collecting data, managing data, and labeling data.

●Data collection: The general data collection methods include sorting and collecting internally accumulated business data, purchasing or cooperating to obtain data from data parties, obtaining data publicly available on the network through crawler systems, and collecting and reporting data through terminal equipment.

●Data governance: Usually the data we collect from multiple channels cannot be used directly, because different channels have different definitions of data and different production and usage scenarios. If you want to use it, you need to carry out a series of data processing procedures, governance Data is a complex process, including data quality, data standards, data security and other processing technologies and methods.

●Data labeling: Data labeling is to label the data accordingly. AI products need to process a large amount of unstructured data. The meaning of data labeling is to label the information accumulated by human experience and judgment on the data, so that the machine can understand and read it. The process of data labeling can be divided into: determining the purpose of data labeling, formulating labeling standards, data labeling, and labeling results acceptance.

Design product development plan-build and train models

The training process of machine learning has been shared in the previous article and will not be introduced here.
An introductory compulsory course for AI product managers-case studies (1)

3 Case study: Image recognition-AI recognize plants

Application scenarios

●Get knowledge-take photos to identify plants and conduct plant research;

●Assisted teaching-quickly identify plants, understand basic plant information, and simplify students' cognition;

●Hobbies-Travel and play, scan and identify pictures to add interest.

Clear task type

Based on specific application scenarios, sort out the core problems to be solved, such as plant image recognition. The simplest scenario is to enter a plant picture and return the correct plant name, which is a typical classification problem.

Develop classification standards

If we want to accurately identify plant pictures, the first step is to clarify how many types of plants are divided into, and what characteristics each plant has. The following figure is divided by the survival mode of different plants as the classification logic, which can be used as a reference. There are many classification methods. The core needs to confirm and uniformly formulate a set of standards to facilitate future maintenance and expansion, and lay the foundation for subsequent model training.

The process of formulating standards tests the product manager’s understanding of demand scenarios and the research on the knowledge background of specific scenarios. The scope of the standard directly affects the scope of the problem to be solved by the final product.

An introductory compulsory course for AI product managers-case studies (1)

data collection

Collect sample pictures for each category according to the classification criteria. Usually there will be a dedicated data collection platform. We only need to create task types, define the scope of data collection and the websites or links that need to be visited to complete automated data collection.

Data annotation

Label the collected sample pictures with corresponding classification labels. The labeling process can be divided into manual labeling and machine labeling. Usually the company will build a dedicated data service platform for labeling. For example, Baidu has its own crowdsourcing platform that provides services for data labeling in various departments. In addition, there are also companies that specialize in data annotation on the market, such as platforms such as Cloud Data.

Model training-CNN

Convolutional neural network CNN is currently the mainstream technology for processing image problems. It includes key technologies such as image content positioning, target segmentation, target key point detection, and target classification, which can quickly extract image features. Before introducing CNN, first understand what a neural network is.

Neural network is a neuron model constructed by simulating the process of biological nerve cells transmitting information. It mainly contains three parts: data input layer, hidden layer, and output layer.

●Input layer: It is to input basic data into the model.

●Hidden layer: also called the calculation layer, which contains mathematical model calculations of multiple parameters.

● Output layer: output the result after calculation.

The calculation process can be simply understood as: each value input in the input layer is multiplied by the corresponding weight and then passed to the next node, and the node will accumulate each calculated data result. After the accumulated value is activated by the activation function, it will continue to participate in the calculation as the input data of the next layer, and then loop until the output data of the last layer is calculated. Each time the training data is entered, the weight value of each node on the entire neural network will be updated once, and the error will be gradually reduced through the continuous adjustment of the weight value of each layer to confirm the final model.

An introductory compulsory course for AI product managers-case studies (1)

The convolutional neural network CNN has the same logic as the neural network. It also includes an input layer, a hidden layer, and an output layer. The difference is that the hidden layer will split into a convolutional layer, a linear rectification unit layer, a pooling layer, and a fully connected layer. The convolutional layer is to extract image features; the linear rectification unit layer calls a specific ReLU activation function during calculation; the pooling layer is to reduce the dimensionality of the image feature data involved in the calculation; the fully connected layer is to comprehensively calculate different The score of the classification is prepared for the final data output. (The technical logic is more complicated, and interested children's shoes can refer to more information to expand their understanding. Usually this part is handled by algorithm engineers, and the product only needs to understand the principle simply).

An introductory compulsory course for AI product managers-case studies (1)

Model evaluation

The AI ​​product manager needs to be responsible for the results of the model. It is necessary to establish a unified evaluation standard for evaluating whether the model is available, clarify the evaluation process, and form a conclusion through data analysis. In the scene of plant image recognition, on the one hand, it is necessary to evaluate whether the model can successfully recognize that the image contains plants, and on the other hand, it is necessary to evaluate the classification accuracy of the identified plants.

●Evaluation criteria

Evaluation criteria include preparing test data sets, determining evaluation indicators, and defining criteria for judgment in different situations.

An introductory compulsory course for AI product managers-case studies (1)

●Evaluation process

All scenes related to image recognition must be first identified and then predicted. Therefore, in the evaluation process, it is necessary to focus on whether the model has correctly circled the target object. If the target object has been selected by the frame, then judge whether it is correct or not.

An introductory compulsory course for AI product managers-case studies (1)

●Evaluation data

After determining the evaluation criteria and evaluation process, the collected test data should be evaluated one by one. This link can be handed over to the data labeling team or interns for operation, because the test data needs to be relatively large in individual scenarios. In order to improve efficiency, Tasks are allocated, and most companies will set up special positions for data support.

An introductory compulsory course for AI product managers-case studies (1)

●Evaluation conclusion

The accuracy index is to evaluate the predictive ability of the model to the data; YES judges the correct number/(YES identification box + NO identification box should be YES).

The recall index is to evaluate the ability of the model to recognize images; the main plant frame has been selected/should be selected.

An introductory compulsory course for AI product managers-case studies (1)


A community where data people communicate and learn, follow us, master professional data knowledge, and get to know more data partners.
Take you to explore the magical mystery of data

Guess you like

Origin blog.51cto.com/13526224/2560488