Active learning and weakly supervised learning

Obtaining artificial intelligence data is not as simple as imagined. Although we have already been under the wave of big data, many companies are rolling in the wave of data acquisition but have never found a suitable channel to obtain data. In many cases, obtaining high-quality artificial intelligence data requires a lot of manpower, time, and money. However, in the future world, creating value for life through "human-machine collaboration" has already become an inevitable trend. Imagine using this data to train a supervised machine learning (ML) algorithm? ML algorithms can perform the same classification tasks as humans, just much faster! It can reduce costs and inefficiencies. It can reduce costs and inefficiencies. It works on your mixed data, such as images, text files, and simple numbers. It can do all these things, and if your model is good enough, you can really do whatever you want. However, to obtain this data, you have to pay a huge price. But we can always find a solution. If you discover an "active learning" machine learning technology, maybe how to obtain a large amount of data will not scare you away.

Comparison of two popular machine learning techniques

The field of machine learning (ML) has grown exponentially, but obtaining data remains a daunting task for many enterprises. Traditional machine learning algorithms require a large amount of manually labeled data. The huge data required is often unavailable at scale and costly, not to mention the time and effort required to manually annotate the data. The finished product data fell short of ideal quality standards. Active learning vs. weakly supervised learning: Overcome your data challenges with these two great machine learning techniques. Labeling data also requires human annotators. In many cases, these annotators are industry experts (SMEs) to the extent that they can use their expertise in the industry to produce accurate annotations. However, the availability of SMEs is limited and hiring costs are high. With these challenges in mind, teams developing artificial intelligence (AI) solutions are moving from fully supervised learning (which requires complete manually labeled datasets to train ML models) to active and weakly supervised learning. The latter is typically faster and less labor intensive, while still being able to successfully train the model. Understanding how different learning techniques work and their advantages can help teams decide whether weakly supervised learning or active learning (or a combination of the two) is an appropriate solution for training machine models.

Active learning and weakly supervised learning: how to adapt to supervised learning

First of all, we need to clearly understand that there are different types of learning in machine learning, and all of these learning types can be classified into one of these two categories: supervised learning and unsupervised learning. In supervised learning, the machine receives data points labeled by humans and uses these data points to make predictions. Unsupervised learning, on the other hand, uses unlabeled data; the algorithm must extract structure and patterns from the data without human guidance. Supervised learning can also be broken down into a range of learning types. These include active learning (a form of semi-supervised learning) and weakly supervised learning.

Active learning

Active learning is a form of semi-supervised learning. Unlike fully supervised learning, this form of learning only provides the machine learning algorithm with an initial subset of human-labeled data from a larger unlabeled data set. Algorithms process this data and provide predictions with a certain level of confidence. Any prediction below this confidence level will indicate that more data is needed. These low-confidence predictions are sent to annotators to label the algorithm that requested the data. The loop repeats until the algorithm is trained and functioning properly with the expected prediction accuracy. This iterativehuman-machine collaborationapproach is based on the fact that not all samples have learning value, so the algorithm will select the data to be learned. A key differentiator in active learning is the sampling method used, which strongly affects how the model performs. Data scientists can test different sampling methods to choose the method that produces the most accurate results. Overall, active learning relies less on humandata annotationthan fully supervised learning because not all data sets need to be annotated, Only data points required by the machine need to be annotated.

weakly supervised learning

Weakly supervised learning is a learning technique that incorporates knowledge from a variety of data sources, many of which are of low quality. These data sources may include:

  • Low-cost low-quality annotated data from non-experts.
  • Senior supervision from the SME, for example, using heuristics (rules). A heuristic might be stated as, "If a data point = x, then label it y." Thousands, even millions, of data points can be annotated instantly using a heuristic or set of heuristics.
  • Pre-trained old models, which may be biased or noisy.

The data in these data sources is often imprecise (the data has labels, but the labels are not as accurate as expected) or inaccurate (some of the labels have errors). You can program the model to learn from collected data sets using simple techniques or annotation capabilities such as pattern recognition. Then, more ideal weights are obtained by adjusting features and hyperparameters until the model achieves the desired performance. Smaller supervised data sets can be included as needed to complete model training. Weakly supervised learning is a method ofprogramming training data. Its purpose is to reduce manuallabeling dataThe time required. This approach is best suited for classification tasks when there are unlabeled datasets to manage, or when the application scenario explicitly allows the use of weakly labeled sources. By now you probably know how to use active learning to obtain artificial intelligence data more effectively and how to label a large number of data sets.

Guess you like

Origin blog.csdn.net/Appen_China/article/details/134971506