AI data annotation has entered the era of automation, and a 26-year veteran has tapped into the global smart car market

4D data annotation has set off a new trend in the industry.

One is the perception technology represented by BEV, which converts the output space from 2D perspective image to 3D space + temporal dimension, and 4D annotation emerges at the historic moment; second, 4D annotation relies on point cloud level or object level reconstruction to accumulate original information through manual annotation. Data, and then feeding and training large models in the cloud to gradually replace manual labeling, which can improve labeling efficiency by more than 80%.

Coupled with factors such as domestic car companies benchmarking Tesla's data closed-loop solution and the development of autonomous driving to higher-level capabilities, car companies have put forward higher requirements for data annotation. For example, the data annotation accuracy rate must reach more than 99%, and service providers must Equipped with a dedicated automated annotation platform and annotation tools.

This also means that "high quality and efficiency" has become the focus of competition on the data annotation track, and the industry's technical threshold has been further raised.

On the one hand, the iteration of intelligent driving technology relies on the continuous optimization of algorithms, which in turn depends on the quality of data nourishment. Especially with the convergence of intelligent driving perception solutions and computing platforms, high-quality data has become a moat that OEMs and various intelligent driving solution providers focus on building.

On the other hand, traditional manual annotation has been unable to meet the demand for massive data sets for model training in terms of efficiency and cost. This has driven the data annotation industry to gradually align with automated data annotation, and a market elimination competition has begun. However, purely human data labeling companies that lack technical competitive advantages, and players that are difficult to support mass-produced car data labeling volume may be the first to be eliminated.

"The data annotation needs of the automotive market are constantly changing and increasing, including moving from conventional scenarios to long-tail scenarios. Data complexity is also constantly increasing (from 2D, 3D to 4D requirements). The requirement for annotation accuracy has reached 99.9%. In the long term, Looking at it, it will be reflected in the long-term cooperation capability requirements for data suppliers." Zhang Xianxiong, sales director of Appen (China), said.

It is reported that Appen, an AI data service provider founded in 1996 and officially entered the Chinese market in 2019, has an industry-leading artificial intelligence-assisted data annotation platform and an integrated AI data and resource management platform that can provide images, text, and voice. , audio, video and other types of data annotation services.

Facing the opportunities and challenges of autonomous driving scene data annotation, Appen has conveyed a new signal of strong potential through keywords such as "going overseas", "AI assistance" and "talent echelon" .

picture

Comprehensive layout of automated AI data annotation

Since 2021, autonomous driving has become a popular business scenario in the eyes of domestic data annotation service providers. The continuous demand for data annotation of different sensors, different models, and different special situations has pushed the data annotation track into the blue ocean.

From the perspective of industry needs, data annotation is mainly based on the requirements of car companies and intelligent driving solution providers. Data objects such as voice, point cloud, images, and videos are annotated in different ways, thereby providing a large amount of training data for algorithm iteration.

However, with the improvement of autonomous driving levels, the large-scale application of sensors such as lidar, cameras, and 4D imaging radar, and the continuous enrichment of application scenarios such as highways, urban expressways, and parking lots, the magnitude of autonomous driving data annotation has increased. It is rising exponentially, and it is difficult for pure manual annotation to handle 100k, 1000k or even larger-scale data annotation.

According to reports, Appen divides the data annotation platform into five stages based on the level of intelligence and automation : L0 purely manual data collection and annotation, L1 simple data preprocessing, L2 intelligent interaction, L3 semi-automatic annotation, and L4 fully automatic annotation .

picture

5 development stages of data annotation platform

At present, the industry is generally in the L1 or L2 stage, that is, batch-processed pre-annotation results are used as original input or simple human-computer interaction is used to improve annotation efficiency.

However, the problems solved by general pre-annotated models are usually limited and cannot cover the customized needs in real projects. The high requirements for model accuracy of interactive intelligence in the L2 stage also increase the difficulty of developing interactive models and limit the application in various types of projects. widely used in projects.

Based on the development trend of automated annotation and industry pain points, Appen has comprehensively laid out the two major sections of interactive intelligent annotation and pre-annotated large models + fine-tuning , entered the stage of semi-automatic data annotation , and continues to move towards fully automated data annotation .

picture

Appen MatrixGo platform data loopback

For example, in response to the core pain points of high annotation costs caused by high requirements for data accuracy, heavy reliance on manual labor, and complex tool usage logic , Appen independently developed the artificial intelligence-assisted data annotation platform MatrixGo - interactive intelligent annotation , which relies on simplicity. By replacing the dense contour drawing process, it can save about 50% of the labeling time compared with pure manual labeling.

Specific to lane line annotation in 3D point cloud data , the pain points are that lane lines are difficult to observe, sparseness of 3D point cloud data, lane line shape and reflectivity distortion, etc. Appen's interactive lane line auxiliary annotation model only requires simple Pull the box to bring the complete lane lines into the range, and the model can return the prediction results of the lane lines in real time. The annotator can complete the annotation by making simple modifications or adjustments based on the prediction results.

However, in Appen's view, AI automation is a gradual process. Currently, it is difficult to achieve fully automated labeling in autonomous driving subdivision scenarios. The main reason is the existence of corner cases and the need for manual labeling intervention to ensure the accuracy of standard results.

From a technical point of view, the algorithm capabilities of the annotation tool can only be continuously trained through the annotation results of specific scenes, making it infinitely close to fully automated annotation.

"Corner Case is a problem that requires long-term continuous optimization in the autonomous driving market. It faces problems such as fewer scenarios and difficulties in data screening. It requires high-quality data service providers to help customers continue to process with the help of data collection, data screening, data synthesis and other technologies. and optimization." Qian Cheng, senior director of product and R&D at Appen (China), told Gaogong Intelligent Automotive.

According to Qian Cheng, there are few specific scenarios for corner cases, and it is almost unrealistic to sift through massive road mining data. Simulation synthetic data, as a data enhancement technology, can fill potential or marginal usage scenarios, save data collection costs, and meet privacy requirements. , is one of the feasible ways to deal with the Corner case data problem.

Currently, the main ways to create synthetic data include extraction from distributed data, fitting real data to distributed data, and deep learning . Deep learning includes variational autoencoder models and generative adversarial network (GAN) models . For example, the variational autoencoder model compresses the initial data set and sends it to the decoder, and then uses the decoder to output the initial data set.

In fact, based on the huge application potential of synthetic data in corner case scenarios, Appen has taken the lead.

As early as 2022, Appen acquired a minority stake in the artificial intelligence data platform Mindtech, and the two parties carried out in-depth cooperation to enhance the ability to provide customers with synthetic data. It is worth mentioning that Mindtech is the world's leading developer of an end-to-end synthetic data creation platform for AI vision system training, and has achieved data synthesis by creating accurate neural networks.

With the technical support of Mindtech, Appen can provide synthetic data services and analyze whether the synthetic data is suitable for the customer's various models based on the customer's application requirements to help them quickly deploy AI solutions and put them into the market.

picture

Competition intensifies and gold rushes into new overseas markets

It can be said that the data of the fierce battle between the heroes marks the track, and the competition has become fierce.

Especially under the trend of software-defined cars, the data collection objects in the automotive industry are moving closer to mass-produced cars, and the quality requirements for data collection are getting higher. The demand for traditional external data collection is declining, and some autonomous driving companies are also trying to build With the data closed-loop tool chain, the data collection and processing industry faces more uncertainties.

On the one hand, data annotation has gradually shifted from relying on manual labor to relying on high-end technicians to develop automated tool platforms. Customers' data annotation needs have changed greatly, and the rules are facing uncertainty. This also means that the R&D costs invested by enterprises will increase, and gross profit margins will increase. At risk of decline.

On the other hand, from a long-term perspective, OEMs have also set a series of thresholds for the qualifications of data annotation service providers, such as the size of the service provider, the intelligence of the annotation tools, the response speed of customized annotation, etc., in line with the needs of the OEMs Data service providers will have more stable supply opportunities.

After all, the "strong binding" between OEMs and data annotation service providers is conducive to long-term control of their data security, and at the same time, they can quickly seize production capacity and reduce the implementation risks of autonomous driving-related projects.

"Appen has transformed from a pure autonomous driving data service provider to providing overall data service solutions such as consulting, products, and operations, forming a deeper bond with customers; and its investment in research and development in recent years can greatly improve efficiency. , is mainly used for large-scale efficiency improvement rather than single scenarios, and the effect of cost reduction and efficiency improvement has already appeared." Zhang Xianxiong said.

picture

Appen MatrixGo intelligent data service platform architecture

Following the development trend of the industry, Appen has established a moat with its independent and leading R&D technology, strong delivery capabilities and resources.

Currently, Appen China is headquartered in Shanghai, the AI ​​capital of China, with large delivery centers in Wuxi, Dalian, and Chongqing. It has more than 1,600 full-time employees, 1,000+ BPO (Business Process Outsourcing) resources, and tens of thousands of high-quality local Crowdsourced personnel and continues to expand.

In addition, Appen has independently developed an industry-leading AI-assisted intelligent data annotation platform , which can flexibly configure different annotation tools, automate the annotation process, and improve the production capacity of data collection and data annotation projects to ensure the delivery of high-quality training data to customers.

For example, MatrixGo, the enterprise-level AI data annotation platform developed by the Appen China team , has about 2,000 projects in operation every year. The platform is technologically leading in the field of 2D and 3D image annotation , greatly enhancing the scale expansion capabilities of local Chinese enterprises' AI projects. .

picture

2D image/video continuous frame annotation example on Appen MatrixGo platform

In addition to stepping up to seize the automatic driving window period, Appen, which is one step ahead, is also expanding certain opportunities in overseas markets.

According to reports, in the field of autonomous driving subdivisions, overseas annotation scenarios are not as complex as domestic annotation scenarios. However, the laws and regulations of different countries and regions, as well as the high requirements for data security, are important challenges faced by data annotation overseas at this stage. .

Appen’s innate international genes will empower it with strong overseas competitiveness. It is reported that Appen’s global headquarters is located in Australia, with offices in the United States, the United Kingdom, the Philippines and other countries and regions. Its capabilities include more than 70,000 locations in  170 countries  /regions around the world and more than 1 million professionals proficient in 235 languages. Crowdsourcing resources , as well as the industry’s most advanced artificial intelligence-assisted data annotation platform. The combination of past cooperation experience with major global car companies will feed Appen's stronger global service capabilities.

In addition, for data quality control, security management and privacy protection, Appen is always committed to providing customers with the highest level of management standards. In addition to ISO 9001, ISO 27001 and ISO 27701 certifications, Appen has also passed data security compliance certifications in different countries and regions around the world such as GDPR, SOC 2 Type II, HIPAA, etc. to ensure that data sources and channels are formal, safe and legal.

In the future, under the deterministic opportunities of the data labeling track and uncertain market changes, Appen will further strengthen the company's moat.

First, in terms of human resources , Appen will follow the changes in industry talent trends, promote the transformation of talents from "blue-collar-led" to "white-collar-led", reserve more professional talents, and release its own cost advantages through technological dividends while being able to cope with More difficult data service projects.

Second, at the product level , in the short term, we will strengthen the model research and development capabilities of the algorithm team, support higher-scale data volume, faster data flow efficiency, and improve flexibility, empower cutting-edge tool capabilities related to autonomous driving, and build industry technical barriers. .

picture

Appen MatrixGo platform 4D data annotation example

For example, for 4D model auxiliary functions, upgraded point cloud tool 2.0, etc., we can achieve better interactive design, support more complex scene data, improve the intelligence level of the tool, and effectively help customers build data closed-loop capabilities.

It is not difficult to find that the global data labeling bonus period belonging to Appen, a 26-year veteran, has just begun.

Guess you like

Origin blog.csdn.net/GGAI_AI/article/details/132684215