AI data labeling industry is facing five major development difficulties Shu Man Fu Technology

IResearch released the industry white paper shows that in 2018 China artificial intelligence underlying data services market size of 2.586 billion yuan, is expected to market size in 2025 will exceed 11.3 billion yuan, the annual compound growth rate of the industry reached 23.5%.

As a cornerstone of the industry of artificial intelligence, data labeling industry is moving from behind the front, prospects for the future is unlimited.

However, as is the pre-dawn darkness, like an endless, data tagging behind the booming industry is also facing many difficulties, these difficulties have become the biggest stumbling block to hinder the development of the industry.

1. highlight the human cost

Although dressed in artificial intelligence, data labeling, "coat", but essentially still a labor-intensive industries.

At present, the domestic population data in the labeling industry has reached tens of millions, of which 90% of the employees as members of tagging data, these data mark members dispersed in large and small marked team.

20 people full-time to a small label team, for example, the average daily labor cost per person is about 100-200 yuan a month just on labor costs in consumption reached 60,000 -12 million, and a cycle in half months or more projects may mark the total contract amount, but only a few million, this result is a lot of little profit or are they marked the team can not profit.

In fact, many marked team managers have recognized the threat of labor costs for the development of the team, but very few people choose to layoffs ways to reduce these costs, even risk the loss of many teams wore constantly adding people, because without he, as the data label industry labor-intensive industries, to ensure there is enough manpower able to eat large projects, the more layoffs can not receive more high-profit large projects, pick up pick up some small resources, is probably the team eventually face the dissolution of the situation.

2. The labeling efficiency is low

In large-scale labor costs can not alleviate the situation, put the data in front of the most feasible way of marking the team it is to improve labeling efficiency.

Proficiency through the recruitment of a higher data annotation member, or use high-efficiency annotation tool can be done in a short period of time effective to enhance the labeling efficiency, but to the real implementation stage, many people have found is not so simple.

On the one hand, high proficiency of high-quality data marked member of the industry is still scarce in the state, especially with the AI ​​companies increasingly high demand for the scene tagging data, this gap will grow. As a simple example, in the field of voice annotation, there are many items labeled in English, but English proficiency data marked member has scarce.

On the other hand, the lack of an efficient industry data annotation tool. At present, many teams are using the open-source tagging annotation tools, such tools even though it can meet the basic labeling requirements, but in terms of efficiency, accuracy has been far from meeting the current needs of the company AI.

AI data labeling industry is facing five major development difficulties Shu Man Fu Technology

3. AI accuracy is difficult to meet the company's needs

There is a simple but important consensus within the industry of artificial intelligence: data sets directly determine the quality of the final model quality is good or bad.

Machine Learning rely mass label feeding data, the level of data quality AI will eventually produce smooth landing a critical impact.

At present, many AI companies are aware of this, and proposed new labeling requirements on data quality. For example, in the past marked data accuracy rate of 95% to meet the needs of AI companies, but now it needs to reach 99% or even 99.99%.

However, a considerable number of companies unable to meet the labeling requirements such as the reasons mentioned above, on the one hand data tagging capability uneven member, on the other hand annotation tool low quality.

4. Data Security doubtful

Data labeling industry since the "data" relevant, then security must be the focus of concern for many businesses.

In the field of security, because it involves the need to collect a lot of people face tagging and other private data, thus ensuring data security has become a rigid demand side of many projects.

From data acquisition, data annotation, to the preservation of data, every step must ensure that data is not leaked, not stolen, which for many teams are unable to do.

On the one hand, a lot of teams do not have their own independent research and development platform for annotation, still use open source tools or modification on the basis of minor open source tools, how to use this open platform to ensure data security?

In addition, many enterprises in the labeling and storage of data in the process, due to the cost factor is still used in public server, which for many companies in the field of security, it is clear that is not compliance.

The lack of labeling capability Scene

As artificial intelligence technology began large-scale ground applications, AI company to the scene labeling requirements are becoming more and more data.

Automobile autopilot for example, the frequency of occurrence related to automobile manufacturers for labeling scene requires more and more sophisticated, complex long tail scenes such as red light running vehicles, pedestrians cross the street, roadside illegal parked vehicles, etc. began to increase, quite a lot of data tagging team can not meet the AI ​​companies similar labeling requirements.

AI data labeling industry is facing five major development difficulties Shu Man Fu Technology
Car autopilot marked scene (Source: Man Fu Technology data tagging platform)

This aspect of the lack of data tagging team customized services related to the ability, on the other hand is also marked with the annotation tool features simplify teams using. Under the background of large-scale landing AI, can not meet the needs of the project side, that means the risk of being eliminated, to enhance the scene, and customization capabilities are marked marked team in front of a lot of very real needs.

For the above difficulties, Man Rover Technology departure from reality, the efforts made by the following:

  1. Professional team to create high-quality data service platform, service costs by 30% or more;

  2. Since independent data labeling SaaS research platform, pre-marked labeling efficiency can be improved more than four times the art blessing;

  3. Real-time and accurate estimation of the secondary screening AI, accurate data to more than 99%;

  4. Support for private cloud deployment, real-time monitoring to strengthen security;

  5. Customized scene set up, 7X24-hour rapid technical response.

Through the above efforts, Fu Man Technology is committed to providing customers with high-quality, high efficiency, customized, marked the scene of the data service experience. Man Fu Technology's data acquisition marked one-stop solution platform is autopilot, related fields of security, VR / AR, unmanned aerial vehicles, new retail, AI education, industrial robots, quietly changing the world.

Although current data marked dilemma facing the industry has affected the whole industry to flourish, but as Man Fu vision technology, we will rely on their own efforts, by all possible data liberation AI.

Guess you like

Origin blog.51cto.com/14624568/2463682