Just how difficult vulgar anti-technology? This time, first introduced BERT Today's headlines

 

July 30, headlines today announced the launch of a new version of "Spirit Dogs" anti-vulgar Assistant, a new tool in addition to the evolution of the text recognition feature for the first time joined the picture identification function. This is the first time in six months, "the spirit of the dog" is another major upgrade.

"Spirit Dogs" is a test content health of gadgets designed to help people combat vulgar low-quality content, purify the Internet space. A new generation of "Spirit Dog" was first introduced in the field of natural language processing in recent popular  BERT  model, after the training data up to 1.2T, the spiritual content recognition accuracy dogs raised to 91%.

 

AI can really solve the contents reviewed? In today's headlines headquarters, beating byte Artificial Intelligence Laboratory Director Wang Zhanghu and we had some exchanges. Now it seems that technology can solve many problems, but also a lot of disadvantages.

 

 

Difficult technical review of

 

In the mobile Internet into the mainstream of today, technology companies need to process data are growing exponentially, many companies are establishing their own technical audit. Last September, Facebook released and deployed called "Rosetta" content review system to solve the problem, Rosetta real time to extract text from more than 1 billion images and video frames in a day, and can identify and review a variety of text language.

 

At home, we know almost launched last year the community management of the brain, "Wali" hope that through a variety of algorithms unfriendly treatment in the community, content irrelevant answers, low quality, illegal and so on. According to reports, the system can clean up the newly created low-quality content of about 5000 per day.

 

Although various companies are using their own algorithm technology to deal with illegal content, but the face of the unlimited possibilities of language and pictures, artificial intelligence still often be wrong. On the other hand, the contents of the audit as driverless cars, the consequences will be very serious due to the Missing. There is not enough recall it, then good algorithm can not be practical. Between the US Independence Day last year, "Declaration of Independence" in excerpts been Facebook's algorithm determines that the alleged racial discrimination that has been removed.

 

So, where text, picture processing technical difficulties in? Let's start with how technology can make learning a language to start.

 

Language comprehension: the crown jewel

 

History Natural Language Processing (NLP) with a history almost as long as computers and artificial intelligence. Since the birth of the computer, there is the study of artificial intelligence, which is the earliest research field of artificial intelligence machine translation and natural language understanding. This does not mean that today's capacity for understanding how high the machine language, in fact, we have a long way to go from a real smart.

 

Computer very good structured data, such as spreadsheets and database tables. But we humans normally use unstructured text communication with each other, this is not a good thing for a computer.

 

In order for the machine to understand language, we often need to follow a pipeline process: First, to split the text into individual sentences, then the sentence is divided into different words or marks, then, we need to let the machine to try to guess each tag parts of speech: nouns , verbs, adjectives and so on. After Lemmatization, stop word recognition, and other dependency-resolution process in the named entity recognition (NER) process through statistical models, using context to guess the word on behalf of what type of terms.

 

Although natural language processing technology allows computers to understand the meaning of the text to some extent, but most studies are based on English. Only from the point of view of NLP research: in English in speech tagging, quite a difference in syntactic analysis and other tasks. Mainly reflected a significant inflections (singular and plural, tenses, etc.) in English and Chinese lack of these inflections.

 

Let BERT learn Chinese

 

For the text of the audit, the algorithm must be able to "fit" through a process known word semantics; on the other hand, the algorithm also must have the ability to generalization, on the basis of understanding semantics, can learn by analogy.

 

The most common text classification model includes Fasttext, TextCNN, TextRNN and variations thereof. Wherein, fasttext directly based on average token embedded in the text classification, although this method does not consider the word order, but it is simple and effective. TextCNN convolutional modeling based on a local dependency of the text (local feature), the global learning information pooled. CNN is able to capture the relationship between word order while local dimensionality reduction. To modeling long-distance dependencies, need to rely on the multi-layer cell layer and the convolution model more complex structure. TextRNN sequence pattern based on modeling GRU LSTM or text, the text can be effectively modeled long distance dependencies.

 

Today's headlines behind "Spirit Dogs" text classification model has gone through three iterations, text recognition application of the model of the first generation spirit dog is "word vector" and "CNN (convolution neural network)" technology, the training data set containing 3.5 million data samples, the prediction accuracy of the random sample of 79%. The second-generation dog spirit, the application is "LSTM (short and long term memory)" and "attention mechanism", the training data set that contains 8.4 million data samples, the accuracy rate increased to 85%.

 

Each new version Compared to the old version, and the technical aspects of data collection has a significant jump. The third-generation Ling dogs already have access to BERT.

 

"BERT" is currently the most advanced natural language processing technology in recent years, significant progress in the field of NLP synthesizer. This technique common in reading comprehension, semantic implication, questions and answers on various tasks, such as correlation had once broken the 11 best-record, but also because of the amount of parameters up to 300 million prohibitive for most developers. "BERT" presents a model of deep structure, "block" approach while taking advantage of the context of improving the accuracy and ultra-large-scale natural corpus modeled by unsupervised learning. Due to natural language has a natural coherence, through predictive power of large-scale training of language models, reached an unprecedented level.

 

The new "Spirit Dogs" while the application of "BERT" model and semi-supervised learning, and the use of a special Chinese corpus on this basis, adjusted the model structure without sacrificing effectiveness, making the computational efficiency reached a practical level.

 

Today's headlines said LSTM + Attention compared to the previous embodiment, the content identification scheme BERT model machine under 125ms delay, increased operator demand force 33 times, improve the accuracy rate was 7.04%.

 

Image Recognition: there are always strange things happen

 

And text is different machine image recognition process is like in the Braille reading pixel is a single information points, to make a final judgment by the most reasonable set of all points of information content. This approach allows the machine has more than humans can on a particular visual image processing. For example, in the identification of plant and animal species, we are more than just a computer "professional." But in more cases, content inspection is a challenging task.

 

The basic idea of ​​the current common image classification is based on ImageNet pre-trained classification models (eg ResNet, Xception, SENet, etc.), making structural adjustments and parameters; and then based on the model image feature extraction after fine tuning, as a specific task classification model input image classification. The convolution-based neural network approach has the risk of being "deceived" in.

 

 

Animal images in the figure above, the first time since 1892 in a German magazine has been confusing people: some people only see a rabbit, some people only see a duck. Some people took this picture input into the Google image recognition machine tools, machine results considered 78% probability of a bird, is a 68% probability of a duck.

 

Worked at BuzzFeed data scientist Max Woolf then we designed a more complex experiment: he simply let this picture spin up, wanted to see what the machine will do the judging. As a result, Google AI initially thought it was a duck, duck mouth pointing 9:00 direction. With duck mouth up to 10:00 direction, and soon Google AI considers painting there is a rabbit, duck mouth go until after 2:00 direction. After a period of time, Google AI think neither duck nor rabbits. Until 7 o'clock direction, Google AI again is certainly a duck.

 

Some people think that, perhaps because humans have a priori knowledge of space for when judging an object - such a marked data to train the model, will also considerations of space and direction, etc., unknowingly. Moreover, the machine will not only rotate the picture confusing, sometimes even different picture sizes also make the machine gives a different judge.

 

Optimization depth learning model

 

For image content review, the difficulty consists of three areas: data is not balanced, great variance within the class and not exhaustive . Less offensive image sample data set accounted for the proportion of content, often lead to deep learning model training ineffective. In addition, the kind of vulgar images of rich, complex, composed of vulgar picture features vary widely.

 

In this regard, "the spirit of the dog" solution is to optimize the use of deep learning. "We are in the data model and calculation, and so do a lot of optimization," Wang Zhanghu Road. "On the data plane, the Spirit dog has accumulated tens of millions-level training set, while at the model level, the spirit dog for a number of difficult samples made model structure tuning, try to solve the multi-size, multi-scale, complex small goals problems. on the level of computing power, the spirit of dog training using a distributed algorithm and GPU cluster training, accelerated training and commissioning model. "

 

In response to the user to upload pictures in different proportions, Today's headlines designed "multi-barrel model" in image recognition algorithms, so that various proportions of picture identification can have a good effect. When the model to predict the ratio of the algorithm looks for the closest "bucket" according to the proportion of the incoming picture, and then give the corresponding predictions. Since the parameters corresponding to different proportions of the tub model is shared, and single mode prediction time was close. And because after a process corresponding to the model, the algorithm can be further improved accuracy.

 

In people-oriented scenario, the proportion of people in the area in order to solve the picture changes a big problem, engineers introduced the characteristic pyramid structure of objects at different scales, it can improve the ability of the model to extract consistency features. Conventional network structure will image a plurality of times the convolution of the image feature map, then the whole butt connection layer further classification result obtained image - but this method has a drawback, if the proportion of people in the test set in the picture training set and a large gap, it will lead to decreased effect. High semantic feature information into the network pyramid structure, wherein the bottom and upper integration features, and give each prediction result, simultaneous use of the underlying features high resolution and high-level features.

 

 

In response to challenges small areas appear in the picture, Today's headlines also designed the division assisting classified network. The network incorporates the pyramid feature structure, the training is divided into two parts, each divided part predictions are denoted by calculating area loss predicted classification section maps in the feature region and then enters the classifier and classification labels calculation of the loss ; prediction, wherein the prediction region output will pyramid structure, wherein the region of FIG superimposed, and then fed to a classifier to obtain a classified result.

 

Although the use of optimized algorithms, but difficult to get some technical problems at this stage also depends on human judgment: world famous paintings often appear nude image, if entirely up to the judgment machine, the machine exposed area by identifying the people portrayed in the skin, You will think the picture is pornographic vulgar; and some pictures taken ballet, perspective view of the machine, perhaps similar to the skirts photographed.

 

Wangchang Hu believes that the complexity of the problems and limitations for judgment vulgar different ways of judgment, one needs evolving technology model, on the other hand the need for effective combination of technology and human judgment in two ways.

 

"Our model is still evolving, in addition to hundreds of model systems vulgar anti-dog spirit, as well as pornographic, vulgar, heading the party, false information, such as low-quality," Wang Changhu expressed. "Since its establishment in 2012, headlines today has established nearly ten thousand people a professional audit team to ensure the safety of the content."

 

Artificial intelligence can help us significantly improve audit efficiency and accuracy, but even at this stage in a long time, it still can not completely replace humans all judgment. Because the machine is also difficult to understand the meaning behind the content, do not free to switch different cultural scene, or learn in time to changing standards and criteria. Now it seems that the contents of the audit on the machine + manual method is the most reasonable common practice.

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104684652