"Big Data Application Scenario" Next Door Pharaoh (serial 2)

Everyone's good friends, our good neighbor Lao Wang is here for the second time in a week! ! ! Last time I told you that Lao Wang found a suitable partner with the help of our company's crawler, and business has been booming since then, but Lao Wang is not satisfied with the status quo. When he heard that Coca-Cola used big data to analyze the cherry-flavored cola and launched it all over the world, he burst into joy. His spicy tiao factory wants to launch new flavors of spicy tiao, please help him analyze it.

Step 1: Open the crawler

First , the editor first searches for all the information about meals, snacks, snacks, instant noodles, etc. from Weibo search. The collected information text contains some irrelevant spam information, such as advertisements, etc. .

Step 2: Stuff the junk information into the classifier

But what Lao Wang needs is only about the tastes that the public likes in these four types of food. These junk information will not help, but will increase the burden on the system. Therefore, it is necessary to give a spam flag to this kind of text. So how to determine whether it is junk text? Xiaobian input the characteristic words of junk text in the classifier. If the speech text belongs to junk text, it will give a junk identification.
Through the above steps, Lao Wang, with the help of the editor, screened out all the taste evaluations about meals, snacks, snacks, and instant noodles through the classifier.

The third step: stuff the valid information into the classifier

Next , the editor will start to classify the information, that is to say, the classifier using the front sniff will automatically label the obtained text according to the type of speech. into the category. Among them, the category of speech is determined according to the provided type, and the classifier is trained from large-scale labeled data using machine learning methods. For an input speech text, N trained classifiers determine whether the speech belongs to this category, such as braised beef flavor in instant noodles, pickled cabbage in Laotan, etc. If the speech text belongs to this category, the speech is labeled accordingly.

Through the above steps, Lao Wang, with the help of the editor, has accurately classified all the valid information according to the various tastes of dinner, snacks, snacks and instant noodles through the classifier.

Step 4: Determine whether you want to eat / don't want to eat

The last and most important step is sentiment polarity analysis, also known as text orientation analysis, which is to judge whether the comments on Weibo belong to wanting to eat, not wanting to eat, or whether to eat or not. The sentiment polarity judgment of user speech text is divided into two processes, namely, the emotion polarity model training process and the speech text emotion polarity discrimination process. First, speech texts marked with emotional polarity need to be input into the classifier, and three types of texts are required: texts you want to eat, texts you don't want to eat, or texts that you can eat or not. After the emotional model is trained, this classifier can be used to judge the emotional polarity of the speech text, and then the emotional polarity label of the final output speech text: want to eat, don't want to eat, or eat or not.

Step 5: Close

the classifier The results of the classifier are associated with the ForeAna data analysis engine, and a visual chart is automatically drawn.

Pharaoh ran to the factory with the result in high spirits. The workers refused the result, and threatened them with ba gong. If they wanted to produce spicy sticks with these flavors, they would go to a Long Spice stick factory. . .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326184400&siteId=291194637