Complete set of 00 road

Complete set Content: public opinion monitoring.

Lang Lang sway for a month, it began to complete the set up. Recently been reading papers, research how to start, things are difficult, just do not know where to start, has been not start, do not have.

First analysis of what is in itself a significant demand problem, the first step is to be sure crawlers. Because this system certainly do use some of the original database is stored on the order, be sure to read and write speed can not keep up, but now intend to conduct a little experiment with a small portion of the data to see whether or not they want the way it works. Reptile because it involves more than one site, we now intend to go two sites, each a different layout, you need to multi-threaded processing. Currently only think of these crawling out of the need to impatient, data cleansing, got part of what I want. This is the main, I want a part of.

These data will then be fragmented treatment, keyword clustering and classification put off to the future. So this week is going to do these things. Because there may be little contact before the algorithms involved, this is not simply before reptiles, research will try to grind it this week, closing on Sunday to see how the data results.

This is my first step in the complete set, come on! ! ! (Good sad ah !!!)

Guess you like

Origin www.cnblogs.com/mm20/p/11305519.html