Shanghai International Studies University corpus analysis tool developed abbreviated

Explanation

By chance, I made a small corpus analysis tools to help students Chinese bulk export data, Quick Stats Sentences such as accounting, data sources using the bbc Corpus Beijing Language and Culture University (currently this corpus language schools have been converted to use the North, not accessible from outside).

reptile

I wrote a small reptile put on your own server, day and night, will save tens of thousands of pieces of data to the database server's own.

  • Reptile V1.0, simple paging crawling data, get stored into the database after the data; if discovered after deployment quick access to North corpus, the other site is easy to 500, led me down a long time and repetition of reptiles crawling.
  • V1.1 reptiles, timing is set to reduce the frequency of access, and add tasks breakpoint crash restart; calculation time after deployment, requires 20 hours to find crawling 10w of data is completed, too long.
  • Reptile V2.0, open two crawler process, which opened two threads, e-mail or complete collapse of the reptiles remind; crawling after deployment time reduced from 20 hours to five hours, wake up crawling completed .

Ali Flying Ice

Ali first time flying ice component library, find useful, very smooth process of writing code, build, package, compiled results are good, the scaffolding is still very sound, suitable for use agile development, UI component library is also very effect it is good.

Page screenshot

Guess you like

Origin www.cnblogs.com/bbman/p/12072303.html