https://github.com/codelucas/newspaper
https://github.com/joelYing/NewsSpider
https://github.com/chrislinan/cx-extractor-python
https://blog.csdn.net/qq_34202873/article/details/78452449
https://cuiqingcai.com/7436.html
https://blog.csdn.net/tiandd12/article/details/72898316
https://www.92wenzhai.com/m/view.php?aid=14387
https://www.leiphone.com/news/201810/D9pffRYO2t2sTUBX.html
https://www.yuanrenxue.com/crawler/news-crawler-content-extract.html
http://forthxu.com/blog/article/73.html
dbscan聚类算法:
https://blog.csdn.net/u014688145/article/details/53388649
https://zhuanlan.zhihu.com/p/23504573
https://www.cnblogs.com/pinard/p/6208966.html
Gensim