Python Crawl Portal Forum Comments

Python crawling Sina Weibo comments

  • Environment: Python3 + windows.

  • Development tools: Anaconda + Jupyter / VS Code.

  • learning result:

  1. Getting to Know the Crawler/Robots Protocol

  2. Learn about browser developer tools

  3. Handling of dynamically loaded pages

  4. Data collection of mobile client pages

Robots.txt protocol

Robots Protocol, also known as Crawler Protocol

The website tells search engines which pages can be crawled and which pages cannot be crawled through the Robots protocol. Robots is a protocol, not a command. The Robots.txt file is a text file that is placed in the root directory of the website and can be created and edited using any common text editor. Robots.txt is the first file to be viewed by a search engine when visiting a website. Its main function is to tell the spider program what files can be viewed on the server. Finally, if your time is not very tight, and you want to improve quickly, the most important thing is not to be afraid of hardships, I suggest you to contact Wei: 762459510, that is really good, many people are making rapid progress, you need not be afraid of hardships ! You can go and add it~

The Robots protocol is a common code of ethics in the international Internet community. By convention.

Python crawling Sina Weibo comments

Python code

  • import module

image.png

  • Anti-climb

image.png

image.png

Python development direction

  • Data Analysis/Data Mining

    Association analysis [beer and diapers], clustering, discriminant analysis, random forest .

  • artificial intelligence

    An intelligent machine that responds in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and more. For example, AlphaGo, AlphaGo Zero. Finally, if your time is not very tight, and you want to improve quickly, the most important thing is not to be afraid of hardships, I suggest you contact Dimension: 762459510, that is really good, many people have made rapid progress , I need you not to be afraid of hardship! You can go and add it~

  • Python operation and maintenance

    Operation and maintenance that will not be developed will eventually be eliminated! ! !

  • web development

    Develop websites, such as Douban. Focus on actual combat! ! !

  • Python crawler

    Collect network data to provide support for data analysis or big data, such as Google, Baidu, etc. Focus on actual combat! ! !

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326952437&siteId=291194637