Use Crawler to crawl Tencent News and save it into the database - Code World

Use Crawler to crawl Tencent News and save it into the database

Others 2022-04-23 03:56:56 views: 0

System environment: Windows7

Task requirements: crawling URL + crawling the news content inside + storage

http://www.oschina.net/p/Crawler project software address

1. We must first have a URL list. With the list, we can deeply dig the content of the news

Use the cl command to collect the content to crawl:

C:\Users\ssHss\Desktop\Jar包\ImageTemp>java -jar Crawler1.0.3.jar -cl http://news.qq.com/ -cq "div[class=Q-tpWrap]"

-cl http://news.qq.com/

-cq "div[class=Q-tpWrap]" 就是样式代码 <div class="Q-tpWrap" style:"xxsxxs:da;dadsad;sad;"><a href="x">x</a></div>

Extract the parameters after the rule -cq

Oh my god, why are there different codes in the crawling? ヾ(｡｀Д´｡). ok, we add the format parameter, haha, it will be safer to write the code in this way. -format feature

Through crawling, we found that news.qq.com/a/ is a URL feature shared by news

We add stunt -fromat "news.qq.com/a/"

Add File, we generate the URL to the local path - input localpath

The first step we completed the collection of URLs

2. Deep crawling content using ci command

Load the local URLlist file for crawling

I read the content, I made a mistake in neirong extraction and changed it to div[class=db].

3. Import the database

finished

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325445358&siteId=291194637

Use Crawler to crawl Tencent News and save it into the database

Interface News Crawl | Crawler

Use selenium to crawl news

Crawler task 1: use httpclient to crawl the news headlines and urls of Baidu News homepage, the encoding is utf-8

Python crawler | Use Selenium and BeautifulSoup to crawl xxxticket information and save it to an Excel file

Use Python crawler to crawl the Top 250 movies from a website and save them as Excel files

Teach you how to use the Scrapy crawler framework to crawl the food forum data and store it in the database

node crawler: crawl csdn blog and save it as markdown file

[Python crawler] Use proxy to crawl sister images

[Crawler] Use selenium to crawl dynamically loaded pages

Scrapy combat, use Scrapy to simply crawl news and store content

Use python crawler to crawl data and then use echarts data visualization analysis

Teach you to use Python to crawl Baidu search results and save them

How to use API to crawl data, what is the difference between it and web crawler?

Use Python native crawler to crawl simple information of blog posts

How to use Python crawler to crawl the barrage data of station B?

Crawler combat: use Selenium to crawl JD baby information

How to use Python web crawler to crawl NetEase cloud music songs

How to use python crawler to crawl data in only six steps!

Crawl news content

Python crawler-crawl document content, how to remove the table in the document and save the text content

Use Java to save calendar information to the database

[Selenium crawler] Yhen takes you by hand to use selenium automated crawler to crawl one piece anime pictures

Python crawler to play by yourself: Use a small python crawler to crawl the novels you want on the Internet

Crawl Tencent Comics

Post Bar Crawl and Save to Local

Python crawler advanced article-use the beautifulsoup library to crawl web page article content practical demonstration

Use multi-threaded crawler to crawl Fanjian.com data--www.fanjian.com

``Python crawler series explanation'' 13. Use Scrapy technology to crawl network data

Teach you how to use Node.js crawler to crawl website data

Recommended

Face Wall Intelligence releases the Eurux-8x22B open source large model - it can be called the "science champion"

Kaiyuan Daily | Google supports Hongmeng to take over; open source Rabbit R1; Android phones supported by Docker; Microsoft’s anxiety and ambition; Haier Electric shuts down the open platform

Ranking

Jianzhi offer interview questions 68-II. The nearest common ancestor of a binary tree (recursive)

jQuery Mobile development 1-UI components

Summary of Kotlin function knowledge

[ZZ] The Naked Truth About Anisotropic Filtering

Diagnosis: record a recovery of abnormal CRASH storage caused the database to be unable to be opened normally

Redis introduction and Linux installation Redis

Experiment 2 Introduction to switches

glances open source command-line system monitoring tools introduced

Use of selector

vue3 handwriting a carousel picture

Daily

More

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)

2024-05-01(4)

2024-04-30(36)

2024-04-29(5)

2024-04-28(12)

2024-04-27(29)