Winter big data study notes ten

  Today First simple to learn a bit xpath, there are many online introduction xpath, I will not elaborate, because they can lead out a lot of xpath attributes such as nodes, etc., I use spoken language to describe its usage. It can search through HTML tags out what you want in HTML. The following example, the first to see Tencent news home page, right-examination, appeared Developer options, Ctrl + F to bring up the search box xpath

 

 

 

xpath used as follows:

 / [Label]  represents the root node (the beginning) to find, in this case, can only be found  / HTML  , the other can not be found

 

 

 

 

 

 

 // [label]  is a match from the entire document, which is most commonly used on the label, it will be able to find matching tags from the entire document

 

 

 As FIG, div total of 129 can be found in

 [@ (class / id /...)= " "]  is limiting the search range based on the attributes, brackets writing directly behind the label

These are the three basic rules, with these three rules, you can write most of the xpath. For example, I want to get on the link below:

 

 

 

 // a [@ class = "picture "] / @ href  to give the corresponding links, this means the href attribute is the root node of the non-selected picture from the class attribute.

 

 As for why there are three, of course, it is because there is more than one link to the same tag attributes.

In addition, if they do not want to write, right floating on xpath want the label, right-click to select copy, select xpath xpath can get, but I think this xpath not universal, so it is better to write your own.

 

 

 

  继续学习scrapy框架,看了几个简单的教程例子后准备动手写一个。我的目标是爬取卫健委的疫情数据。创建项目,设置Setting,爬取源码,然后得到一大堆js……没错,这个网页使用js动态加载的,并非静态网页,而很不巧的是,scrapy没法爬取动态加载网页的网页数据,所以只好找了一个静态网页练习了一下。正在寻找scrapy爬取动态网页的方法。似乎scrapy-splash可以,正在实验中。

Guess you like

Origin www.cnblogs.com/YXSZ/p/12287347.html