Data analysis in python crawler --- detailed explanation of xpath expression

1. What is xpath

1. XPath (XML Path Language) is a language for finding information in HTML\XML documents, and can be used to traverse elements and attributes in HTML\XML documents.

2. Know xml

The difference between html and xml

insert image description here

xml tree structure

insert image description here

Three. The usage of xpath

XPath uses path expressions to select nodes or sets of nodes in an XML document. These path expressions are very similar to the expressions we see in regular computer file systems.

1 Useful expressions

insert image description here

2. In the table below, we have listed some path expressions and the results of the expressions:

insert image description here

3. Pick unknown nodes

insert image description here

4 some cases

insert image description here

Four. Summary

  1. An overview of xpath XPath (XML Path Language), a language for parsing, searching, and extracting information
  2. Node relationship of xpath: root node, child node, peer node
  3. Key syntax of xpath to get any node: //
  4. Key syntax of xpath to get nodes based on attributes: tag[@attribute='value']
  5. Get the text of the node in xpath: text()
  6. Get node attribute value of xpath: @attribute name

5. Expand knowledge

1. Escape characters

insert image description here

raw string

1 Since the backslash in the string has a special effect, when the string contains a backslash, you need to use the escape character \ to escape each "" contained in the string.

For example, we want to write a string about the Windows path G:\publish\codes\02\2.4. If we write this directly in the Python program, it will definitely not work. We need to use the \ escape character for each character in the string. "" to escape, that is, to write in the form of G:\publish\codes\02\2.4.

2. Raw string starts with "r", it will not treat backslash as special character.

Guess you like

Origin blog.csdn.net/m0_74459049/article/details/130305418