The difference between .text and get_text()
in conclusion:
1. Calling get_text() without parameters is equivalent to .text, there is no difference.
2. However, get_text can also support various keyword parameters to change its behavior (separator, strip, types). If you want to control the result more flexibly, you can use get_text.
3. For visual reading, .text seems to be more elegant.
Implementation of the official source code:
@property
def stripped_strings(self):
for string in self._all_strings(True):
yield string
def get_text(self, separator=u"", strip=False,
types=(NavigableString, CData)):
"""
Get all child strings, concatenated using the given separator.
"""
return separator.join([s for s in self._all_strings(
strip, types=types)])
getText = get_text
text = property(get_text)
Reference: https://blog.csdn.net/f156207495/article/details/78074240
The difference between .text and .string, see example:
1、<td>some text</td>
2、<td></td>
3、<td><p>more text</p></td>
4、<td>even <p>more text</p></td>
The result of .string is:
1、some text
2、None
3、more text
4、None
The result of .text is:
1、some text
2、
3、more text
4、even more text
in conclusion:
1. In the first line, when the tag td is specified, there is no sub-tag, and there is text, the return results of the two are the same, both are text
2. In the second line, when the specified tag td has no sub-tags and no text, .string returns None, and .text returns empty
3. In the third line, when the specified tag td has only one sub-tag, and the text only appears between the sub-tags, both return the same result, and both return the text in the sub-tag
4. The fourth line, the most critical difference, when the specified tag td has sub-tags, and the parent tag td and sub-tag p each contain a piece of text, there is a big difference in the return results of the two
.string returns empty, because the number of texts >= 2, string does not know which one to get
.text returns the concatenation of two texts.
Reposted from: https://zhuanlan.zhihu.com/p/30911642