The difference between .text and get_text() in BeautifulSoup, the difference between .text and .string

The difference between .text and get_text()

in conclusion:

1. Calling get_text() without parameters is equivalent to .text, there is no difference.

2. However, get_text can also support various keyword parameters to change its behavior (separator, strip, types). If you want to control the result more flexibly, you can use get_text.

3. For visual reading, .text seems to be more elegant.

Implementation of the official source code:

    @property
    def stripped_strings(self):
        for string in self._all_strings(True):
            yield string
 
    def get_text(self, separator=u"", strip=False,
                 types=(NavigableString, CData)):
        """
        Get all child strings, concatenated using the given separator.
        """
        return separator.join([s for s in self._all_strings(
                    strip, types=types)])
    getText = get_text
    text = property(get_text)

Reference: https://blog.csdn.net/f156207495/article/details/78074240

The difference between .text and .string, see example:

1、<td>some text</td> 
2、<td></td>
3、<td><p>more text</p></td>
4、<td>even <p>more text</p></td>

The result of .string is:

1、some text
2、None
3、more text
4、None

The result of .text is:

1、some text
2、
3、more text
4、even more text

in conclusion:

1. In the first line, when the tag td is specified, there is no sub-tag, and there is text, the return results of the two are the same, both are text

2. In the second line, when the specified tag td has no sub-tags and no text, .string returns None, and .text returns empty

3. In the third line, when the specified tag td has only one sub-tag, and the text only appears between the sub-tags, both return the same result, and both return the text in the sub-tag

4. The fourth line, the most critical difference, when the specified tag td has sub-tags, and the parent tag td and sub-tag p each contain a piece of text, there is a big difference in the return results of the two

.string returns empty, because the number of texts >= 2, string does not know which one to get

.text returns the concatenation of two texts.

Reposted from: https://zhuanlan.zhihu.com/p/30911642

Guess you like

Origin blog.csdn.net/bigcarp/article/details/128580324