Negative lookahead of regular expressions, negative lookahead, negative lookaround, replacement suffix is not pdf, png, uuid of these formats

"Negative lookahead", "negative lookahead" and "negative lookahead" all refer to the same regular expression concept, which is a mechanism in regular expressions that can limit the number of subsequent matches when matching strings. The string must meet certain conditions, but the characters of the input string are not consumed. Essentially, they are all zero-width assertions, i.e. they just assert that some text is present in the input, but not in the final match. They are widely used in advanced features of regular expression engines for matching more complex text patterns or constraints.

inner_div=re.sub(r'\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}(?!\.(png|jpg|pdf|\w+))', '', inner_div)

Here is a regex to replace numeric-format UUIDs in strings with empty strings. The specific explanation is as follows:

\b means a word boundary, making sure to only replace whole UUIDs and not parts of UUIDs.
[0-9a-fA-F]{8} matches 8 hexadecimal digits representing the first part of the UUID.

  • Represents a separator that divides the UUID into 5 parts.
    [0-9a-fA-F]{4} matches 4 hexadecimal digits representing the 2nd to 5th parts of the UUID.
    -[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{ 12} matches the last 3 parts of the UUID, each of which consists of 4 hexadecimal digits.
    (?!.(png|jpg|pdf|\w+)) is a negative look-ahead to ensure that the UUID does not end with .png/jpg/pdf/w+ file extensions to avoid accidentally deleting files such as pictures.
    Finally, the re.sub() function will replace the matched UUID with an empty string, and get the part of inner_div that does not contain UUID.

This code is a regular expression that replaces any string in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with an empty string in the inner_div variable. Among them, x represents any character in 0-9, AF, af. This string form is usually used to represent UUID (Universally Unique Identifier), also called GUID (Globally Unique Identifier), which is an identifier.

A regular expression consists of the following elements:

\b: Indicates matching word boundaries, that is, there cannot be letters, numbers, underscores and other characters before and after the string.
[0-9a-fA-F]: Indicates that only one character among numbers and letters AF and af is matched.
{8}: Indicates that the preceding character is repeated exactly 8 times.
-: means match "-"
(?!.(png|jpg|pdf|\w+)): This is a negative lookahead, indicating that the following string must not end with .png, .jpg, .pdf or letters, numbers and any one of underscore characters as the end.
The role of the whole regular expression is to find a string in the inner_div variable that matches the above UUID format but does not end with .png, .jpg, .pdf or other characters in letters, numbers, underscores, and replace it with an empty string , which deletes the string.

Guess you like

Origin blog.csdn.net/weixin_45934622/article/details/131004602