python learn 2 regular expressions

match ip address

Refer to http://blog.chinaunix.net/uid-108431-id-3350731.html

http://blog.csdn.net/liangyuannao/article/details/8755325

The first step: address analysis, regular initial judgment 

  1. 1, 0-9 \d to match
  2. 2. 10-99 [1-9]\d to match
  3. 2. 100-199 1\d\d to match
  4. 3. 200-249 2[0-4]\d to match
  5. 4. 250-255 25[0-5] for matching
  1. Basic expressions can be combined as: \d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]
  2. The first three parts of the regular expression are combined into: ((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.){3}
  3. The first three parts plus the last part are combined into: ((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.){3 }(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])

Step 3: Further Merge

 

Can be further combined as:

r = r'((([1-9]?|1\d)\d|2([0-4]\d|5[0-5]))\.){3}(([1-9]?|1\d)\d|2([0-4]\d|5[0-5]))'

After testing, it is found that the ip address with other string filtering will have problems

changed to

r='((?:(?:(?:[1-9]?|1\d)\d|2(?:[0-4]\d|5[0-5]))\.){3}(?:[2-9]|(?:[1-9]|1\d)\d|2(?:[0-4]\d|5[0-4])))\D'

re.findall(r,s)

Notice:

  1. This regular expression has two flaws:
  2. 0.0.0.0 cannot be judged
  3. 255.255.255.255 cannot be judged correctly
  4. This problem needs to be solved with additional code

built-in functions,

abs() takes absolute value max() min()

divmod(x,y) returns the modulo quotient and remainder callable (function) to determine whether the function can be called

isinstance (x, y) judgment type

str.capitalize() capitalize the first letter of the string

str.replace() String replacement This is a specific replacement, see sub() for fuzzy replacement

string.replace(s, old, new) needs to import the string module

The filter(function, sequence) function acts on the sequence and can be used for sequence filtering

regular expression

r1 = r"\d{3,4}-?"

Compile regular expression as object

re_tel=re.compile(r1)

In this way, the method of re can be called as an object

re_tel.findall('string') returns a list

Matching is case insensitive

re_tel=re.compile(r1,re.I)re.I The attribute of calling re means case insensitive

Whether the object matched by match() is at the beginning of the string, an object is returned. Generally, the return value can be given to a variable to see if the variable is none.

search() matches the string no matter where it is, and it also returns an object

finditer() returns an object iterator after matching, which can be viewed through the next() method. The return is also an object. If you want to see its value, you can use the method group()

A method similar to group() also has

start() returns the position where the match started

end returns the position where the match ends

span returns a tuple containing (start, end)

sub is to intercept the string, but supports multiple delimiters

sub(regular, 'replacement object', original string) This is a fuzzy replacement

re.split(regular, string) supports multiple delimiters and supports regular expressions

r1=r"[\+\-\*]"

re.split(r1,s) uses +, _, *, three strings as separators

Attributes that are often used in regular expressions

re.S makes . match all characters including newlines re.findall(r1, string, re.S)

re.I makes the match case-insensitive

re.M Multi-line matching, affects ^ and $, and multi-line matching. If the regular expression is multi-line, this attribute should also be added

The verbose state of re.X being able to use REs means that it is organized more clearly and comprehensibly

Regular expression grouping

email = r "\w{3}@\w+(\.com|\.cn)"

re.match(email,'[email protected]') will return an object if it matches

re.findall (email, [email protected] ) returns the matched group string [.com]

if we have a string s=''' sdf hello sun=westos t43

                                          wefs sun=no t43 sdf

                                          hello  sun=ni  t43 sddsf'''

r1=r"hello sun=.+ t43"

All matched data returned by re.findall(r1,s) ['hello sun=westos t43','hello sun=ni t43']

If we do group processing

r1=r"hello sun=(.+) t43"

The priority returned by re.findall(r1,s) is the data in the group, that is, the data matching the group ['westos','ni']      

match can also match      

This can be used by web crawlers to get the url or ip we want to get                          

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325456761&siteId=291194637