python--python basic learning--strings and regular expressions

table of Contents

1. Common operations of strings

1.1 Concatenating strings

1.2 Calculate the length of the string

1.3 Intercept string

1.4 Separate and combine strings

1.5 Retrieving a string

1.6 Case conversion of letters

1.7 Remove spaces and special characters in strings

1.8 Format string

2. String encoding conversion

2.1 Use encode() method to encode

2.2 Use decode() method to decode

3. Regular expressions

3.1 Line locator

3.2 Metacharacters

3.3 Qualifier

3.4 Exclude characters

3.5 Select characters

3.6 escape characters

3.7 Grouping

Four. Use the re module to implement regular expression operations

4.1 matching string

4.2 Replace string

4.3 Use regular expressions to split strings


1. Common operations of strings

1.1 Concatenating strings

Use the "+" operator to complete the splicing of multiple strings and generate a string object.

Note: Strings are not allowed to be directly spliced ​​with other types of data

1.2 Calculate the length of the string

The len() function calculates the length of a string

1.3 Intercept string

String is also a sequence, so to intercept the string, you can use the slice method to achieve

string[start:end:step]
str1 = "人生苦短,我用python"
length = len(str1)
print(length) #13
substr = str1[1:6:2]
print(substr) #生短我

1.4 Separate and combine strings

1) Separated string

The split() method of the string object can realize string separation, that is, cut a string into a list of strings according to the specified separator

str1 = "明 日 学 院 官 网 >>> www.mingrisoft.com"
list1 = str1.split()
list2 = str1.split(">>>")
list3 = str1.split(".")
list4 = str1.split(" ",4) #只分割前4个
list5 = str1.split(">") #每个分割一次,没有得到内容的,将产生一个空元素
print(list1)
print(list2)
print(list3)
print(list4)
print(list5)
'''
['明', '日', '学', '院', '官', '网', '>>>', 'www.mingrisoft.com']
['明 日 学 院 官 网 ', ' www.mingrisoft.com']
['明 日 学 院 官 网 >>> www', 'mingrisoft', 'com']
['明', '日', '学', '院', '官 网 >>> www.mingrisoft.com']
['明 日 学 院 官 网 ', '', '', ' www.mingrisoft.com']

2) Combine strings

The merged string is different from the spliced ​​string, it will connect multiple strings together with a fixed delimiter

Combining strings can be achieved using the join() method of the string object

1.5 Retrieving a string

1) Count() method: Used to retrieve the number of times the specified string appears in another string .

If the retrieved string does not exist, 0 is returned, otherwise the number of occurrences is returned.

2) find() method

This method is used to retrieve whether the specified substring is included. If the retrieved string does not exist, it returns -1, otherwise it returns the index of the first occurrence of the string.

str.find(sub)

3) index () method

The index() method is similar to the find() method. It is also used to retrieve whether the specified substring is included. When the specified string does not exist, an exception will be thrown.

4) startswith() method

The start() method is used to retrieve whether the string starts with the specified string, if it is, it returns true, otherwise it returns false

5) endswith() method

The endswith() method is used to retrieve whether a string ends with a specified string.

1.6 Case conversion of letters

The string provides lower() and upper() methods for case conversion of letters

str1.upper()
str1.lower()

1.7 Remove spaces and special characters in strings

1) strip() is used to remove spaces and special characters on the left and right sides of the string

2) The lstrip() method is used to remove spaces and special characters on the left side of the string

str.lstrip()
str2.lstrip("@")

3) The rstrip() method is used to remove spaces and special characters on the right side of the string

1.8 Format string

1) To achieve a formatted string, use the% operator

eg: %s string (displayed by str())

2) Use the format() method of the string to format the string

2. String encoding conversion

In python3.x, the default encoding format is utf-8, which effectively solves the problem of Chinese garbled characters.

2.1 Use encode() method to encode

The encode() method is a method of the str object, which is used to convert a string into binary data (ie bytes), which also becomes an encoding

verse = "野渡无人舟自横"
byte = verse.encode('gbk')

2.2 Use decode() method to decode

The decode() method is a bytes object method used to convert binary data into a string, that is, the result of the conversion using the encode() method is converted into a string, also known as "decoding".

print("解码后: ",byte.decode("GBK"))

3. Regular expressions

3.1 Line locator

The line locator is to describe the boundary of the string, "^" means the beginning of the line, "$" means the end of the line

^tm
tm$

3.2 Metacharacters

3.3 Qualifier

For example, match 8-digit qq number

^\d{8}$

3.4 Exclude characters

To match a string that does not meet the specified character set, use ^

[^a-zA-Z]

3.5 Select characters

Use the selection character (|) to achieve

3.6 escape characters

Escape character (\)

3.7 Grouping

Parentheses change the scope of the qualifier

(six|four)th

Four. Use the re module to implement regular expression operations

Python provides the re module for processing regular expression operations. You can use the search(), match(), findall(), etc. methods provided by the re module

4.1 matching string

1) Use the match() method to match

     It is used to match from the beginning of the string. If the match is successful at the starting position, it returns the match object, otherwise it returns None.

    Match from the starting position, when the first letter does not meet the conditions, no more matching

re.match(pattern,string,[flags])
import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.match(pattern ,string,re.I)
print(match)  #<re.Match object; span=(0, 7), match='MR_SHOP'>
string = "项目名称MR_SHOP mr_shop"
match = re.match(pattern,string,re.I)
print(match) #None

2) Use the search() method to match

import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.search(pattern ,string,re.I)
print(match)
string = "项目名称MR_SHOP mr_shop"
match = re.search(pattern,string,re.I)
print(match)
#<re.Match object; span=(0, 7), match='MR_SHOP'>
#<re.Match object; span=(4, 11), match='MR_SHOP'>

[Note]: The search() method is not only to search at the beginning of the string, but also to search for matches in other positions.

3) Use the findAll() method to match

The findAll() method is used to search for all strings that meet the regular expression in the entire string and return it in the form of a list. If the match is successful, it returns a list containing the matching structure, otherwise it returns an empty list

import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.findall(pattern ,string,re.I)
print(match)
string = "项目名称MR_SHOP mr_shop"
match = re.findall(pattern,string,re.I)
print(match)
#['MR_SHOP', 'mr_shop']
#['MR_SHOP', 'mr_shop']

4.2 Replace string

The sub() method is used to implement string replacement

import re
pattern = r'1[34578]\d{9}'
string = "中奖号码为13645611111,请领奖"
res = re.sub(pattern,"1*******",string)
print(res)
# 中奖号码为1*******,请领奖

4.3 Use regular expressions to split strings

The split() method is used to split the string according to the regular expression and return it in the form of a list.

Its function is similar to the split() method of the string object, the difference is that the split string is specified by the pattern string

result = re.split(pattern,url)

 

Guess you like

Origin blog.csdn.net/yezonghui/article/details/113312099