table of Contents
1. Common operations of strings
1.2 Calculate the length of the string
1.4 Separate and combine strings
1.6 Case conversion of letters
1.7 Remove spaces and special characters in strings
2.1 Use encode() method to encode
2.2 Use decode() method to decode
Four. Use the re module to implement regular expression operations
4.3 Use regular expressions to split strings
1. Common operations of strings
1.1 Concatenating strings
Use the "+" operator to complete the splicing of multiple strings and generate a string object.
Note: Strings are not allowed to be directly spliced with other types of data
1.2 Calculate the length of the string
The len() function calculates the length of a string
1.3 Intercept string
String is also a sequence, so to intercept the string, you can use the slice method to achieve
string[start:end:step]
str1 = "人生苦短,我用python"
length = len(str1)
print(length) #13
substr = str1[1:6:2]
print(substr) #生短我
1.4 Separate and combine strings
1) Separated string
The split() method of the string object can realize string separation, that is, cut a string into a list of strings according to the specified separator
str1 = "明 日 学 院 官 网 >>> www.mingrisoft.com"
list1 = str1.split()
list2 = str1.split(">>>")
list3 = str1.split(".")
list4 = str1.split(" ",4) #只分割前4个
list5 = str1.split(">") #每个分割一次,没有得到内容的,将产生一个空元素
print(list1)
print(list2)
print(list3)
print(list4)
print(list5)
'''
['明', '日', '学', '院', '官', '网', '>>>', 'www.mingrisoft.com']
['明 日 学 院 官 网 ', ' www.mingrisoft.com']
['明 日 学 院 官 网 >>> www', 'mingrisoft', 'com']
['明', '日', '学', '院', '官 网 >>> www.mingrisoft.com']
['明 日 学 院 官 网 ', '', '', ' www.mingrisoft.com']
2) Combine strings
The merged string is different from the spliced string, it will connect multiple strings together with a fixed delimiter
Combining strings can be achieved using the join() method of the string object
1.5 Retrieving a string
1) Count() method: Used to retrieve the number of times the specified string appears in another string .
If the retrieved string does not exist, 0 is returned, otherwise the number of occurrences is returned.
2) find() method
This method is used to retrieve whether the specified substring is included. If the retrieved string does not exist, it returns -1, otherwise it returns the index of the first occurrence of the string.
str.find(sub)
3) index () method
The index() method is similar to the find() method. It is also used to retrieve whether the specified substring is included. When the specified string does not exist, an exception will be thrown.
4) startswith() method
The start() method is used to retrieve whether the string starts with the specified string, if it is, it returns true, otherwise it returns false
5) endswith() method
The endswith() method is used to retrieve whether a string ends with a specified string.
1.6 Case conversion of letters
The string provides lower() and upper() methods for case conversion of letters
str1.upper()
str1.lower()
1.7 Remove spaces and special characters in strings
1) strip() is used to remove spaces and special characters on the left and right sides of the string
2) The lstrip() method is used to remove spaces and special characters on the left side of the string
str.lstrip()
str2.lstrip("@")
3) The rstrip() method is used to remove spaces and special characters on the right side of the string
1.8 Format string
1) To achieve a formatted string, use the% operator
eg: %s string (displayed by str())
2) Use the format() method of the string to format the string
2. String encoding conversion
In python3.x, the default encoding format is utf-8, which effectively solves the problem of Chinese garbled characters.
2.1 Use encode() method to encode
The encode() method is a method of the str object, which is used to convert a string into binary data (ie bytes), which also becomes an encoding
verse = "野渡无人舟自横"
byte = verse.encode('gbk')
2.2 Use decode() method to decode
The decode() method is a bytes object method used to convert binary data into a string, that is, the result of the conversion using the encode() method is converted into a string, also known as "decoding".
print("解码后: ",byte.decode("GBK"))
3. Regular expressions
3.1 Line locator
The line locator is to describe the boundary of the string, "^" means the beginning of the line, "$" means the end of the line
^tm
tm$
3.2 Metacharacters
3.3 Qualifier
For example, match 8-digit qq number
^\d{8}$
3.4 Exclude characters
To match a string that does not meet the specified character set, use ^
[^a-zA-Z]
3.5 Select characters
Use the selection character (|) to achieve
3.6 escape characters
Escape character (\)
3.7 Grouping
Parentheses change the scope of the qualifier
(six|four)th
Four. Use the re module to implement regular expression operations
Python provides the re module for processing regular expression operations. You can use the search(), match(), findall(), etc. methods provided by the re module
4.1 matching string
1) Use the match() method to match
It is used to match from the beginning of the string. If the match is successful at the starting position, it returns the match object, otherwise it returns None.
Match from the starting position, when the first letter does not meet the conditions, no more matching
re.match(pattern,string,[flags])
import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.match(pattern ,string,re.I)
print(match) #<re.Match object; span=(0, 7), match='MR_SHOP'>
string = "项目名称MR_SHOP mr_shop"
match = re.match(pattern,string,re.I)
print(match) #None
2) Use the search() method to match
import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.search(pattern ,string,re.I)
print(match)
string = "项目名称MR_SHOP mr_shop"
match = re.search(pattern,string,re.I)
print(match)
#<re.Match object; span=(0, 7), match='MR_SHOP'>
#<re.Match object; span=(4, 11), match='MR_SHOP'>
[Note]: The search() method is not only to search at the beginning of the string, but also to search for matches in other positions.
3) Use the findAll() method to match
The findAll() method is used to search for all strings that meet the regular expression in the entire string and return it in the form of a list. If the match is successful, it returns a list containing the matching structure, otherwise it returns an empty list
import re
pattern = r'mr_\w+'
string = 'MR_SHOP mr_shop'
match = re.findall(pattern ,string,re.I)
print(match)
string = "项目名称MR_SHOP mr_shop"
match = re.findall(pattern,string,re.I)
print(match)
#['MR_SHOP', 'mr_shop']
#['MR_SHOP', 'mr_shop']
4.2 Replace string
The sub() method is used to implement string replacement
import re
pattern = r'1[34578]\d{9}'
string = "中奖号码为13645611111,请领奖"
res = re.sub(pattern,"1*******",string)
print(res)
# 中奖号码为1*******,请领奖
4.3 Use regular expressions to split strings
The split() method is used to split the string according to the regular expression and return it in the form of a list.
Its function is similar to the split() method of the string object, the difference is that the split string is specified by the pattern string
result = re.split(pattern,url)