String (str, bytes)

4.3 String (str, bytes)

4.3.1 String pre-knowledge

4.3.1.1 The concept of string

A character string is composed of a series of characters. Character is a general term for various characters and symbols, including national characters, punctuation marks, graphic symbols, numbers, etc. A string is a data type that represents text in programming languages.
A string is made up of individual elements (characters). Such data composed of elements in sequence, in the Python language, data with this characteristic is called sequence, and it is translated into Chinese 序列. Regarding the sequence, we will introduce it in detail in the following content.

4.3.1.2 Character set

A character set is a collection of multiple characters. There are many types of character sets. Each character set contains different numbers of characters. Common character sets include: ASCII character set, GB2312 character set, BIG5 character set, GB18030 character set, Unicode character set Wait.
Simply put, a character set is a table, which is a correspondence table between characters and ids.

4.3.1.3 Character encoding and decoding

Because computers can only process numbers, if you want to process text, you must first convert the text to numbers before you can process it. The earliest computers used 8 bits (bit, also known as ) as a byte (byte) when designing, so the largest integer that can be represented by a byte is 255 (binary 11111111=decimal 255), if you want to represent a larger integer, more bytes must be used. For example, the largest integer that can be represented by two bytes is 65535, and the largest integer that can be represented by four bytes is 4294967295.
The information stored in the computer is represented by binary numbers, and the English, Chinese characters and other characters we see on the screen are the results of binary number conversion.
In layman's terms, storing characters in the computer according to certain rules, such as awhat to use to represent them, is called 编码; on the contrary, parsing the binary numbers stored in the computer into characters and displaying them is called 解码, just like encryption and decryption in cryptography .
During the decoding process, if the wrong decoding rules are used, the target character will be parsed into other characters or garbled characters.

4.3.1.3.1 History of coding development

The Americans invented the computer, using the combination of eight 0s and 1s to correspond to the characters in English one by one, and made a table, the ASCII table. That is the famous ASCII character set.
The ASCII character set mainly includes control characters (enter key, backspace, line feed, etc.); displayable characters (English uppercase and lowercase characters, Arabic numerals and Western symbols).
Computers were introduced into China. China has a vast land and resources, and there are many traditional and simplified characters. 8-bit bytes can represent up to 256 characters, which cannot meet people's daily work needs. Therefore, ASCII is expanded and the new character set is called GB2312. Later, it was found that GB2312 was not enough, and GB18030 was formed after expansion.
Like China, every country codes its own language, so various codes appear. If you don’t install the corresponding codes, you won’t be able to correctly decode and view the correct content.
Due to the coding of each country, international communication cannot be carried out. An international organization has jointly created a code UNICODE (Universal Multiple-Octet Coded Character Set), which stipulates that all characters are represented by two bytes, which is fixed, and all characters are only two bytes, which is easy for computers to recognize.
Although UNICODE has solved the problem of fighting each other, Americans are unwilling because the original ASCII in the United States only needs one byte. UNICODE encoding adds one more byte to their language, wasting one byte of storage space in vain. After negotiation, a new conversion format appeared, which is called universal conversion format, which is UTF (unicode transformation format). The common ones are UTF-8 and UTF-16. UTF-8 stipulates that American characters have one byte, European characters have two characters, and Southeast Asian characters have three characters.

4.3.1.3.2 ASCII encoding

ASCII (American Standard Code for Information Interchange) is a computer coding system based on the Latin alphabet. It is mainly used to display modern English, while its extended version EASCII can display other Western European languages. It is the most common single-byte encoding system used today.
ASCII encoding is an encoding rule that converts the ASCII character set into numbers that a computer system can store. Use 7 bits (bits) to represent a character, a total of 128 characters; but the 7-bit coded character set can only support 128 characters, in order to represent more common European characters ASCII has been extended, the ASCII extended character set uses 8 bits ( bits) represent a character, a total of 256 characters.

4.3.1.3.3 GBXXXX character set and encoding

The GB2312 standard contains a total of 6763 Chinese characters, and GB2312 uses double-byte representation for any character.
GBK has a total of 21886 Chinese characters and graphic symbols, and GBK uses double-byte representation for any character.
GB18030 contains a total of 70,244 Chinese characters, and the GB18030 code is a 124-byte variable-length code. It is a single byte, compatible with ASCII encoding.
Note: GBK is compatible with GB2312, and GB18030 is compatible with GBK.

4.3.1.3.4 Unicode character set and encoding

Unicode is a character encoding scheme formulated by international organizations that can accommodate all characters and symbols in the world, and records a number corresponding to all characters in the world.
Unicode is just a character set, not an encoding method. There are two types, one is the UCS-2 character set, which uses 2 bytes to represent characters, and can represent a total of 65535 characters. The other is the UCS-4 character set, which uses 4 bytes to represent characters, and can represent nearly 4.3 billion characters in total. Each code point can represent a character, which can be displayed directly without encoding.

4.3.1.3.5 UTF-8 encoding

UTF-8 is currently the most widely used Unicode encoding method on the Internet, and its biggest feature is variable length. It can use 1 - 4 bytes to represent a character, and the length can be changed according to the characters.
UTF-8 encodes Unicode in bytes. The characteristic of UTF-8 is to use different length encodings for different ranges of characters. For characters between 0x00—0x7F, UTF-8 encoding is exactly the same as ASCII encoding.
For characters that need to be represented by N bytes (N > 1), the first N bits of the first byte are all set to 1, the N + 1 bit is set to 0, and the remaining N - 1 bytes of the first Both bits are set to 10, and the remaining bits are filled with the character's Unicode code point.

4.3.1.3.6 ANSI encoding

ANSI encoding is the default encoding of the local system. In China, ANSI encoding defaults to GBK.

4.3.1.4 Encoding and decoding in Python

The internal representation of Python3 strings is unicode encoding, so when doing encoding conversion, it is usually necessary to use unicode as an intermediate encoding, that is, first decode (decode) other encoded strings into unicode, and then encode (encode) from unicode into another encoding.
The function of encode is to convert unicode encoding into other encoded strings, such as str.encode(gbk), which means converting unicode encoded string str into gbk encoding.

string = '离离原上草,一岁一枯荣'.encode('gbk')
print(type(string))
print(string)

<class ‘bytes’>
b’\xc0\xeb\xc0\xeb\xd4\xad\xc9\xcf\xb2\xdd\xa3\xac\xd2\xbb\xcb\xea\xd2\xbb\xbf\xdd\xc8\xd9’

The chardet.detectencoding can be detected by
>>> print(chardet.detect(string))

{‘encoding’: ‘GB2312’, ‘confidence’: 0.7407407407407407, ‘language’: ‘Chinese’}

The function of decode is to convert other encoded strings into unicode encoding, such as str.decode(gbk), which means converting the gbk-encoded string str into unicode encoding.

string = '离离原上草,一岁一枯荣'.encode('gbk')
print(string)
print(string.decode('gbk'))

b'\xc0\xeb\xc0\xeb\xd4\xad\xc9\xcf\xb2\xdd\xa3\xac\xd2\xbb\xcb\xea\xd2\xbb\xbf\xdd\xc8\xd9'
from the original On the grass, one year old and one dry

Summary: If you want to convert other encodings to UTF-8, you must first decode it into unicode and then re-encode it into UTF-8, which uses unicode as the conversion medium. For example: s='Chinese' If it is in a utf8 file, the string is encoded in utf8; if it is in a gb2312 file, its encoding is gb2312. In this case, to perform encoding conversion, you need to use the decode method to convert it into unicode encoding, and then use the encode method to convert it into other encodings.
Usually, when no specific encoding method is specified, the code file is created using the system default encoding.

PS: For the convenience of understanding and memory, encoding can be roughly understood as encryption, and decoding as decryption. The text before encoding (encryption) is understandable plaintext, but after encoding (encryption), it becomes unintelligible ciphertext (bytes type). Different encryption methods correspond to different encoding methods. Only through the corresponding decoding (decryption) method can it be correctly decrypted.

4.3.2 Creation of strings

4.3.2.1 Creation by string identifier

In Python, you can use single quotes ('') or double quotes ("") to represent strings, and multi-line strings can use triple quotes ''' or """ to represent.

'hello,world'
"祝你好运~"
"""
多行内容:
离离原上草,一岁一枯荣。
野火烧不尽,春风吹又生。
远芳侵古道,晴翠接荒城。
又送王孙去,萋萋满别情。
"""

Note:
1. In Python, there are no single characters, only strings. A single character is also a string.
2. Single quotes and double quotes in Python are exactly the same when used alone.

4.3.2.2 String concatenation and repetition

see 可迭代对象chapter序列的拼接和重复

4.3.2.3 Character Escaping

If there are quotation marks in the string we need to define, the direct definition will report an error:
>>> string = 'Let's go'

Traceback (most recent call last):
File “C:\Program Files\Python3102\lib\code.py”, line 63, in runsource
code = self.compile(source, filename, symbol)
File “C:\Program Files\Python3102\lib\codeop.py”, line 185, in call
return _maybe_compile(self.compiler, source, filename, symbol)
File “C:\Program Files\Python3102\lib\codeop.py”, line 102, in _maybe_compile
raise err1
File “C:\Program Files\Python3102\lib\codeop.py”, line 91, in _maybe_compile
code1 = compiler(source + “\n”, filename, symbol)
File “C:\Program Files\Python3102\lib\codeop.py”, line 150, in call
codeob = compile(source, filename, symbol, self.flags, True)
File “”, line 1
string = ‘Let’s go’
^
SyntaxError: unterminated string literal (detected at line 1)

This is because when defining a string, quotation marks must be paired in pairs, and there are 3 single quotation marks in the above string, which cannot be paired. At this time, we can use double quotes to define the string:
>>> string = "Let's go"
When defining a string, both single quotes and double quotes are fine. The same is true for single double quotes in the string.
Generally, when we define a string with double quotes, the content of the string itself cannot contain double quotes, and when we define a string with single quotes, the content of the string itself cannot contain single quotes.
However, what if there are both single quotes and double quotes in the string content that needs to be defined? There are 2 ways to deal with this situation:
1. Use triple quotes to define, triple single quotes or double quotes are OK
2. Use character escape

4.3.2.3.2 Character Escaping

Character escaping is to use a specific symbol to escape a certain character, so that this character can represent other meanings. \Escape with pair symbols in Python . Common escape characters are as follows:

escape character describe display effect
\ backslash symbol \
' apostrophe '
" Double quotes "
\b backspace
\n new line
\t Tabs

The example above can be expressed as 'Let's go' if escaping is used. Although the content wrapped in single quotes contains single quotes. But no error will be reported, because the single quotes in the string are \escaped.

4.3.2.4 String prefixes

In Python3, there are only two types of strings, namely: str and bytes. A prefix can be added in front of the string to produce different effects. The more commonly used prefixes are: r, u, b.
r: Indicates that the content in the subsequent string will not be escaped and remains as it is
u: Indicates that the subsequent string is Unicode, and all strings are of this type by default in Python3.
b: Indicates that the following string is a bytes type. The
absolute path of the windows system will contain \symbols. If you want to represent the path of the windows system in Python, an error will often be reported, because the characters in the path \and the following characters will be used as escape characters for escaping. Very inconvenient for daily use. rThe prefix can be used at this time :
>>> print('C:\new\name') # here \nrepresents a newline, which is not as expected

C:\new
ame

>>> print(r'C:\new\name') # The content in the string is displayed as it is

C:\some\name

In this way, the characters in the above string \nwill not be displayed as newline characters. Instead, leave it as it is.

4.3.3 String access

4.3.3.1 String Index

see 可迭代对象chapter序列的索引

4.3.3.2 String Slicing

see 可迭代对象chapter序列的切片

4.3.3.3 Traversing Strings

see 可迭代对象chapter可迭代对象的遍历

4.3.3.4 Membership check

see 可迭代对象chapter成员资格检查

4.3.4 Modification of character strings

Python strings cannot be modified and are hashable. Therefore, assigning a value to an index position in the string will report an error:
>>> string = 'Python'
>>> string[0] = 'J'

Traceback (most recent call last):
File “C:\Program Files\Python3102\lib\code.py”, line 90, in runcode
exec(code, self.locals)
File “”, line 1, in
TypeError: ‘str’ object does not support item assignment

To generate a different string, create a new string:
>>> string = 'Python'
>>> 'J' + string[1:]
>>> string[:2] + 'py'

‘Jython’
‘Pypy’

4.3.5 Deletion of character strings

Since strings are hashable, there is no way to delete an element (character) in a string. Deleting one of the elements (characters) can only be done by creating a new character string.
>>> string = 'Python'
>>> str2 = string[:2] + string[3:]
>>> str2

‘Pyhon’

4.3.6 String formatting

If there is a requirement for the output format of the string, the format of the string needs to be used. For example, when sending text messages in groups, the names are different each time, but the content is the same. At this time, string formatting is required.

Hello xx! This is Shandong Xiaohonghua Group...

4.3.6.1 %Formatting

Python can format strings by using %placeholders. For example:
>>> 'Hello, %s' % 'world'
>>> 'Hi, %s, you are %d years old.' % ('Yagami', 16)

‘Hello, world’
‘Hi, Yagami, you are 16 years old.’

stars = ("黎明", "华仔", "郭富城", "张学友")
print("四大天王:%s, %s, %s, %s" % stars)

The Four Heavenly Kings: Leon Lai, Andy Lau, Aaron Kwok, Jacky Cheung

Inside the string, %sit means to replace it with a string, and %dit means to replace it with an integer. There are several %?placeholders, followed by several variables or values, and the order should be corresponding. If there is only one %?, the parentheses behind the variable can be omitted.
%格式化中常见的占位符:

Placeholder replace content
%d integer
%f floating point number
%s string
%x hexadecimal integer

Among them, formatting integers and floating-point numbers can also specify whether to complement 0 and the number of digits of integers and decimals:

print('%9d-%09d' % (3, 1))
print('%.6f' % 3.1415926)
      3-000000001

3.141593

If you're not sure what to use, %s will always work, it will convert any data type to a string:
>>> 'Age: %s.Gender: %s' % (25, True)

‘Age: 25. Gender: True’

About the custom format, it will be introduced in detail in the chapter of custom format later. >>> 'growth rate: %d %%' % 7
使用%转义百分号

‘growth rate: 7 %’

4.3.6.2 format

format is a new function for formatting strings starting from Python 2.6. It will replace the placeholders {0}, {1} in the string with the parameters passed in in sequence...
Usage method: through the format function Parameters to replace the {} content in the string. >>> "{} {}".format("hello", "world") # Do not set the specified location, according to the default order >>> "{1} {0} {1}".format("hello", "world") # set the specified location
指定位置

‘hello world’
‘world hello world’

设置参数
>>> "Site name: {name}, address: {url}".format(name="本", url="www.ben.com")

'Site name: Ben, address: www.ben.com'

通过字典设置参数
>>> site = {"name": "this", "url": "www.ben.com"}
>>> "site name: {name}, address {url}".format(**site)

'Site name: Ben, address: www.ben.com'

通过列表索引设置参数
>>> my_list = ['本', 'www.ben.com']
>>> "Site name: {0[0]}, address {0[1]}".format(my_list) # "0" is necessary

'Site name: Ben, address: www.ben.com'

使用{}转义大括号
>>> "{} corresponds to { {10}}".format("ben")

The position corresponding to ben is {10}

You can also specify what type the value is to be converted, or more precisely, what type to treat it as. For example, you might provide an integer but treat it as a decimal. To do this, use the character f (for fixed-point numbers) in the format specification (ie, after the colon).
>>> "π is {:.2f}".format(3.1415926)

'π is 3.14'

About the custom format, it will be introduced in detail in the chapter of custom format later.

4.3.6.1 f-string

f-string, also known as formatted string literals, is a new string formatting method introduced by Python3.6. This method is derived from PEP 498 – Literal String Interpolation. String operations are easier. In form, f-string is a string led by f or F modifier (f'xxx' or F'xxx'), and the field to be replaced is marked by braces {}; f-string is not a string in essence Constant, but an expression that is evaluated at runtime.
In terms of function, f-string is not inferior to the traditional %-formatting statement and str.format() function. At the same time, its performance is better than both, and it is more concise and clear to use. Therefore, for Python3.6 and later versions, it is recommended to use f-string performs string formatting. f-string uses curly brackets {} to indicate the field to be replaced, and fill in the replacement content directly: >>> name = 'Eric' >>> f'Hello, my name is {name}'
简单使用


‘Hello, my name is Eric’

>>> number = 7
>>> f’My lucky number is {number}’

‘My lucky number is 7’

>>> price = 19.99
>>> f’The price of this book is {price}’

‘The price of this book is 19.99’

表达式求值与函数调用
The curly braces {} of f-string can be filled in expressions or calling functions, and Python will calculate the results and fill them in the returned string:

>>> f’A total number of {24 * 8 + 4}’

‘A total number of 196’

>>> f’Complex number {(2 + 2j) /(2 - 3j)}’

‘Complex number(-0.15384615384615388+0.7692307692307692j)’

>>> name = ‘ERIC’
>>> f’My name is {name.lower()}’

‘My name is eric’

>>> import math
>>> f’The answer is {math.log(math.pi)}’

‘The answer is 1.1447298858494002’

引号、大括号与反斜杠
The quotation marks used in f-string curly braces cannot conflict with the quotation mark delimiters outside the curly braces, and you can flexibly switch between ' and " according to the situation:

>>> f’I am {“Eric”}’

‘I am Eric’

>>> f’I am {‘Eric’}’

File "<stdin>", line 1
  f'I am {'Eric'}'
              ^

SyntaxError: invalid syntax

If ' and " are not sufficient, ''' and """ can also be used.
Quotes outside curly braces can also be escaped with \, but not inside curly braces:

>>> f’‘‘He’ll say {“I’m Eric”}’’’

“He’ll say I’m Eric”

>>> f’‘‘He’ll say {“I’m Eric”}’’’

File "<stdin>", line 1

SyntaxError: f-string expression part cannot include a backslash

If curly brackets need to be displayed outside f-string curly brackets, you should enter two consecutive curly brackets { { and }}:
>>> f'5 {“{stars}”}'
>>> f'{ {5}} {"stars"}'

‘5 {stars}’
‘{5} stars’

As mentioned above, \ escape cannot be used in f-string curly braces. In fact, not only that, \ is not allowed in f-string curly braces at all. If \ is really needed, the content containing \ should be represented by a variable first, and then the variable name should be filled in f-string braces.

4.3.6.2 Custom format

By studying the above three methods of formatting strings, we can roughly understand the custom formats used in formatting strings.
These three methods of formatting strings can all use custom formats, and the methods are similar. The following is a detailed introduction to how to customize the format.
Mainly include: alignment, width, sign, zero padding, precision, base, etc.

4.3.6.2.1 Alignment dependencies

< Left alignment (string default alignment), &formatted using-

Right alignment (value default alignment), &formatted using +
^ centered, applicable to f-stringandformat

var1 = 'python'
var2 = -3.1415
print(f'|{
      
      var1:>10}|')
print('|{:^10}|'.format(var2))
print('|%+10s|' % var1)
print('|%-10s|' % var2)

| python|
| -3.1415 |
| python|
|-3.1415 |

4.3.6.2.2 Digit Signature Dependent Format Descriptor
  •   负数前加负号(-),正数前加正号(+)
    
  •   负数前加负号(-),正数前不加任何符号(默认)
    

Add a minus sign (-) before a negative number and a space before a positive number

Note: Applies to numeric types only.

var1 = 3.14
var2 = -4.13
print(f'|{
      
      var1:+}|')
print('|{:-}|'.format(var2))
print('|{: }|'.format(var1))
print('|%+s|' % var1)

|+3.14|
|-4.13|
| 3.14|
|3.14|

4.3.6.2.3 Width and Precision Dependent Format Descriptors

width Integer width specifies the width
0width Integer width specifies the width, the leading 0 specifies the high bit and fills the width with 0
width.precision Integer width specifies the width, integer precision specifies the display precision

Note:
1. 0width cannot be used for complex and non-numeric types, and width.precision cannot be used for integer types.
2. Width.precision has different meanings when used for floating-point numbers and complex numbers of different formats: when used for f, F, e, E and %, precision specifies the number of digits after the decimal point; when used for g and G, precision Specify the number of significant figures (digits before the decimal point + digits after the decimal point).
3. width.precision can also be used for strings besides floating-point numbers and complex numbers. At this time, precision means only using the first precision characters in the string.

var1 = 3.1415
print(f'|{
      
      var1:8}|')
print('|{:8.3}|'.format(var1))
print('|{:.4}|'.format(var1))
print('|%10.5s|' % var1)

| 3.1415|
| 3.14|
|3.142|
| 3.141|

4.3.6.2.4 Thousands separator related format descriptor

, use , as the thousand separator
_ use _ as the thousand separator
Note:
1. If , or is not specified, f-string does not use any thousand separator, which is the default setting.
2. Only applicable to floating-point numbers, complex numbers and decimal integers: For floating-point numbers and complex numbers, , only separates the digits before the decimal point.
3.
Applicable to floating point numbers, complex numbers and binary, octal, decimal, and hexadecimal integers: for floating point numbers and complex numbers, _ only separates the digits before the decimal point; for binary, octal, and hexadecimal integers, it is fixed from low to A _ is inserted every four bits in the high bit (a _ is inserted every three bits for a decimal integer).

var1 = 31415
print(f'|{
      
      var1:_}|')
print('|{:,}|'.format(var1))
print('|%,s|' % var1)  # 不适用

|31_415|
|31,415|
Traceback (most recent call last):
File “E:\studypy\tmp.py”, line 4, in
print(‘|%s|’ % var1)
ValueError: unsupported format character '
’ (0x5f) at index 2

4.3.6.2.5 Format Type Dependent Format Descriptor

Meaning and function of the format descriptor Applicable variable type
s Ordinary string format string
b Binary integer format integer
c Character format, convert the integer to the corresponding character integer according to unicode encoding
d Decimal integer format integer
o Octal integer format integer
x Hexadecimal Integer format (lowercase letters) Integer
X Hexadecimal integer format (uppercase letters) Integer
e Scientific notation format, represented by e × 10^ Floating point numbers, complex numbers, integers (automatically converted to floating point numbers)
E is equivalent to e, but Use E to represent ×10^ Floating-point numbers, complex numbers, integers (automatically converted to floating-point numbers)
f Fixed-point number format, the default precision (precision) is 6 floating-point numbers, complex numbers, integers (automatically converted to floating-point numbers)
F is equivalent to f, But replace nan and inf with NAN and INF Floating-point numbers, complex numbers, integers (automatically converted to floating-point numbers)
g general format, use f for decimals, and e for large numbers Floating-point numbers, complex numbers, integers (automatically converted to floating-point numbers)
G and G is equivalent, but F is used for decimals, and E is used for large numbers. Floating point numbers, complex numbers, and integers (automatically converted to floating point numbers)
% Percentage format, numbers are automatically multiplied by 100 and formatted in f format, and % suffixed floating point numbers, integers ( automatically converted to floating point)

4.3.7 String methods

4.3.7.1 Codec class

4.3.7.1.1 encode(encoding=‘UTF-8’,errors=‘strict’)

描述
Encodes the specified string. The encoded return result is bytes type.
The errors parameter is used to set different error handling schemes. Defaults to 'strict', meaning encoding errors raise a UnicodeError. Other possible values ​​are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace'
示例

s1 = 'ab甲乙'
s2 = s1.encode(encoding='utf-8')
print(s2)

b’ab\xe7\x94\xb2\xe4\xb9\x99’

4.3.7.1.2 decode(encoding=“utf-8”, errors=“strict”)

描述
Decode the bytes object. The returned result after decoding is of str type.
The errors parameter is used to set different error handling schemes. Defaults to 'strict', meaning encoding errors raise a UnicodeError. Other possible values ​​are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace'
示例

s1 = 'ab甲乙'
s2 = s1.encode(encoding='utf-8')
print(s2)
s3 = s2.decode(encoding='utf-8')
print(s3)

b'ab\xe7\x94\xb2\xe4\xb9\x99'
ab A B

4.3.7.2 Find Statistics Class

4.3.7.2.1 count

See Sequence Methods 可迭代对象in subsections of chapter序列count

4.3.7.2.2 find(str, beg=0, end=len(string))、rfind

描述
Returns the index where str appears in string. Returns -1 if not found. rfind means to search from the right.
The beg parameter and the end parameter represent the scope of the search, which is searched in the entire str by default.
示例

s = ' Python tian \t mao \n taobao '
print(s.find('o'))
print(s.find('ao'))
print(s.find('io'))

5
16
-1

4.3.7.2.3 index、rindex

See the sequence method 可迭代对象in the subsection of the chapter序列index

4.3.7.3 Formatting classes

4.3.7.3.1 center(width, fillchar)

描述
Returns a string with the specified width width centered, fillchar is the filled character, and the default is a space.
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.center(30)}|")
print(f"|{
      
      s.center(30, '*')}|") 

| Python tian mao taobao |
|** Python tian mao taobao **|

4.3.7.3.2 bright(width[, fillchar])、rjust

描述
Returns a new string that is left (right) aligned to the original string and filled with fillchar to a length of width. fillchar defaults to a space.
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.ljust(30)}|")
print(f"|{
      
      s.ljust(30, '*')}|") 

| Python tian mao taobao |
| Python tian mao taobao ****|

4.3.7.3.3 zfill(width)

描述
Returns a string of length width, the original string is right-aligned, and the front is filled with 0
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.zfill(30)}|")

|0000 Python tian mao taobao |

4.3.7.4 Combine interception class

4.3.7.4.1 join(seq)

描述
Merge all elements (string representations) in seq into a new string with the specified string as the delimiter.
示例

seq = ['a', 'b', 'c']
s = '*'.join(seq)
print(s)

abc

4.3.7.4.2 lstrip()、rstrip()

描述
Cut off spaces or specified characters on the left (right) of the string.
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.lstrip()}|")
print(f"|{
      
      s.rstrip()}|")
print(f"|{
      
      s.rstrip('ao ')}|")

|Python tian mao taobao |
| Python tian mao taobao|
| Python tian so taob|

4.3.7.4.3 split(str=“”, num=string.count(str))

描述
Use str as the delimiter to intercept the string. Defaults to spaces, newlines, tabs, etc. If num has a specified value, only num+1 substrings will be intercepted
示例

s = ' Python tian \t mao \n taobao '
print(f"|{
      
      s.split()}|")
print(f"|{
      
      s.split('o')}|")

|['Pyth', 'tian', 'mao', 'taobao']|
|['Pyth', 'n tian \t ma', ' \n ta', 'ba', ' ']|

4.3.7.4.4 splitlines([keepends])

描述
Separate by line ('\r','\r\n',\n'), return a list containing each line as an element, if the parameter keepends is False, do not include newline characters, if True, keep newline characters.
示例

s = ' Python tian \t mao \n taobao '
print(f"|{
      
      s.splitlines()}|")
print(f"|{
      
      s.splitlines(True)}|")

|[’ Python tian \t mao ‘, ’ taobao ‘]|
|[’ Python tian \t mao \n’, ’ taobao ']|

4.3.7.4.5 strip([chars])

描述
Simultaneously lstrip() and rstrip() on a string
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.strip()}|")
print(f"|{
      
      s.strip('ao ')}|")

|Python tian mao taobao|
|Python tian mao taob|

4.3.7.5 Judgment class

4.3.7.5.1 startswith(substr, beg=0,end=len(string))、endswith

描述
Checks whether the string starts with the specified substring substr, returns True if yes, otherwise returns False. If beg and end specify values, check within the specified range.
示例

s = ' Python tian \t mao taobao '
print(f"|{
      
      s.startswith(' ')}|")
print(f"|{
      
      s.endswith('a')}|")

|True|
|False|

4.3.7.5.2 islower()、isupper()

描述
If there are case-sensitive characters in the string, and they are all lowercase (uppercase), return True, otherwise return False
示例

s = ' python tian \t mao 赛车12 '
print(f"|{
      
      s.islower()}|")
print(f"|{
      
      s.isupper()}|")

|True|
|False|

4.3.7.5.3 isalnum()

描述
Returns True if the string contains only letters and numbers, otherwise returns False.
示例

s1 = 'python 赛车12'
print(f"|{
      
      s1.isalnum()}|")
s2 = 'python 12'
print(f"|{
      
      s2.isalnum()}|")
s3 = 'python12'
print(f"|{
      
      s3.isalnum()}|")

|False|
|False|
|True|

4.3.7.5.4 isalpha()

描述
Returns True if the string contains only letters and words (Chinese, Japanese, etc.), otherwise returns False.
示例

s1 = 'python 赛车12'
print(f"|{
      
      s1.isalpha()}|")
s2 = 'python 12'
print(f"|{
      
      s2.isalpha()}|")
s3 = 'python12'
print(f"|{
      
      s3.isalpha()}|")
s4 = 'python赛车'
print(f"|{
      
      s4.isalpha()}|")
s5 = 'こんにちは'
print(f"|{
      
      s5.isalpha()}|")

|False|
|False|
|False|
|True|
|False|

4.3.7.5.5 isdecimal()

描述
Checks if the string contains only decimal characters, returns true if yes, false otherwise.
示例

s = 'python 赛车12'
print(f"|{
      
      s.isdecimal()}|")
s = 'p12'
print(f"|{
      
      s.isdecimal()}|")
s = '12'
print(f"|{
      
      s.isdecimal()}|")
s = '0x12'
print(f"|{
      
      s.isdecimal()}|")

|False|
|False|
|True|
|False|

4.3.7.5.6 isdigit()、isnumeric()

描述
Returns True if the string contains only numbers, otherwise returns False.

4.3.7.5.7 isspace()

描述
Returns True if the string contains only spaces, otherwise returns False.

4.3.7.5.8 istitle()

描述
Returns True if all words in the string are in title format (first letter capitalized), otherwise returns False.

4.3.7.6 Conversion class

4.3.7.6.1 capitalize()

描述
Converts only the first character of the string to uppercase. If the first is not a letter, do nothing.
示例

s = ' python Tian 赛车12 '
print(f"|{
      
      s.capitalize()}|")
s = 'python Tian 赛车12 '
print(f"|{
      
      s.capitalize()}|")

| python tian racing 12 |
| python tian racing 12 |

4.3.7.6.2 lower()、upper()

描述
Convert all lowercase (uppercase) letters in a string to uppercase (lowercase) letters
示例

s = ' pyTHon Tian 赛车12 '
print(f"|{
      
      s.lower()}|")
s = 'python Tian 赛车12 '
print(f"|{
      
      s.upper()}|")

| python tian racing 12 |
| PYTHON TIAN racing 12 |

4.3.7.6.3 replace(old, new [, max])

描述
Replace old in the string with new. If max is specified, the replacement will not exceed max times.
示例

s = ' pyTHon Tian 赛车12 '
print(f"|{
      
      s.replace('T','*')}|")
print(f"|{
      
      s.replace('T','*', 1)}|")
print(f"|{
      
      s.replace('Ti','/', 1)}|")

| py*Hon ian racing 12 |
| py
Hon Tian racing 12 |
| pyTHon /an racing 12 |

4.3.7.6.4 title()

描述
The returned 标题化string means that all words start with uppercase and the rest of the letters are lowercase.

4.3.7.6.5 expandtabs(tabsize)

描述
Convert the tab symbols in the string string to spaces, and the default number of spaces for the tab symbols is 8.
示例

s = '\tTian 赛车12 '
print(f"|{
      
      s}|")
print(f"|{
      
      s.expandtabs()}|")
print(f"|{
      
      s.expandtabs(2)}|")

| Tian Racing 12 |
| Tian Racing 12 |
| Tian Racing 12 |

4.3.7.6.6 swapcase()

描述
Convert uppercase to lowercase and lowercase to uppercase in a string.

4.3.7.6.7 maketrans()、translate(table, deletechars=“”)

描述
maketrans is used to create a conversion table for character mapping. For the simplest calling method that accepts two parameters, the first parameter is a string, indicating the character to be converted, and the second parameter is also a string indicating the target of the conversion.
translate converts the characters of string according to the table given by table (including 256 characters).
The combination of these two methods can easily and quickly replace multiple characters in a string.
示例

s = ' python Tian 赛车12 '
x = "yo"
y = "ab"
z = "车"   # 设置删除的字符
mytable = s.maketrans(x, y, z)
print(mytable)
print(s.translate(mytable))

{121: 97, 111: 98, 36710: None}
pathbn Tian Sai 12

Guess you like

Origin blog.csdn.net/crleep/article/details/125758769