part2-2: Python string and associated methods of operation, and the difference byte string string, the use of escape characters and formatting characters


Python string must use a pair of quotation marks, marks may be single and double quotation marks, three marks. 

First, the basic string

1, string, and escape character
content can be almost any string of characters, English characters may be, may be Chinese characters. Quoted string is paired up package, when there is quoted string, require special handling:
(1) using different quotes enclose the string, for example, double quotes indicate a string, the character string there can appear inside single quotes.
(2), to escape quotes, backslash (\) in the string to escape special characters.

Example:
str1 = "It apos CAT A" # string containing single quotes, double quotes outside
str2 = 'Python is a "programe " language' # string containing double quotes, use single quotes outside
str3 = 'Python \ 'sa "programe" language' # backslash character string in single quotes escape
when str4 = "" " 'Python's a" programe "language" "" # quoted string that represents the use of three, within a single string can double quotes

2, string concatenation
s1 = 'hello,' "michael " # Same row next two strings, automatic splicing into a string.

Python is true for string concatenation operator is the plus sign (+) , example is as follows:
S2 = 'Hello,'
S3 = 'Michael'
print (s2 + s3) # output is: hello, michael

can use string join method of stitching , for example:
"" .join ((S2, S3))

. 3, string functions, and the repr
Python string do not allow direct splicing and values, but may be converted to a first numeric string before splicing. May be used str () or the repr () function converts the value into a string .

Example:
s1 = "My age:"
Age = 25
Print (s1 + Age) # string and numeric direct splicing, reported a TypeError
print (s1 + str (age) ) # Use str method converts a numeric value to a string after stitching, the output is: my age: 25
Print (s1 + repr (Age)) # use repr method converts a numeric value to a string and then stitching, the output is: my age: 25

str is a built-in Python type, repr () is only a function of another function repr is a Python expression to represent the values . For example:
Print (S1) is output #: My age:
Print (the repr (S1)) is output #: 'Age I is:'

using the repr () when the string conversion, the resulting output string in quotes, which is an expression in the form of a string of Python.

() Function processing when the variable expressions interactive interpreter, enter a variable or expression, Python automatically by repr.

4, using input and raw_input obtaining user input

input () function can be used to get user input, this function can pass in a parameter message. input () function always placed the user input string, the user can enter any content. If the entered value of the pure, desired to be used and numerical computation, it is necessary to be calculated after conversion.

Raw_input role in Python2.x version () function with Python3 version of the input () function the same.

Input in Python2 version () function requires user input must be in line with Python syntax expression. For example, to enter values: 123, to input string: 'abc', where the input character string must be enclosed in quotes, otherwise an error.

5, a long string
long string using three pairs of quotation marks string, the string length may be placed in any content, single and double quotation marks, line breaks and the like. If a long string not assigned to any variable, this string is ignored interpreter, equivalent to comment.

Escape character (\) may be escaping the newline, after the escape of two-line string is actually a row . For example:
s2 = 'the Linux Systems IS \
! Very strong'
this input string s2 are two lines, but in fact only one line string.

To wrap the expression, can also use an escape character (\) to escape , for example:
Result = 20 is 10 * / 2 + 100 - \
20 is. 5 *
Although this result variable has two lines, but only one expression.

6, the original string
in the string backslash has a special role to represent the original meaning when backslash, we need to escape backslash, a method of escaping the backslash (\\), write inevitably some trouble. In this case the string can be added in front of "r", it represents the original string, the original string is not as special backslash characters. The following example represents a path string windows:
r'D: \ the Python \ python38 \ the Scripts \ math.py '

when the original has a quoted string, the same marks To escape, this time escaping backslash can become a portion of a string . For example:
S3 = R & lt ' "What \'? S your name", Michael '
Print (S3) # output is: "what \' s your name ?", Michael

backslash quotes in the original string will be transferred justice, at the end of the string can not have a backslash, otherwise the string can not end properly. If you must use a backslash at the end of the string, the string can be used in pairs three quotation marks, may be written separately backslash, e.g. follows:
S4 = r'Hello World '' \\ '!
Print ( s4) # is output: the Hello World \!

. 7, byte string (bytes)
string (str) by a plurality of characters, in characters operate. Byte string (bytes) composed of a plurality of bytes, in units of bytes operation.

str operation is a character, bytes is a byte operation. Both methods support all substantially the same, bytes are immutable sequence.

bytes in bytes (in binary format) to recording data sequence, data representing the content determined by the program. Using the appropriate character set, the string can be converted into a string of bytes; otherwise, the byte sequence can be restored to the corresponding string.

bytes are stored raw bytes (in binary format) data, bytes may be used to transmit data objects on the network can also be used for storing various files in binary format, such as pictures, music and the like.

Bytes is converted into a string object that has the following three ways:
(1), when all the contents of the string of ASCII characters, can be added directly in front of the string b to build byte string values.
(2) call bytes () function (actually a constructor bytes) Converts a string into a byte character set specified string, if the character set is not specified, the UTF-8 character set by default.
(3), the string itself calls encode () method to convert a string into a byte character set specified by the string, if the character set is not specified, the default use UTF-8 character set.


Creating a string of bytes example:
B1 = bytes () # Create an empty bytes
B2 = b '' # Create an empty bytes
B3 = b'Python 'b # prefix in front of the string, to create the type of value bytes
print ( b3) # outputs are: b'Python '
Print (B3 [0]) # output is the ASCII value of the first letter of P: 80
Print (B3 [2:. 5]) # output remains byte string: b' Tho '

# bytes method call to the object string into bytes, and specify the use of utf-8 character set encoding
b4 = bytes ( "Python programming may be used in many fields", encoding = 'UTF-. 8')
Print (B4) # output bytecode, Chinese characters in UTF-8 encoding, similar output: b'Python \ XE7 \ XBC ... '
Print (b4.decode (' UTF-. 8 ')) # can be used when output decode () method of decoding the character set specified, the contents of the output string at this time

# bytes method does not use direct call the method can also encode a string of the specified character set encoding
b5 = "Python programming may be used in many fields" .encode ( 'UTF-. 8')
Print (B5) similar to the output #: b'Python \ xe7 \ xbc. .. '

string byte strings very similar, each data unit in the string is 1 byte. Byte (Byte) indicates a byte contains 8 bits (bit), 1bit is a value of 0 or 1 is stored.

Each byte string data unit is a byte, i.e. 8, wherein each of the four may be a hexadecimal number (0 ~ f) represented, each byte can be presented as two hexadecimal numbers to represent, so there are "\ xe7 \ xbc" in the above output, where "\ xe7" says one byte, this byte \ x hexadecimal, e7 is two hexadecimal number system.

Further, in the object program bytes, which you can call the decode () method to decode it to a string. B5 string of bytes, for example, you can call the decode () method:
Print (b5.decode ( 'UTF-. 8'))

Character Set Concept Description:
Character can not be saved directly in the bottom of the computer. The computer scientist is: for each character number, when the program saves the data, the actual number of characters is stored; when the program reads characters actually read is the number. The numbering - get the actual character "corresponding to the character number table" (abbreviated code table). Usually say the character set is the sum of the number of characters consisting of all. The original ASCII code table only consider letters, numbers, special symbols, so to represent these letters, numbers, special symbols, as long as one byte (8 bits, can support up to 256 characters in number) can be. Later, as other countries have carried out their text number, which had developed their own character sets (such as our GB2312, GBK, etc.), these characters are not compatible. For compatibility of national character sets, use two bytes (16 bits, support 65536 character encoding) to the text of a unified national number, which appeared in the Unicode character set. Actual use UTF-8, UTF-16 belong to the Unicode character set.

In addition the same number is entirely possible in a different character set represents the character is different. For example, number 111 in a character set can represent "China" in another character set can mean "national" character. Thus, for the same string, if different character set to generate the object bytes, resulting bytes are different objects.

Example:
b1 = bytes ( 'Hello, Chengdu', encoding = 'utf-8 ') # define a string of bytes, and specify the encoding is utf-8 character set
print (b1) # b1 byte string output, the output is : B '\ XE4 \ XBD \ XA0 \ xe5 \ xa5 \ XBD \ XeF \ XBC \ x8c \ XE6 \ X88 \ X90 \ xe9 \ X83 \ XBD'
Print (b1.decode ( 'GBK')) # output error or garbled not utf-8 string of decoded characters in GBK
print (b1.decode ( 'GB2312') ) # output error or distortion, can not decode utf-8 string GB2312 character set
print (b1.decode ()) # default character set decoding utf-8 string
print (b1 .decode ( 'utf-8') ) # explicitly specify utf-8 character set decoding string

Second, and use the string

one string in each programming language is the most common type, is very important.

1, the escape character
backslash in the string (\) to escape. If the string itself contains a backslash, you use double backslash escape (\\). In addition to this the Python escape character, as well as the following escape character.
\ b backspace
\ n newline
\ r carriage return
\ t tab
\ "double quote
\ 'single quotation
\\ backslash


Example:
S1 =' the Hello \ nworld \ nHello \ npython 'character #!! string added newline \ n-
S2 = 'name \ t \ t price \ t \ t \ number' # tabs added string

2, string formatting

used in Python "%" for the various types of data format output example the following code:.
NUM = 100
Print ( "

The above code is formatted print output code, where print the function has three parts, the first format is a string (template), in the format string "% s" placeholder, indicating that the position is replaced by the value of a variable or expression of the third portion; the second portion is fixed using delimiter "%"; placeholder for replacing the third portion of the first portion.

The format string "% s" is the conversion specifier (Conversion Specifier), whose role is a placeholder, it is replaced by the value of variable expression behind. "% S" specifies that the value or use str () function converts a string. In the format string when there are multiple "% s" break points, but also provide a third part corresponding to a plurality of variables, and these variables are placed in parentheses. For example the following code:
name = 'Michael'
Age = 25
Print the first part of a two # symbols point, the third portion is provided with two variables (% (name, age) " % s is% s years old!")

the following summarizes the Python conversion specifier:
D, I is converted to decimal integer with a sign
o is converted to signed octal integer
x, converting X to signed hexadecimal integer
e, E is converted to scientific computing the floating-point representation method
f, F is converted to decimal floating-point
g intelligent selection format with f or g
G G intelligent selection format with F or
C to single-characters (only accepts single-character string or integer)
R & lt use repr ( ) converting the variable or expression is a string
s using str () will be converted to a string variable or expression


using these conversion specifier may be a minimum width specified after conversion . Example is as follows:
NUM = -29
Print ( "NUM IS:% 6I" NUM%) # signed integer output, 6 bits wide, output: NUM IS: -29
Print ( "NUM IS:% 6D" NUM%) # Ibid
print ( "num is:% 6o "% num) # octal integer output unsigned 6-bit wide, output: num is: -35
ten # output with a sign: print (% num "% 6x num is") hex integer, 6 bits wide, output: NUM iS: -1D
Print ( "NUM iS:% 6X" NUM%) # hexadecimal integer output unsigned 6-bit wide output: num is: -1D
print ( "num is:% 6s "% num) # use str () function into a string variable or expression, output: num is:

-29 by specifying the minimum width of the string of 6 bits, when the value of the total width of the output of the converter 6, automatically add the value to a space before the alignment. The default output is right-justified, is not enough space left to add width. Python is also allowed to add a flag to change between a minimum width of this behavior, there are three such markers, these markers may exist, marked as follows:
-: Specifies the left-justified.
+: Indicates the total value to be signed, with a positive "+" sign, with a negative "-" sign.
0: not to add a space, but complement 0.


Example is as follows:
NUM = 39
print ( "num is:% 06d "% num) # minimum width of 6, left insufficient with 0 added, output: NUM IS: 000039
Print ( "NUM IS:% + 06D"% NUM) # minimum width of 6, 0 left up, always bring symbols output: NUM iS: +00039
Print ( "NUM iS:% -6D" NUM%) # 6 is the minimum width, left-aligned, the output is: num is: 39

in the floating-point conversion , you can specify the number of digits after the decimal point; when converting a string, you can specify the maximum number of characters in the string after conversion. This flag is called the accuracy of the value of this flag after the minimum width in the middle with (.) Apart . Example is as follows:
PI = 3.1415926
Print ( "PI IS:% 10.4f"% PI) # minimum width of 10 bits, four decimal places, right justified, output: PI IS: 3.1416
Print ( "PI IS:% 010.4f "% pi) # minimum width of 10 bits, four decimal places, right, up left 0, output: PI iS: 00,003.1416
Print (" PI iS:% 010.4f + "% pi) # minimum width of 10 , 4 decimal places, right, up left 0, with the sign bit, output: PI IS: +0003.1416
name = 'Michael'
Print ( "the name IS:% .4s"% name) string reserved # 4 character, output: the name is: Mich
print ( "the name is:% 8.2s"% name) # 2 reserved characters, the minimum width of 8 bits, output: the name is: Mi

three serial correlation method

string is composed of a plurality of characters, and therefore can be operated by the index character. Get the specified character index can also be acquired in a location specified character string.

Represents the index of the string in square brackets, the index of the string starting from 0, i.e., the index 0 is the first character, the second character is an index 1, and so on. Index can also start from the last character, the last character index is -1, the penultimate character index is -2, and so on.

Further, when the index exceeds the length of the string if it reported IndexError error. Example is as follows:
s1 = "! IS Python Very Good" # define a string
print (s1 [3]) # s1 acquired character index 3, the output is: H
Print (s1 [-5]) acquired in s1 # from the right, a character index 5, the output is: G

may be used in the square brackets range (or slice) to acquire a certain period of characters in the string, often called substring, when using the scope, the left index comprising, the right index does not contain. Examples are as follows:
Print (s1 [3: 6]) get a substring s1 # 6 (not included) from the index to the index 3, an output is: Hon
Print (s1 [3: -8]) # s1 acquired from index penultimate 3 to 8 (not included) character in the string, the output is: hon is ve
print (s1 [-8: -4] ) # string acquired from the index to the inverse of the inverse of the eighth character in the fourth s1, the output; ry G

when using a microtome index, may be omitted begin or end index . Omitting the start index from the beginning of the string interception omitted, taken on the end of the index to the end of the string. Examples are as follows:
Print (s1 [8:]) # s1 get a substring from index to the end 8, the output is: S Very Good!
Print (s1 [-8:]) # s1 acquired from the penultimate character 8 the substring end, the output is: Ry Good!
Print (s1 [: 6]) # get a substring from the beginning to the index 6 s1, that is in front of 6 characters, the output is: Python
Print (s1 [: - 6]) # s1 obtain substring from the beginning of the countdown to the sixth character, the output is: Very iS Python

the Python supports string in operator to determine whether contains a substring, the judgment result is True or False , the following examples :
Print ( 'Python' in S1) # output: True
Print ( 'Java' not in S1) # may be used not in to determine whether or not included, output: True

len () function to get the string length. min () and max () function to get the largest and smallest character string in character , examples are as follows:
print (len (s1)) # Output: 24
Print (len ( 'Python')) # outputs:. 6
Print (max (S1)) # largest character string, output: Y
Print (min (S1)) # minimum character string, the output space is

four, the case related methods

Python string str constructed from built-in class that contains a number of methods. Python is a "bring your own document", and in the use of aid, to be familiar with the following two functions:
dir (): lists the specified class or module that contains the entire contents (including functions, methods, classes, variables, etc.).
help (): View of a function or method help documentation.


Str View all content contained classes, enter the command in the interactive interpreter:
dir (str)
in the str class methods listed, beginning with "__", "__" approach the end of the convention is a private method , not external want directly. To view a method of use, use the help () function:
Help (str.title)

in the str class and method commonly used in the case relating to the following:
(1), title (): The first letter of each word change uppercase, other letters to lowercase.
(2), lower (): the entire string to lowercase.
(3), upper (): the entire string to uppercase.


Example is as follows:
S = 'hLLEO World' defined string #
print (s.title ()) # word capitalized, the rest lowercase letters, output: Hlleo World
Print (s.lower ()) # all lowercase letters, output: hlleo world
Print (s.upper ()) # All letters uppercase, output: HLLEO WORLD

five, delete blank
str delete empty class methods are commonly used:
(. 1), Strip (): delete empty string at the beginning and end.
(2), lstrip (): Delete the previous string (left) of the blank.
(3), rstrip (): Delete the back of a string (to the right) blank.


Note that in Python strings are immutable, no matter what method is used to modify a string, the result is to produce a copy of the original string are not affected.

View a blank string delete function Help:
Help (str.strip)
can be seen from help, this strip () method can also pass parameters when an incoming parameters, it deletes the corresponding string of characters on both sides . Similarly lstrip () and rstrip ()
function can also pass parameters.

Example is as follows:
S = "Python IS Very strong!"
Print (s.strip ()) # sides delete empty string, output:! IS Very strong Python
Print (s.lstrip ()) # delete empty string left
print (s.rstrip ()) # Remove the right blank string
S2 = "Python IS Very strong!"
Print (s2.strip ( 'PIO!')) # remove both sides of the string p, i, o ,! characters, output is: Very strong ython iS
Print (s2.lstrip ( '! PIO')) # delete the string to the left of p, i, o ,! character
print (s2.rstrip ( 'pio!' )) # Remove the right strings the p, i, o ,! character

six, string search, replace the relevant method

used to find class provides str, replace, etc. as follows.
(1), startswith (): determines whether the string at the beginning of the specified sub-string.
(2), endswith (): determines whether the string ends with the specified string.
(3), find (): Find the specified location substring occurring in the string, if the specified string is not found, it returns -1.
(4), index (): Find the specified substring position appear in the string, if the specified substring is not found, an error is thrown ValueError.
(5), replace (): the specified replacement target substring of substrings. The third parameter may specify an alternative number.
(6), translate (): translation mapping table using the specified character string substitutions performed.


Example is as follows:
s = "! IS Very strong Python"
Print (s.startswith ( 'Py')) s # determines whether the string beginning py, a return True.
print (s.endswith ( 'rong!' )) # determines whether the string s to Rong! ends, returns a True
Print (s.find ( 'on')) # find the location of the string s occurs on the note the index is zero, output:. 4
Print (s.index ( 'oN')) # supra, if not found on the error, output:. 4
Print (s.find ( 'oN',. 5)) from the index # 5 find began to appear on output: 18
Print (s.index ( 'on', 5)) # 5 start from the index to find the location on emerging, not found the error, output: 18
Print (s.replace ( ' on ',' AA ', 1 )) # replaced on a string of AA, replacing the third parameter represents a 1 times.

table = {111: 945, 112 : 946, 116: 964} # define translation mapping table: 111 (o) -> 945 (α), 112 (p) -> 946 (β), 116 (t) -> 964 (tau)
Print (s.translate (the table)) # replace the use of translation mapping table, output: βyτhαn is very sτrαng!

Translation of the above exemplary character, Translate () method of class str needs to find the string based on a translation mapping table replacement. This translation is a custom map, in
the actual development work, this custom translation mapping table would be too cumbersome. str class provides a maketrans () method, which can easily create a translation mapping table.
table = str.maketrans ( 'opt', 'αβτ') # Create a character map using maketrans str class
table # output is: {111: 945, 112: 946, 116: 964}

When creating a mapping table defined the corresponding relationship between the two characters, but translation mapping table can not use the character itself, you must use the character encoding.

In Python2.x version, str class does not maketrans () method, the method provided by the string module . For example, in versions Python2.x performed as follows:
Import String
Table the string.maketrans = ( 'ABC', '123') # output is not intuitive, but as translate () method using parameters

seven, segmentation, the connection method

str segmentation is the class providing split (), the connection method is to join ().
(1), split (): the string into a plurality of strings specified delimiter.
(2), join (): connecting a plurality of strings into a single string.


Example is as follows:
S = "! IS Very strong Python"
Print (s.split ()) # separated with a blank string, output: [ 'Python', 'IS', 'Very', 'strong!']
Print ( s.split (None, 2)) # partition with a blank string, separating two words up front, output: [ '
print (s.split ( 'is') ) # specified character as a delimiter, output: [ 'Python', '! Very strong']
alist = s.split () # The result of the partition assigned to the variable alist
Print ( ':'. join (alist )) # connector as colon, output: python: is: very:! strong

apparent from the above example, str class split () and join () method are inverse operation, Split ( ) string is divided into a plurality of sub-strings; the Join () is connected to a plurality of sub-strings into a single string.

Guess you like

Origin www.cnblogs.com/Micro0623/p/11410542.html