Python - composite data types

Today I’m going to introduce Python’s combined data types. It’s
not easy to sort out. I hope to get your support. Readers are welcome to comment, like and collect.
Thank you!

knowledge points

  • Basic concepts of composite data types
  • List types: definition, index, slice
  • List type operations: list operation functions, list operation methods
  • Dictionary types: definition, index
  • Operation of dictionary type: dictionary operation function, dictionary operation method

knowledge map

insert image description here

1. The basic concept of combined data types

1.1 Composite data types

  • There are three types of combined data types most commonly used in the Python language, namely collection types, sequence types, and mapping types.
  • Collection type is a specific data type name, while sequence type and mapping type are general terms for a class of data types.
  • The collection type is a collection of elements, the elements are unordered, and the same element exists only in the collection.
  • The sequence type is a vector of elements, and there is a sequence relationship between the elements, which are accessed through serial numbers, and the elements are not exclusive. Typical representatives of sequence types are string type and list type.
  • The map type is a combination of "key-value" data items, and each element is a key-value pair, expressed as (key, value). A typical representative of the mapping type is the dictionary type.
    insert image description here

1.2 Overview of Collection Types

  • The collection type in the Python language is consistent with the collection concept in mathematics, that is, an unordered combination of 0 or more data items.
  • A set is an unordered combination, represented by braces ({}), it has no concept of index and position, and elements in the set can be dynamically added or deleted.
  • The elements in the collection cannot be repeated, and the element types can only be fixed data types, such as: integers, floating point numbers, strings, tuples, etc. Lists, dictionaries, and collection types themselves are variable data types and cannot appear as elements of the collection.
S = {
    
    1010, "1010", 78.9}
print(type(S))
# <class 'set'>
print(len(S))
# 3
print(S)
# {78.9, 1010, '1010'}
  • It should be noted that since the collection elements are unordered, the printing effect of the collection may not be consistent with the definition order. Since the collection elements are unique, using the collection type can filter out duplicate elements.
T = {
    
    1010, "1010", 12.3, 1010, 1010}
print(T)
# {1010, '1010', 12.3}
  • The set type has 4 operators, intersection (&), union (|), difference (-), complement (^), and the operation logic is the same as the mathematical definition.
    insert image description here
    insert image description here
    insert image description here
S = {
    
    1010, "1010", 78.9}
T = {
    
    1010, "1010", 12.3, 1010, 1010}
print(S - T)
# {78.9}
print(T – S)
# {12.3}
print(S & T)
# {1010, '1010'}
print(T & S)
# {1010, '1010'}
print(S ^ T)
# {78.9, 12.3}
print(T ^ S)
# {78.9, 12.3}
print(S | T)
# {78.9, 1010, 12.3, '1010'}
print(T | S)
# {1010, 12.3, 78.9, '1010'}
  • Collection types have some common operation functions or methods
    insert image description here
  • The collection type is mainly used for element deduplication and is suitable for any combined data type.
S = set('知之为知之不知为不知')
print(S)
# {'不', '为', '之', '知'}
for i in S:
        print(i, end="")
# 不为之知

1.3 Overview of sequence types

  • The sequence type is a one-dimensional element vector, and there is a sequence relationship between elements, which are accessed through serial numbers.
  • Since there is an order relationship between elements, elements with the same value but different positions can exist in the sequence. Many data types in the Python language are sequence types, among which the more important ones are: string type and list type, and also includes tuple type.
  • The string type can be regarded as an ordered combination of single characters, which belongs to the sequence type. A list is a sequence type that can use multiple types of elements. Sequence types use the same indexing system, namely ascending and descending numbers.
    insert image description here
  • Sequence types have some common operators and functions
    insert image description here

1.4 Overview of Mapping Types

  • The mapping type is a combination of "key-value" data items, each element is a key-value pair, that is, the element is (key, value), and the elements are unordered. A key-value pair is a binary relationship, derived from the mapping relationship between attributes and values
    insert image description here

  • Mapping types are an extension of sequence types. In the sequence type, the index of the specific element value is indexed by the positive increment sequence number starting from 0. For the mapping type, the user defines the serial number, that is, the key, which is used to index the specific value.

  • The key (key) represents an attribute, and can also be understood as a category or item, the value (value) is the content of the attribute, and the key-value pair describes an attribute and its value. Key-value pairs structure the mapping relationship for storage and expression.

2. List type

2.1 Definition of list

  • A list is an ordered sequence consisting of 0 or more tuples, which belongs to the sequence type. Lists can be added, deleted, replaced, searched, and so on. The list has no length limit, the element types can be different, no predefined length is required.
  • The list type is represented by square brackets ([]), and the collection or string type can also be converted into a list type through the list(x) function.
ls = [1010, "1010", [1010, "1010"], 1010]
print(ls)
# [1010, '1010', [1010, '1010'], 1010]
print(list('列表可以由字符串生成'))
# ['列', '表', '可', '以', '由', '字', '符', '串', '生', '成']
print(list())
# []
  • The list belongs to the sequence type, so the list type supports the operations corresponding to the sequence type

2.2 List index

  • Indexing is the basic operation of lists, used to obtain an element of the list. Use square brackets as the indexing operator.
ls = [1010, "1010", [1010, "1010"], 1010]
print(ls[3])
# 1010
print(ls[-2])
# [1010, '1010']
print(ls[5])
'''
Traceback (most recent call last):
   File "<pyshell#35>", line 1, in <module>
      ls[5]
IndexError: list index out of range'''
  • You can use the traversal loop to traverse the elements of the list type. The basic usage is as follows:
for <循环变量> in <列表变量>:

        <语句块>
ls = [1010, "1010", [1010, "1010"], 1010]
for i in ls:
        print(i*2)
'''
2020
10101010
[1010, '1010', 1010, '1010']
2020'''

2.3 Slicing of lists

  • Slicing is the basic operation of lists, which is used to obtain a fragment of the list, that is, to obtain one or more elements. The result of slicing is also a list type. Slices can be used in two ways:
<列表或列表变量>[N: M]<列表或列表变量>[N: M: K]
  • Slicing takes elements of the list type from N to M (not including M) to form a new list. When K exists, the slice obtains a list composed of elements corresponding to the list type from N to M (not including M) with K as the step size.
ls = [1010, "1010", [1010, "1010"], 1010]
print(ls[1:4])
# ['1010', [1010, '1010'], 1010]
print(ls[-1:-3])
# []
print(ls[-3:-1])
# ['1010', [1010, '1010']]
print(ls[0:4:2])
# [1010, [1010, '1010']]

3. List type operations

3.1 List operation functions

  • The list type inherits the characteristics of the sequence type and has some general operation functions
    insert image description here
ls = [1010, "1010", [1010, "1010"], 1010]
print(len(ls))
# 4 
lt =["Python", ["1010", 1010, [
1010, "
Python"]]]
print(len(lt))
# 2
  • min(ls) and max(ls) return the smallest or largest element of a list respectively. The premise of using these two functions is that the types of elements in the list can be compared.
ls = [1010, 10.10, 0x1010]
print(min(ls))
# 10.1
lt = ["1010", "10.10", "Python"]
print(max(lt))
# 'Python'
ls = ls + lt
print(ls)
# [1010, 10.1, 4112, '1010', '10.10', 'Python']
print(min(ls))
'''
Traceback (most recent call last):
   File "<pyshell#15>", line 1, in <module>
      min(ls)
TypeError: '<' not supported between instances of 'str' and 'float''''
  • list(x) converts the variable x into a list type, where x can be a string type or a dictionary type.
print(list("Python"))
# ['P', 'y', 't', 'h', 'o', 'n']
print(list({
    
    "小明", "小红", "小白", "小新"}))
# ['小红', '小明', '小新', '小白']
print(list({
    
    "201801":"小明", "201802":"小红", "201803":"小白"}))
# ['201801', '201802', '201803']

3.2 How to operate the list

  • There are some operation methods for the list type, and the usage syntax is:
<列表变量>.<方法名称>(<方法参数>)

insert image description here

  • ls.append(x) adds an element x to the end of the list ls.
lt = ["1010", "10.10", "Python"]
lt.append(1010)
print(lt)
# ['1010', '10.10', 'Python', 1010]
lt.append([1010, 0x1010])
print(lt)
# ['1010', '10.10', 'Python', 1010, [1010, 4112]]
  • ls.append(x) is only used to add one element to the list. If you want to add multiple elements, you can use the plus sign to merge the two lists.
lt = ["1010", "10.10", "Python"]
ls = [1010, [1010, 0x1010]]
ls += lt
print(lt)
['1010', '10.10', 'Python', 1010, [1010, 4112]]
  • ls.insert(i, x) adds element x at the position of sequence number i in the list ls, and the sequence numbers of elements after sequence number i increase in turn.
lt = ["1010", "10.10", "Python"]
lt.insert(1, 1010)
print(lt)
# ['1010', 1010, '10.10', 'Python']
  • ls.clear() deletes all elements of the list ls and clears the list.
lt = ["1010", "10.10", "Python"]
lt.clear()
print(lt)
# []
  • ls.pop(i) will return the i-th element in the list ls and delete the element from the list.
lt = ["1010", "10.10", "Python"]
print(lt.pop(1))
# 10.10
print(lt)
# ["1010", "Python"]
  • ls.remove(x) will remove the first occurrence of x in the list ls.
lt = ["1010", "10.10", "Python"]
lt.remove("10.10")
print(lt)
# ["1010", "Python"]
  • In addition to the above methods, you can also use the Python reserved word del to delete list elements or fragments, as follows:
del <列表变量>[<索引序号>]del <列表变量>[<索引起始>: <索引结束>]
lt = ["1010", "10.10", "Python"]
del lt[1]
print(lt)
# ["1010", "Python"]
lt = ["1010", "10.10", "Python"]
del lt[1:]
print(lt)
# ["1010"]
  • ls.reverse() reverses the elements in the list ls in reverse order.
lt = ["1010", "10.10", "Python"]
print(lt.reverse())
# ['Python', '10.10', '1010']
  • ls.copy() Copies all elements in ls to generate a new list.
lt = ["1010", "10.10", "Python"]
ls = lt.copy()
lt.clear() # 清空lt
print(ls)
# ["1010", "10.10", "Python"]
  • As can be seen from the above example, a list lt is copied using the .copy() method and assigned to the variable ls, and clearing the lt element does not affect the newly generated variable ls.
  • It should be noted that for basic data types, such as integers or strings, element assignment can be achieved through the equal sign. But for the list type, the real assignment cannot be achieved by using the equal sign. Among them, the ls = lt statement does not copy the elements in lt to the variable ls, but associates a new reference, that is, ls and lt point to the same set of content.
lt = ["1010", "10.10", "Python"]
ls = lt # 仅使用等号
lt.clear()
print(ls)
# []
  • List elements can be modified using an index with the equal sign (=).
lt = ["1010", "10.10", "Python"]
lt[1] = 1010
print(lt)
# ["1010", 1010, "Python"]
  • List is a very flexible data structure, which has the ability to handle arbitrary length and mixed types, and provides a wealth of basic operators and methods. When the program needs to use combined data types to manage batch data, please use the list type as much as possible.

4. Dictionary type

4.1 Dictionary definition

  • "Key-value pairs" are an important way to organize data and are widely used in Web systems. The basic idea of ​​a key-value pair is to associate the "value" information with a "key" information, and then use the key information to find the corresponding value information. This process is called mapping. Mapping is implemented through the dictionary type in the Python language.
  • Dictionaries in the Python language are created using braces {}, and each element is a key-value pair, which is used as follows:
{
    
    <1>:<1>, <2>:<2>,, <键n>:<值n>}
  • Among them, keys and values ​​are connected by colons, and different key-value pairs are separated by commas. The dictionary type also has properties similar to collections, that is, there is no order between key-value pairs and they cannot be repeated.
  • The variable d can be regarded as the mapping relationship between "student number" and "name". It should be noted that the elements of the dictionary do not have an order.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d)
# {'201801': '小明', '201802': '小红', '201803': '小白'}

4.2 Indexing of dictionaries

  • List types are indexed by position in element order. Since the key in the dictionary element "key-value pair" is the index of the value, the key-value pair relationship can be directly used to index the element.
  • The index pattern for key-value pairs in a dictionary is as follows, in square bracket format:
<> = <字典变量>[<>]
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d["201802"])
# 小红
  • Each element in the dictionary can be modified by using index and assignment (=).
d["201802"] = '新小红'
print(d)
# {'201801': '小明', '201803': '小白', '201802': '新小红'}
  • Dictionaries can be created using braces. Elements can be added to the dictionary through the cooperation of indexing and assignment.
t = {
    
    }
t["201804"] = "小新"
print(d)
# {'201804': '小新'}
  • A dictionary is a data structure that stores a variable number of key-value pairs. Keys and values ​​can be of any data type. Values ​​are indexed by keys and values ​​can be modified by keys.

5. Dictionary type operations

5.1 Operation functions of dictionaries

  • The dictionary type has some common operation functions
    insert image description here
  • len(d) gives the number of elements in the dictionary d, also known as the length.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(len(d))
# 3
  • min(d) and max(d) return the smallest or largest index value in dictionary d, respectively.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(min(d))
# '201801'
print(max(d))
# '201803'
  • The dict() function is used to generate an empty dictionary, which is consistent with {}.
d = dict()
print(d)
# {}

5.2 Operation method of dictionary

  • There are some operation methods for the dictionary type, and the usage syntax is:
<字典变量>.<方法名称>(<方法参数>)

insert image description here

  • d.keys() returns all the key information in the dictionary, and the return result is an internal data type of Python, dict_keys, which is dedicated to representing the keys of the dictionary. If you want to make better use of the returned result, you can convert it to a list type.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.keys())
# dict_keys(['201801', '201802', '201803'])
print(type(d.keys()))
# <class 'dict_keys'>
print(list(d.keys()))
# ['201801', '201802', '201803']
  • d.values() returns all value information in the dictionary, and the return result is an internal data type dict_values ​​of Python. If you want to make better use of the returned result, you can convert it to a list type.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.values())
# dict_values(['小明', '小红', '小白'])
print(type(d.values()))
# <class 'dict_values'>
print(list(d.values()))
# ['小明', '小红', '小白']
  • d.items() returns all key-value pair information in the dictionary, and the return result is an internal data type dict_items of Python.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.items())
# dict_items([('201801', '小明'), ('201802', '小红'),('201803', '小白')])
print(type(d.items()))
# <class 'dict_items'>
print(list(d.items()))
# [('201801', '小明'), ('201802', '小红'), ('201803', '小白')]
  • d.get(key, default) searches and returns the value information according to the key information. If the key exists, it returns the corresponding value, otherwise it returns the default value. The second element default can be omitted. If it is omitted, the default value is empty.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.get('201802'))
'小红'
print(d.get('201804'))
print(d.get('201804', '不存在'))
'不存在'
  • d.pop(key, default) searches and fetches the value information according to the key information, returns the corresponding value if the key exists, otherwise returns the default value, the second element default can be omitted, if omitted, the default value is empty. Compared with the d.get() method, d.pop() will delete the corresponding key-value pair from the dictionary after taking out the corresponding value.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.pop('201802'))
# '小红'
print(d)
# {'201801': '小明', '201803': '小白'}
print(d.pop('201804', '不存在'))
# '不存在'
  • d.popitem() randomly takes a key-value pair from the dictionary and returns it in the form of a tuple (key, value). Remove this key-value pair from the dictionary after taking it out.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print(d.popitem())
# ('201803', '小白')
print(d)
# {'201801': '小明', '201802': '小红'}
  • d.clear() deletes all key-value pairs in the dictionary.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
d.clear()
print(d)
# {}
  • In addition, if you want to delete an element in the dictionary, you can use the Python reserved word del.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
del d["201801"]
print(d)
# {'201802': '小红', '201803': '小白'}
  • The dictionary type also supports the reserved word in, which is used to determine whether a key is in the dictionary. Return True if present, False otherwise.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
print("201801" in d)
# True
print("201804" in d)
# False
  • Like other combination types, dictionaries can be traversed through loops to traverse their elements. The basic syntax structure is as follows:
for <变量名> in <字典名>

        <语句块>
  • The variable name returned by the for loop is the index value of the dictionary. If you need to get the value corresponding to the key, you can get it through the get() method in the statement block.
d = {
    
    "201801":"小明", "201802":"小红", "201803":"小白"}
for k in d:
    print("字典的键和值分别是:{}和{}".format(k, d.get(k)))
'''
字典的键和值分别是:201801和小明
字典的键和值分别是:201802和小红
字典的键和值分别是:201803和小白'''

6. Example Analysis: Text Word Frequency Statistics

  • In many cases, you will encounter such a problem: For a given article, you want to count the words that appear many times in it, and then analyze the content of the article. The solution to this problem can be used for automatic retrieval and archiving of network information.

  • In the era of information explosion, this kind of filing or classification is very necessary. This is the problem of "word frequency statistics".
    Counting the frequency of English words in "Hamlet"

  • Step 1: Decompose and extract the words of the English article
    Use the txt.lower() function to change the letters into lowercase to eliminate the interference of the difference in the original text’s case on the word frequency statistics. In order to unify the separation method, you can use the txt.replace() method to replace various special characters and punctuation marks with spaces, and then extract words.

  • Step 2: Count each word

if word in counts:
else:
        counts[word] = 1

Alternatively, this processing logic can be expressed more concisely as the following code:

 counts[word] = counts.get(word,0) + 1
  • Step 3: Sort the statistical values ​​of words from high to low
    Since the dictionary type has no order, it needs to be converted into an ordered list type, and then use the sort() method and the lambda function to sort the elements according to the number of words .
items = list(counts.items())#将字典转换为记录列表
items.sort(key=lambda x:x[1], reverse=True) #以第2列排序
# CalHamlet.py
def getText():
    txt = open("hamlet.txt", "r").read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ") #将文本中特殊字符替换为空格
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {
    
    }
for word in words:
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

>>>
the 1138
and 965
to 754
of 669
you 550
a 542
i 542
my 514
hamlet 462
in 436

summary

Mainly aimed at readers who are beginners in programming, it specifically explains the basic concepts of programming languages, understands the IPO writing method of program development, the specific steps of configuring the Python development environment, and the characteristics of Python language and Python programs, etc., and further gives 5 Simple Python example codes to help readers test the Python development environment and have an intuitive understanding of the language.

The Python drama is about to be staged, let's follow the drama together.

Guess you like

Origin blog.csdn.net/weixin_61587867/article/details/132239331