Common Data Structures in Python

Articles and codes have been archived to [Github warehouse: https://github.com/timerring/dive-into-AI] or the public account [AIShareLab] can also be obtained by replying to python data analysis .

Data Structures and Sequences

tuple

A tuple is a fixed-length, immutable sequence of Python objects. The easiest way to create a tuple is to separate a list of values ​​with commas:

In [1]: tup = 4, 5, 6

When defining tuples with complex expressions, it is best to enclose the values ​​in parentheses, as follows:

In [3]: nested_tup = (4, 5, 6), (7, 8)
In [4]: nested_tup
Out[4]: ((4, 5, 6), (7, 8))

tupleAny sequence or iterator can be converted to a tuple with :

In [5]: tuple([4, 0, 2])
Out[5]: (4, 0, 2)

In [6]: tup = tuple('string')

In [7]: tup
Out[7]: ('s', 't', 'r', 'i', 'n', 'g')

Elements of a tuple can be accessed using square brackets. Like C, C++, JAVA and other languages, the sequence starts from 0:

In [8]: tup[0]
Out[8]: 's'

Objects stored in tuples may be mutable objects. Once a tuple is created, the objects in the tuple cannot be modified:

If an object in the tuple is mutable, such as a list, it can be modified in-place :

In [11]: tup[1].append(3)

In [12]: tup
Out[12]: ('foo', [1, 2, 3], True)

Tuples can be concatenated using the plus operator :

In [13]: (4, None, 'foo') + (6, 0) + ('bar',)
Out[13]: (4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, like a list, concatenates copies of several tuples:

In [14]: ('foo', 'bar') * 4
Out[14]: ('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

The object itself isn't copied, it's just referenced.

split tuple

If you try to assign a tuple to a tuple-like variable, Python tries to split the value to the right of the equals sign :

In [15]: tup = (4, 5, 6)

In [16]: a, b, c = tup

In [17]: b
Out[17]: 5

Even tuples that contain tuples are split :

In [18]: tup = 4, 5, (6, 7)

In [19]: a, b, (c, d) = tup

In [20]: d
Out[20]: 7

Using this feature, you can easily replace variable names, which might be done in other languages:

tmp = a
a = b
b = tmp

But in Python, substitution can be done like this:

In [21]: a, b = 1, 2

In [22]: a
Out[22]: 1

In [23]: b
Out[23]: 2

In [24]: b, a = a, b

In [25]: a
Out[25]: 2

In [26]: b
Out[26]: 1

Variable splitting is often used to iterate over sequences of tuples or lists:

In [27]: seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [28]: for a, b, c in seq:
   ....:     print('a={0}, b={1}, c={2}'.format(a, b, c))
a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9

Another common usage is to return multiple values ​​from a function. It will be explained in detail later.

Python has recently added more advanced tuple splitting functionality, allowing to "pluck" several elements from the beginning of the tuple. It uses a special syntax *restto grab the remaining parts to form a list:

In [29]: values = 1, 2, 3, 4, 5

In [30]: a, b, *rest = values

In [31]: a, b
Out[31]: (1, 2)

In [32]: rest
Out[32]: [3, 4, 5]

restThe part is the part you want to discard, the name of the rest is not important. As a matter of convention, many Python programmers use underscores for variables that are not needed:

In [33]: a, b, *_ = values

tuple method

Because the size and content of the tuple cannot be modified, it has very few instance methods. One of the very useful ones count(which also apply to lists) is to count how often a value occurs:

In [34]: a = (1, 2, 2, 2, 3, 4, 2)

In [35]: a.count(2)
Out[35]: 4

the list

In contrast to tuples, lists have variable length and can be modified. You can define it with square brackets, or with lista function:

In [37]: tup = ('foo', 'bar', 'baz')

In [38]: b_list = list(tup)

In [39]: b_list
Out[39]: ['foo', 'bar', 'baz']

listFunctions are often used to instantiate iterators or generators in data processing:

In [42]: gen = range(10)

In [43]: gen
Out[43]: range(0, 10)

In [44]: list(gen)
Out[44]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Add and remove elements

Use appendto add elements at the end of a list :

In [45]: b_list.append('dwarf')

In [46]: b_list
Out[46]: ['foo', 'peekaboo', 'baz', 'dwarf']

insertElements can be inserted at specific positions :

In [47]: b_list.insert(1, 'red')

In [48]: b_list
Out[48]: ['foo', 'red', 'peekaboo', 'baz', 'dwarf']

The inserted sequence number must be between 0 and the length of the list.

Warning: Computationally expensive appendcompared to , since references to subsequent elements must be migrated internally to make room for new elements. insertIf you want to insert elements at the head and tail of the sequence, you may need to use collections.dequea double-tailed queue.

The inverse of insert is popthat it removes and returns the element at the specified position **:

In [49]: b_list.pop(2)
Out[49]: 'peekaboo'

In [50]: b_list
Out[50]: ['foo', 'red', 'baz', 'dwarf']

You can use to removeremove a value, removethe first value will be found and removed :

In [51]: b_list.append('foo')

In [52]: b_list
Out[52]: ['foo', 'red', 'baz', 'dwarf', 'foo']

In [53]: b_list.remove('foo')

In [54]: b_list
Out[54]: ['red', 'baz', 'dwarf', 'foo']

If you don't consider performance, use appendand remove, you can think of Python's list as a perfect "multiset" data structure.

You incan check if a list contains a value with :

In [55]: 'dwarf' in b_list
Out[55]: True

Negation incan add a not:

In [56]: 'dwarf' not in b_list
Out[56]: False

Checking for the existence of a value in a list is much slower than dictionaries and sets, because Python is linearly searching the value in the list, but in dictionaries and sets, you can also check other items in the same time (based on hash tables) .

Concatenating and combining lists

Similar to tuples, two lists can be concatenated using the plus sign :

In [57]: [4, None, 'foo'] + [7, 8, (2, 3)]
Out[57]: [4, None, 'foo', 7, 8, (2, 3)]

If a list has been defined, multiple elements can be appended withextend the method :

In [58]: x = [4, None, 'foo']

In [59]: x.extend([7, 8, (2, 3)])

In [60]: x
Out[60]: [4, None, 'foo', 7, 8, (2, 3)]

Concatenating lists by addition is computationally expensive because a new list is created and objects are copied. Appending extendelements, especially to a large list, is preferable.

everything = []
for chunk in list_of_lists:
    everything.extend(chunk)

is faster than the concatenation method:

everything = []
for chunk in list_of_lists:
    everything = everything + chunk

to sort

You can sort a list in-place (without creating a new object) with a function:sort

In [61]: a = [7, 2, 5, 1, 3]

In [62]: a.sort()

In [63]: a
Out[63]: [1, 2, 3, 5, 7]

sortThere are a few options that work well sometimes. One of them is the secondary sort key, which can be used to sort. For example, we can sort strings by length:

In [64]: b = ['saw', 'small', 'He', 'foxes', 'six']

In [65]: b.sort(key=len)

In [66]: b
Out[66]: ['He', 'saw', 'six', 'small', 'foxes']

Later, we'll learn about sortedfunctions that produce a sorted copy of a sequence.

Binary Search and Maintaining a Sorted List

bisectThe module supports binary search, and inserting values ​​into sorted lists.

  • bisect.bisectYou can find where sorting is guaranteed after inserting values,

  • bisect.insortis to insert a value at this position:

In [67]: import bisect

In [68]: c = [1, 2, 2, 2, 3, 4, 7]

In [69]: bisect.bisect(c, 2)
Out[69]: 4

In [70]: bisect.bisect(c, 5)
Out[70]: 6

In [71]: bisect.insort(c, 6)

In [72]: c
Out[72]: [1, 2, 2, 2, 3, 4, 6, 7]

Note: bisectThe module does not check whether the list is sorted, doing so would be computationally expensive. Therefore, using it on an unsorted list bisectwill not produce an error, but the result is not necessarily correct.

slice

Slices can be used to select parts of most sequence types, and the basic form of slicing is used within square brackets start:stop:

In [73]: seq = [7, 2, 3, 7, 5, 6, 0, 1]

In [74]: seq[1:5]
Out[74]: [2, 3, 7, 5]

Slices can also be assigned to sequences:

In [75]: seq[3:4] = [6, 3]

In [76]: seq
Out[76]: [7, 2, 3, 6, 3, 5, 6, 0, 1]

The start element of the slice is inclusive, the end element is not. Therefore, the number of elements contained in the result is stop - start. startor stopboth can be omitted. After omitting, the default sequence start and end respectively. Negative numbers indicate slice from back to front.

Slices of positive and negative integers are shown.

Figure 3-1 Python slice demo

Used after the second colon step, you can take an element every other:

In [81]: seq[::2]
Out[81]: [7, 3, 3, 6, 1]

A clever approach is to use-1 , which reverses a list or tuple :

In [82]: seq[::-1]
Out[82]: [1, 0, 6, 5, 3, 6, 3, 2, 7]

sequence function

enumerate function

When iterating over a sequence, you may want to keep track of the current item's ordinal. A manual approach might be as follows:

i = 0
for value in collection:
   # do something with value
   i += 1

Python has a built-in enumeratefunction that returns (i, value)a sequence of tuples:

for i, value in enumerate(collection):
   # do something with value

A good way to use when you're indexing data enumerateis to compute the sequence's (unique) dictvalues ​​that map to positions:

In [83]: some_list = ['foo', 'bar', 'baz']

In [84]: mapping = {
    
    }
# 同时列出序号和数据内容
In [85]: for i, v in enumerate(some_list):
   ....:     mapping[v] = i

In [86]: mapping
Out[86]: {
    
    'bar': 1, 'baz': 2, 'foo': 0}

sorted function

sortedFunctions can return a new sorted list from any sequence of elements:

In [87]: sorted([7, 1, 2, 6, 0, 3, 2])
Out[87]: [0, 1, 2, 2, 3, 6, 7]

In [88]: sorted('horse race')
Out[88]: [' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

sortedFunctions can accept sortthe same parameters as .

zip function

zipMultiple lists, tuples, or other sequences can be combined pairwise into a list of tuples:

In [89]: seq1 = ['foo', 'bar', 'baz']

In [90]: seq2 = ['one', 'two', 'three']

In [91]: zipped = zip(seq1, seq2)

In [92]: list(zipped)
Out[92]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zipAny number of sequences can be processed, the number of elements depends on the shortest sequence :

In [93]: seq3 = [False, True]

In [94]: list(zip(seq1, seq2, seq3))
Out[94]: [('foo', 'one', False), ('bar', 'two', True)]

zipOne of the common uses of is to iterate over multiple sequences simultaneously, possibly in enumeratecombination:

In [95]: for i, (a, b) in enumerate(zip(seq1, seq2)):
   ....:     print('{0}: {1}, {2}'.format(i, a, b))
   ....:
0: foo, one
1: bar, two
2: baz, three

Given a "compressed" sequence, zipit can be used to decompress the sequence. Can also be thought of as converting a list of rows to a list of columns. This method looks a bit magical:

In [96]: pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
   ....:             ('Schilling', 'Curt')]

In [97]: first_names, last_names = zip(*pitchers)

In [98]: first_names
Out[98]: ('Nolan', 'Roger', 'Schilling')

In [99]: last_names
Out[99]: ('Ryan', 'Clemens', 'Curt')

reversed function

reversedA sequence can be iterated backwards and forwards:

In [100]: list(reversed(range(10)))
Out[100]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Remember that reversedit is a generator (detailed later), and only after materialization (ie list or for loop) can the reversed sequence be created.

dictionary

create dictionary

More common names for dictionaries are hash maps or associative arrays. It is a variable-sized collection of key-value pairs, where both keys and values ​​are Python objects. One of the ways to create a dictionary is to use angle brackets, with colons separating keys and values:

In [101]: empty_dict = {
    
    }

In [102]: d1 = {
    
    'a' : 'some value', 'b' : [1, 2, 3, 4]}

In [103]: d1
Out[103]: {
    
    'a': 'some value', 'b': [1, 2, 3, 4]}

access dictionary

You can access, insert, or set elements in dictionaries just like elements in lists or tuples

In [104]: d1[7] = 'an integer'

In [105]: d1
Out[105]: {
    
    'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [106]: d1['b']
Out[106]: [1, 2, 3, 4]

You can check for a key in a dictionary the same way you check for a value in lists and tuples:

In [107]: 'b' in d1
Out[107]: True

delete

Values ​​can be deleted with delkeywords or methods (which return the value and delete the key):pop

In [111]: d1
Out[111]: 
{
    
    'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [112]: del d1[5]

In [114]: ret = d1.pop('dummy')

In [115]: ret
Out[115]: 'another value'

In [116]: d1
Out[116]: {
    
    'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

keys 和 values

keysand valuesare iterator methods for the keys and values ​​of the dictionary. Although key-value pairs have no order, these two methods can output keys and values ​​in the same order :

In [117]: list(d1.keys())
Out[117]: ['a', 'b', 7]

In [118]: list(d1.values())
Out[118]: ['some value', [1, 2, 3, 4], 'an integer']

fusion

updateYou can merge one dictionary with another using the method:

In [119]: d1.update({
    
    'b' : 'foo', 'c' : 12})

In [120]: d1
Out[120]: {
    
    'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

updateThe approach is to mutate the dictionary in-place, so any updateold values ​​for keys passed to will be discarded .

Create a dictionary with sequences

Often, you may want to combine two sequence pairs into a dictionary. Here's one way to write it:

mapping = {
    
    }
for key, value in zip(key_list, value_list):
    mapping[key] = value

Because dictionaries are essentially collections of 2-tuples, dict can accept lists of 2-tuples:

In [121]: mapping = dict(zip(range(5), reversed(range(5))))

In [122]: mapping
Out[122]: {
    
    0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

dict comprehensionsAnother elegant way to construct dictionaries will be discussed later .

Defaults

The following logic is common:

if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

Therefore, the methods getand popcan take default values ​​to return, and the above if-else statement can be abbreviated as follows:

value = some_dict.get(key, default_value)

get returns None by default, and pop throws an exception if the key does not exist . Regarding setting values, the common case is that the values ​​in the dictionary belong to other collections, such as lists. For example, you can sort a list of words by initial letter:

In [123]: words = ['apple', 'bat', 'bar', 'atom', 'book']

In [124]: by_letter = {
    
    }

In [125]: for word in words:
              # 取首字母
   .....:     letter = word[0]
   .....:     if letter not in by_letter:
                  # 没有该首字母,以该首字母为键,word为值
   .....:         by_letter[letter] = [word]
   .....:     else:
                  # 直接添加
   .....:         by_letter[letter].append(word)
   .....:

In [126]: by_letter
Out[126]: {
    
    'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

setdefaultThe method does exactly that. The previous for loop can be rewritten as:

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

collectionsThe module has a useful class, defaultdict, that simplifies the above even further. Pass a type or function to generate default values ​​for each position:

from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

valid key types

The values ​​of dictionaries can be arbitrary Python objects , while the keys are usually immutable scalar types (integers, floats, strings) or tuples (objects in a tuple must be immutable). This is called "hashability" . You can use hashthe function to check whether an object is hashable (can be used as a dictionary key):

In [127]: hash('string')
Out[127]: 5023931463650008331

In [128]: hash((1, 2, (2, 3)))
Out[128]: 1097636502276347782

In [129]: hash((1, 2, [2, 3])) # fails because lists are mutable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-129-800cd14ba8be> in <module>()
----> 1 hash((1, 2, [2, 3])) # fails because lists are mutable
TypeError: unhashable type: 'list'

To use a list as a key, one way is to convert the list to a tuple, which can be hashed as long as the inner elements can be hashed:

In [130]: d = {
    
    }

In [131]: d[tuple([1, 2, 3])] = 5

In [132]: d
Out[132]: {
    
    (1, 2, 3): 5}

gather

create

A set is an unordered collection of non-repeatable elements. You can treat it like a dictionary, but with keys and no values . Sets can be created in two ways: through the set function or using the angle bracket set statement :

In [133]: set([2, 2, 2, 1, 3, 3])
Out[133]: {
    
    1, 2, 3}

In [134]: {
    
    2, 2, 2, 1, 3, 3}
Out[134]: {
    
    1, 2, 3}

Sets support mathematical set operations such as merge, intersection, difference, and symmetric difference. Consider two example collections:

In [135]: a = {
    
    1, 2, 3, 4, 5}

In [136]: b = {
    
    3, 4, 5, 6, 7, 8}

merge union or |

Merge is to take the unique elements of two collections. You can use unionmethods, or |operators:

In [137]: a.union(b)
Out[137]: {
    
    1, 2, 3, 4, 5, 6, 7, 8}

In [138]: a | b
Out[138]: {
    
    1, 2, 3, 4, 5, 6, 7, 8}

intersection intersection or &

The elements of the intersection are contained in both sets. You can use intersectionthe or &operator:

In [139]: a.intersection(b)
Out[139]: {
    
    3, 4, 5}

In [140]: a & b
Out[140]: {
    
    3, 4, 5}

Table 3-1 lists commonly used collection methods.

All logical collection operations have additional in-place implementations that directly replace the contents of the collection with the result. For large collections, it's more efficient to do this:

In [141]: c = a.copy()

In [142]: c |= b

In [143]: c
Out[143]: {
    
    1, 2, 3, 4, 5, 6, 7, 8}

In [144]: d = a.copy()

In [145]: d &= b

In [146]: d
Out[146]: {
    
    3, 4, 5}

Like dictionaries, collection elements are generally immutable . To get list-like elements, it must be converted to a tuple:

In [147]: my_data = [1, 2, 3, 4]

In [148]: my_set = {
    
    tuple(my_data)}

In [149]: my_set
Out[149]: {
    
    (1, 2, 3, 4)}

superset 和 subset

You can also test whether a set is a subset or superset of another:

In [150]: a_set = {
    
    1, 2, 3, 4, 5}

In [151]: {
    
    1, 2, 3}.issubset(a_set)
Out[151]: True

In [152]: a_set.issuperset({
    
    1, 2, 3})
Out[152]: True

Sets are equal if their contents are the same:

In [153]: {
    
    1, 2, 3} == {
    
    3, 2, 1}
Out[153]: True

List, set and dictionary comprehensions

List comprehensions!

List comprehensions are one of Python's most beloved features. It allows users to conveniently filter elements from a collection, form a list, and modify elements in the process of passing parameters. The form is as follows:

[expr for val in collection if condition]

It is equivalent to the for loop below;

result = []
for val in collection:
    if condition:
        result.append(expr)

The filter condition can be ignored, leaving only the expression. For example, given a list of strings, we can filter out strings of length 2 or less and convert them to uppercase:

In [154]: strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [155]: [x.upper() for x in strings if len(x) > 2]
Out[155]: ['BAT', 'CAR', 'DOVE', 'PYTHON']

Dictionary comprehensions!

In a similar way, sets and dictionaries can also be derived. The comprehension of the dictionary is as follows:

dict_comp = {
    
    key-expr : value-expr for value in collection if condition}

Set comprehensions!

Set comprehensions are just like lists, but with angle brackets:

set_comp = {
    
    expr for value in collection if condition}

Similar to list comprehensions, set and dictionary comprehensions are convenient and make code easy to read and write. Look at the preceding list of strings. If we only want the length of the string, it's very convenient to use a set comprehension:

In [156]: unique_lengths = {
    
    len(x) for x in strings}

In [157]: unique_lengths
Out[157]: {
    
    1, 2, 3, 4, 6}

mapThe function can be further simplified:

In [158]: set(map(len, strings)) # 妙极
Out[158]: {
    
    1, 2, 3, 4, 6}

As an example of a dictionary comprehension, we can create a lookup map of a string to determine its position in a list:

In [159]: loc_mapping = {
    
    val : index for index, val in enumerate(strings)}

In [160]: loc_mapping
Out[160]: {
    
    'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

nested list comprehension

Suppose we have a list of lists containing some English and Spanish names:

In [161]: all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
   .....:             ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

You might get these names from some files and want to classify them by language. Now suppose we want to have a list of all names with two or more e's in them. You can do it with a for loop:

names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)

These can be written together in nested list comprehensions, like this:

In [162]: result = [name for names in all_data for name in names
   .....:           if name.count('e') >= 2]

In [163]: result
Out[163]: ['Steven']

Nested list comprehensions look a little complicated. **The for part of the list comprehension is based on the nesting order, and the filter condition is still placed at the end. **Here is another example where we flatten a list of tuples of integers into a list of integers:

In [164]: some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
# 可以做到取出每一项
In [165]: flattened = [x for tup in some_tuples for x in tup]

In [166]: flattened
Out[166]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Remember, the order of the for expressions is the same as the order of the nested for loops (not the order of the list comprehensions):

flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)

You can have as many levels of nesting as you want, but if you have more than two or three, you should consider code readability. It is also important to distinguish the syntax in list comprehensions of list comprehensions:

In [167]: [[x for x in tup] for tup in some_tuples]
Out[167]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

This code produces a list of lists, rather than a flattened list of elements.

Guess you like

Origin blog.csdn.net/m0_52316372/article/details/130470632