Articles and codes have been archived to [Github warehouse: https://github.com/timerring/dive-into-AI] or the public account [AIShareLab] can also be obtained by replying to python data analysis .
Article directory
- Data Structures and Sequences
-
- tuple
- split tuple
- tuple method
- the list
- Add and remove elements
- Concatenating and combining lists
- to sort
- Binary Search and Maintaining a Sorted List
- slice
- sequence function
- enumerate function
- sorted function
- zip function
- reversed function
- dictionary
- Create a dictionary with sequences
- Defaults
- valid key types
- gather
- List, set and dictionary comprehensions
- nested list comprehension
Data Structures and Sequences
tuple
A tuple is a fixed-length, immutable sequence of Python objects. The easiest way to create a tuple is to separate a list of values with commas:
In [1]: tup = 4, 5, 6
When defining tuples with complex expressions, it is best to enclose the values in parentheses, as follows:
In [3]: nested_tup = (4, 5, 6), (7, 8)
In [4]: nested_tup
Out[4]: ((4, 5, 6), (7, 8))
tuple
Any sequence or iterator can be converted to a tuple with :
In [5]: tuple([4, 0, 2])
Out[5]: (4, 0, 2)
In [6]: tup = tuple('string')
In [7]: tup
Out[7]: ('s', 't', 'r', 'i', 'n', 'g')
Elements of a tuple can be accessed using square brackets. Like C, C++, JAVA and other languages, the sequence starts from 0:
In [8]: tup[0]
Out[8]: 's'
Objects stored in tuples may be mutable objects. Once a tuple is created, the objects in the tuple cannot be modified:
If an object in the tuple is mutable, such as a list, it can be modified in-place :
In [11]: tup[1].append(3)
In [12]: tup
Out[12]: ('foo', [1, 2, 3], True)
Tuples can be concatenated using the plus operator :
In [13]: (4, None, 'foo') + (6, 0) + ('bar',)
Out[13]: (4, None, 'foo', 6, 0, 'bar')
Multiplying a tuple by an integer, like a list, concatenates copies of several tuples:
In [14]: ('foo', 'bar') * 4
Out[14]: ('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')
The object itself isn't copied, it's just referenced.
split tuple
If you try to assign a tuple to a tuple-like variable, Python tries to split the value to the right of the equals sign :
In [15]: tup = (4, 5, 6)
In [16]: a, b, c = tup
In [17]: b
Out[17]: 5
Even tuples that contain tuples are split :
In [18]: tup = 4, 5, (6, 7)
In [19]: a, b, (c, d) = tup
In [20]: d
Out[20]: 7
Using this feature, you can easily replace variable names, which might be done in other languages:
tmp = a
a = b
b = tmp
But in Python, substitution can be done like this:
In [21]: a, b = 1, 2
In [22]: a
Out[22]: 1
In [23]: b
Out[23]: 2
In [24]: b, a = a, b
In [25]: a
Out[25]: 2
In [26]: b
Out[26]: 1
Variable splitting is often used to iterate over sequences of tuples or lists:
In [27]: seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
In [28]: for a, b, c in seq:
....: print('a={0}, b={1}, c={2}'.format(a, b, c))
a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9
Another common usage is to return multiple values from a function. It will be explained in detail later.
Python has recently added more advanced tuple splitting functionality, allowing to "pluck" several elements from the beginning of the tuple. It uses a special syntax *rest
to grab the remaining parts to form a list:
In [29]: values = 1, 2, 3, 4, 5
In [30]: a, b, *rest = values
In [31]: a, b
Out[31]: (1, 2)
In [32]: rest
Out[32]: [3, 4, 5]
rest
The part is the part you want to discard, the name of the rest is not important. As a matter of convention, many Python programmers use underscores for variables that are not needed:
In [33]: a, b, *_ = values
tuple method
Because the size and content of the tuple cannot be modified, it has very few instance methods. One of the very useful ones count
(which also apply to lists) is to count how often a value occurs:
In [34]: a = (1, 2, 2, 2, 3, 4, 2)
In [35]: a.count(2)
Out[35]: 4
the list
In contrast to tuples, lists have variable length and can be modified. You can define it with square brackets, or with list
a function:
In [37]: tup = ('foo', 'bar', 'baz')
In [38]: b_list = list(tup)
In [39]: b_list
Out[39]: ['foo', 'bar', 'baz']
list
Functions are often used to instantiate iterators or generators in data processing:
In [42]: gen = range(10)
In [43]: gen
Out[43]: range(0, 10)
In [44]: list(gen)
Out[44]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Add and remove elements
Use append
to add elements at the end of a list :
In [45]: b_list.append('dwarf')
In [46]: b_list
Out[46]: ['foo', 'peekaboo', 'baz', 'dwarf']
insert
Elements can be inserted at specific positions :
In [47]: b_list.insert(1, 'red')
In [48]: b_list
Out[48]: ['foo', 'red', 'peekaboo', 'baz', 'dwarf']
The inserted sequence number must be between 0 and the length of the list.
Warning: Computationally expensive
append
compared to , since references to subsequent elements must be migrated internally to make room for new elements.insert
If you want to insert elements at the head and tail of the sequence, you may need to usecollections.deque
a double-tailed queue.
The inverse of insert is pop
that it removes and returns the element at the specified position **:
In [49]: b_list.pop(2)
Out[49]: 'peekaboo'
In [50]: b_list
Out[50]: ['foo', 'red', 'baz', 'dwarf']
You can use to remove
remove a value, remove
the first value will be found and removed :
In [51]: b_list.append('foo')
In [52]: b_list
Out[52]: ['foo', 'red', 'baz', 'dwarf', 'foo']
In [53]: b_list.remove('foo')
In [54]: b_list
Out[54]: ['red', 'baz', 'dwarf', 'foo']
If you don't consider performance, use append
and remove
, you can think of Python's list as a perfect "multiset" data structure.
You in
can check if a list contains a value with :
In [55]: 'dwarf' in b_list
Out[55]: True
Negation in
can add a not:
In [56]: 'dwarf' not in b_list
Out[56]: False
Checking for the existence of a value in a list is much slower than dictionaries and sets, because Python is linearly searching the value in the list, but in dictionaries and sets, you can also check other items in the same time (based on hash tables) .
Concatenating and combining lists
Similar to tuples, two lists can be concatenated using the plus sign :
In [57]: [4, None, 'foo'] + [7, 8, (2, 3)]
Out[57]: [4, None, 'foo', 7, 8, (2, 3)]
If a list has been defined, multiple elements can be appended withextend
the method :
In [58]: x = [4, None, 'foo']
In [59]: x.extend([7, 8, (2, 3)])
In [60]: x
Out[60]: [4, None, 'foo', 7, 8, (2, 3)]
Concatenating lists by addition is computationally expensive because a new list is created and objects are copied. Appending extend
elements, especially to a large list, is preferable.
everything = []
for chunk in list_of_lists:
everything.extend(chunk)
is faster than the concatenation method:
everything = []
for chunk in list_of_lists:
everything = everything + chunk
to sort
You can sort a list in-place (without creating a new object) with a function:sort
In [61]: a = [7, 2, 5, 1, 3]
In [62]: a.sort()
In [63]: a
Out[63]: [1, 2, 3, 5, 7]
sort
There are a few options that work well sometimes. One of them is the secondary sort key, which can be used to sort. For example, we can sort strings by length:
In [64]: b = ['saw', 'small', 'He', 'foxes', 'six']
In [65]: b.sort(key=len)
In [66]: b
Out[66]: ['He', 'saw', 'six', 'small', 'foxes']
Later, we'll learn about sorted
functions that produce a sorted copy of a sequence.
Binary Search and Maintaining a Sorted List
bisect
The module supports binary search, and inserting values into sorted lists.
-
bisect.bisect
You can find where sorting is guaranteed after inserting values, -
bisect.insort
is to insert a value at this position:
In [67]: import bisect
In [68]: c = [1, 2, 2, 2, 3, 4, 7]
In [69]: bisect.bisect(c, 2)
Out[69]: 4
In [70]: bisect.bisect(c, 5)
Out[70]: 6
In [71]: bisect.insort(c, 6)
In [72]: c
Out[72]: [1, 2, 2, 2, 3, 4, 6, 7]
Note:
bisect
The module does not check whether the list is sorted, doing so would be computationally expensive. Therefore, using it on an unsorted listbisect
will not produce an error, but the result is not necessarily correct.
slice
Slices can be used to select parts of most sequence types, and the basic form of slicing is used within square brackets start:stop
:
In [73]: seq = [7, 2, 3, 7, 5, 6, 0, 1]
In [74]: seq[1:5]
Out[74]: [2, 3, 7, 5]
Slices can also be assigned to sequences:
In [75]: seq[3:4] = [6, 3]
In [76]: seq
Out[76]: [7, 2, 3, 6, 3, 5, 6, 0, 1]
The start element of the slice is inclusive, the end element is not. Therefore, the number of elements contained in the result is stop - start
. start
or stop
both can be omitted. After omitting, the default sequence start and end respectively. Negative numbers indicate slice from back to front.
Slices of positive and negative integers are shown.
Used after the second colon step
, you can take an element every other:
In [81]: seq[::2]
Out[81]: [7, 3, 3, 6, 1]
A clever approach is to use-1
, which reverses a list or tuple :
In [82]: seq[::-1]
Out[82]: [1, 0, 6, 5, 3, 6, 3, 2, 7]
sequence function
enumerate function
When iterating over a sequence, you may want to keep track of the current item's ordinal. A manual approach might be as follows:
i = 0
for value in collection:
# do something with value
i += 1
Python has a built-in enumerate
function that returns (i, value)
a sequence of tuples:
for i, value in enumerate(collection):
# do something with value
A good way to use when you're indexing data enumerate
is to compute the sequence's (unique) dict
values that map to positions:
In [83]: some_list = ['foo', 'bar', 'baz']
In [84]: mapping = {
}
# 同时列出序号和数据内容
In [85]: for i, v in enumerate(some_list):
....: mapping[v] = i
In [86]: mapping
Out[86]: {
'bar': 1, 'baz': 2, 'foo': 0}
sorted function
sorted
Functions can return a new sorted list from any sequence of elements:
In [87]: sorted([7, 1, 2, 6, 0, 3, 2])
Out[87]: [0, 1, 2, 2, 3, 6, 7]
In [88]: sorted('horse race')
Out[88]: [' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']
sorted
Functions can accept sort
the same parameters as .
zip function
zip
Multiple lists, tuples, or other sequences can be combined pairwise into a list of tuples:
In [89]: seq1 = ['foo', 'bar', 'baz']
In [90]: seq2 = ['one', 'two', 'three']
In [91]: zipped = zip(seq1, seq2)
In [92]: list(zipped)
Out[92]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
zip
Any number of sequences can be processed, the number of elements depends on the shortest sequence :
In [93]: seq3 = [False, True]
In [94]: list(zip(seq1, seq2, seq3))
Out[94]: [('foo', 'one', False), ('bar', 'two', True)]
zip
One of the common uses of is to iterate over multiple sequences simultaneously, possibly in enumerate
combination:
In [95]: for i, (a, b) in enumerate(zip(seq1, seq2)):
....: print('{0}: {1}, {2}'.format(i, a, b))
....:
0: foo, one
1: bar, two
2: baz, three
Given a "compressed" sequence, zip
it can be used to decompress the sequence. Can also be thought of as converting a list of rows to a list of columns. This method looks a bit magical:
In [96]: pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
....: ('Schilling', 'Curt')]
In [97]: first_names, last_names = zip(*pitchers)
In [98]: first_names
Out[98]: ('Nolan', 'Roger', 'Schilling')
In [99]: last_names
Out[99]: ('Ryan', 'Clemens', 'Curt')
reversed function
reversed
A sequence can be iterated backwards and forwards:
In [100]: list(reversed(range(10)))
Out[100]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Remember that reversed
it is a generator (detailed later), and only after materialization (ie list or for loop) can the reversed sequence be created.
dictionary
create dictionary
More common names for dictionaries are hash maps or associative arrays. It is a variable-sized collection of key-value pairs, where both keys and values are Python objects. One of the ways to create a dictionary is to use angle brackets, with colons separating keys and values:
In [101]: empty_dict = {
}
In [102]: d1 = {
'a' : 'some value', 'b' : [1, 2, 3, 4]}
In [103]: d1
Out[103]: {
'a': 'some value', 'b': [1, 2, 3, 4]}
access dictionary
You can access, insert, or set elements in dictionaries just like elements in lists or tuples
In [104]: d1[7] = 'an integer'
In [105]: d1
Out[105]: {
'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}
In [106]: d1['b']
Out[106]: [1, 2, 3, 4]
You can check for a key in a dictionary the same way you check for a value in lists and tuples:
In [107]: 'b' in d1
Out[107]: True
delete
Values can be deleted with del
keywords or methods (which return the value and delete the key):pop
In [111]: d1
Out[111]:
{
'a': 'some value',
'b': [1, 2, 3, 4],
7: 'an integer',
5: 'some value',
'dummy': 'another value'}
In [112]: del d1[5]
In [114]: ret = d1.pop('dummy')
In [115]: ret
Out[115]: 'another value'
In [116]: d1
Out[116]: {
'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}
keys 和 values
keys
and values
are iterator methods for the keys and values of the dictionary. Although key-value pairs have no order, these two methods can output keys and values in the same order :
In [117]: list(d1.keys())
Out[117]: ['a', 'b', 7]
In [118]: list(d1.values())
Out[118]: ['some value', [1, 2, 3, 4], 'an integer']
fusion
update
You can merge one dictionary with another using the method:
In [119]: d1.update({
'b' : 'foo', 'c' : 12})
In [120]: d1
Out[120]: {
'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}
update
The approach is to mutate the dictionary in-place, so any update
old values for keys passed to will be discarded .
Create a dictionary with sequences
Often, you may want to combine two sequence pairs into a dictionary. Here's one way to write it:
mapping = {
}
for key, value in zip(key_list, value_list):
mapping[key] = value
Because dictionaries are essentially collections of 2-tuples, dict can accept lists of 2-tuples:
In [121]: mapping = dict(zip(range(5), reversed(range(5))))
In [122]: mapping
Out[122]: {
0: 4, 1: 3, 2: 2, 3: 1, 4: 0}
dict comprehensions
Another elegant way to construct dictionaries will be discussed later .
Defaults
The following logic is common:
if key in some_dict:
value = some_dict[key]
else:
value = default_value
Therefore, the methods get
and pop
can take default values to return, and the above if-else statement can be abbreviated as follows:
value = some_dict.get(key, default_value)
get returns None by default, and pop throws an exception if the key does not exist . Regarding setting values, the common case is that the values in the dictionary belong to other collections, such as lists. For example, you can sort a list of words by initial letter:
In [123]: words = ['apple', 'bat', 'bar', 'atom', 'book']
In [124]: by_letter = {
}
In [125]: for word in words:
# 取首字母
.....: letter = word[0]
.....: if letter not in by_letter:
# 没有该首字母,以该首字母为键,word为值
.....: by_letter[letter] = [word]
.....: else:
# 直接添加
.....: by_letter[letter].append(word)
.....:
In [126]: by_letter
Out[126]: {
'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}
setdefault
The method does exactly that. The previous for loop can be rewritten as:
for word in words:
letter = word[0]
by_letter.setdefault(letter, []).append(word)
collections
The module has a useful class, defaultdict
, that simplifies the above even further. Pass a type or function to generate default values for each position:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
by_letter[word[0]].append(word)
valid key types
The values of dictionaries can be arbitrary Python objects , while the keys are usually immutable scalar types (integers, floats, strings) or tuples (objects in a tuple must be immutable). This is called "hashability" . You can use hash
the function to check whether an object is hashable (can be used as a dictionary key):
In [127]: hash('string')
Out[127]: 5023931463650008331
In [128]: hash((1, 2, (2, 3)))
Out[128]: 1097636502276347782
In [129]: hash((1, 2, [2, 3])) # fails because lists are mutable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-129-800cd14ba8be> in <module>()
----> 1 hash((1, 2, [2, 3])) # fails because lists are mutable
TypeError: unhashable type: 'list'
To use a list as a key, one way is to convert the list to a tuple, which can be hashed as long as the inner elements can be hashed:
In [130]: d = {
}
In [131]: d[tuple([1, 2, 3])] = 5
In [132]: d
Out[132]: {
(1, 2, 3): 5}
gather
create
A set is an unordered collection of non-repeatable elements. You can treat it like a dictionary, but with keys and no values . Sets can be created in two ways: through the set function or using the angle bracket set statement :
In [133]: set([2, 2, 2, 1, 3, 3])
Out[133]: {
1, 2, 3}
In [134]: {
2, 2, 2, 1, 3, 3}
Out[134]: {
1, 2, 3}
Sets support mathematical set operations such as merge, intersection, difference, and symmetric difference. Consider two example collections:
In [135]: a = {
1, 2, 3, 4, 5}
In [136]: b = {
3, 4, 5, 6, 7, 8}
merge union or |
Merge is to take the unique elements of two collections. You can use union
methods, or |
operators:
In [137]: a.union(b)
Out[137]: {
1, 2, 3, 4, 5, 6, 7, 8}
In [138]: a | b
Out[138]: {
1, 2, 3, 4, 5, 6, 7, 8}
intersection intersection or &
The elements of the intersection are contained in both sets. You can use intersection
the or &
operator:
In [139]: a.intersection(b)
Out[139]: {
3, 4, 5}
In [140]: a & b
Out[140]: {
3, 4, 5}
Table 3-1 lists commonly used collection methods.
All logical collection operations have additional in-place implementations that directly replace the contents of the collection with the result. For large collections, it's more efficient to do this:
In [141]: c = a.copy()
In [142]: c |= b
In [143]: c
Out[143]: {
1, 2, 3, 4, 5, 6, 7, 8}
In [144]: d = a.copy()
In [145]: d &= b
In [146]: d
Out[146]: {
3, 4, 5}
Like dictionaries, collection elements are generally immutable . To get list-like elements, it must be converted to a tuple:
In [147]: my_data = [1, 2, 3, 4]
In [148]: my_set = {
tuple(my_data)}
In [149]: my_set
Out[149]: {
(1, 2, 3, 4)}
superset 和 subset
You can also test whether a set is a subset or superset of another:
In [150]: a_set = {
1, 2, 3, 4, 5}
In [151]: {
1, 2, 3}.issubset(a_set)
Out[151]: True
In [152]: a_set.issuperset({
1, 2, 3})
Out[152]: True
Sets are equal if their contents are the same:
In [153]: {
1, 2, 3} == {
3, 2, 1}
Out[153]: True
List, set and dictionary comprehensions
List comprehensions!
List comprehensions are one of Python's most beloved features. It allows users to conveniently filter elements from a collection, form a list, and modify elements in the process of passing parameters. The form is as follows:
[expr for val in collection if condition]
It is equivalent to the for loop below;
result = []
for val in collection:
if condition:
result.append(expr)
The filter condition can be ignored, leaving only the expression. For example, given a list of strings, we can filter out strings of length 2 or less and convert them to uppercase:
In [154]: strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
In [155]: [x.upper() for x in strings if len(x) > 2]
Out[155]: ['BAT', 'CAR', 'DOVE', 'PYTHON']
Dictionary comprehensions!
In a similar way, sets and dictionaries can also be derived. The comprehension of the dictionary is as follows:
dict_comp = {
key-expr : value-expr for value in collection if condition}
Set comprehensions!
Set comprehensions are just like lists, but with angle brackets:
set_comp = {
expr for value in collection if condition}
Similar to list comprehensions, set and dictionary comprehensions are convenient and make code easy to read and write. Look at the preceding list of strings. If we only want the length of the string, it's very convenient to use a set comprehension:
In [156]: unique_lengths = {
len(x) for x in strings}
In [157]: unique_lengths
Out[157]: {
1, 2, 3, 4, 6}
map
The function can be further simplified:
In [158]: set(map(len, strings)) # 妙极
Out[158]: {
1, 2, 3, 4, 6}
As an example of a dictionary comprehension, we can create a lookup map of a string to determine its position in a list:
In [159]: loc_mapping = {
val : index for index, val in enumerate(strings)}
In [160]: loc_mapping
Out[160]: {
'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}
nested list comprehension
Suppose we have a list of lists containing some English and Spanish names:
In [161]: all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
.....: ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]
You might get these names from some files and want to classify them by language. Now suppose we want to have a list of all names with two or more e's in them. You can do it with a for loop:
names_of_interest = []
for names in all_data:
enough_es = [name for name in names if name.count('e') >= 2]
names_of_interest.extend(enough_es)
These can be written together in nested list comprehensions, like this:
In [162]: result = [name for names in all_data for name in names
.....: if name.count('e') >= 2]
In [163]: result
Out[163]: ['Steven']
Nested list comprehensions look a little complicated. **The for part of the list comprehension is based on the nesting order, and the filter condition is still placed at the end. **Here is another example where we flatten a list of tuples of integers into a list of integers:
In [164]: some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
# 可以做到取出每一项
In [165]: flattened = [x for tup in some_tuples for x in tup]
In [166]: flattened
Out[166]: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Remember, the order of the for expressions is the same as the order of the nested for loops (not the order of the list comprehensions):
flattened = []
for tup in some_tuples:
for x in tup:
flattened.append(x)
You can have as many levels of nesting as you want, but if you have more than two or three, you should consider code readability. It is also important to distinguish the syntax in list comprehensions of list comprehensions:
In [167]: [[x for x in tup] for tup in some_tuples]
Out[167]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
This code produces a list of lists, rather than a flattened list of elements.