Python core technology and combat --02 dictionaries and collections

What is a dictionary?
A dictionary is a series of elements by a key (key) and value (value) pairs consisting of a collection, in python3.7 +, the dictionary is determined to be orderly (Note that in 3.6, is a dictionary and orderly implementation detail, was officially at 3.7 become language features), but can not guarantee 100% 3.6 orderly, but before 3.6 disordered, which variable-length size, the elements can be cut and changed arbitrarily.

And compared to a list of tuples, Dictionary of better performance, particularly with respect to the Find / add / delete operations, the dictionary can be completed within a constant time complexity.

The dictionary and set substantially the same, the only difference is that there is no set paired keys and values of a series / unordered unique combination of elements.

First we look at the dictionary and create a collection, there are usually several ways

d1={'name':'jason','age':20}

d2=dict({'name':'jason','age':20})

d3=dict([('name','jason'),('age',20)])

d4=dict(name='jason',age=20)

S1 = {1,2,3}

s2=set([1,2,3])

python the dictionary and a set value or whether it is a bond, may be of mixed type. For example, the following example, I created a elements 1, 'hello', a collection of 5.0

Element access problem. Dictionary index keys can be directly accessed, if there is no exception will be thrown.

= {D ' name ' : Jason ' , ' Age ' :} 20 is 
d.get ( ' name ' ) 
d.get ( ' LOCATION ' , ' null ' ) 
if there is no return ' null ' Default

set

The collection is essentially a hash table, and the list is not the same. So it does not support indexing operations.

Want to determine if an element is not in the dictionary or a collection, we can use the value in dict / set to judge.

s={1,2,3}
1 in s
True

10 in s
False

d = {'name':'jason','age':20}
'name' in d
True

'location' in d
False

Add / delete / update operations

d={'name':'jason','age':20}
d['gender']='male'
d['dob']='1999-02-01'
d.pop('dob')

s={1,2,3}
s.add(4)
s.remove(4)

a collection of pop () operation is to delete the last element in the collection, but the collection itself is out of order so you can not know which element will be deleted, this operation with caution

Practical applications, in many cases, we need to sort of dictionary or a collection, for example, take out the maximum 50 pairs.

For the dictionary, we usually according to the key or value, ascending or descending order:

d={"a':1,'c':3,'d':4}
sorted_by_key = sorted(d.items(),key=lambda x:x[0])
sorted_by_value = sorted(d.items(),key=lambda x:x[1])

Performance dictionaries and collections

Dictionaries and collections are carried out highly performance optimized data structures, especially for Find / add / delete and operations. That Next, we take a look at their performance in specific scenarios, as well as a list of other data structures compare

For example, a business enterprise background, storing the ID / name / and price of each product. Now demand is given an item of ID, we are asked to identify their prices

If we use the list to store these data structures, and to find the corresponding code is as follows

def find_product_price(products,product_id):
      for id,price in products:
             if id == product_id:
                   return price
       return None

products=[(143121312,100),
                (23121312,30),
                 (32421312,150)]

The list of assumptions there are n elements, and the search process to traverse the list, then the time complexity is O (n). Even if we first sort the list, and then use binary search, we also need the complexity O (logn) time degree, not to mention the need to sort the list O (nlogn) time.

But if you use a dictionary to store the data, then look will be very convenient and efficient, requires only O (1) time complexity can be completed. The reason is simple, just mentioned, the composition of the internal dictionary is a hash table, you can directly through the hash value of the key to find the corresponding value.

Similarly, the demand now become, to find out how many of these commodities have different prices. We also use the same method to compare.

If a list is selected, the corresponding code is as follows, wherein, A and B are two cycles. Also assuming that the original list has n elements, then the worst case, require O (n ^ 2) time complexity.

def find_unique_price_using_list(products):
      unique_price_list =[]
      for _,price in products:
          if price not in unique_price_list:
                  unique_price_list.append(price)
      return len(unique_price_list)

But if we choose to use this set of data structures, because the set is highly optimized hash table, which elements can not be repeated, and find and add complexity only needs O (1), then the total time complexity on only O (n)

def find_unique_price_using_set(products):
      unique_price_set =set()
      for _,price in products:
                  unique_price_set.add(price)
      return len(unique_price_set)

Dictionary and a collection of works:

Internal data structure of the dictionary is a collection of hash tables.

For dictionary terms, this table stores the hash value (hash) / keys / values of these three elements.

As for the collection, the difference is not the key pair in the hash table, only a single element.

The old version of the hash table structure:

Hash value (hash) key (key) value (value)
 ------------------------------------ --------- 
hash0 KEY0 value0
 -------------------------------------- ------- 
hash1 key1 value1
 ---------------------------------------- ----- 
hash2 key2 value2
 ------------------------------------------ -

Not difficult to imagine, with the expansion of the hash table, it will become more and more sparse. For example, such as I have such a dictionary:

{'name':'mike','dob':'1991-01-01',''gender':'male'}

Then it will be stored as a form similar to the following:

entries=[
['--','--','--'],
[-230273521,'dob','1999-01-01'],
['--','--','--'],
['--','--','--'],
['1231236123','name','mike'],
['--','--','--'],
[9371539127','gender','male']
]

This design architecture is a waste of storage space. In order to improve memory utilization, in addition to the dictionary now hash table structure itself, and the hash value will index / key / value separately from, this new structure is the following

Indices
--------------------------------------------------------------------------
None| index| None| None| index|None|index|....
-------------------------------------------------------------------------

Entries
--------------------
hash0 key0 value0
-----------------------
hash1  key1  value1
------------------------
hash2   key2   value2

Python core technology and combat --02 dictionaries and collections

Guess you like