Reading speed of python dictionary dict and list list

 

Reading speed issues with python dictionaries and lists

When processing genomic data recently, it is necessary to read larger data (2.7G) and store it in a dictionary, then match the key value of the dictionary to the processed data, and read one line at a time in the processed file for processing. Whether in the keys of the dictionary, the efficiency of the following two pieces of code is very different:

First paragraph:

if(pos in fre_dist.keys()):
newvalue= fre_dist[pos]

Second paragraph:

if(pos in fre_dist):
newValue=fre_dist[pos]

When processing 30,000 pieces of data, the second piece of code is thousands of times faster than the first piece of code.

The reason is: the first piece of code fre_dist.keys() becomes a list, python is relatively slow when retrieving a list, the second piece of code fre_dist is a dictionary, and python is faster when retrieving a dictionary.

blood lessons.

 

 

 

 

dict structure, I think most people will think of the method for key in dictobj, and indeed this method is applicable in most cases. But it's not completely safe, see the following example:

Copy the code The code is as follows:
#Initialize a dict here
>>> d = {'a':1, 'b':0, 'c':1, 'd':0}
#The intention is to traverse the dict and find that the value of the element is 0, then delete
>>> for k in d:
... if d[k] == 0:
... del(d[k])
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration #The
result throws an exception, and only one of the two 0 elements is deleted.
>>> d
{'a': 1, 'c': 1, 'd': 0}

>>> d = {'a':1, 'b':0, 'c':1, 'd' :0}
#d.keys() is an array of subscripts
>>> d.keys()
['a', 'c', 'b', 'd'
] # This way of traversing, there is no problem, because in fact, the list constant d.keys() is actually traversed here.
>>> for k in d.keys():
... if d[k] == 0:
... del(d[k])
... 
>>> d
{'a': 1, 'c': 1}
#The result is also
correct >>>

In fact, this example is simplified by me. I found this problem in a multi-threaded program, so my suggestion is: when traversing dict, get into the habit of using for k in d.keys().
However, if it is multi-threaded, is this absolutely safe? Not necessarily: after both threads have finished fetching d.keys(), if both threads delete the same key, the first deletion will succeed, and the latter deletion will definitely report a KeyError, which seems to be only possible guaranteed by other means.


Another article: Performance comparison of two traversal methods of dict

About performance issues with parentheses and without parentheses in tangled dict traversal

Copy the code The code is as follows:

for (d,x) in dict.items():
     print "key:"+d+",value:"+str(x)

for d,x in dict.items():
    print "key:"+d+",value:"+str(x)

We can see that when the number of dicts is less than 200, the performance with parentheses is higher, but the execution time without parentheses will be less after more than 200 pieces of data.

 

 

 

The dictionary is represented by curly brackets ({}), the items in it appear in pairs, one key corresponds to one value; the key and the value
are separated by a colon (:); different items are separated by a comma (,).

Python Shell:

copy code
n = {'username':'zz',"password":123}
n.keys()
dict_keys(['username', 'password'])
n.values()
dict_keys(['zz', 123])


n.items()
dict_items([('username', 'zc'), ('password', 123)])

for (k,v) in n.items():
        print("this's key:%r" %k)
        print("this's value:%r" %v")

this's key:'username'
this's value:'zc'
this's key:'password'
this's value:123
copy code


zip(): is to take out the elements of each array in turn, and then combine

copy code
n = [1,2,3]
m = ['a','b','c']
a = zip(m,n)

for i in a:
    print(i)

('a', 1)
('b', 2)
('c', 3)
copy code
copy code
n = [1,2,3]
m = ['a','b','c']
a = zip(m,n)

for (m,n) in a:
        print(m,n)

a 1
b 2
c 3
copy code

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324635971&siteId=291194637