Fall in love with the python series-python performance (4): list-set-dict performance comparison

I accidentally saw the article saying that the set speed is faster than the list, so I was puzzled, so I did an experiment myself

Experiment 1: Comparison of query performance of the three with almost no duplicate data

import time
import numpy as np
##准备数据
mylist=[int(np.random.rand()*10000000) for i in range(10000000)]
myset=set(mylist)
mydic={i:i for i in mylist}
##list
st=time.time()
for i in mylist:
    t1=i
print(time.time()-st)
#set
st=time.time()
for j in myset:
    t2=j    
print(time.time()-st)
#dict
st=time.time()
for m in mydic:
    t3=mydic[m]
    #print(t3)
print(time.time()-st)

operation result:

1.7017035484313965 
1.8434038162231445 
2.733205556869507

We can see that when the data has almost no repeated values, dict is actually the slowest

Experiment 2: Comparison of query performance of the three when increasing data duplication

import time
import numpy as np
##准备数据
##np.random.rand()这里比实验1少一个0,以增加重复数据比例,下面的实验同理
mylist=[int(np.random.rand()*1000000) for i in range(10000000)]

myset=set(mylist)
mydic={i:i for i in mylist}
##list
st=time.time()
for i in mylist:
    t1=i
print(time.time()-st)
#set
st=time.time()
for j in myset:
    t2=j    
print(time.time()-st)
#dict
st=time.time()
for m in mydic:
    t3=mydic[m]
    #print(t3)
print(time.time()-st)
    

operation result:

1.7004029750823975
0.2652003765106201
0.5343024730682373

We can see that now the list is the slowest, the set is the fastest, and the plugin is still relatively large

The reason for this situation is that the set will be removed, and the set will sort the list after the operation. If you don’t believe it, you can try

The dict is fast because the bottom layer has a hash. This experiment shows that there are many repeated values, and the list will be very slow

Experiment 3: Comparison of the query performance of the three when continuing to increase the repeated ratio data

import time
import numpy as np
##准备数据
mylist=[int(np.random.rand()*100000) for i in range(10000000)]
myset=set(mylist)
mydic={i:i for i in mylist}
##list
st=time.time()
for i in mylist:
    t1=i
print(time.time()-st)
#set
st=time.time()
for j in myset:
    t2=j    
print(time.time()-st)
#dict
st=time.time()
for m in mydic:
    t3=mydic[m]
    #print(t3)
print(time.time()-st)
    

operation result:

1.7057056427001953
0.031199932098388672
0.04680013656616211

The current data is more obvious. After all, after the set is deduplicated, the data in the set will be very small. The hash also reflects the meaning in this experiment. The number of dict and list data is the same, but the dict is faster

As mentioned earlier, the role of set sorting, sorting can indeed increase the data query speed to a certain extent

import time
import numpy as np
ls1=[i for i in range(10000000)]
ls2=[int(np.random.rand()*10000000) for i in range(10000000)]
st=time.time()
for i in ls1:
    t1=i
print(time.time()-st)
st=time.time()
for j in ls2:
    t2=j
print(time.time()-st)

operation result:

1.591202974319458 
2.0755040645599365

Ordered data can become faster when querying, but the magnitude is to be determined. This experimental result provides a relatively large

Guess you like

Origin blog.csdn.net/zhou_438/article/details/109231927