Unanticipated Python Dictionary Behavior

Harshal Parekh :

I have this piece of code:

import time

d = dict()
for i in range(200000):
    d[i] = "DUMMY"

start_time = time.time()

for i in range(200000):
    for key in d:
        if len(d) > 1 or -1 not in d:
            break
    del d[i]

print("--- {} seconds ---".format(time.time() - start_time))

Why does this take ~15 seconds to run?

But, if I comment out del d[i] or the inner loop, it runs in ~0.1 seconds.

Blckknght :

The issue you have is caused by iterating over even one element (e.g. next(iter(d))) of a dictionary that was once large, but has been shrunk a great deal. This can be nearly slow as iterating over all of the dictionary items if you get unlucky with your hash values. And this code is very "unlucky" (predictably so, due to Python hash design).

The reason for the issue is that Python does not rebuild a dictionary's hash table when you remove items. So the hash table for a dictionary that used to have 200000 items in it, but which now has only 1 left is still has more than 200000 spaces in it (and probably more, since it was probably not entirely full at its peak).

When you're iterating the dictionary when it has all its values in it, finding the first one is pretty simple. The first one will be in one of the first few table entries. But as you empty out the table, more and more blank spaces will be at the start of the table and the search for the first value that still exists will take longer and longer.

This might be even worse given that you're using integer keys, which (mostly) hash to themselves (only -1 hashes to something else). This means that the first key in the "full" dictionary will usually be 0, the next 1, and so on. As you delete the values in increasing order, you'll be very precisely removing the earliest keys in the table first, and so making the searches maximally worse.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=8066&siteId=1