Comparison and copy of Python objects

I have come into contact with many examples of Python object comparison and copying, such as the following if statement to determine whether a and b are equal:


if a == b:
    ...

Take the second example again, where l2 is a copy of l1.


l1 = [1, 2, 3]
l2 = list(l1)

But you may not know what happened behind these statements. such as,

Is l2 a shallow copy or a deep copy of l1?
Does a == b compare the values of two objects for equality, or does the two objects are exactly equal?

‘==’ VS ‘is’

Equal to ( $= =$ ) And is are two commonly used methods of objects in Python. Simply put, the'==' operator compares whether the values between the objects are equal. For example, the following example indicates whether the values pointed to by the comparison variables a and b are equal.


a == b

The'is' operator compares whether the identity of the object is equal, that is, whether they are the same object and whether they point to the same memory address.

In Python, the identity of each object can be obtained through the function id(object). Therefore, the'is' operator is equivalent to comparing whether the IDs between objects are equal. Let's look at the following example:


a = 10
b = 10

a == b
True

id(a)
4427562448

id(b)
4427562448

a is b
True

Here, first, Python will open up a piece of memory for the value of 10, and then variables a and b point to this memory area at the same time, that is, both a and b point to the variable 10, so the values of a and b are equal, and the id is also equal, a = = b and a is b both return True.

However, it should be noted that for integer numbers, the above conclusion that a is b is True is only applicable to numbers in the range -5 to 256. For example, the following example:


a = 257
b = 257

a == b
True

id(a)
4473417552

id(b)
4473417584

a is b
False

Here we assign 257 to a and b at the same time, we can see that a == b still returns True, because the values pointed to by a and b are equal. But the strange thing is that a is b returns false, and we find that the IDs of a and b are different. Why?

In fact, for performance optimization considerations, Python internally maintains an array of integers from -5 to 256, which acts as a cache. In this way, every time you try to create an integer number in the range of -5 to 256, Python will return the corresponding reference from this array instead of creating a new memory space.

However, if the integer number exceeds this range, such as the 257 in the above example, Python will open up two memory areas for the two 257s, so the IDs of a and b are different, and a is b will return False.

Generally speaking, in actual work, when we compare variables, the number of times'==' is used much more than'is', because we generally care more about the values of two variables, rather than their internal storage addresses. However, when we compare a variable with a singleton, we usually use'is'. A typical example is to check whether a variable is None:


if a is None:
      ...

if a is not None:
      ...

Note here that the speed and efficiency of the comparison operator'is' is usually better than that of'=='. Because the'is' operator cannot be overloaded, in this way, Python does not need to find out whether the comparison operator is overloaded elsewhere in the program and call it. The execution of the comparison operator'is' is just to compare the IDs of two variables.

But' $= =$ 'The operator is different. Executing a == b is equivalent to executing a.eq(b), and most data types in Python will overload the __eq__ function, and its internal processing is usually more complicated. For example, for a list, the __eq__ function will traverse the elements in the list and compare their order and whether their values are equal.

However, for immutable variables, if we compare them with'==' or'is' before, will the result remain the same?

The answer is naturally no. Let's look at the following example:


t1 = (1, 2, [3, 4])
t2 = (1, 2, [3, 4])
t1 == t2
True

t1[-1].append(5)
t1 == t2
False

We know that tuples are immutable, but tuples can be nested, the elements in it can be list types, and lists are mutable, so if we modify a variable element in the tuple, the tuple itself Also changed, the result obtained with the'is' or'==' operator before may not be applicable.

Shallow copy and deep copy

Next, let's take a look at the shallow copy and deep copy in Python.

For these two familiar operations, I don't want to throw the concepts first and let you memorize them to distinguish them. Let's start with their operation methods and understand the difference between the two through code.

Let's look at the shallow copy first. The common shallow copy method is to use the constructor of the data type itself, such as the following two examples:


l1 = [1, 2, 3]
l2 = list(l1)

l2
[1, 2, 3]

l1 == l2
True

l1 is l2
False

s1 = set([1, 2, 3])
s2 = set(s1)

s2
{
    
    1, 2, 3}

s1 == s2
True

s1 is s2
False

Here, l2 is the shallow copy of l1, and s2 is the shallow copy of s1. Of course, for variable sequences, we can also use the slicing operator':' to complete a shallow copy, such as the following list example:


l1 = [1, 2, 3]
l2 = l1[:]

l1 == l2
True

l1 is l2
False

Of course, Python also provides a corresponding function copy.copy(), which is suitable for any data type:


import copy
l1 = [1, 2, 3]
l2 = copy.copy(l1)

However, it should be noted that for tuples, using tuple() or the slice operator':' will not create a shallow copy. Instead, it will return a reference to the same tuple:


t1 = (1, 2, 3)
t2 = tuple(t1)

t1 == t2
True

t1 is t2
True

Here, the tuple (1, 2, 3) is created only once, and t1 and t2 point to this tuple at the same time.

At this point, you should be very clear about shallow copies. Shallow copy refers to reallocating a piece of memory to create a new object, and the elements inside are references to child objects in the original object. Therefore, if the element in the original object is immutable, it doesn't matter; but if the element is mutable, shallow copy usually brings some side effects, especially you need to pay attention to. Let's look at the following example:


l1 = [[1, 2], (30, 40)]
l2 = list(l1)
l1.append(100)
l1[0].append(3)

l1
[[1, 2, 3], (30, 40), 100]

l2
[[1, 2, 3], (30, 40)]

l1[1] += (50, 60)
l1
[[1, 2, 3], (30, 40, 50, 60), 100]

l2
[[1, 2, 3], (30, 40)]

In this example, we first initialize a list l1, where the elements are a list and a tuple; then perform a shallow copy of l1 and assign l2. Because the element in the shallow copy is a reference to the element of the original object, the element in l2 and l1 point to the same list and tuple object.

Then look down. l1.append(100), which means adding element 100 to the list of l1. This operation will not have any effect on l2, because l2 and l1 as a whole are two different objects and do not share memory addresses. After the operation, l2 remains unchanged, and l1 will change:


[[1, 2, 3], (30, 40), 100]

Look again, l1[0].append(3), here means adding element 3 to the first list in l1. Because l2 is a shallow copy of l1, the first element in l2 and the first element in l1 both point to the same list, so the first list in l2 will also correspond to the new element 3. Both l1 and l2 will change after the operation:


l1: [[1, 2, 3], (30, 40), 100]
l2: [[1, 2, 3], (30, 40)]

The last is l1[1] += (50, 60), because the tuple is immutable, this means splicing the second tuple in l1, and then re-creating a new tuple as the second one in l1 Element, and no new tuple is referenced in l2, so l2 is not affected. After the operation, l2 remains unchanged, and l1 changes:


l1: [[1, 2, 3], (30, 40, 50, 60), 100]

Through this example, you can clearly see the possible side effects of using shallow copy. Therefore, if we want to avoid this side effect and copy an object completely, you have to use deep copy.

The so-called deep copy refers to reallocating a piece of memory, creating a new object, and recursively copying the elements in the original object to the new object by creating new sub-objects. Therefore, the new object has no relation to the original object.

Python uses copy.deepcopy() to implement deep copy of objects. For example, the above example is written in the following form, which is a deep copy:


import copy
l1 = [[1, 2], (30, 40)]
l2 = copy.deepcopy(l1)
l1.append(100)
l1[0].append(3)

l1
[[1, 2, 3], (30, 40), 100]

l2 
[[1, 2], (30, 40)]

We can see that no matter how l1 changes, l2 remains the same. Because at this time l1 and l2 are completely independent and have no connection.

However, deep copy is not perfect, and often brings a series of problems. If there is a reference to itself in the copied object, the program can easily fall into an infinite loop:


import copy
x = [1]
x.append(x)

x
[1, [...]]

y = copy.deepcopy(x)
y
[1, [...]]

In the above example, the list x has a reference to itself, so x is an infinitely nested list. But we found that after deep copying x to y, the program did not show a stack overflow. Why is this?

In fact, this is because the deep copy function deepcopy maintains a dictionary to record the copied objects and their IDs. During the copying process, if the object to be copied is already stored in the dictionary, it will be returned directly from the dictionary. We can understand by looking at the corresponding source code:


def deepcopy(x, memo=None, _nil=[]):
    """Deep copy operation on arbitrary Python objects.
      
  See the module's __doc__ string for more info.
  """
  
    if memo is None:
        memo = {
    
    }
    d = id(x) # 查询被拷贝对象x的id
  y = memo.get(d, _nil) # 查询字典里是否已经存储了该对象
  if y is not _nil:
      return y # 如果字典里已经存储了将要拷贝的对象，则直接返回
        ...