python - assignment and deep and shallow copy

Reprinted by: Eva_J

Friends who are beginners in programming will have some doubts about the usage of deep and shallow copying. Today we will talk about assignment and deep and shallow copying from the perspective of memory in combination with the characteristics of python variable storage~~~

Preliminary knowledge one - python variables and their storage

Before learning about assignment, copy and deepcopy in python in detail, we still need to spend a little time to understand the storage of variables in python memory.

  In high-level languages, a variable is an abstraction of memory and its address. For python, all variables in python are objects, and the storage of variables adopts the way of reference semantics. What is stored is only the memory address where the value of a variable is located, not the variable itself.

Reference semantics: In python, variables hold references to objects (values), which we call reference semantics. In this way, the storage space required by the variable is the same size, because the variable only holds a reference. Also known as object semantics and pointer semantics.
Value semantics: Some languages ​​do not use this method. They store the value of the variable directly in the storage area of ​​the variable. This method is called value semantics. For example, in C language, using this storage method, each variable is stored in The space occupied in the memory depends on the actual size of the variable and cannot be fixed.
The difference between value semantics and reference semantics:
Value Semantics: Dead, Silly, Simple, Concrete, Reproducible
Quote Semantics: Living, Smart, Complex, Abstract, Unreproducible

Detailed explanation of reference semantics and value semantics
Detailed explanation of reference semantics and value semantics

Let's look at a simple and easy-to-understand diagram to understand the storage of Python's reference semantics and C language value semantics in memory. The left and right diagrams show the difference between variable storage in python and variable storage in C language:

Preliminary knowledge 2 - address storage and change of each basic data type

Data types in python include: bool, int, long, float, str, set, list, tuple, dict, and more. We can roughly categorize these data types into simple data types and complex data structures.

My standard of division is that if a data type can have other data types as its own elements, I consider it a data structure. There are many classifications of data structures, but only three structures are commonly used in Python: set, sequence, and map. Corresponds to set, list (tuple, str), dict in python; commonly used data types include int, long, float, bool, str and other types. (Among them, the str type is special, because from the perspective of the C language, str is actually a collection of chars, but this has little relevance to this article, so we will not talk about this issue for the time being)
Data structure and its division basis

Since the variables in python use reference semantics, the data structure can contain basic data types, resulting in the storage of data in python as shown in the figure below. Each variable stores the address of the variable instead of the value. itself; for complex data structures, only the address of each element is stored in it. :

1. The impact of data type reinitialization on Python semantic references

Every time a variable is initialized, a new space is opened, and the address of the new content is assigned to the variable. For the following figure, we repeatedly assign values ​​to str1, but the changes in memory are as follows:

From the above figure, we can see that str1 is in the repeated initialization process because the address of the element stored in str1 is changed from the address of 'hello world' to 'new hello world'.

2. The impact of changes in internal elements of the data structure on Python semantic references

For complex data types, the effect of changing the internal value on the variable:

When performing some additions, deletions and modifications to the elements in the list, it will not affect the address of the lst1 list itself to the entire list, only the address reference of its internal elements will be changed. However, when we re-initialize (assign) a list, we re-assign an address to the lst1 variable, overwriting the address of the original list. At this time, the memory id of the lst1 list changes. The same principle applies to all complex data types.

variable assignment

After understanding the above content, and then discussing the assignment of variables, it becomes very simple.

1. assignment of str

 

We have just known that the re-initialization (assignment) of str1 will cause the memory address to change. From the results in the above figure, we can see that after modifying str1, the assigned str2 is not affected from the memory address to the value.

  Looking at the changes in memory, the initial assignment operation allows both str1 and str2 variables to store the address of 'hello world', and re-initializes str1, so that the address stored in str1 has changed, pointing to the new value, at this time The memory address stored by the str2 variable has not changed, so it is not affected.

2. Assignment in complex data structures

Just now we looked at the assignment of simple data types, and now we look at the impact of changes in complex data structures on memory .

The above figure adds and modifies the list without changing the memory address of the list. Both lst1 and lst2 have changed.

Comparing the memory map, it is not difficult to see that when a new value is added to the list, the address of a new element is stored in the list, but the address of the list itself does not change, so the ids of lst1 and lst2 have not changed and have been added. a new element. In a simple analogy, when we go out to eat, lst1 and lst2 are like two people eating at the same table. Two people share a table. As long as the table remains unchanged, the dishes on the table have changed. The two people feel the same.

first acquaintance copy

We have seen the process of variable assignment in detail. For complex data structures, assignment is equivalent to completely sharing resources, and changes to one value are completely shared by another value.

However, sometimes, we need to keep a copy of the original content of a piece of data, and then process the data. At this time, it is not wise to use assignment. Python provides the copy module for this need. Two main copy methods are provided, one is normal copy and the other is deepcopy. We call the former a shallow copy and the latter a deep copy. Deep and shallow copying has always been an important knowledge point for all programming languages. Let's analyze the difference between the two from the perspective of memory.

shallow copy

First, let's look at shallow copying. Shallow copy: No matter how complex the data structure, a shallow copy will only copy one layer. Let's take a look at a picture to understand the concept of shallow copy.

 

看上面两张图,我们加入左图表示的是一个列表sourcelist,sourcelist = ['str1','str2','str3','str4','str5',['str1','str2','str3','str4','str5']];

右图在原有的基础上多出了一个浅拷贝的copylist,copylist = ['str1','str2','str3','str4','str5',['str1','str2','str3','str4','str5']];

sourcelist和copylist表面上看起来一模一样,但是实际上在内存中已经生成了一个新列表,copy了sourceLst,获得了一个新列表,存储了5个字符串和一个列表所在内存的地址。 我们看下面分别对两个列表进行的操作,红色的框框里面是变量初始化,初始化了上面的两个列表;我们可以分别对这两个列表进行操作,例如插入一个值,我们会发现什么呢?如下所示:

 

从上面的代码我们可以看出,对于sourceLst和copyLst列表添加一个元素,这两个列表好像是独立的一样都分别发生了变化,但是当我修改lst的时候,这两个列表都发生了变化,这是为什么呢?我们就来看一张内存中的变化图:

 

我们可以知道sourceLst和copyLst列表中都存储了一坨地址,当我们修改了sourceLst1的元素时,相当于用'sourceChange'的地址替换了原来'str1'的地址,所以sourceLst的第一个元素发生了变化。而copyLst还是存储了str1的地址,所以copyLst不会发生改变。

当sourceLst列表发生变化,copyLst中存储的lst内存地址没有改变,所以当lst发生改变的时候,sourceLst和copyLst两个列表就都发生了改变。

这种情况发生在字典套字典、列表套字典、字典套列表,列表套列表,以及各种复杂数据结构的嵌套中,所以当我们的数据类型很复杂的时候,用copy去进行浅拷贝就要非常小心。。。

深拷贝

刚刚我们了解了浅拷贝的意义,但是在写程序的时候,我们就是希望复杂的数据结构之间完全copy一份并且它们之间又没有一毛钱关系,应该怎么办呢?

  我们引入一个深拷贝的概念,深拷贝——即python的copy模块提供的另一个deepcopy方法。深拷贝会完全复制原变量相关的所有数据,在内存中生成一套完全一样的内容,在这个过程中我们对这两个变量中的一个进行任意修改都不会影响其他变量。下面我们就来试验一下。

看上面的执行结果,这一次我们不管是对直接对列表进行操作还是对列表内嵌套的其他数据结构操作,都不会产生拷贝的列表受影响的情况。我们再来看看这些变量在内存中的状况:

 

看了上面的内容,我们就知道了深拷贝的原理。其实深拷贝就是在内存中重新开辟一块空间,不管数据结构多么复杂,只要遇到可能发生改变的数据类型,就重新开辟一块内存空间把内容复制下来,直到最后一层,不再有复杂的数据类型,就保持其原引用。这样,不管数据结构多么的复杂,数据之间的修改都不会相互影响。这就是深拷贝~~~

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325007645&siteId=291194637