05-Advanced Reading: Quotation, Copy and View

The annual Apple mobile phone launch conference will definitely firmly occupy the hot search on the Weibo list. Although the "slots" continue, but you still have to buy what you should buy, and you are not reluctant to start. The new model iPhone Xs Maxfinally broke through the sky smoothly, reaching 12799 yuan, which is worthy of being a magical machine. We might as well imagine that this year's year-end bonus issued 100,000 bills, decided to reward ourselves and the object, each bought the same Apple mobile phone, the configuration, color, price, and even the power-on password are all set to the same, absolute couples. So the question is, are these two identical phones?

If my mobile phone is defined as a variable M, then the target mobile phone is defined as N, then how to judge whether the two are the same?

M == N

If the above-mentioned judgment is equal, then the result is definitely Truea. Because this method is based on the content of variables, since the configuration, color, function, etc. of the two phones are the same, it must be True. Of course, there are no two hardware configurations that are exactly the same in the world.

M is N

If a isjudge it? This answer seems a bit mysterious, indeed, the answer should be False. The judgment here is based on identity, that is, is my phone yours? Obviously not.

The above example a good explanation copy copyof this notion that my cell phone and cell phone each other object copy, but we are not the same!

This article will start with the quotation and copying of Python, and elaborate on the copying mechanism of Numpy in depth to help you understand the relationship between array variables and variables during data analysis and data processing.

1. Python

  • Python references

Object reference is the process of assignment. Let's give a chestnut:

a = ["a", "b", "c"]
b = a
a is b
Out:
    True

In the above chestnuts where we aagain assigned to b, so the Pythoninternals of view, this assignment process, in fact, just another new layer of references (references here, similar to the concept of pointers), from the establishment bto the real listData reference. The schematic diagram of this process is as follows:

image description

If you say that this picture is not intuitive, I will show you this picture:

image-20200815180904194

So I use it in isthe comparison aand b, when it is more common points to the same piece of memory, identity is a natural judge True. Obviously here we amake changes, bwill follow the changes, and vice versa. This is the concept of reference.

Analyzing two variables have identity, can also use id()the function, the method may obtain the memory address of the object. For example, for the above two variables, we can do the following:

# 在具体的环境中进行内存地址查询时,地址编码一般情况下会不一样
# 但是深层次的规律是一样的,即a和b的内存地址相同
id(a)
Out: 1430507484680
id(b)
Out: 1430507484680
  • Deep copy and shallow copy of Python

Before that, let me make a conclusion that the memory in numbers and strings point to the same address, so deep copy and shallow copy are meaningless to them. In other words, we study deep copy and shallow copy, both for variable objects. The most common case is a list.

  1. Deep copy

The so-called deep copy is to create an identical copied object based on the variable object being copied. In addition to the two look exactly the same, they are independent of each other and will not affect each other.

Let's take the list as an example, for a single-level conventional list chestnut:

# 对于单层常规列表,经过深拷贝操作后,拷贝后得到的对象的任何操作均无法改变原始对象
# Python的原生拷贝操作需要导入copy包,其中的deepcopy()函数表示深拷贝
import copy
m = ["Jack", "Tom", "Brown"]
n = copy.deepcopy(m)

# 判断二者是否相等,结果显而易见
m == n
Out: True

# 判断二者是否具有同一性,答案为否。也就是说,列表m和列表n是存储在不同的地址里。
m is n
Out: False

# 改变m首位的元素,发现n并无变化,说明二者互不影响
m[0] = "Helen"
m
Out: ['Helen', 'Tom', 'Brown']
n
Out: ['Jack', 'Tom', 'Brown']

The above case illustrates, we are of variable object musing a deep copy when it is complete copy mof the data structure, and assigned to the n. This is also in line with our intuitive understanding.

  1. Shallow copy

In fact, since Pythonthe copythere are deeper than the other, it is clear that there must be a shallow copy is not the same place. If it is only for a single-level regular list composed of immutable objects, there is no difference between shallow copy and deep copy. Here we simply do a test:

# 用copy库中的copy方法来表示浅拷贝
import copy
m = ["Jack", "Tom", "Brown"]
n = copy.copy(m)

# 判断浅拷贝前后是否具有同一性,答案是不具备同一性
m is n
Out: False

# 更改m的值,发现n并无任何变化。这里的规律保持和深拷贝一致
m[0] = "Black"
n
Out: ['Jack', 'Tom', 'Brown']

What if it is a nested list? There will be some differences. We create a 2-level list, the memory uses a list of length 3 to represent the student’s name, height and weight; the first of the outer list represents the class:

# students列表的长度为3,其中首位为字符串,其他位均为列表
students = ["Class 1", ["Jack", 178, 120], ["Tom", 174, 109]]
students_c = copy.copy(students)

# 查看内嵌的列表是否具备同一性
students[1] is students_c[1]
Out: True

# 尝试更改students中某位学生的信息,通过测试更改后的students和students_c
students[1][1] = 180
students
Out: ['Class 1', ['Jack', 180, 120], ['Tom', 174, 109]]
students_c
Out: ['Class 1', ['Jack', 180, 120], ['Tom', 174, 109]]
## 发现:students_c也跟着改变了,这说明对于嵌套列表里的可变元素(深层次的数据结构),浅拷贝并没有进行拷贝,只是对其进行了引用

# 我们接着尝试更改students中的班级信息
students[0] = "Class 2"
students
Out: ['Class 2', ['Jack', 180, 120], ['Tom', 174, 109]]
students_c
Out: ['Class 1', ['Jack', 180, 120], ['Tom', 174, 109]]
## 发现:students_c没有发生变化,这说明对于嵌套列表里的不可变元素,浅拷贝和深拷贝效果一样

Through the above research, we can get the following conclusions:

1) A list composed of immutable objects, shallow copy and deep copy have the same effect, and they are independent of each other before and after copying, and do not affect each other;

2) When the list contains variable elements, the shallow copy only creates a reference (pointer) from the element to the new list. When the element changes, the copied object will also change;

3) Deep copy does not consider saving memory at all, shallow copy saves memory relatively, shallow copy only copies the first level elements;

  1. Slice and shallow copy

Generally speaking, we copy the list, and slicing is a broad and convenient operation. If we change the structure of the list obtained after slicing, will it cause the source list to change?

Let's first conclude: slicing is actually a shallow copy of some elements of the source list!

# 我们沿用上面的students列表的数据,通过对students进行切片等一系列微操作
students = ["Class 1", ["Jack", 178, 120], ["Tom", 174, 109]]
students_silce = students[:2]

# 对students的前2项进行切片,并赋值给students_silce;
# 修改students_silce的第二项,修改其中身高值,并比较源列表和切片结果的变化
students_silce[-1][1] = 185
students_silce
Out: ['Class 1', ['Jack', 185, 120]]
students
Out: ['Class 1', ['Jack', 185, 120], ['Tom', 174, 109]]
## 比较发现,切片结果的变化值,也传递给了源列表。说明可变元素的数据结构只是被引用,没有被复制。

# 修改students_silce的第一项,修改班级名,并比较源列表和切片结果的变化
students_silce[0] = "Class 3"
students_silce
Out: ['Class 3', ['Jack', 185, 120]]
students
Out: ['Class 1', ['Jack', 185, 120], ['Tom', 174, 109]]
# 比较发现,切片结果的变化值,没有传递给了源列表。说明对于不可变元素,切片前后互相独立。

## 综合比较,可以发现,切片的效果其实就是浅拷贝!

The shallow copy and deep copy of Python are the basis for understanding the principles of Python. A deep understanding of the difference between the two is very helpful for subsequent advancement and understanding of multidimensional arrays.

2. Numpy

As we mentioned earlier, Numpy has optimized memory in order to adapt to the characteristics of big data. The so-called optimization is to reduce the memory overhead during the slicing process on the premise of saving memory.

For Numpy, we mainly distinguish two concepts, namely, view and copy. (Note that because multidimensional arrays can be regarded as nested lists, the concept of shallow copy of nested lists is also applicable here. In fact, the effect of multidimensional array views can be understood as shallow copies of nested lists. Copy is the same as The concept of deep copy is basically the same)

The view viewis a reference data, by this reference, you can easily access, operate the original data, but will not copy the original data. If we modify the view, it will affect the original data because their physical memory is in the same location.

A copy is a complete copy of the data ( Pythonthe concept of medium-deep copy). If we modify the copy, it will not affect the original data, and their physical memory is not in the same location.

  • view view

Create a view, we can in two ways: Numpythe slicing operation and call view()functions.

We look at Lee calls the view()case a function to create a view:

# 视图是新建了一个引用,但是更改视图的维数,并不会引起原始数组的变化
import numpy as np
arr_0 = np.arange(12).reshape(3,4)
view_0 = arr_0.view()
view_0
Out: array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])

# 从id看,二者并不具备同一性。
id(arr_0) is view_0
Out: False
# 更改视图的元素,则原始数据会产生联动效果
view_0[1,1] = 100
view_01
Out: array([[  0,   1,   2,   3],
           [  4, 100,   6,   7],
           [  8,   9,  10,  11]])

# 更改视图的维度:
# 视图的纬度更改并不会传递到原始数组
view_0.shape = (4,3)
print("arr_0 shape:", arr_0.shape, "view_0 shape:", view_0)
Out: arr_0 shape: (3, 4) view_0 shape: (4, 3)

Friends who use slices to create views are familiar, let's take a look at the effect of testing on a one-dimensional array:

# 对一维数组切片,并对切片后的结果进行更改,查看是否对原始数组产生影响
arr_1 = np.arange(12)
slice_1 = arr_1[:6]
slice_1[3] = 99
slice_1
Out: array([ 0,  1,  2, 99,  4,  5])
# arr_1的第四个元素发生变成了99。在数组是1维的时候,规律和列表有些不一样,这里要特别注意。
arr_1
Out: array([ 0,  1,  2, 99,  4,  5,  6,  7,  8,  9, 10, 11])
  • Copy

A copy is also a deep copy. Relatively speaking, the processing of memory is rough and easy to understand. Before and after the copy is created, the two variables are completely independent.

# Numpy建立副本的方法稍有不同。
# 方法一,是利用Numy自带的copy函数;
# 方法二,是利用deepcopy()函数。这里我们重点讲解方法一:
arr_2 = np.array([[1,2,3], [4,5,6]])
copy_2 = arr_2.copy()
copy_2[1,1] = 500
copy_2
Out: array([[  1,   2,   3],
           [  4, 500,   6]])
arr_2
Out: array([[1, 2, 3],
           [4, 5, 6]])
# 比较发现,建立副本后,二者互不影响。符合上面的结论。

3. Summary

The knowledge points explained in this chapter are somewhat theoretical, but it is a step that friends must take in the process from entry to advanced. If you can’t master it for the first time, you can bring in doubts and deepen your understanding while practicing in the follow-up practice.

In fact, to sum up, it is mainly to distinguish the three concepts of direct assignment, shallow copy and deep copy. The difficulty is to understand the concept of shallow copy.

The shallow copy is in Pythonthe native list, and it is necessary to distinguish whether it is a nested list. If it is a nested list, then the underlying list and the copied result will change as one party changes. If it is a list of immutable elements, then there is no difference between shallow copy and deep copy.

Shallow copy Numpyis simpler in. No matter the dimension of the array, when you modify the view or slice result at the element level, the effect of the operation will be reflected in the original array.

At this point, Numpythe main content is about to come to an end. The next chapter will lead you through Pandasthe process from entry to master.

Guess you like

Origin blog.csdn.net/qq_33254766/article/details/108362831