Summary of common Python interview knowledge (1): iterators, copies, threads and underlying structures

前言: Hello everyone, my name is Dream. Today I will summarize the common interview knowledge in Python and C language. Everyone is welcome to come and discuss and learn together~

[1] What is the concept of iterator in Python?

Iterable objects are the basis for iterators, generators, and decorators. To put it simply, the object that can be traversed using for loop is an iterable object. Such as common list, set and dict.

Let's look at an example:

from collections import Iterable
print(isinstance('abcddddd', Iterable))     # str是否可迭代

print(isinstance([1,2,3,4,5,6], Iterable))   # list是否可迭代

print(isinstance(12345678, Iterable))       # 整数是否可迭代

-------------结果如下----------------
True
True
False

When the dir() method is called on all iterable objects, you will find that they all implement the iter method. This way you can return an iterator via iter(object).

x = [1, 2, 3]
y = iter(x)
print(type(x))

print(type(y))

------------结果如下------------
<class 'list'>
<class 'list_iterator'>

You can see that after calling iter(), it becomes a list_iterator object. You can find that a __next__ method has been added. All objects that implement the __iter__ and __next__ methods are iterators.

The iterator is a stateful object. It records the current iteration position to facilitate obtaining the correct element during the next iteration. __iter__ returns the iterator itself, __next__ returns the next value in the container, and if there are no more elements in the container, a Stoplteration exception is thrown.

x = [1, 2, 3]
y = iter(x)
print(next(y))
print(next(y))
print(next(y))
print(next(y))

----------结果如下----------
1
2
3
Traceback (most recent call last):
  File "/Users/Desktop/test.py", line 6, in <module>
    print(next(y))
StopIteration

How to determine whether an object is an iterator is similar to the method of determining whether it is an iterable object. Just replace Iterable with Iterator.

Python's for loop is essentially implemented by continuously calling the next() function. For example, the following code first converts the iterable object into an Iterator and then iterates. This saves memory because the iterator will not actually calculate the next value until we call next() .

x = [1, 2, 3]
for elem in x:
    ...

The itertools library provides the use of many common iterators.

>>> from itertools import count     # 计数器
>>> counter = count(start=13)
>>> next(counter)
13
>>> next(counter)
14

[2] Related knowledge of generators in Python

When we create the list, due to memory constraints, the capacity is definitely limited, and it is impossible to enumerate them all at once. A fatal shortcoming of Python's commonly used list generation is that it is generated upon definition, which is a huge waste of space and efficiency.

If the list elements can be calculated according to a certain algorithm, then we can continuously calculate subsequent elements during the loop, so that we do not have to create a complete list, thus saving a lot of space. In Python, this mechanism of looping and calculating at the same time is called a generator: generator.

To create a generator, the simplest way is to transform the list generation:

a = [x * x for x in range(10)]
print(a)
b = (x * x for x in range(10))
print(b)

--------结果如下--------------
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
<generator object <genexpr> at 0x10557da50>

Another method is to define the generator function through def, and then use yield to support the iterator protocol, which is simpler to write than iterators.

def spam():
    yield"first"
    yield"second"
    yield"third"

for x in spam():
    print(x)

-------结果如下---------
first
second
third

When a function call is made, a generator object is returned. When calling next(), it will return when yield is encountered and record the function call position at this time. The next time next() is called, it will start from the breakpoint.

We can use generators exactly like iterators, except for the definition of course. To define an iterator, you need to implement the iter() method and next() method respectively, but the generator only needs a small yield.

The generator also has send() and close() methods, which can only be used when the generator is in a suspended state after next() is called.

Python supports coroutines, that is, micro-threads, which are implemented through generators. With the generator, we can customize the calling hierarchy of functions and schedule threads ourselves.

[3] Related knowledge of decorators in Python

The decorator allowsto add some extra functionality to an existing function by passing it to the decorator. Will perform the functionality of existing functions and add additional functionality.

A decorator is essentially a function, which can add functionality to an existing function without making any changes.

Next we use some examples to illustrate the role of decorators:

If we don't use a decorator, we usually insert the log before the function is executed like this:

def foo():
    print('i am foo')

def foo():
    print('foo is running')
    print('i am foo')

Although writing this way meets the needs, it changes the original code. If there are other functions that also need to insert logs, all functions need to be rewritten, so the code cannot be reused.

We can rewrite it as follows:

import logging

def use_log(func):
    logging.warning("%s is running" % func.__name__)
    func()

def bar():
    print('i am bar')

use_log(bar)    #将函数作为参数传入

-------------运行结果如下--------------
WARNING:root:bar is running
i am bar

Writing this way can indeed reuse the inserted log. The disadvantage is that it explicitly encapsulates the original function. We hope to do this implicitly.

We can use decorators to write:

import logging

def use_log(func):
    def wrapper(*args, **kwargs):
        logging.warning('%s is running' % func.__name__)
        return func(*args, **kwargs)

    return wrapper


def bar():
    print('I am bar')


bar = use_log(bar)
bar()

------------结果如下------------
WARNING:root:bar is running
I am bar

Among them, the use_log function is a decorator. It encapsulates the function bar() we really want to execute, and returns a new function that encapsulates the added code. It looks like bar() has been decorated.

But writing this way is not implicit enough. We can use @ syntax sugar to play the role of bar = use_log(bar).

import logging

def use_log(func):
    def wrapper(*args, **kwargs):
        logging.warning('%s is running' % func.__name__)
        return func(*args, **kwargs)

    return wrapper


@use_log
def bar():
    print('I am bar')


@use_log
def haha():
    print('I am haha')


bar()
haha()

------------结果如下------------
WARNING:root:bar is running
I am bar
WARNING:root:haha is running
I am haha

This looks very concise and the code is easy to reuse. It can be regarded as a kind of intelligent advanced packaging.

[4] Python’s deep copy and shallow copy?

In Python, using one variable to assign a value to another variable is actually just adding a "label" to the object currently in memory.

>>> a = [6, 6, 6, 6]
>>> b = a
>>> print(id(a), id(b), sep = '\n')
66668888
66668888

>>> a is b
True(可以看出,其实a和b指向内存中同一个对象。)

Shallow copy refers to creating a new object, whose content is a reference to the elements in the original object (the new object shares sub-objects in memory with the original object).

Note: The difference between shallow copy and deep copy is only for composite objects. The so-called composite objects are objects that contain other objects, such as lists, class instances, etc. For numbers, strings, and other "atomic" types, there is no copy, and all generated are references to the original objects.

Common shallow copies include: slicing operations, factory functions, copy() methods of objects, and the copy function in the copy module.

>>> a = [6, 8, 9]
>>> b = list(a)
>>> print(id(a), id(b))
4493469248 4493592128    #a和b的地址不同

>>> for x, y in zip(a, b):
...     print(id(x), id(y))
... 
4489786672 4489786672
4489786736 4489786736
4489786768 4489786768
# 但是他们的子对象地址相同

As can be seen from the above example, a shallow copy obtains b. a and b point to different list objects in memory, but their elements point to the same int object. This is a shallow copy.

Deep copy refers to creating a new object and then recursively copying the sub-objects contained in the original object. The deep-copied object has no relationship with the original object.

There is only one way to deep copy: the deepcopy function in the copy module.

We next use a list of mutable objects to demonstrate exactly the difference between a shallow copy and a deep copy:

>>> a = [[6, 6], [8, 8], [9, 9]]
>>> b = copy.copy(a)   # 浅拷贝
>>> c = copy.deepcopy(a) # 深拷贝
>>> print(id(a), id(b)) # a和b地址不同
4493780304 4494523680
>>> for x, y in zip(a, b):   # a和b的子对象地址相同
...     print(id(x), id(y))
... 
4493592128 4493592128
4494528592 4494528592
4493779024 4493779024
>>> print(id(a), id(c))   # a和c不同
4493780304 4493469248
>>> for x, y in zip(a, c): # a和c的子对象地址也不同
...     print(id(x), id(y))
... 
4493592128 4493687696
4494528592 4493686336
4493779024 4493684896

[5] Is Python an interpreted language or a compiled language?

Python is an interpreted language.

The advantage of interpreted language is good portability, but the disadvantage is that it requires an interpreted environment to run. It runs slower than compiled language, takes up more resources, and has low code efficiency. .

The advantages of compiled language are fast running speed, high code efficiency, compiled programs cannot be modified, and good confidentiality. The disadvantage is that the code needs to be compiled to run, is less portable, and can only run on compatible operating systems.

[6] Python’s garbage collection mechanism

In Python, usereference counting for garbage collection; at the same time, use mark-sweep algorithm< /span>. generational collection algorithmSolve the problem of circular references that may occur in container objects; finally, improve garbage collection efficiency through

[7] Are there multiple threads in Python?

Multi-threading in Python is fake multi-threading.

Due to the GIL global lock in the design of the Python interpreter, multi-threads cannot utilize multiple cores, and only one thread runs in the interpreter.

For I/O-intensive tasks, Python's multi-threading can play a role, but for CPU-intensive tasks, Python's multi-threading has almost no advantage and may slow down due to competition for resources.

For all I/O-oriented programs (that call built-in operating system C code), the GIL is released before the I/O call, allowing other threads to run while the thread is waiting for I/O. .

If it is a purely computational program with no I/O operations, the interpreter will release the lock every 100 operations to allow other threads to have the opportunity to execute (this number can be adjusted through sys.setcheckinterval). If a thread has not With many I/O operations, it will tie up the processor and GIL for its own time slice.

Methods to alleviate GIL locks: multi-process and coroutine (coroutine is only a single CPU, but it can reduce switching costs and improve performance)

[8] What is the difference between range and xrange in Python?

First of all, the xrange function and the range function are used exactly the same. The difference is that the xrange function generates not a list object, but a generator.

When you want to generate a large number sequence, using xrange will have much better performance than range, because it does not require a large memory space to be opened up.

Python 2.7.15 | packaged by conda-forge | (default, Jul  2 2019, 00:42:22) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xrange(10)
xrange(10)
>>> list(xrange(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The xrange function and the range function are generally used when looping. Specific examples are as follows:

>>> for i in range(0,7):
...     print(i)
... 
0
1
2
3
4
5
6

>>> for i in xrange(0,7):
...     print(i)
... 
0
1
2
3
4
5
6

In Python3, the xrange function has been removed, leaving only the implementation of the range function, but at this time the functionality of the range function combines xrange and range. And the type of the range function has also changed. In Python2, it is a list type, but in Python3, it is an object of the range sequence.

[9] What is the difference between lists and tuples in Python?

  1. Lists are mutable and can be modified at will after creation.

  2. Tuples are immutable. Once a tuple is created, it cannot be changed. The tuple can be treated as a read-only version of the list.

  3. Tuples cannot be copied.

  4. Python allocates larger chunks to tuples with low overhead because they are immutable. For lists, small memory blocks are allocated. Tuples are smaller in memory compared to lists. Tuples are faster than lists when you have a large number of elements.

【10】What is the underlying structure of dict (dictionary) in Python?

Python's dict (dictionary) uses a hash table as the underlying structure to support fast search. The average search time complexity of a hash table is O(1). The CPython interpreter uses quadratic probing to resolve hash collisions.

[11] What are the commonly used deep learning frameworks and which companies developed them?

  1. PyTorch:Facebook

  2. TensorFlow:Google

  3. Hard:Google

  4. MxNet: Dmlc Community

  5. Caffe:UC Berkeley

  6. PaddlePaddle: Baidu

[12] What is the difference between PyTorch dynamic graphs and TensorFlow static graphs?

PyTorch dynamic graph: The calculation and construction of the calculation graph are performed simultaneously; it is more flexible and easy to adjust.

TensorFlow static graph: The calculation graph is built first and then the calculation is performed; it is more efficient and inflexible.

Recommendation at the end of the article

Insert image description here
Content introduction:
"Machine Learning Platform Architecture in Action" elaborates on the basic solutions related to machine learning platform architecture, mainly including machine learning and machine learning solution architecture, machine Business use cases for learning, machine learning algorithms, data management for machine learning, open source machine learning libraries, Kubernetes container orchestration infrastructure management, open source machine learning platforms, building data science environments using AWS machine learning services, building enterprise machines using AWS machine learning services Learning architecture, advanced machine learning engineering, machine learning governance, bias, explainability, and privacy, building machine learning solutions using artificial intelligence services and machine learning platforms, and more. In addition, this book also provides corresponding examples and codes to help readers further understand the implementation process of related solutions.
Dangdang: https://product.dangdang.com/29625469.html
JD: https://item.jd.com/13855627.html

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_51390582/article/details/134935930