Python Basics Interview Part 3

1. Get all file names in the current directory

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

import os

def get_all_files(directory):

    file_list = []<br>    # <code>os.walk</code>返回一个生成器,每次迭代时返回当前目录路径、子目录列表和文件列表

    for root, dirs, files in os.walk(directory):

        for file in files:

            file_list.append(os.path.join(root, file))

    return file_list

# 获取当前目录下的所有文件名

current_directory = os.getcwd()

files = get_all_files(current_directory)

# 打印所有文件名

for file in files:

    print(file)

2. The difference between generators and iterators in Python

An Iterator is an object that implements the iteration protocol. It must provide a __iter__()method and a __next__()method. By calling __iter__()a method, the iterator can return itself, and by calling a __next__()method, the iterator can return the next element in sequence, until StopIterationan exception is thrown when there are no more elements. Iterators are a lazy calculation method that only generates one element when needed, thereby saving memory space. iter()Iterable objects can be converted into iterators using functions.

Generator is a special type of iterator that uses a more concise syntax to define iterators. The generator can yieldbe implemented through keywords in the function. When the function executes yielda statement, it will pause execution and return a value, save the current state, and continue execution from the last paused position the next time it is called. Generator functions can receive parameters like ordinary functions and can contain logic such as loops and conditional statements. Generators are a very convenient and efficient way to implement iterators.

Here is a summary of the differences between generators and iterators:

  1. Syntax: Generators are yielddefined using keywords, while iterators require implementations __iter__()and __next__()methods.
  2. Implementation: Generators can be defined using functions, while iterators can be implemented by classes.
  3. State saving: The generator yieldpauses execution at the statement and saves the current state, and resumes execution from the last paused position the next time it is called; the iterator saves the iteration position through the internal state and pointer.
  4. Simplicity: The syntax of the generator is more concise, and ordinary function definitions and control flow statements can be used; the iterator needs to implement multiple special methods, and the code is relatively large.
  5. Lazy computation: Generators are lazily computed and only generate one element when needed each time; iterators can also implement lazy computation, but require manual control.

In short, a generator is a special kind of iterator that provides a more concise and convenient syntax. Generators can yieldimplement iterative processes through statements in functions, and logic can be written like ordinary functions. Iterators are a more general concept that can be implemented through classes, requiring explicit definitions __iter__()and __next__()methods. Whether they are generators or iterators, they can achieve the ability to generate and process large amounts of data on demand, improving the efficiency and readability of the code.

 A little chestnut:

When we need to iterate over a large data set, generators can help us generate data on demand instead of loading the entire data set into memory at once.

Here is a simple example where we use a generator to generate the first n elements of the Fibonacci sequence on demand:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

def fibonacci_generator(n):

    a, b = 01

    count = 0

    while count < n:

        yield a

        a, b = b, a + b

        count += 1

# 使用生成器按需生成斐波那契数列的前10个元素

fibonacci = fibonacci_generator(10)

# 逐个打印生成的元素

for num in fibonacci:

    print(num)

    In the above code, we define a generator function fibonacci_generatorthat uses yieldstatements to generate the elements of the Fibonacci sequence. Each time the generator's __next__()method is called, it will execute to yieldthe statement,

Returns the current Fibonacci number and pauses execution, saving the current state. Then, the next time the generator's __next__()method is called, it will continue execution from where it last paused, generating the next Fibonacci number. This way, we can iterate over the generator

to get the elements of the Fibonacci sequence on demand. When we iterate over the generator object, it generates the elements of the Fibonacci sequence and prints them out. Since the generator generates data on demand, it only generates an element when needed rather than generating the entire element at once.

sequence of numbers. This saves memory space, especially when the Fibonacci sequence is large. To sum up, a generator can be seen as a special function that can generate data on demand, save memory space, and provide a concise and convenient way to

Implement an iterator. By using generators, we can avoid loading large amounts of data into memory at once and instead generate data one by one when needed, thereby improving the efficiency and scalability of the code.

3. What is an iterable object and what is its principle?

  Iterable objects (Iterable) refer to objects that can be iteratively traversed. In many programming languages, iteration refers to the process of accessing the elements of a collection one by one in a certain order. In Python, an iterable object refers to an object that implements the Iterator Protocol.

The iterator protocol contains two methods:

  1. __iter__() method: This method returns an iterator object. Iterator objects are used to implement specific iteration logic and must contain the __next__() method.

  2. __next__() method: This method returns the next element in the iterator. If there are no elements to return, it should raise a StopIteration exception.

When we iterate using an iterable object, it is actually done through an iterator object. The iterator object is responsible for tracking the current iteration state and providing the next element. The iterator object calls the __next__() method on each iteration and returns the next element until all elements have been traversed or a StopIteration exception is raised.

Many built-in data types and containers in Python are iterable objects, such as List, Tuple, Dictionary, Set, etc. In addition, we can also implement iterable objects through custom classes. We only need to define the __iter__() method in the class and return an iterator object in this method.

Here is an example showing how to iterate using iterable and iterator objects:

1

2

3

4

5

6

7

8

9

10

11

12

13

# 创建一个可迭代对象

my_list = [12345]

# 获取迭代器对象

my_iter = iter(my_list)

# 使用迭代器对象进行迭代

try:

    while True:

        item = next(my_iter)

        print(item)

except StopIteration:

    pass

  In the above example, we iter()obtained my_listthe iterator object by calling the function my_iter, and then used next()the function to get the next element from the iterator object and print it until all elements are traversed or the

StopIterationUntil abnormal. The principle of iterable objects is based on the implementation of the iterator protocol, and the next element in the sequence is provided through the __next__() method of the iterator object. This mechanism allows us to easily compare elements in the collection

Access and process one by one, providing a unified iteration interface

A small example of implementing iterable objects yourself

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

class MyIterable:

    def __init__(self, data):

        self.data = data

    def __iter__(self):

        self.index = 0

        return self

    def __next__(self):

        if self.index < len(self.data):

            item = self.data[self.index]

            self.index += 1

            return item

        else:

            raise StopIteration

# 创建一个可迭代对象实例

my_iterable = MyIterable([12345])

# 使用迭代器进行迭代

for item in my_iterable:

    print(item)

4. What are the main differences between Python2 and Python3:

Some major differences between Python 2.x and Python 3.x:

  1. Print function: In Python 2.x, the print statement is a keyword and uses  print "Hello" syntax similar to . In Python 3.x, print it is transformed into a built-in function and requires the use of parentheses, for example  print("Hello").

  2. Integer division: In Python 2.x, the result of integer division is truncated to an integer  , 5 / 2 e.g. 2In Python 3.x, the result of integer division will retain the decimal part and obtain a floating point result, for example,  5 / 2 the result is  2.5. If you want to perform truncated integer division, you can use  // the operator.

  3. Unicode strings: In Python 2.x, string types are divided into ordinary strings and Unicode strings (  u represented by prefixes), which leads to some confusion and confusion in string handling. In Python 3.x, all strings are Unicode strings. Ordinary strings are represented by bytes and need to be  b represented by prefixes.

  4. xrange 替代 range:在 Python 2.x 中,range 函数返回的是一个列表,如果需要生成大范围的整数序列,会占用大量内存。而在 Python 3.x 中,range 函数的实现类似于 Python 2.x 中的 xrange,返回一个迭代器对象,节省了内存。

  5. 异常语法:在 Python 2.x 中,捕获异常时使用的语法是 except Exception, e,将异常对象存储在变量 e 中。而在 Python 3.x 中,使用 except Exception as e 的语法,将异常对象存储在变量 e 中。

  6. input 函数:在 Python 2.x 中,input 函数会将用户输入的内容作为 Python 代码进行解析,存在安全风险。而在 Python 3.x 中,input 函数始终将用户输入的内容作为字符串返回,不进行解析。

除了上述主要区别之外,Python 3.x 还进行了一些其他改进,包括改进的类定义语法、更好的模块管理和导入机制、更一致的异常处理和错误机制等。然而,这也导致了 Python 2.x 代码无法直接在 Python 3.x 中运行,需要进行一些修改和调整。

5. 说说Python中多线程和多进程

  1. 多线程(Multithreading):

    • 多线程是指在一个进程内创建多个线程,每个线程独立执行不同的任务。多线程共享进程的内存空间,因此线程之间可以方便地共享数据。
    • 在 Python 中,可以使用 threading 模块来创建和管理线程。通过创建 Thread 类的实例,指定要执行的函数或方法,并调用 start() 方法,可以启动一个线程。
    • Python 的多线程由于全局解释器锁(Global Interpreter Lock,GIL)的存在,同一时刻只允许一个线程执行 Python 字节码。这意味着多线程并不能充分利用多核处理器,并发性能受限。多线程适用于 I/O 密集型任务,如网络请求、文件读写等,但对于 CPU 密集型任务,多线程并不能提升性能。
  2. 多进程(Multiprocessing):

    • 多进程是指创建多个独立的进程,每个进程都有自己的内存空间和系统资源。多个进程之间相互独立,可以并行执行不同的任务。每个进程都有自己的 Python 解释器,因此可以充分利用多核处理器,提高并发性能。
    • 在 Python 中,可以使用 multiprocessing 模块来创建和管理进程。通过创建 Process 类的实例,指定要执行的函数或方法,并调用 start() 方法,可以启动一个进程。
    • 多进程可以通过进程间通信(Inter-Process Communication,IPC)来实现进程之间的数据共享。Python 提供了多种 IPC 机制,如队列(Queue)、管道(Pipe)和共享内存等。

总结:

  • 多线程适合处理 I/O 密集型任务,可以提高程序的响应能力和效率。
  • 多进程适合处理 CPU 密集型任务,可以充分利用多核处理器提高并发性能。
  • 在 Python 中,多线程受到 GIL 的限制,多进程可以绕过 GIL,实现真正的并行执行。
  • 使用多线程或多进程时需要注意线程安全和进程安全的问题,避免数据竞争和共享资源的冲突。

6. Python中GIL锁:

  全局解释器锁(Global Interpreter Lock,简称 GIL)是在 CPython 解释器中存在的一个特性。它是一种机制,用于保证同一时刻只有一个线程执行 Python 字节码。虽然 GIL 的存在确保了 CPython 解释器的线程安全性,但也对多线程并发执行带来了一些限制。

以下是对 GIL 的一些详细解释和理解:

  1. GIL 的作用:

    • GIL 的主要作用是保护 CPython 解释器内部的数据结构免受并发访问的影响,确保线程安全。
    • CPython 使用引用计数(Reference Counting)作为内存管理的主要机制。GIL 保证了在修改引用计数时的原子性,避免了竞态条件(Race Condition)和内存泄漏问题。
    • GIL 还可以简化 CPython 解释器的实现,使其更加简单高效。
  2. GIL 的影响:

    • 由于 GIL 的存在,同一时刻只有一个线程可以执行 Python 字节码,其他线程被阻塞。这意味着多线程并不能充分利用多核处理器,无法实现真正的并行执行。
    • 对于 CPU 密集型任务,由于 GIL 的限制,多线程并不能提升性能。实际上,由于线程切换的开销,可能导致多线程执行速度比单线程还要慢。
    • GIL 对于 I/O 密集型任务的影响相对较小,因为线程在进行 I/O 操作时会主动释放 GIL,允许其他线程执行。因此,多线程在处理 I/O 操作时仍然可以提供一定的性能优势。
  3. 解决 GIL 的方法:

    • 采用多进程:由于每个进程都有独立的 Python 解释器和 GIL,多进程可以充分利用多核处理器,实现并行执行。
    • 使用扩展模块:某些扩展模块,如 NumPy、Pandas 等,使用 C/C++ 编写,可以释放 GIL,允许多线程并行执行。
    • 使用多线程库:一些第三方库,如 multiprocessing 模块、concurrent.futures 模块等,提供了替代方案,使得在某些情况下可以绕过 GIL 的限制。

需要注意的是,GIL 只存在于 CPython 解释器中,而其他实现(如 Jython、IronPython)可能没有 GIL。此外,对于许多类型的应用程序,如 I/O 密集型、并发处理不频繁的应用程序,GIL

的影响较小,可以继续使用多线程来实现并发。然而,对于 CPU 密集型任务和需要充分利用多核处理器的应用程序,考虑使用多进程或其他解决方案来规避 GIL。

最后感谢每一个认真阅读我文章的人,礼尚往来总是要有的,虽然不是什么很值钱的东西,如果你用得到的话可以直接拿走:

这些资料,对于【软件测试】的朋友来说应该是最全面最完整的备战仓库,这个仓库也陪伴上万个测试工程师们走过最艰难的路程,希望也能帮助到你!

Guess you like

Origin blog.csdn.net/NHB456789/article/details/135221348