Summary of Python interview questions (updated in October 2022)

Interview Questions Catalog - Subsequent Content Supplement

Python basics

The difference between new-style classes and old-style classes in Python

The polymorphic inheritance of new-style classes uses the C3 algorithm, and the old-style classes use the depth-first algorithm

__init__The difference between the and __new__method in Python

​ When creating a new class, first call __new__the method to instantiate, and then call __init__the method to assign the corresponding parameters to the new instance. If you do not create a new class but call it directly, it will only trigger __init__the function, and __new__the method will return an instance, but __init__the method will not

What problem does the singleton pattern solve in Python? How to achieve

The singleton mode in Python solves the problem of repeatedly creating instances in the program. For example, a log class may be instantiated in multiple programs, resulting in multiple class objects in the program affecting program performance. The singleton mode can be avoided. Performance occupancy, generally there are two ways to implement the singleton mode, the first one is __new__implemented through the update method, the code snippet is as follows:

# 创建单实例对象 Singleton
class Singleton():
    def __new__(cls, *args, **kwargs):
        if no hasattr(cls, '_instance'):
            cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)
        return cls._instance
    
class A(Singlenton):
    pass 

>>> a = A()
>>> id(a)
1302990860592
>>> b = A()
>>> id(b)
1302990860592
>>> a==b
True
>>> a is b
True

You can also use a decorator to complete a single instance object. Using a decorator can add function functions without modifying the original code. It is generally used to expand related functions such as function input inspection and logging. The implementation code is as follows:

def singelton(cls, *args, **kwargs):
    instance = {
    
    }
    def _singelton(*args, **kwargs):
        if cls not in instance:
            instance[cls] = cls(*args, **kwargs)
        return instance[cls]
    return _singelton

@singleton
class MyClass3(object):
    pass

What are the scopes of Python variables? Where is the scope of action

Function scope can be summarized by LEGB:

  • L indicates the internal scope of the function logcal, which means that variables inside a function will be searched first.
  • If it cannot be found, it will enter the nested function enclosing inside the function for query.
  • If it cannot be found again, enter the global scope to search.
  • If it cannot be queried in the global scope, enter the built-in scope build-in that comes with Python to inquire.

GIL locks in Python

In order to ensure the safe operation of the thread, it will automatically lock when a thread is running, so that the CPU only executes the tasks in the current thread. Python's multi-threading has no advantage. Generally, multi-process and coroutines are used to improve its own processing speed. There may be multiple threads in a process, and the execution of process tasks is ensured by switching threads. However, thread switching also takes time, and the switching time is controlled by the system itself. Therefore, the concept of coroutines is proposed, and it is up to the user to decide which one is appropriate. To switch, yield is proposed on the basis of coroutines

The difference between iterators and generators in Python

The generator has already appeared in Python2 at the earliest. A breakpoint is set during program execution through the yield keyword, and a result is returned at the breakpoint. The program continues to execute at the breakpoint through next() until the next power-off or The return point of the program itself is to implement a simplified version of the iterator.

A generator is a special iterator that uses yield to return results without manually implementing __iter__methods and __next__methods, while iterators need to implement __iter__methods to return the iterator object itself, and also need to implement __next__methods to get the next value in the iterator .

How does Python implement thread communication and process communication?

Threads use queue Queue to communicate, adopt producer-consumer mode to monitor the queue content, and automatically end the queue when the queue is empty. Inter-process communication can use multiprocessing.Manager,

What is observer pattern in development pattern?

When an object changes, other objects that depend on it are notified. The purpose is to solve how to notify other objects when an object changes state, and to ensure that the objects are loosely coupled. The disadvantage is that when an object has many observers, it is a waste of resources to notify each observer. If there is a circular reference between each other, it may cause a crash. Secondly, the observer mode can only notify the state of the change, and cannot detect the cause of the change. The sample code is as follows:

# Observer Pattern

# create Observer
class Observer:
    def update(self, temp, humidity, pressure):
        pass
    
    def display(self):
        pass

# create Subject
class Subject:
    def register_observer(self, observer):
        return
    
    def remove_observer(self, observer):
        return
    
    def notify_observer(self):
        return


class WeatherData(Subject):
    def __init__(self):
        # use to save observer
        self.observer = []
        self.temperature = 0.0
        self.humidity = 0.0
        self.pressure = 0.0
        return
    
    def register_observer(self, observer):
        self.observer.append(observer)
        return
    
    def remove_observer(self, observer):
        self.observer.remove(observer)
        return
    
    def get_Humidity(self):
        return self.humidity
    
    def get_temperature(self):
        return self.temperature
    
    def get_pressure(self):
        return self.pressure
    
    def measurements_changed(self):
        self.notify_observer()
        return
    
    def set_measuerment(self, temp, humidity, pressure):
        self.temperature = temp
        self.humidity = humidity
        self.pressure = pressure
        self.measurements_changed()
        return
    
    def notify_observer(self):
        for item in self.observer:
            item.update(self.temperature, self.humidity, self.pressure)
        return


class CurrentConditionDisplay(Observer):
    def __init__(self, weatherData):
        self.weather_data = weatherData
        self.temperature = 0.0
        self.humidity = 0.0
        self.pressure = 0.0
        weatherData.register_observer(self)
        return
    
    def update(self, temp, humidity, pressure):
        self.temperature = temp
        self.humidity = humidity
        self.pressure = pressure
        self.display()
        return
    
    def display(self):
        print("temprature = %f, humidity = %f" % (self.temperature, self.humidity))
        return


class StatiticDisplay(Observer):
    def __init__(self, WeatherData):
        self.weather_data = WeatherData
        self.temperature = 0.0
        self.humidity = 0.0
        self.pressure = 0.0
        WeatherData.register_observer(self)
        return
    
    def update(self, temp, humidity, pressure):
        self.temperature = temp
        self.humidity = humidity
        self.pressure = pressure
        self.display()
        return
    
    def display(self):
        print("Statictic = %f, pressuer = %f" % (self.temperature, self.pressure))
        return


if __name__ == '__main__':
    weather = WeatherData()
    display = CurrentConditionDisplay(weather)
    weather.set_measuerment(2.0, 3.0, 4.0)
    display = StatiticDisplay(weather)
    weather.set_measuerment(3.0, 4.0, 5.0)

What does a good Python development pattern look like?

The coding style complies with PEP8. Comments are used below the key statements and definitions of functions to indicate the function of the code. It has the ability to output logs and handle exceptions to ensure that abnormal programs can continue to run and save the site for subsequent analysis. The data content is in the form of logs or localized data. Save in the form for easy follow-up investigation

How to troubleshoot the data inconsistency between the front and back ends?

Watch the log files and local cache files to determine where data inconsistencies occur. For example, check the network response through the front-end console to determine whether there is a problem with the back-end data. If there is no problem, check the back-end, and determine the problem of data inconsistency through breakpoints and logs. If there is a problem in the front end, enter the script file of the front end to troubleshoot.

What is git-flow?

git-flow is a tool to standardize the development process through scripts. It realizes the separation of the production environment and the development environment through the two branches of master-develop. When the test is passed, the version is released automatically through the release, and the hotfix branch is provided for hot repair.

What is the difference between process, thread and coroutine? The underlying implementation logic?

  • A process is the unit of the CPU to execute programs. Only one process can be executed on a CPU at the same time. The essence of multi-process is that multiple processes are executed in turn according to certain rules. The process consists of memory space (the space contains code, data, process space, open file) and one or more threads.

  • There are multiple threads in a process, and multiple threads will switch between each other. A standard thread consists of thread ID, current instruction pointer (PC), registers and stack.

    A process is the smallest unit that the CPU can be allocated by the operating system, and a thread is the smallest unit that the program itself can control. The overhead of thread switching is much less than that of process switching. Processes are independent of each other, and different threads under the same process can share the process space.

What is the garbage collection mechanism?

Python mainly uses reference counting for garbage collection. Whenever a memory address is referenced, a reference count is increased, and when the reference count is 0, the memory is reclaimed.

There is a problem of circular references in reference counting, so generational collection and garbage collection mechanisms are added to assist. When the user creates an object, a new linked list is placed. When the object created by the user fills the first linked list, the linked list is checked through the garbage collection mechanism, and then the oldest object on the linked list is moved to the second linked list. There are 3 such linked lists to realize the generational recycling mechanism.

The garbage collection mechanism relies on the gc module provided by Pyhton to eliminate unreachable objects in the program.

How to solve the problem of circular reference?

When a package is referenced in Python, if the package is referenced for the first time __init__.py, the code in it will be executed, and the top-level code (global variables, imports, etc.) of the imported module will be executed at the same time. The circular reference problem often occurs in this part.

solution:

  • Import the module directly and module.functioncall the function in the form of
  • Use lazy imports, import in functions or at the bottom
  • Redesign the code structure and use a unified entry to import and reference modules

What is Python's introspection/reflection?

When coding, sometimes the names of some properties need to be determined by the user or when they are instantiated. You can add properties and methods to the instantiated object through strings. This behavior is called introspection/reflection.

Reflection is a behavior that directly operates on the functions in the class through strings. In the program, we call the function by directly calling the function name. When the user enters a string, we can use the function to call the function in the class getattr(classers, function_name). Search, the return value is the function of this class, which can be used directly. Before use, it can cooperate hasattr(classes, function_name)to determine whether there is a method with the corresponding name in a class.

database

What are the optimization methods of the database?

  • Use join instead of subquery to search. join connects the data of two tables through the Cartesian product of two tables and the value of a specific column. Common query algorithms include Inner Join, Left Outer Join and Right Outer Join

    JOIN type

    Nested-Loop Join is the most primitive query method. It caches the smaller number of tables in the memory, and queries each row of data in the outer table, and enters it after satisfying the query conditions. The lowest efficiency is generally when the number of tables is small or Used when the join condition does not contain equivalence.

    Hash Join is a common query method. It pulls all the data of a smaller table and writes it into the hash table. It traverses each row of data in the outer table and uses the equivalent condition JOIN KEY to query in the hash table and retrieves 0-N matching data. row, after constructing the result row, compare it with the query condition, and output the result.

    Lookup Join is another equivalent JOIN algorithm, traversing smaller tables according to the number

How to view the SQL query time and specific query process?

To query the process and time of this SQL query through EXPLAIN, you need to pay attention to the creation of ROWS and temporary tables, which are the number of rows queried and the number of temporary tables created, and minimize the number of rows queried and the number of temporary tables created.

field name usefulness
table Which table is the data displayed about
type Important columns and types (from good to bad const, eq_reg, ref, range, indexhe, all)
possible_keys Indexes that may be applied to the table, empty if no index is available
key the actual index used
key_len The index length to use, the shorter the better
ref Show which column of the index is used
rows The number of rows this query must examine
extra Additional operational information about this query

What are the optimization methods for join table query?

solution:

  • Add indexes: primary key index, common index, unique index, full-text index, aggregated index (multi-column index)
  • Avoid subqueries, use join instead
  • Avoid making a null judgment on the field in the where statement, otherwise the index will be abandoned and the full table scan will be used
  • Avoid using in and not in, these two keywords will also cause a full table scan, you can use exists instead of in
  • Try to use numbers instead of characters
  • Avoid fuzzy queries
  • Avoid using or as a join condition, you can use union all instead

What should I do if a deadlock occurs during production and the entire data hangs?

  1. Prioritize resuming production operations by restarting services or using a standby database
  2. Check the cause of the deadlock through the log
  3. Perform breakpoint debugging on the deadlock site and check the trigger code

Advantages and disadvantages of Token and Session?

Token is a method used to make up for the stateless login of the HTTP protocol. After the user logs in, the server uses the key to encrypt and sign a JSON string. The user sends the string together when sending the request, and the server can verify it. Directly judge its login status, which can solve the problem of users logging in across domains under different domain names. The advantage is that Token information is saved on the client, which can save server resources

Session is also used to make up for the stateless login method of the HTTP protocol. When the user logs in, the user's relevant information is saved on the server side and entered into the Session library, and the corresponding session ID is returned. When the user accesses the content, the server retrieves the Session ID to confirm the user identity.

The token may be deceived by people analyzing relevant data on the client side, and the session avoids this problem.

How to realize the direct login of different websites? A directly logs in to B after logging in

Use JWT to verify whether the user's login information is valid by using a self-signed method.

How to optimize the tens of millions of tables?

First determine what kind of data the data in the tens of millions of tables belong to:

  1. Flow-type data, such as transaction flow, payment flow, the main business content is insertion. Business split to distributed storage
  2. Status-type data is mainly for query and modification in business, and there are requirements for the accuracy of the data, such as balance and status. Try not to split and expand horizontally
  3. Configuration data, such as system configuration, path, permission point, etc.

Optimize for business scenarios, split mixed business into independent business, split status data and historical data, data can be split according to date, partition, etc. and renamed in the form of table name.

For scenarios with more reads and fewer writes, cache and in-memory databases can be used to reduce database pressure. For scenarios with fewer reads and more writes, methods such as asynchronous submission and queue writing can be used to reduce the write frequency. Add middleware, read-write separation, load balancing and other methods to improve database availability in horizontal expansion.

Standardize the use of transactions in the code to avoid abuse, optimize SQL query statements to improve query efficiency, and increase indexes.

In terms of operation and maintenance, data is regularly cleaned and hot and cold data are divided.

How to test database concurrency?

What are the differences and usage scenarios of HGET, GET, HSET, and SET in Redis? Read and write efficiency?

Reids provides 5 types of data for common data:

  • string string
  • Hash hash, usually used to store key-value pair information
  • List list, which can store strings, allows repeated insertion, up to 2^32-1 can be inserted, and can be added at the beginning or end of the list
  • Set set, the set is unordered, realized through the hash mapping table, the time complexity of adding, deleting, modifying and checking is O(1)
  • Ordered set set, a collection of string elements, each element is associated with a score of double type, the score is allowed to be repeated, sorted by this score
  • Bitmap bitmaps store 0 or 1 as a value through a map-like structure, usually used for statistical status
  • Cardinality statistics HyperLogLogs, accepts multiple elements as input, and calculates the cardinality of the elements

There are different operations for storing and reading these data types

type of data read operation write operation
string get foo set foo “this is simple"
String (batch operation) Mget foo foo1 Meet foo “1” foo1 “2”
hash hget dict:1 Set dict:1 “123”
hash (batch operation) Hgetall user:1 Hmset user:1 “23” “45”
list LRANGE user 0 1 LPUSH user tom
set SMEMBERS user SADD user tom
check zrange user 0 10 zadd user 0 tom
bitmaps SETBIT user:0001 10003 1
HyperLogLog PFADD user tom

What is a hashable value?

Immutable data structures such as strings, tuples, etc. can convert large amounts of data into smaller data, which is convenient for us to query it under a fixed complexity.

What is binary security?

Binary storage is often used when storing strings. However, in some languages, it is necessary to judge the end or beginning of the string, resulting in a result returned after a string is input that does not meet expectations. In the case of binary security, no special processing should be done to the input string data. The length of the string is known and not affected by other terminators.

What is Time Series Data

Time-series data is data indexed by time. It has the characteristics of stable and continuous writing, high concurrency and high throughput. It writes more and reads less. In most cases, only data is written, and in rare cases it is manually modified. At the same time, there is a large distinction between hot and cold data of time series data. Most people care about the time series data in the recent period, and rarely read and write early data. At the same time, the shorter the monitoring time interval, the greater the amount of data generated.

Commonly used time series data databases

The time-series database itself must be able to support high-concurrency and high-throughput writing, and at the same time support interactive query at the terabyte level or even higher level of data volume, and be able to support data storage of this volume. Generally, NoSQL databases stored in LSM trees are used, such as HBase, Cassandra, TableStore, etc.

Common tree structures and their pros and cons?

The degree of the tree depends on the number of nodes with the most linked nodes, and the depth refers to the number of layers in the tree from the root node to the farthest leaf node. A tree whose left and right nodes can be exchanged is called an unordered tree, and vice versa, a binary tree is a tree whose nodes have a degree of 2.

Linear structures consume a lot of time when inserting and reading. Generally, tree structures are used for storage. Currently, mainstream dynamic search trees include: binary search tree, balanced binary tree, red-black tree, B tree and B+ tree . The query complexity of the first three trees is related to the depth of the tree, and the latter two are generally used, that is, balanced multi-way search trees.

The characteristics of binary search tree:

  • If the left subtree of the tree is not empty, the values ​​of all nodes on the left subtree are less than its root node
  • If the right subtree of the tree is not empty, the values ​​of all nodes on the right subtree are greater than its root node
  • The left and right subtrees of the tree are also binary search trees

Naming of binary tree nodes:

  • A node without a parent is called a root node, and a binary tree has only one root node
  • A node with a parent node is called a child node, and child nodes with a common parent node are sibling nodes
  • Those without child nodes are called leaf nodes, and a binary tree can have multiple leaf nodes

Mathematical properties of a binary tree:

  • The i-th layer in the binary tree has at most 2 (i-1) nodes, and the binary tree with a depth of k has at most 2 k-1 nodes
  • In a binary tree, if the number of leaf nodes is n0 and the number of nodes with degree 2 is n2, then n0=n2+1

Why is the database not recommended to use foreign keys

Foreign keys are used to constrain and check the relationship between database tables. When inserting data, the tables connected by foreign keys will be checked to ensure that dirty data will not be inserted. When deleting, cascading deletion will also be used to delete invalid data. You can Ensure data reliability and accuracy.

These features can cause some trouble in a production environment:

  • Every time you insert data, you need to check other tables, which affects efficiency
  • Every time data is deleted, other data deletions will be triggered, which may cause a large number of data deletions due to the amount of data and cause a crash
  • When inserting data, the rows in the corresponding foreign key table will be locked, affecting other business operations
  • The database structure is limited, and it is difficult to divide databases and tables

Why is it recommended to use tinyint instead of enum

tinyint can represent content by numbers, and enum as an enumeration value can be queried by value or index of enumeration value, for example:

enum = {
   
   'a','b','c'}
select * from tbl_name whre enum = 2
select * from tbl_name where enum = 'b'

The two are equivalent. When using insert into to insert, if enmu is designed as a number, there may be an index that originally wanted to insert a number but become an index to insert an enumeration value. At the same time, when adding an enumeration value in enum, if If it is not added at the end but added in one of them, it will cause confusion in the records in other places. Finally, enmu is a characteristic field of MySQL, which is not supported in other databases, which will affect the import and export of data.

What is a prefix index

Perform prefix indexing on text and string types, suitable for texts with large prefix differences, and reduce the index length

ALTER TABLE table_name ADD KEY(column_name(prefix_length))

Practical questions

Find the student's average grade and the number of courses taken

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-wy8GYap8-1666748452406)(/Users/tomjerry/Library/Application%20Support/typora-user-images/image- 20221022215437619.png)]

Known: S is the student number, and the sc table is the student's record of each subject

Question 1: Write a SQL statement to find the student number and average grade of each student, and display the records with an average grade of 90 or more

select S, avg(score) 
from sc 
group by S 
having avg(score) > 90

Question 2: Write a SQL statement to find the student's student number, name, number of courses and total grades

select t1.S, t1.Sname, count(t2.C), sum(t2.score) 
from student t1 
inner join sc t2
on t1.S = t2.S
group by t1.S

Backend related

Common crawler bypass methods?

Anti-climbing measures solution
Detect IP source Use IP proxy
verification code Image recognition, coding platform
encryption parameters JS reverse engineering
browser header validation user-agent masquerade
source verification Add refer header
Login to view Simulated login, cookie camouflage
Limit individual user visits Multithreading
JS anti-debugging breakpoint bypass

What is the difference between putting fields in headers and cookies?

Cookies will only be added to the request header when accessing the same domain name, and headers will carry this field in requests for all domain names.

What is the Same Origin Policy?

Access according to the requirements of the same protocol, same IP, and same port, and other pages are not allowed to access the resources of the current page.

Is Flask multithreaded or multiprocess? How to solve the conflict between threads and coroutines in Flask?

The default development web server can be customized, the default is single process and single thread, threadad = True to enable multi-threading, processes=2 to enable multi-process.

The coroutines in the thread will share the thread resources, so it needs to be modified

containerization

1. The difference between Docker and virtual system

The virtual system is to create a virtual layer on the host machine and install applications through the virtualized operating system. When using Docker, the Docker engine is created through the host system, and the application is installed on this basis. Therefore, it can achieve second-level startup, smaller resource occupation, and can create configuration files through Dockerfile to realize automatic creation and deployment.

Isolate different containers directly through the namespace in the container

Guess you like

Origin blog.csdn.net/qq_20728575/article/details/127526588