The most comprehensive summary of Python key knowledge you have ever seen!

This is a summary of the Python focus from the developer @ Twenty One on SegmentFault. Since too many things have been summarized, the length is a bit long. This is what the author has summarized for a long time. It is strongly recommended to bookmark and read slowly~

Py2 VS Py3

Differences between Py2 and Py3

  • print becomes a function, python2 is a keyword

  • There is no longer a unicode object, the default str is unicode

  • python3 division return float

  • no long type

  • xrange does not exist, range replaces xrange

  • Chinese can be used to define function names and variable names

  • Advanced unpacking and *unpacking

  • The variable after the qualified keyword parameter * must add name=value

  • raise from

  • iteritems removed becomes items()

  • yield from chains sub-generators

  • asyncio, async/await native coroutine supports asynchronous programming

  • 新增 enum, mock, ipaddress, concurrent.futures, asyncio urllib, selector

    • Cannot compare between different enumeration classes

    • Only equality comparisons can be made between the same enumeration class

    • The use of enumeration classes (the number starts from 1 by default)

    • In order to avoid the occurrence of the same enumeration value in the enumeration class, you can decorate the enumeration class with @unique

#枚举的注意事项
from enum import Enum

class COLOR(Enum):
    YELLOW=1
#YELLOW=2#会报错
    GREEN=1#不会报错,GREEN可以看作是YELLOW的别名
    BLACK=3
    RED=4
print(COLOR.GREEN)#COLOR.YELLOW,还是会打印出YELLOW
for i in COLOR:#遍历一下COLOR并不会有GREEN
    print(i)
#COLOR.YELLOW\nCOLOR.BLACK\nCOLOR.RED\n怎么把别名遍历出来
for i in COLOR.__members__.items():
    print(i)
# output:('YELLOW', <COLOR.YELLOW: 1>)\n('GREEN', <COLOR.YELLOW: 1>)\n('BLACK', <COLOR.BLACK: 3>)\n('RED', <COLOR.RED: 4>)
for i in COLOR.__members__:
    print(i)
# output:YELLOW\nGREEN\nBLACK\nRED

#枚举转换
#最好在数据库存取使用枚举的数值而不是使用标签名字字符串
#在代码里面使用枚举类
a=1
print(COLOR(a))# output:COLOR.YELLOW

py2/3 conversion tool

  • six module: a module compatible with pyton2 and pyton3

  • 2to3 tool: change code syntax version

  • __future__: use the next version of the function

Class library related

common library

  • Must know collectionshttps://segmentfault.com/a/1190000017385799

  • Python sorting operation and heapq module https://segmentfault.com/a/1190000017383322

  • Itertools module super practical method https://segmentfault.com/a/1190000017416590

Less commonly used but important libraries

  • dis (code bytecode analysis)

  • inspect(generator state)

  • cProfile (performance analysis)

  • bisect (maintain ordered list)

  • fnmatch

  • fnmatch(string,"*.txt") # case insensitive under win

  • fnmatch depends on the system

  • fnmatchcase fully case sensitive

  • timeit (code execution time)

def isLen(strString):
    #还是应该使用三元表达式,更快
    return True if len(strString)>6 else False

def isLen1(strString):
    #这里注意false和true的位置
    return [False,True][len(strString)>6]
import timeit
print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))

print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))
  • contextlib

    • @contextlib.contextmanager makes a generator function a context manager

  • types (contains all types of type objects defined by the standard interpreter, and can modify the generator function to asynchronous mode)

import types
types.coroutine #相当于实现了__await__
  • html (implement the escape of html)

import html
html.escape("<h1>I'm Jim</h1>") # output:'&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;'
html.unescape('&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;') # <h1>I'm Jim</h1>
  • mock (resolve test dependencies)

  • concurrent (create process pool and thread pool)

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor()
task = pool.submit(函数名,(参数)) #此方法不会阻塞,会立即返回
task.done()#查看任务执行是否完成
task.result()#阻塞的方法,查看任务返回值
task.cancel()#取消未执行的任务,返回True或False,取消成功返回True
task.add_done_callback()#回调函数
task.running()#是否正在执行     task就是一个Future对象

for data in pool.map(函数,参数列表):#返回已经完成的任务结果列表,根据参数顺序执行
    print(返回任务完成得执行结果data)

from concurrent.futures import as_completed
as_completed(任务列表)#返回已经完成的任务列表,完成一个执行一个

wait(任务列表,return_when=条件)#根据条件进行阻塞主线程,有四个条件
  • selector (encapsulation select, user multiplexing io programming)

  • asyncio

future=asyncio.ensure_future(协程)  等于后面的方式  future=loop.create_task(协程)
future.add_done_callback()添加一个完成后的回调函数
loop.run_until_complete(future)
future.result()查看写成返回结果

asyncio.wait()接受一个可迭代的协程对象
asynicio.gather(*可迭代对象,*可迭代对象)    两者结果相同,但gather可以批量取消,gather对象.cancel()

一个线程中只有一个loop

在loop.stop时一定要loop.run_forever()否则会报错
loop.run_forever()可以执行非协程
最后执行finally模块中 loop.close()

asyncio.Task.all_tasks()拿到所有任务 然后依次迭代并使用任务.cancel()取消

偏函数partial(函数,参数)把函数包装成另一个函数名  其参数必须放在定义函数的前面

loop.call_soon(函数,参数)
call_soon_threadsafe()线程安全    
loop.call_later(时间,函数,参数)
在同一代码块中call_soon优先执行,然后多个later根据时间的升序进行执行

如果非要运行有阻塞的代码
使用loop.run_in_executor(executor,函数,参数)包装成一个多线程,然后放入到一个task列表中,通过wait(task列表)来运行

通过asyncio实现http
reader,writer=await asyncio.open_connection(host,port)
writer.writer()发送请求
async for data in reader:
    data=data.decode("utf-8")
    list.append(data)
然后list中存储的就是html

as_completed(tasks)完成一个返回一个,返回的是一个可迭代对象    

协程锁
async with Lock():

Advanced Python

  • Inter-process communication:

    • Manager (built-in a lot of data structures, which can realize memory sharing between multiple processes)

from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)
  • Pipe (for two processes)

from multiprocessing import Pipe,Process
#pipe的性能高于queue
def producer(pipe):
    pipe.send("bobby")

def consumer(pipe):
    print(pipe.recv())

if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    #pipe只能适用于两个进程
    my_producer= Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
  • Queue (cannot be used in process pool, communication between process pools needs to use Manager().Queue())

from multiprocessing import Queue,Process
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
  • process pool

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()
  • Several common methods of the sys module

    • argv command line parameter list, the first is the path of the program itself

    • path returns the search path for modules

    • modules.keys() returns a list of all modules that have been imported

    • exit(0) exit the program

  • a in s or b in s or c in s简写

    • Use the any method: all() returns True for any iterable object that is empty

# 方法一
True in [i in s for i in [a,b,c]]
# 方法二
any(i in s for i in [a,b,c])
# 方法三
list(filter(lambda x:x in s,[a,b,c]))
  • use of set collection

    • {1,2}.issubset({1,2,3})#Determine whether it is a subset

    • {1,2,3}.superset({1,2})

    • {}.isdisjoint({})#Judge whether the intersection of two sets is empty, if it is an empty set, it is True

  • Chinese matching in the code

    • [u4E00-u9FA5] Match the Chinese text range [一辥]

  • View the system default encoding format

import sys
sys.getdefaultencoding()    # setdefaultencodeing()设置系统编码方式
  • getattr VS getattribute

class A(dict):
    def __getattr__(self,value):#当访问属性不存在的时候返回
        return 2
    def __getattribute__(self,item):#屏蔽所有的元素访问
        return item
  • Class variables will not be stored in the instance __dict__, they will only exist in the __dict__ of the class

  • globals/locals (codes can be manipulated in disguise)

    • Globals saves all variable attributes and values ​​​​in the current module

    • All variable attributes and values ​​​​in the current environment are saved in locals

  • Parsing mechanism for python variable names (LEGB)

    • Local scope (Local)

    • Enclosing locals in which the current scope is embedded

    • Global/module scope (Global)

    • Built-in scope (Built-in)

  • Realize the grouping of three groups from 1-100

print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])
  • What are metaclasses?

    • That is to create a class of a class. When creating a class, you only need to set metaclass=metaclass, and the metaclass needs to inherit type instead of object, because type is the metaclass

type.__bases__  #(<class 'object'>,)
object.__bases__    #()
type(object)    #<class 'type'>
class Yuan(type):
        def __new__(cls,name,base,attr,*args,**kwargs):
            return type(name,base,attr,*args,**kwargs)
    class MyClass(metaclass=Yuan):
        pass
  • What is duck typing (ie: polymorphism)?

    • Python will not judge the parameter type by default in the process of using the incoming parameter, as long as the parameter meets the execution conditions, it can be executed

  • deep copy and shallow copy

    • Deep copy copy content, shallow copy copy address (increase reference count)

    • The copy module realizes the god copy

  • unit test

    • The general test class inherits the TestCase under the module unittest

    • pytest module shortcut test (method starts with test_/test file starts with test_/test class starts with Test, and cannot have an init method)

    • coverage statistics test coverage

class MyTest(unittest.TestCase):
    def tearDown(self):# 每个测试用例执行前执行
        print('本方法开始测试了')

    def setUp(self):# 每个测试用例执行之前做操作
        print('本方法测试结束')

    @classmethod
    def tearDownClass(self):# 必须使用 @ classmethod装饰器, 所有test运行完后运行一次
        print('开始测试')
    @classmethod
    def setUpClass(self):# 必须使用@classmethod 装饰器,所有test运行前运行一次
        print('结束测试')

    def test_a_run(self):
        self.assertEqual(1, 1)  # 测试用例
  • gil will release gil according to the number of bytecode lines executed and the time slice, and gil will actively release when encountering io operations

  • What is a monkey patch?

    • Monkey patch, replace the blocking syntax at runtime and change it to a non-blocking method

  • What is Introspection?

    • The ability to determine the type of an object at runtime, id, type, isinstance

  • Is python pass by value or by reference?

    • Neither, python is a shared parameter, and the default parameter will only be executed once during execution

  • The difference between else and finally in try-except-else-finally

    • else executes when no exception occurs, finally executes regardless of whether an exception occurs

    • except can catch multiple exceptions at a time, but generally in order to handle different exceptions differently, we capture and process them in batches

  • GIL global interpreter lock

    • Only one thread can execute at the same time, the characteristics of CPython (IPython), other interpreters do not exist

    • cpu-intensive: multi-process + process pool

    • io-intensive: multithreading/coroutines

  • What is Cython

    • Interpret python into C code tool

  • Generators and Iterators

    • Objects that implement the __next__ and __iter__ methods are iterators

    • Iterable objects only need to implement the __iter__ method

    • Generator functions using generator expressions or yield (a generator is a special kind of iterator)

  • What is a coroutine

    • A lighter-weight multitasking method than threads

    • Method to realize

    • yield

    • async-awiat

  • dict underlying structure

    • In order to support fast lookup, a hash table is used as the underlying structure

    • The average lookup time complexity of a hash table is O(1)

    • The CPython interpreter uses quadratic probing to resolve hash collisions

  • Hash expansion and Hash conflict resolution

    • Cyclic replication to new space for capacity expansion

    • Conflict resolution:

    • link method

    • Secondary detection (open addressing method): python uses

for gevent import monkey
monkey.patch_all()  #将代码中所有的阻塞方法都进行修改,可以指定具体要修改的方法
  • Determine whether it is a generator or a coroutine

co_flags = func.__code__.co_flags

# 检查是否是协程
if co_flags & 0x180:
    return func

# 检查是否是生成器
if co_flags & 0x20:
    return func
  • Fibonacci Solved Problems and Variations

#一只青蛙一次可以跳上1级台阶,也可以跳上2级。求该青蛙跳上一个n级的台阶总共有多少种跳法。
#请问用n个2*1的小矩形无重叠地覆盖一个2*n的大矩形,总共有多少种方法?
#方式一:
fib = lambda n: n if n <= 2 else fib(n - 1) + fib(n - 2)
#方式二:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return b

#一只青蛙一次可以跳上1级台阶,也可以跳上2级……它也可以跳上n级。求该青蛙跳上一个n级的台阶总共有多少种跳法。
fib = lambda n: n if n < 2 else 2 * fib(n - 1)
  • Get the environment variables set by the computer

import os
os.getenv(env_name,None)#获取环境变量如果不存在为None
  • garbage collection mechanism

    • reference count

    • mark clear

    • Generational recycling

#查看分代回收触发
import gc
gc.get_threshold()  #output:(700, 10, 10)
  • True and False are completely equivalent to 1 and 0 in the code, and can be directly calculated with numbers, inf means infinity

  • C10M/C10K

    • C10M: 8 core cpu, 64G memory, maintain 10 million concurrent connections on a 10gbps network

    • C10K: 1GHz CPU, 2G memory, 10,000 clients in a 1gbps network environment to provide FTP services

  • The difference between yield from and yield:

    • yield from is followed by an iterable object, and there is no limit after yield

    • GeneratorExit fires when the generator stops

  • Several uses of single underscore

    • When defining a variable, it is represented as a private variable

    • When unpacking, it means discarding useless data

    • Represents the last code execution result in interactive mode

    • Can do digital splicing (111_222_333)

  • Use break will not execute else

  • Decimal to binary

def conver_bin(num):
    if num == 0:
        return num
    re = []
    while num:
        num, rem = divmod(num,2)
        re.append(str(rem))
    return "".join(reversed(re))
  conver_bin(10)
  • list1 = ['A', 'B', 'C', 'D'] How to get a new list A=[], B=[], C=[], D=[] named after the elements in the list

list1 = ['A', 'B', 'C', 'D']

# 方法一
for i in list1:
    globals()[i] = []   # 可以用于实现python版反射

# 方法二
for i in list1:
    exec(f'{i} = []')   # exec执行字符串语句
  • memoryview and bytearray are not commonly used, just see the record

# bytearray是可变的,bytes是不可变的,memoryview不会产生新切片和对象
a = 'aaaaaa'
ma = memoryview(a)
ma.readonly  # 只读的memoryview
mb = ma[:2]  # 不会产生新的字符串

a = bytearray('aaaaaa')
ma = memoryview(a)
ma.readonly  # 可写的memoryview
mb = ma[:2]      # 不会会产生新的bytearray
mb[:2] = 'bb'    # 对mb的改动就是对ma的改动
  • Ellipsis type

# 代码中出现...省略号的现象就是一个Ellipsis对象
L = [1,2,3]
L.append(L)
print(L)    # output:[1,2,3,[…]]
  • lazy lazy calculation

class lazy(object):
    def __init__(self, func):
        self.func = func

    def __get__(self, instance, cls):
        val = self.func(instance)    #其相当于执行的area(c),c为下面的Circle对象
        setattr(instance, self.func.__name__, val)
        return val`

class Circle(object):
    def __init__(self, radius):
        self.radius = radius

    @lazy
    def area(self):
        print('evalute')
        return 3.14 * self.radius ** 2
  • Traverse files, pass in a folder, and print out the paths of all files in it (recursively)

all_files = []    
def getAllFiles(directory_path):
    import os                                       
    for sChild in os.listdir(directory_path):                
        sChildPath = os.path.join(directory_path,sChild)
        if os.path.isdir(sChildPath):
            getAllFiles(sChildPath)
        else:
            all_files.append(sChildPath)
    return all_files
  • When the file is stored, the processing of the file name

#secure_filename将字符串转化为安全的文件名
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool \xfcml\xe4uts.txt') # output:i_contain_cool_umlauts.txt
  • date formatting

from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

import time
#这里只有localtime可以被格式化,time是不能格式化的
time.strftime("%Y-%m-%d",time.localtime())
  • Tuple uses += strange problem

# 会报错,但是tuple的值会改变,因为t[1]id没有发生变化
t=(1,[2,3])
t[1]+=[4,5]
# t[1]使用append\extend方法并不会报错,并可以成功执行
  • __missing__ you should know

class Mydict(dict):
    def __missing__(self,key): # 当Mydict使用切片访问属性不存在的时候返回的值
        return key
  • + and +=

# +不能用来连接列表和元祖,而+=可以(通过iadd实现,内部实现方式为extends(),所以可以增加元组),+会创建新对象
#不可变对象没有__iadd__方法,所以直接使用的是__add__方法,因此元祖可以使用+=进行元祖之间的相加
  • How to turn each element of an iterable into all keys of a dictionary?

dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}

Network knowledge

  • What is HTTPS?

    • Secure HTTP protocol, https requires cs certificate, data encryption, port 443, safe, the same website https seo ranking will be higher

  • Common Response Status Codes

204 No Content //请求成功处理,没有实体的主体返回,一般用来表示删除成功
206 Partial Content //Get范围请求已成功处理
303 See Other //临时重定向,期望使用get定向获取
304 Not Modified //请求缓存资源
307 Temporary Redirect //临时重定向,Post不会变成Get
401 Unauthorized //认证失败
403 Forbidden //资源请求被拒绝
400 //请求参数错误
201 //添加或更改成功
503 //服务器维护或者超负载
  • Idempotency and security of http request method

  • WSGI

# environ:一个包含所有HTTP请求信息的dict对象
# start_response:一个发送HTTP响应的函数
def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return '<h1>Hello, web!</h1>'
  • RPC

  • CDN

  • SSL (Secure Sockets Layer), and its successor, Transport Layer Security (TLS), is a security protocol that provides security and data integrity for network communications.

  • SSH (Secure Shell Protocol) is the abbreviation of Secure Shell, developed by the Network Working Group of IETF; SSH is a security protocol based on the application layer. SSH is currently the most reliable protocol designed to provide security for remote login sessions and other network services. Using the SSH protocol can effectively prevent information leakage in the remote management process. SSH was originally a program on UNIX systems, and then quickly expanded to other operating platforms. SSH, when used correctly, closes holes in the network. SSH clients are available for a variety of platforms. Almost all UNIX platforms—including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix, and others—can run SSH.

  • TCP/IP

    • Although it is logical that all four messages have been sent, we can directly enter the CLOSE state, but we must pretend that the network is unreliable, and the last ACK may be lost. So the TIME_WAIT state is used to resend the ACK message that may be lost.

    • Because when the server side receives the SYN connection request message from the client side, it can directly send the SYN+ACK message. Among them, the ACK message is used for response, and the SYN message is used for synchronization. But when the connection is closed, when the server receives the FIN message, it may not close the SOCKET immediately, so it can only reply an ACK message first, telling the client, "I have received the FIN message you sent". I can only send the FIN message until all the messages on my Server end have been sent, so they cannot be sent together. Therefore, a four-step handshake is required.

    • Three-way handshake (SYN/SYN+ACK/ACK)

    • Four waves (FIN/ACK/FIN/ACK)

    • TCP: connection oriented/reliable/byte stream based

    • UDP: connectionless / unreliable / message-oriented

    • three handshake four wave

    • Why is there a three-way handshake when connecting, but a four-way handshake when closing?

    • Why does the TIME_WAIT state need to go through 2MSL (maximum segment lifetime) before returning to the CLOSE state?

  • XSS/CSRF

  • HttpOnly prohibits js scripts from accessing and operating cookies, which can effectively prevent XSS

Mysql

  • Index Improvement Process

    • Linear structure->binary search->hash->binary search tree->balanced binary tree->multi-way search tree->multi-way balanced search tree (B-Tree)

  • Mysql interview summary basic articles

  • https://segmentfault.com/a/1190000018371218

  • Mysql interview summary advanced articles

    • https://segmentfault.com/a/1190000018380324

  • Simple Mysql

    • http://ningning.today/2017/02/13/database/mysql in simple terms/

  • When clearing the entire table, InnoDB deletes one row at a time, while MyISAM deletes the newly created table

  • The text/blob data type cannot have a default value, and there is no case conversion when querying

  • When do indexes fail

    • Try to avoid using != or <> operators in the where clause, otherwise the engine will give up using the index and perform a full table scan

    • Try to avoid using or to connect the conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, even if there is a conditional index, it will not be used, which is why you should use or as little as possible

    • If the column type is a string, be sure to quote the data in the condition, otherwise the index will not be used

    • You should try to avoid performing function operations on fields in the where clause, which will cause the engine to give up using indexes and perform full table scans

    • For multi-column indexes, if the first part is not used, the index will not be used

    • Like fuzzy query starting with %

    • Implicit type conversion occurs

    • Does not satisfy the leftmost prefix principle

    • Failure scenario:

例如:
select id from t where substring(name,1,3) = 'abc' – name;
以abc开头的,应改成:
select id from t where name like 'abc%' 
例如:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
应改为:

Do not perform functions, arithmetic operations or other expression operations on the left side of "=" in the where clause, otherwise the system may not be able to use the index correctly

You should try to avoid performing expression operations on fields in the where clause, which will cause the engine to give up using indexes and perform full table scans

如:
select id from t where num/2 = 100 
应改为:
select id from t where num = 100*2;

Not suitable for columns with fewer key values ​​(columns with more repeated data) For example: set enum column is not suitable (enumeration type (enum) can add null, and the default value will automatically filter blank collection (set) and enumeration similar, but only 64 values ​​can be added)

If MySQL estimates that using a full table scan is faster than using an index, do not use an index

  • What is a clustered index

    • B+Tree leaf nodes store data or pointers

    • MyISAM index and data separation, using non-clustered

    • InnoDB data files are index files, and primary key indexes are clustered indexes

Summary of Redis commands

  • why so fast

    • Because Redis is a memory-based operation, the CPU is not the bottleneck of Redis. The bottleneck of Redis is most likely the size of the machine memory or the network bandwidth. Since single-threading is easy to implement, and the CPU will not become a bottleneck, it is logical to adopt a single-threaded solution (after all, multi-threading will cause a lot of trouble!).

    • Memory-based, written in C

    • Use multiple I/O multiplexing model, non-blocking IO

    • Use a single thread to reduce switching between threads

    • simple data structure

    • Built the VM mechanism by itself to reduce the time of calling system functions

  • Advantage

    • High performance – Redis can read at 110,000 times/s and write at 81,000 times/s

    • rich data types

    • Atomic – all operations of Redis are atomic, and Redis also supports atomic execution of several operations

    • Rich features – Redis also supports publish/subscribe (publish/subscribe), notification, key expiration and other features

  • What is a redis transaction?

    • A mechanism for packaging multiple requests and executing multiple commands in sequence at one time

    • Implement transaction functions through multi, exec, watch and other commands

    • Python redis-py pipeline=conn.pipeline(transaction=True)

  • Persistence

    • save (synchronization, data consistency can be guaranteed)

    • bgsave (asynchronous, shutdown, if there is no AOF, it will be used by default)

    • RDB (snapshot)

    • AOF (append log)

  • How to implement the queue

    • push

    • rpop

  • Commonly used data types (Bitmaps, Hyperloglogs, range queries, etc. are not commonly used)

    • skiplist (jump list)

    • intset或hashtable

    • ziplist (continuous memory block, the head of each entry node saves the length information of the front and rear nodes to realize the function of the doubly linked list) or double linked list

    • Integer or sds (Simple Dynamic String)

    • String (string): counter

    • List (list): user's attention, fan list

    • Hash (hash):

    • Set (collection): user's followers

    • Zset (ordered set): real-time information leaderboard

  • Differences from Memcached

    • Memcached can only store string keys

    • Memcached users can only add data to the end of an existing string through APPEND, and use this string as a list. However, when deleting these elements, Memcached uses a blacklist to hide the elements in the list, thereby avoiding operations such as reading, updating, and deleting elements.

    • Both Redis and Memcached store data in memory, and both are memory databases. But Memcached can also be used to cache other things, such as pictures, videos, etc.

    • Virtual memory – When the physical memory is exhausted, Redis can swap some Values ​​that have not been used for a long time to disk

    • Storage data security – after Memcached hangs up, the data is gone; Redis can be saved to disk regularly (persistence)

    • The application scenarios are different: Redis can be used not only as a NoSQL database, but also as a message queue, data stack, and data cache; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data, and sessions, etc.

  • Redis implements distributed locks

    • Use setnx to achieve locking, and you can add a timeout through expire at the same time

    • The value of the lock can be a random uuid or a specific name

    • When releasing the lock, judge whether it is the lock by uuid, if yes, execute delete to release the lock

  • common problem

    • When the traffic increases sharply, the service has problems (such as slow response time or no response), or non-core services affect the performance of the core process, it is still necessary to ensure that the service is still available, even if the service is damaged. The system can automatically downgrade based on some key data, or configure switches to achieve manual downgrade

    • Data expires, update the cached data

    • Initialize the project and add some commonly used data to the cache

    • When requesting access to data, it does not exist in the query cache nor does it exist in the database

    • The cached data expires in a short period of time, and a large number of requests access the database

    • cache avalanche

    • cache penetration

    • Cache Warming

    • cache update

    • cache downgrade

  • Consistent Hash Algorithm

    • Ensure data consistency when using clusters

  • Implementing a distributed lock based on redis requires a timeout parameter

    • setnx

  • Virtual Memory

  • memory thrashing

Linux

  • Five Unix I/O Models

    • select

    • poll

    • epoll

    • When the concurrency is not high and the number of connections is very active

    • Not much better than select

    • Applicable to situations where the number of connections is large but the number of active links is small

    • blocking io

    • non-blocking io

    • Multiplexing io (using selectot under Python to achieve io multiplexing)

    • signal driven io

    • Asynchronous io (Gevent/Asyncio implements asynchrony)

  • A better command manual than man

    • tldr: a manual with example commands

  • The difference between kill -9 and -15

    • -15: The program stops immediately/stops when the program releases the corresponding resources/the program may still continue to run

    • -9: Due to the uncertainty of -15, use -9 directly to kill the process immediately

  • Paging mechanism (memory allocation management scheme with logical address and physical address separation):

    • In order to efficiently manage memory and reduce fragmentation, the operating system

    • The logical address of the program is divided into pages of fixed size

    • The physical address is divided into frames of the same size

    • Corresponding logical address and physical address through page table

  • segmentation mechanism

    • In order to meet some logic requirements of the code

    • Data Sharing/Data Protection/Dynamic Links

    • Continuous memory allocation within each segment, and discrete allocation between segments

  • View cpu memory usage?

    • top

    • free View available memory and troubleshoot memory leaks

Design Patterns

singleton pattern

# 方式一
def Single(cls,*args,**kwargs):
    instances = {}
    def get_instance (*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance
@Single
class B:
    pass
# 方式二
class Single:
    def __init__(self):
        print("单例模式实现方式二。。。")

single = Single()
del Single  # 每次调用single就可以了
# 方式三(最常用的方式)
class Single:
    def __new__(cls,*args,**kwargs):
        if not hasattr(cls,'_instance'):
            cls._instance = super().__new__(cls,*args,**kwargs)
        return cls._instance

factory pattern

class Dog:
    def __init__(self):
        print("Wang Wang Wang")
class Cat:
    def __init__(self):
        print("Miao Miao Miao")


def fac(animal):
    if animal.lower() == "dog":
        return Dog()
    if animal.lower() == "cat":
        return Cat()
    print("对不起,必须是:dog,cat")

construction mode

class Computer:
    def __init__(self,serial_number):
        self.serial_number = serial_number
        self.memory = None
        self.hadd = None
        self.gpu = None
    def __str__(self):
        info = (f'Memory:{self.memoryGB}',
        'Hard Disk:{self.hadd}GB',
        'Graphics Card:{self.gpu}')
        return ''.join(info)
class ComputerBuilder:
    def __init__(self):
        self.computer = Computer('Jim1996')
    def configure_memory(self,amount):
        self.computer.memory = amount
        return self #为了方便链式调用
    def configure_hdd(self,amount):
        pass
    def configure_gpu(self,gpu_model):
        pass
class HardwareEngineer:
    def __init__(self):
        self.builder = None
    def construct_computer(self,memory,hdd,gpu)
        self.builder = ComputerBuilder()
        self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
    @property
    def computer(self):
        return self.builder.computer

Data Structures and Algorithms

Python implements various data structures

quick sort

def quick_sort(_list):
    if len(_list) < 2:
        return _list
    pivot_index = 0
    pivot = _list(pivot_index)
    left_list = [i for i in _list[:pivot_index] if i < pivot]
    right_list = [i for i in _list[pivot_index:] if i > pivot]
    return quick_sort(left) + [pivot] + quick_sort(right)

selection sort

def select_sort(seq):
    n = len(seq)
    for i in range(n-1)
    min_idx = i
        for j in range(i+1,n):
            if seq[j] < seq[min_inx]:
                min_idx = j
        if min_idx != i:
            seq[i], seq[min_idx] = seq[min_idx],seq[i]

insertion sort

def insertion_sort(_list):
    n = len(_list)
    for i in range(1,n):
        value = _list[i]
        pos = i
        while pos > 0 and value < _list[pos - 1]
            _list[pos] = _list[pos - 1]
            pos -= 1
        _list[pos] = value
        print(sql)

merge sort

def merge_sorted_list(_list1,_list2):   #合并有序列表
    len_a, len_b = len(_list1),len(_list2)
    a = b = 0
    sort = []
    while len_a > a and len_b > b:
        if _list1[a] > _list2[b]:
            sort.append(_list2[b])
            b += 1
        else:
            sort.append(_list1[a])
            a += 1
    if len_a > a:
        sort.append(_list1[a:])
    if len_b > b:
        sort.append(_list2[b:])
    return sort

def merge_sort(_list):
    if len(list1)<2:
        return list1
    else:
        mid = int(len(list1)/2)
        left = mergesort(list1[:mid])
        right = mergesort(list1[mid:])
        return merge_sorted_list(left,right)

Heap sort heapq module

from heapq import nsmallest
def heap_sort(_list):
    return nsmallest(len(_list),_list)

the stack

from collections import deque
class Stack:
    def __init__(self):
        self.s = deque()
    def peek(self):
        p = self.pop()
        self.push(p)
        return p
    def push(self, el):
        self.s.append(el)
    def pop(self):
        return self.pop()

queue

from collections import deque
class Queue:
    def __init__(self):
        self.s = deque()
    def push(self, el):
        self.s.append(el)
    def pop(self):
        return self.popleft()

binary search

def binary_search(_list,num):
    mid = len(_list)//2
    if len(_list) < 1:
        return Flase
    if num > _list[mid]:
        BinarySearch(_list[mid:],num)
    elif num < _list[mid]:
        BinarySearch(_list[:mid],num)
    else:
        return _list.index(num)

interview questions

  • About database optimization and design

    • Use hash consensus algorithm

    • setnx

    • setnx + expire

    • use redis

    • If the data writing order of the InnoDB table can be consistent with the order of the leaf nodes of the B+ tree index, the access efficiency is the highest at this time. For storage and query performance, self-increasing id should be used as the primary key.

    • For the primary index of InnoDB, the data will be sorted according to the primary key. Due to the disorder of UUID, InnoDB will generate huge IO pressure. At this time, it is not suitable to use UUID as the physical primary key. It can be used as the logical primary key. Add ID. For global uniqueness, uuid should be used as an index to associate other tables or as a foreign key

    • https://segmentfault.com/a/1190000018426586

    • How to implement a queue using two stacks

    • reverse linked list

    • Merge two sorted lists

    • Delete linked list node

    • reverse binary tree

    • Design a short URL service? 62 hexadecimal implementation

    • Design a seckill system (feed flow)?

    • https://www.jianshu.com/p/ea0259d109f9

    • Why is it better to use an auto-incrementing integer for the primary key of the mysql database? Is it okay to use uuid? Why?

    • If it is a distributed system, how do we generate the self-incrementing id of the database?

    • Implementing a distributed lock based on redis requires a timeout parameter

    • If a single redis node goes down, how to deal with it? Are there other solutions in the industry to implement distributed lock codes?

caching algorithm

  • LRU (least-recently-used): replace the least recently used object

  • LFU (Least frequently used): Least frequently used, if a piece of data has been used rarely in the most recent period, it is very unlikely that it will be used in the future

Server-side performance optimization direction

  • Use Data Structures and Algorithms

  • database

    • slow_query_log_file is turned on and query slow query log

    • Troubleshoot indexing problems through explain

    • Adjust Data Modification Index

    • index optimization

    • slow query elimination

    • Batch operations to reduce io operations

    • Use NoSQL: such as Redis

  • network-io

    • batch operation

    • pipeline

  • cache

    • Redis

  • asynchronous

    • Asyncio implements asynchronous operations

    • Use Celery to reduce io blocking

  • concurrency

  • Multithreading

  • Vent

Guess you like

Origin blog.csdn.net/onebound_linda/article/details/131844448