Collection of python interview questions (1)

python technical interview questions

1. Power operation in Python

In python, the power operation is performed by two ** asterisks. The example is as follows:

>>> a = 2 ** 2
>>> a
4

We can see that the output of the square of 2 is 4.

So what does ^ mean? Let’s demonstrate with code:

>>> 1 ^ 01
>>> 0 ^ 11
>>> 1 ^ 10
>>> 0 ^ 00
>>> 2 ^ 20

When used as an operator in python, ^ means bitwise XOR (bitwise operator, bitwise XOR)

a ^ b For each bit, when the corresponding bit of the two operands has one and only one 1, the result is 1, otherwise it is 0.

2. a=1, b=2, do you not need to exchange the values ​​of a and b with intermediate variables?

answer:

>>> a=1
>>> b=2
>>> a=a^b
>>> b=b^a
>>> a=a^b
>>> a2
>>> b1

Note that this is a rather clever method. Of course, the answer to this question is not unique. The simplest answer is

a,b=b,a

Here, in order to demonstrate the XOR above, a clever way is provided:

10 in binary and 1 in binary. So converting to binary in the question is equivalent to a=1, b=10. as follows:

a=01 b=10

First, a=a^b (according to the same take 0, different take 1), pay attention to the change rule of the value on the two binary digits:

a=01 b=10------a=11

Then b=b^a, the result is as follows:

b=10 a=11------b=01

Finally, a=a^b is performed, and the result is as follows:

a=11 b=01------a=10

The final result is a=10,b=01. Compared with the initial a=1, b=10, the values ​​of the two variables are indeed exchanged.

3. Basics of interview questions

1. What problems will occur when modifying immutable data in the code? What exception is thrown?

答:代码不会正常运行,抛出TypeError异常。

2.What underlying method in python does print call?

答:print方法默认调用 sys.stdout.write方法,也就是往控制台打印字符串。

3. Briefly describe your understanding of the input() function?

答:
在python3中,input获取用户输入,不论用户输入什么,获取到的都是字符串类型。

在python2中,有rawinput()和input(),rawinput()和python3中的input作用是一样的,input有点区别,就是输入的是什么数据类型,获取到的就是什么数据类型的。

4.What is the difference between range and xrange in python2?

答:
两者用法相同,不同的是range返回的结果是一个列表,而xrange的结果是一个生成器;range直接开辟一块内存空间来保存列表,xrange是一边循环一边使用,是有使用的时候才会开辟内存空间,所以当列表很长时,使用xrange性能要更好一点。

How to read 5G data from 5.4G memory?

答:
可以通过生成器,分多次读取,每次读取数量相对少的的数据进行处理,处理结束后再读取后面的数据。

还可以通过Linux命令split切割成小文件,然后再对数据进行处理,此方法效率比较高,可以按照行数切割,可以按照文件大小切割。

6、dict

Question: There is a requirement to add a key to the dictionary. If it already exists, no operation will be performed; if it does not exist, a default value needs to be set after adding it. Please use one of the methods provided by the dictionary to do this.

Answer: The setdefault method accomplishes this operation. Examples are as follows:

mydict = {
    
    "1":"小闫", "2":"小良"}
mydict.setdefault('1', 'xx')
print(mydict)
# 结果为 {'1': '小闫', '2': '小良'}
mydict.setdefault('3', 'xx')
print(mydict)
# 结果为 {'1': '小闫', '2': '小良', '3': 'xx'}

7、str

Question: Which one is more efficient, the method join or the operator + for strings?

Answer: If a large number of strings need to be spliced, such as tens of thousands, then the join type is much more efficient than operator +; if only one or two strings are spliced, the operator will be more practical.

The bottom layer of the string is implemented by the PyStringObject object in the C language. This object is immutable. This means that if you want to use the operator +, you need to constantly reapply for address space to store the spliced ​​strings. On the basis of a huge number On, one can imagine how low the efficiency is. Join is different. It operates on iterable objects such as lists. Because the operation object is variable, it only needs to apply for memory once. So the conclusion is that the join type is much more efficient.

8、MVT

Q: Tell me about your understanding of the MVT model?

Answer: First, let’s introduce what these three letters represent. M stands for Model, which represents the model class, and interacts with the database; V stands for View, which processes requests and interacts with database templates, etc.; T stands for Template. Templates are responsible for filling in data to generate front-end pages. The general process is: the client initiates a request, and after the view receives it, it processes it according to the content. If a database is involved in the process, the query and save operations are performed and the results are returned to the view. Then the template is filled and an html is returned to the view. page, and finally the view returns the page to the client for rendering and display.

4. Some knowledge points about file reading

1. When reading a file, we must consider the position of the pointer. If the file is not closed, then our second read will start from the end of the first read.

2. Method to control file pointer:

Method 1: Re-open the file. Every time the read-only mode is executed, reading starts from the file.

Method 2: You can manually move the pointer of the file to the beginning of the file:

file_name.seek(0.0)

file.seek(off, whence=0): Move off operation marks (file pointers) from the beginning of the file. A positive number means moving in the end direction, and a negative number means moving in the starting direction. If the whence parameter is set, the starting position set by whence shall prevail. 0 represents the beginning, 1 represents the current position, and 2 represents the end position of the file.

5. Linux command split

This command represents cutting a large file into smaller files. By default, it is cut into a small file every 1000 lines.

grammar:

split [--help][--version][-<行数>][-b <字节>][-C <字节>][-l <行数>][要切割的文件][输出文件名]

Parameter explanation:

-<Number of lines>: Specify how many lines to cut into a small file

-b<bytes>: Specify how many bytes to cut into a small file

–help: online help –version: display version information

-C<字节> : 与参数"-b"相似,但是在切 割时将尽量维持每行的完整性[输出文件名]: 设置切割后文件的前置文件名, split会自动在前置文件名后再加上编号-a, --suffix-length=N 指定输出文件名的后缀,默认为2个-d, --numeric-suffixes 使用数字代替字母做后缀-l, --lines=NUMBER NUMBER 值为每一输出文档行数大小

使用:

split -l 2482 ../BLM/BLM.txt -d -a 4 BLM_

将 文件 BLM.txt 分成若干个小文件,每个文件2482行(-l 2482),文件前缀为BLM_ ,系数不是字母而是数字(-d),后缀系数为四位数(-a 4)

6、Linux命令wc

读取文件有多少行

wc -l <file_name>

7、Linux命令>>和>

代表覆盖文件内容。>>代表的是在文件后面追加内容。

8、描述用浏览器访问www.baidu.com的过程

1.首先由DNS服务器解析出baidu.com对应的IP地址。

2.要先使用ARP获取默认网关的mac地址。

3.组织数据发送给默认网关(IP还是DNS服务器的IP,但是mac地址是默认网关的mac地址),默认网关拥有转发数据的能力,把数据转发给路由器。

4.路由器根据自己的路由协议,来选择一个合适的较快的路径转发数据给目的网关。

5.目的网关(DNS服务器所在的网关),把数据转发给DNS服务器。

6.DNS服务器查询解析出baidu.com对应的ip地址,并原路返回请求这个域名的client。

7.得到了baidu.com对应的IP地址之后,会发送tcp的3次握手,进行连接。

8.使用HTTP协议发送请求数据给web服务器。

9.web服务器收到数据请求之后,通过查询自己的服务器得到的相应的结果,原路返回给浏览器。

10.浏览器接收到数据之后通过浏览器自己的渲染功能来显示这个网页。

11.浏览器关闭tcp连接,即4次挥手结束,完成整个访问过程。

9、过程中涉及到的知识点

9.1、OSI模型

The OSI model is called the Open System Interconnection Communication Reference Model, which divides the computer network architecture into seven layers:

The first layer: the physical layer (bit stream transmission), which is equivalent to the first-line porter in the post office.

The second layer: Data link layer (providing media access and link management), equivalent to the packing and unpacking workers in the post office. The MAC address is on this layer.

The third layer: the network layer (addressing and routing), equivalent to the workers in the post office who sort mail items by region. IP addresses are in this layer.

Layer 4: Transport layer (providing end-to-end reliable connections), equivalent to the employees in the company who send letters to the post office.

The fifth layer: Session layer (establishing, maintaining and managing sessions), which is equivalent to the secretary in the company who is responsible for receiving and mailing letters, loading and unloading envelopes.

The sixth layer: Presentation layer (processing data format, data encryption, etc.), equivalent to the assistant in the company who writes letters for the boss.

Layer 7: Application layer (providing inter-application communication), equivalent to the big boss.

9.2. TCP/IP model

The first layer: network interface layer (corresponding to the physical layer and data link layer in the seven-layer model)

The second layer: network layer

Layer 3: Transport layer

The fourth layer: application layer (corresponding to the session layer, presentation layer and application layer)

9.3ARP

ARP称为地址解析协议,是根据IP地址获取物理地址的一个TCP/IP协议。

主机将包含目标IP地址的ARP请求广播到网络上的所有主机,然后通过接收返回的消息来确定目标的物理地址。

10. What is the command to create a project in Django?

django-admin startproject 项目名称

11. After Django creates the project, what are the components under the project folder (understanding of mvt)?

manage.py: It is the entry point for project operation, specifying the configuration file path.

init.py : It is an empty file. Its function is that this directory can be used as a package, and some initialization operations can also be performed in this file.

settings.py: is the overall configuration file of the project.

urls.py: is the URL configuration file of the project.

wsgi.py: is the project's WSGI-compatible web server. Directory with the same name as the project: contains the project's configuration files, sub-applications, etc.

12. 3. Understanding of MVC and MVT?

answer:

Let’s talk about MVC first:

M: Model, model, and database interaction.

V: View, view, responsible for generating HTML pages.

C: Controller, controller, receives requests, processes them, interacts with M and V, and returns responses.

Insert image description here

We can use a case of user registration to illustrate the relationship between the three, and illustrate it with pictures:

1.用户输入完注册信息之后,点击按钮,将信息提交给网站的服务器。

2.Controller控制器接收用户的注册信息,Controller会告诉Model层将用户的注册信息保存到数据库中。

3.Model层接收到指令之后,将用户的注册信息保存进数据库。

4.数据库返回保存的结果给Model模型。

5.Model层再将保存的结果的返回给Controller控制器。

6.Controller控制器收到保存的结果之后,告诉VIew视图,View视图产生一个html页面。

7.View将产生的html页面的内容交给Controller控制器。

8.Controller控制器将html页面内容返回给浏览器。

9.浏览器接收到服务器Controller返回的html页面之后进行解析展示。

Let’s talk about the MVT model again:

M: Model, model, has the same function as M in MVC, and interacts with the database.

V: View, view, has the same function as C in MVC. It receives requests, processes them, interacts with M and T, and returns responses.

T: Template, template, has the same function as V in MVC, generating html pages.

Insert image description here

We still use the same registration case to give a simple explanation of the MVT model:

1.用户点击注册按钮,将要注册的内容发送给网站的服务器。

2.View视图,接收到用户发来的注册数据,View告诉Model将用户的注册信息保存进数据库。

3.Model层将用户的注册信息保存到数据库中。

4.数据库将保存的结果返回给Model。

5.Model将保存的结果再返回给View视图。

6.View视图告诉Template模板去产生一个html页面。

7.Template生成html内容返回给View视图。

8.View视图将html页面内容返回给浏览器。

9.浏览器拿到view返回的html页面内容进行解析,展示。

13. How do models in Django use ORM to perform table lookup statements (multiple statements) on MySQL?

1. Add:

# 1.save
people = EthanYan(name='小闫同学',age=18)
people.save()
# 2.create
EthanYan.objects.create(name='小闫同学',age=18)

2.Basic query:

# get,查询单一结果,如果不存在抛出`模型类.DoesNotExist异常
EthanYan.objects.get(id=3)
# all,查询多个结果
EthanYan.objects.all()
# count,查询结果数量
EthanYan.objects.count()

3. Filter query:

# filter,过滤出多个结果
# exclude,排除掉符合条件剩下的结果
# get,过滤单一的结果
# 属性名称和比较运算符间使用两个下划线,所以属性名不能包括多个下划线
属性名称__比较运算符=# exact:表示判断
EthanYan.objects.filter(id__exact=1)
# contains:是否包含
EthanYan.objects.filter(name__contains='闫')
# startswith/endswith:以指定值开头或者结尾
EthanYan.objects.filter(name__startswith='小')
EtahnYan.objects.filter(name__endswith='记')
# isnull:是否为NULL
EthanYan.objects.filter(name__isnull=False)
# in:是否包含在范围内
EthanYan.objects.filter(id__in=[1,3,5])
# gt:大于
# gte:大于等于
# lt:小于
# lte:小于等于
EthanYan.objects.filter(id__gt=3)
# 不等于的运算符,使用exclude()过滤器
EthanYan.objects.exclude(id=3)
# 日期查询
# year、month、day、week_day、hour、minute、second:对日期时间类型的属性进行运算。
XiaoYanBiJi.objects.filter(bpub_date__year=1980)
XiaoYanBiJi.objects.filter(bpub_date__gt=date(1980,1,1))
# F对象:用于类属性之间的比较条件
from django.db.models import F
# 查询小闫笔记中阅读量大于等于评论量的文章。
XiaoYanBiJi.objects.filter(bread__gte=F('bcomment'))
# Q对象:用于查询时的逻辑条件
# Q对象可以使用&、|连接,&表示逻辑与,|表示逻辑或。
# Q(属性名__运算符=值)
from django.db.models import Q
# 查询阅读量大于20,或编号小于3的文章,只能使用Q对象实现
XiaoYanBiJi.objects.filter(Q(bread_gt=20)|Q(id__lt=3))
# 查询编号不等于3的文章
XiaoYanBiJi.objects.filter(~Q(pk=3))
# 聚合函数
# Avg平均、Count数量、Max最大,Min最小、Sum求和
# from django.db.models import Sum
XiaoYanBiJi.objects.aggregate(Sum('bread'))
# aggregate的返回值是一个字典类型

{
    
    '属性名__聚合类小写':}
# count一般不使用aggregate()过滤器
XiaoYanBiJi.objects.count()
# 排序
XiaoYanBiJi.objects.all().order_by('bread')
XiaoYanBiJi.objects.all().order_by('-bread')
# 关联查询
# 由一到多的访问语法:
people = EthanYan.objects.get(id=1)
一对应的模型类对象.多对应的模型类名小写_set
prople.note_set.all()
# 由多到一的访问方法
note = XiaoYanBiJi.objects.get(id=1)
# 多对应的模型类对象.关联类属性_id
note.xiaoyanbiji_id
# 由多模型类条件查询一模型类数据
关联模型类名小写__属性名__条件运算符=# 如果没有“__运算符”部分,表示等于
# 查询文章,要求文章的作者为小闫同学
XiaoYanBiJi.objects.filter(ethan__name='小闫同学')
# 查询文章,要求文章中的作者描述包含“闫”
XiaoYanBiJi.objects.filter(ethanyan__hcomment__contains='闫')
# 由一模型类条件查询多模型类数据
一模型类关联属性名__一模型类属性名__条件运算符=# 如果没有"__运算符"部分,表示等于
# 查询文章名为“Django”的所有作者
EthanYan.objects.filter(xiaoyanbiji_btitle='Django')
# 查询文章阅读量大于30的所有作者
EthanYan.objects.filter(xiaoyanbiji_bread__gt=30)
# 修改
# save
ethanyan = EthanYan.objects.get(ename='小闫同学')
ethanyan.save()
# update
EthanYan.objects.filter(ename='小闫同学1').update(ename='小闫同学')
# 删除
ethanyan = EthanYan.objects.get(id=3)
ethanyan.delete()
EthanYan.objects.filter(id=14).delete()

14. redis persistence

As we all know, redis is an in-memory storage database. While efficiency is high, there is also a drawback that cannot be ignored, and that is the issue of data security. Security here refers to data loss and nothing else. We store all the data in memory. If redis restarts due to unexpected circumstances such as downtime, all the data in memory will be lost. What a terrible thing. So how do we solve it? The answer is data persistence.

There are two ways to persist data: RDB and AOF. At first, you will find it difficult to remember the two names. In fact, you will find that these two methods are named after the suffixes of the relevant data storage files.

1.1RDB

First let’s talk about the RDB method. RDB persists all the data in the memory to the hard disk. There is another term for this process, which is "snapshot". By default, Redis saves our data in the dump.rdb file in the current process directory. Look at the suffix name, and then look at the method name. Did you find anything? After understanding it, what you need to focus on is how to implement it.

When redis starts, it will automatically read the RDB snapshot file, which is why RDB can solve the persistence of data.

Method 1: SAVE and BGSAVE commands

This method allows us to perform snapshot operations manually. The difference between them is that the SAVE command snapshot will block subsequent client requests; while the BGSAVE command takes snapshots asynchronously. The difference is obvious. The SAVE command will interrupt the ongoing operation. If there is too much data, it is a loss, so its use must be reduced. If it must be used, it can be selected in a time period with very little access. BGSAVE can accept client requests and take snapshots at the same time without causing any damage to the website.

Method 2: Configure snapshot conditions yourself

# 900的单位是s,1是被更改的键的个数。这句命令的意思就是,900s内被更改的键的个数大于1的时候,自动进行快照操作。
save 900 1
# 指定快照文件的文件名,这个就不要写其他的了
dbfilename dump.rdb
# 指定快照文件存储路径
dir ./

The configuration file is in the file redis.conf.

Method 3: Execute the flushall command

There is a prerequisite for the snapshot operation of this command, that is, we have configured the snapshot settings as in method 2. When we use the flushall command, Redis will clear all data in the database and perform a snapshot operation.

If save is not set, the snapshot operation will not be performed! ! ! !

1.1.1 Snapshot process

When redis executes a snapshot, the fork function is actually used first. There is a characteristic when this function is executed, that is, when we back up data, if there is data written again, the subsequent data will not be backed up. This means that when we execute a snapshot, we actually save all the data at the moment when the fork function is executed. Then the parent process continues to process the client's related requests, and the child process writes the data to be saved to a temporary file on the hard disk. Only after the child process has written all the data, will this file replace the old RDB file. This completes a snapshot operation.

1.2AOF

Let's talk about the AOF method briefly. It does not persist all the data to the hard disk like the RDB method, but only saves the executed commands. The next time you start it, it will actually execute all the commands again. The following is still the most concerning question of how to implement it.

We need to set appendonly to yes in the configuration file. After it is turned on, every time Redis performs a write operation (note that it is a write command), the command will be recorded in the appendonly.aof file. Does this name sound familiar? Does it look like the title of this section?

There is a concept in AOF called file rewriting. When we set the same value multiple times, there will only be the last setting result in redis. Then there is no need to record the previous operations and execute them again, which will seriously affect the efficiency, so AOF heavy-duty is involved. Write. Rewriting is the process of deleting unused commands.

We can let Redis automatically perform the rewriting operation, that is, now set the following settings in the configuration file:

# 目前的AOF文件的大小超过上一次重写时的AOF文件的百分之多少时再次进行重写,如果之前没有重写过,则以启动时AOF文件大小为依据。auto-aof-rewrite-percentage 100# 当AOF文件的大小大于64MB时才进行重写,因为如果AOF文件本来就很小时,有几个无效的命令也是无伤大雅的事情。auto-aof-rewrite-min-size 64mb

Now that everything has been introduced, of course we have to talk about how to synchronize data to the hard drive.

We can set the time to synchronize AOF files to the hard disk to reduce data loss.

# 实时的写入硬盘appendfsync always# 每秒写入一次appendfsync everysec# 不主动写入appendfsync no

1.3 Important

Be sure to add configuration parameters when restarting the service:

redis-server <redis.conf文件的路径>

15、

1.HTTP/HTTPS protocol

The HTTP protocol is the Hypertext Transfer Protocol, which is the basis of web networking and one of the protocols commonly used in mobile phone networking. The HTTP protocol is an application built on the TCP protocol. It belongs to the application layer. The most significant feature of HTTP connections is that each request sent by the client requires the server to send back a response. After the request is completed, the connection will be actively released. The process from establishing a connection to closing the connection is called "a connection". HTTP is a protocol used to transmit HTML text on the Internet and is used for communication between browsers and servers. One thing to note is that it is transmitted in clear text, and the HTTPS protocol can be used for security.

The HTTPS protocol is actually HTTP wrapped with an SSL/TLS shell. This is also its biggest feature, safety. The HTTP protocol is that the application layer directly transmits data to TCP for transmission, while HTTPS changes the application layer to transmit data to SSL/TLS, encrypts the data and then transmits it to TCP.

HTTP的端口是80;HTTPS的端口是443。SSL是一个加密套件,证书,TLS是SSL的升级版。

HTTP request message format:

1. Request line: request method, resource path, HTTP protocol version

GET / HTTP/1.1\r\n

2. Request header: There are many, and they are different. Let me just talk about the format:

头名称:头对应的值\r\n

3. Blank line and request body

HTTP response message format
Status line: protocol version, status code, status description

HTTP/1.1 200 OK\r\n

Response header:

头名称:头对应的值\r\n

Blank lines and response body

2.TCP/IP protocol

TCP, called Transmission Control Protocol, is a connection-oriented, reliable, byte stream-based transport layer communication protocol. The connection is established through a three-way handshake, and the communication is terminated by waving four times when the communication is completed. Its advantage is that when data is transmitted, it has a sending response mechanism, a timeout retransmission mechanism, an error checking mechanism, a blocking management mechanism, etc., which can ensure the correctness and reliability of the data. The disadvantage is that it is slower than UDP and requires more system resources.

The TCP/IP protocol is a transport layer protocol, which mainly solves how data is transmitted from the network. HTTP is an application layer protocol, which mainly solves how to package data. The IP protocol is at the network layer.

2.1 Three-way handshake

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

**Wen Zou Zou's obscure version of the answer: **The client sets the flag SYN to 1, randomly generates a value seq (pretend it is J), and then sends the data packet to the server. After receiving the data packet, the server sets both the SYN and ACK flags to 1, ack=J+1. Randomly generate a value seq (let's pretend it's K), and then send the data packet to the client to confirm the connection request. After the client receives the confirmation, check whether the ack is J+1 and whether the ACK is 1. After confirming that it is correct, it sets the flag ACK to 1, ack=K+1, and then sends the data packet to the server. The server checks whether the values ​​of ack and ACK are K+1 and 1. If they are correct, the connection is successful.

SYN: Indicates connection request ACK: Indicates confirmation FIN: Indicates closing the connection seq: Indicates message sequence number ack: Indicates confirmation sequence number

**Vivid example version:**A: "Hey, brother, are you there?" B: "Are you there? What's going on?" A: "Well, let me tell you something..." Barabala started to chatter...

2.2 Wave four times

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

**Wen Zou Zou's obscure version of the answer: **Both the client and the server can actively close. Let's take the client's active closing of the connection as an example.

First wave: Client sends a FIN to close the data transfer from Client to Server.

The second wave: After receiving the FIN, the Server sends an ACK to the Client, and the confirmation sequence number is the received sequence number + 1.

The third wave: Server sends a FIN to close the data transfer from Server to Client.

The fourth wave: After receiving the FIN, the Client then sends an ACK to the Server, and the confirmation sequence number is the received sequence number + 1.

**Vivid example version:** Because the connection is a two-way communication, both parties must agree to close it. It's like falling in love. Only when both people agree to break up can the separation be complete, otherwise the relationship will remain disconnected.

2.3 Let’s talk about what is 2MSL of tcp?

主动发送 fin 关闭的一方,在 4 次挥手最后一次要等待一段时间我们称这段时间为 2MSL。

2.4 Why does the client have to wait for 2MSL in the TIME-WAIT state?

就好像分手一样,客户端主动关闭的,你得为这件事负责吧?客户端为了确保服务器收到最后一次挥手的报文。如果最后一次丢包了,服务器没有收到第四次挥手的报文,还以为客户端不想分手,就会再重发一次第三次挥手的报文,看看客户端是不是后悔了,不想分手了。这个等待时间就是为了接收超时重传的报文。

假如客户端发完就断开了链接,然后服务器一直等不到回应,重传了报文还是没有得到回应,服务器不死心啊,服务器就关闭不了链接。客户端这时就是典型的渣男角色,分手你别让别人还抱有幻想是不是?说多了,回归正题。

tcp最大的特点就是数据不会丢失啊,客户端渣渣的建立了新连接,然后发现有一个旧的数据包,然后让客户端的新连接也不好了,这就叫自食其果。所以有了这个等待时间,既保证了双方都正常关闭,又保证了所有报文段消失,不会在新连接中出现旧的请求报文段。

3.WSGI

First recall the back-end server, which is actually divided into a web server and a web framework. The web server is responsible for parsing the request message, calling the framework program to process the request, organizing the response message, and returning the content to the client. The web framework is responsible for routing distribution (finding the corresponding processing function according to the URL); business processing is performed in the processing function.

WSGI is actually an interface between web servers and web frameworks defined for the python language. A specification used to describe how web servers communicate with web frameworks.

In the WSGI protocol, the most important thing is an interface function:

def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return [b'<h1>Hello, web!</h1>']

The first parameter is a dictionary type request address, request method, etc. The second parameter is a callback function used to pass the response status result. The return value is the response body.

The implementation process is very clever. Let me briefly recall it:

1. Call the application function in the server.

2. Define a callback function in the server to store the returned response header information. The function has two parameters, one is the status and the other is other information, which is passed in in the form of a dictionary.

3. In the server, pass in the request address name through a dictionary and the callback function name.

4. Define the application function in the framework. After processing the data, call the passed in function and return the data.

5. After receiving the returned information, the server performs splicing processing of the response information.

Through the WSGI interface, the functional separation of the server and the framework can be achieved. Facilitates server migration and maintenance.

4. Knowledge related to multitasking

1. Parallel: A multi-tasking method based on multiple CPUs executing at the same time point.

2. Concurrency: A method of executing multiple tasks based on time slice rotation.

3. Talk about your understanding of multi-process, multi-thread, and coroutine. Is it used in the project?

Answer: A running program is a process, and code that is not running is called a program. A process is the smallest unit of system resource allocation. A process has its own independent memory space. Data among all processes is not shared, which is expensive. Communication between processes uses Queue. Thread is the smallest unit of scheduling execution, also called execution path. It cannot exist independently and depends on the process. A process has at least one thread, called the main thread, and multiple threads share memory (data sharing, shared global variables), thus greatly improving the running efficiency of the program. But cpython is pseudo-multithreaded. Due to the existence of GIL, only one thread will be executed in the python program at the same time, and multi-core CPU cannot be effectively utilized. Coroutines, also known as micro-threads, fibers, also known as user-level threads, complete multiple tasks without opening up threads, that is, complete multiple tasks in a single thread, and multiple tasks are executed alternately in a certain order. Understand that as long as you only see one yield keyword in def, it means that it is a coroutine.

4. The coroutine Gevent implements multi-tasking. In order for the Gevent framework to automatically switch to execute corresponding tasks when it identifies time-consuming operations, monkey patches can be used:

from gevent import monkey
monkey.patch_all()

5. Generators and Iterators

1. Use a custom iterator to output the first 10 items of the Fibonacci sequence.

class Fibonacci(object):

def __init__(self,num):

self.num = num

self.a = 0

self.b = 1

self.current_index = 0



def __next__(self):

if self.current_index < self.num:

result = self.a

self.a,self.b = self.b,self.a+self.b

self.current_index += 1

return result

else:

raise StopIteration



def __iter__(self):

return self



if __name__ == '__main__':

fib = Fibonacci(10)

for value in fib:

print(value,end=' ')

2. Use the generator to output the first 10 terms of the Fibonacci sequence.

def fibonacci(num):

a = 0

b = 1

current_index = 0

while current_index < num:

result =  a

a,b = b,a+b

current_index += 1

yield result



if __name__ == '__main__':

fibonacci = fibonacci(10)

for value in fibonacci:

print(value,end=' ')

6. QQ third-party login development process

Step 1: The browser requests the server to obtain the QQ login URL.

Step 2: The server returns the QQ login URL and parameters to the client.

给客户端返回的数据:


{
 "login_url": "https://graph.qq.com/oauth2.0/show?which=Login&display=pc&response_type=code&client_id=101474184&redirect_uri=http%3A%2F%2Fwww.ethan.site%3A8080%2Foauth_callback.html&state=%2F&scope=get_user_info"
}

Step 3: The client initiates a request to the QQ server based on the QQ login URL returned above.

Step 4: The QQ server returns the QQ authorized login page to the client.

Step 5: The user starts operating on the authorization page and logs in to QQ.

Step 6: After the authorization is successful, the QQ server redirects the browser to access the callback URL, and carries the code and original state parameters after the URL. The parameters here are provided by the QQ server.

parameter illustrate
code Authorization certificate returned by QQ, access_token can be obtained according to the code
status The status value on the client side. Used by third-party applications to prevent CSRF attacks. After successful authorization, the callback will be brought back as it is. Be sure to check the binding of the user and the state parameter state strictly according to the process. Redirect to the page we specify. If the user does not bind, it will jump to the bound page. At this time, there are two parameters in the query string, one is code and the other is status.

Step 7: The client accesses the callback URL and sends the code parameter provided by QQ to the server. Then get the openid of the QQ login user and process it.

parameter illustrate
openid OpenID is the unique identifier corresponding to the user's identity on the website or in the application. The website or application can store this ID so that the user can identify its identity when logging in next time, or link it with the user's original account on the website or in the application. Make the binding.

Step 8: The server requests the QQ server to obtain the access_token based on the code.

parameter illustrate
access_token The user returns when logging in with QQ for the first time, which contains openid, which is used to bind the identity. Note that this is generated by ourselves.

Step 9: The QQ server returns the required access_token to the server. Step 10: The server requests the QQ server to obtain the openid with the access_token.

Step 11: QQ server returns openid to the server.

Step 12: The server then judges whether the user of this website has been bound according to the openid. The backend interface queries the tb_oatu_qq table from the database according to the openid (this table records the binding between openid and User_id)

Step 13: If it has been bound, directly issue the jwt token and return it to the client, and let the client save the token.

Step 14: If it has not been bound, encrypt the openid and return it to the client. Follow these steps:

Step 15: The front end will also make corresponding judgments. If it has been bound, it will directly return to the homepage login URL. If it has not been bound, the binding page will be displayed in the browser, requiring the user to fill in a form for binding.

Step 16: When the user fills out the above form and clicks the save button, the client initiates a request to the server to bind the QQ login user, and the server saves the form information to the database.

Step 17: The server issues the jwt token and returns it to the client.

16. Database optimization

1. Optimize indexes, SQL statements, and analyze slow queries.

2. When designing tables, design the database strictly according to the database design paradigm.

Three major paradigms:

1. Atomicity of table fields (not splittable);

2. On the basis of satisfying the first normal form, there is a primary key dependency;

3. On the basis of satisfying the first and second normal forms, there is no dependency relationship between non-primary attributes.

3. Use cache to put frequently accessed data and data that does not need to change frequently in the cache, which can save disk IO.

4. Optimize hardware; adopt SSD, use disk queue technology, etc.

5. Use MySQL's own table partitioning technology to divide data into different files, which can improve disk reading efficiency.

6. Vertical table partitioning; put some infrequently read data in one table to save disk IO.

7. Master-slave separation of reading and writing; use master-slave replication to separate the reading operations and writing operations of the database;

8. The main principle of the sub-database and table machine (the amount of data is extremely large) is data routing.

9. Select the appropriate table engine and optimize parameters.

10. Carry out architecture-level caching, staticization and distribution.

11.Do not use full-text indexing.

12. Adopt faster storage methods, such as NoSQL to store frequently accessed data.

17. SQL statement

The table we will use later for querying:

mysql> select * from t_score;
+------+--------------+-----------+--------+
| c_id | c_student_id | c_english | c_math |
+------+--------------+-----------+--------+
|    1 |            1 |      60.5 |     99 |
|    2 |            2 |      65.5 |     60 |
|    3 |            3 |      70.5 |     88 |
|    4 |            4 |      60.5 |     77 |
|    5 |            5 |      60.5 |     89 |
|    6 |            6 |        90 |     93 |
|    7 |            7 |        80 |     99 |
|    8 |            8 |        88 |     99 |
|    9 |            9 |        77 |     60 |
|   10 |           10 |        75 |     86 |
|   11 |           11 |        60 |     60 |
|   12 |           12 |        88 |     99 |
|   13 |           13 |        77 |     59 |
|   14 |           14 |      NULL |     59 |
|   15 |           15 |        60 |   NULL |

+------+--------------+-----------+--------+

1.Single table query

1.1Paging query in mysql.

grammar:

select * from 表名 limit (page-1)*count,count;

Page refers to the page number, and count refers to the number of items displayed on each page.

# 每页3条数据,查询第三页的数据,(3-1)*3=6.
mysql> select * from t_score limit 6,3;
+------+--------------+-----------+--------+
| c_id | c_student_id | c_english | c_math |
+------+--------------+-----------+--------+
|    7 |            7 |        80 |     99 |
|    8 |            8 |        88 |     99 |
|    9 |            9 |        77 |     60 |
+------+--------------+-----------+--------+
1.2.Sum:
# 求数学学科的总成绩
mysql> select sum(c_math) from t_score;
+-------------+
| sum(c_math) |
+-------------+
|        1127 |
+-------------+
1.3. Average:
# 求数学学科的平均成绩
mysql> select avg(c_math) from t_score;
+-------------+
| avg(c_math) |
+-------------+
|        80.5 |
+-------------+
1.4. Find the maximum and minimum values:
# 找到数学最高分
mysql> select max(c_math) from t_score;
+-------------+
| max(c_math) |
+-------------+
|          99 |
+-------------+
# 找到数学最低分
mysql> select min(c_math) from t_score;
+-------------+
| min(c_math) |
+-------------+
|          59 |
+-------------+
1.5. Total number of statistical records:
# 统计参加数学考试的人有多少
mysql> select count(*) from t_score;
+----------+
| count(*) |
+----------+
|       15 |
+----------+
1.6.Group:

group_by后面的字段名要和select后面的字段名相同,否则会报错。

# 从成绩表中取出数学成绩进行分组
mysql> select c_math from t_score group by c_math;
+--------+
| c_math |
+--------+
|   NULL |
|     59 |
|     60 |
|     77 |
|     86 |
|     88 |
|     89 |
|     93 |
|     99 |
+--------+
1.7.根据分组结果,使用group_concat()来获取分组中指定字段的集合
# 根据数据成绩进行分组,获取每个分数中学生的编号
mysql> select c_math,group_concat(c_student_id) from t_score group by c_math;
+--------+----------------------------+
| c_math | group_concat(c_student_id) |
+--------+----------------------------+
|   NULL | 15                         |
|     59 | 13,14                      |
|     60 | 2,9,11                     |
|     77 | 4                          |
|     86 | 10                         |
|     88 | 3                          |
|     89 | 5                          |
|     93 | 6                          |
|     99 | 1,7,8,12                   |
+--------+----------------------------+
1.8.分组和聚合函数的使用
# 根据性别进行分组,求出每组同学的最大年龄、最小年龄、年龄总和、平均年龄、人数
mysql> select c_gender,max(c_age),min(c_age),sum(c_age),avg(c_age),count(*) from t_student group by c_gender;
+----------+------------+------------+------------+------------+----------+
| c_gender | max(c_age) | min(c_age) | sum(c_age) | avg(c_age) | count(*) |
+----------+------------+------------+------------+------------+----------+
||         99 |         15 |       1084 |    47.1304 |       26 |
||         88 |         11 |        239 |    39.8333 |        7 |
+----------+------------+------------+------------+------------+----------+
1.9.having条件语句的使用。
# 从学生表中以性别进行分组,然后选出女生分组,并展示小组中所有的名字

mysql> select c_gender,group_concat(c_name) from t_student group by c_gender having c_gender='女';

+----------+-----------------------------------------------------------+

| c_gender | group_concat(c_name)                               

        |

+----------+-----------------------------------------------------------+

|| 小龙女,白骨精,扈三娘,孙二娘,赵敏,嫦娥,|

+----------+-----------------------------------------------------------+

2.多表查询

# 学生表中保存了学生的信息和所在班级的ID,班级表中保存了班级的信息。 查询学生姓名和对应的班级

mysql> select t_student.c_name,t_class.c_name from t_student,t_class where t_student.c_class_id=t_class.c_id;

+-----------+-------------------------------------+

| c_name    | c_name                   
+-----------+-------------------------------------+

| 孙德龙    | 软件工程18级一班                        |

| 塔大      | 软件工程18级二班                        |

| 宋江      | 计算机科学与技术18级一班                 |

| 武松      | 计算机科学与技术18级二班                 |

| 孙二娘    | 网络工程18级一班                        |

| 扈三娘    | 网络工程18级二班                        |

| 鲁智深    | 软件工程18级一班                        |

| 林冲      | 软件工程18级二班                       |

| 阮小七    | 计算机科学与技术18级一班                 |

| 阮小五    | 计算机科学与技术18级二班                 |

| 阮小二    | 网络工程18级一班                       |

| 白骨精    | 网络工程18级二班                       |

| 孙悟空    | 软件工程18级一班                       |

| 猪八戒    | 软件工程18级二班                       |

| 沙和尚    | 计算机科学与技术18级一班                 |

| 唐三奘    | 计算机科学与技术18级二班                 |

| 哪吒      | 网络工程18级一班                       |

| 嫦娥      | 网络工程18级二班                       |

| 杨过      | 软件工程18级一班                       |

| 郭靖      | 软件工程18级二班                       |

| 洪七公    | 计算机科学与技术18级一班                 |

| 欧阳锋    | 计算机科学与技术18级二班                 |

| 黄药师    | 网络工程18级一班                       |

| 小龙女    | 网络工程18级二班                       |

|%       | 软件工程18级一班                      |

| 张无忌    | 软件工程18级二班                       |

| 张翠山    | 计算机科学与技术18级一班                 |

| 张三丰    | 计算机科学与技术18级二班                 |

| 宋青书    | 网络工程18级一班                        |

| 赵敏      | 网络工程18级二班                       |

|| 计算机科学与技术18级一班                 |

| 孙子      | 计算机科学与技术18级一班                 |

|| 网络工程18级一班                       |

+-----------+-------------------------------------+

2.1.内连接查询

语法:

select * from1 inner join2 on1.列 运算符 表2.

连接时必须指定连接条件,用on指定。如果无条件,那么会出现笛卡尔积。

#  查询学生姓名和对应的班级
mysql> select ts.c_name,tc.c_name from t_student as ts inner join t_class tc on ts.c_class_id=tc.c_id;
.....结果同上一个结果.......

上面的as代表的是为表起别名,也可以不写空格隔开。

2.2.左连接查询

语法:

select * from1 left join2 on1.列 运算符 表2.

查询的结果为根据左表中的数据进行连接,如果右表中没有满足条件的记录,则连接空值。

mysql> select ts.c_name,tc.c_name from t_student as ts left join t_class tc on ts.c_class_id=tc.c_id;

+--------------+-------------------------------------+

| c_name       | c_name                          
    |

+--------------+-------------------------------------+

| 孙德龙       | 软件工程18级一班                        |

| 塔大         | 软件工程18级二班                       |

| 宋江         | 计算机科学与技术18级一班                |

| 武松         | 计算机科学与技术18级二班                |

| 孙二娘       | 网络工程18级一班                       |

| 扈三娘       | 网络工程18级二班                       |

| 鲁智深       | 软件工程18级一班                       |

| 林冲         | 软件工程18级二班                      |

| 阮小七       | 计算机科学与技术18级一班                |

| 阮小五       | 计算机科学与技术18级二班                |

| 阮小二       | 网络工程18级一班                       |

| 白骨精       | 网络工程18级二班                       |

| 孙悟空       | 软件工程18级一班                       |

| 猪八戒       | 软件工程18级二班                       |

| 沙和尚       | 计算机科学与技术18级一班                 |

| 唐三奘       | 计算机科学与技术18级二班                 |

| 哪吒         | 网络工程18级一班                       |

| 嫦娥         | 网络工程18级二班                       |

| 杨过         | 软件工程18级一班                       |

| 郭靖         | 软件工程18级二班                       |

| 洪七公       | 计算机科学与技术18级一班                 |

| 欧阳锋       | 计算机科学与技术18级二班                 |

| 黄药师       | 网络工程18级一班                       |

| 小龙女       | 网络工程18级二班                        |

|%          | 软件工程18级一班                       |

| 张无忌       | 软件工程18级二班                        |

| 张翠山       | 计算机科学与技术18级一班                 |

| 张三丰       | 计算机科学与技术18级二班                 |

| 宋青书       | 网络工程18级一班                        |

| 赵敏         | 网络工程18级二班                       |

|| 计算机科学与技术18级一班                 |

| 孙子         | 计算机科学与技术18级一班                 |

|| 网络工程18级一班                       |

| 亦向枫     | NULL                                  |

+--------------+-------------------------------------+
2.3.子查询

语法:

select * from1 where 条件 运算符 (select查询)

子查询是单独可以执行的一条SQL语句,它作为主查询的条件或者数据源嵌套在主查询中。

2.3.1标量子查询(子查询返回的结果是一个数据(一行一列))
# 查询班级中年龄大于平均年龄的学生信息
mysql> select * from t_student where c_age > (select avg(c_age) from t_student);
# 因为数据太多,为了展示效果,我们查询指定的一些字段
mysql> select c_id,c_name,c_gender,c_address from t_student where c_age > (select avg(c_age) from t_student);
+------+-----------+----------+-----------------------------+
| c_id | c_name    | c_gender | c_address                   |
+------+-----------+----------+-----------------------------+
|    7 | 鲁智深    || 北京市西城区西直门          |
|   15 | 沙和尚    || 北京市西城区西直门          |
|   16 | 唐三奘    || 北京市西城区西直门          |
|   18 | 嫦娥      || 北京市昌平霍营              |
|   19 | 杨过      || 北京市西城区西直门          |
|   20 | 郭靖      || 北京市西城区西直门          |
|   21 | 洪七公    || 北京市西城区西直门          |
|   22 | 欧阳锋    || 北京市西城区西直门          |
|   25 |%       || 北京市西城区西直门          |
|   29 | 宋青书    || 北京市西城区西直门          |
|   30 | 赵敏      || 北京市昌平霍营              |
+------+-----------+----------+-----------------------------+
2.3.2列级子查询(子查询返回的结果是一列(一列多行))
# 主查询 where 条件 in (列子查询)

# 查询出所有学生所在班级的班级名称

mysql> select c_name from t_class where c_id in (select c_class_id from t_student);

+-------------------------------------+

| c_name                              
|

+-------------------------------------+

| 软件工程18级一班                    |

| 软件工程18级二班                    |

| 计算机科学与技术18级一班            |

| 计算机科学与技术18级二班            |

| 网络工程18级一班                    |

| 网络工程18级二班                    |

+-------------------------------------+

2.3.3行级子查询(子查询返回的结果是一行(一行多列))

# 主查询 where (字段1,2,...) = (行子查询) 
# 查询班级年龄最大,所在班号最小的学生
mysql> select c_id,c_name,c_gender,c_address from t_student where(c_age,c_class_id) = (select max(c_age),min(c_class_id) from t_student);
+------+-----------+----------+-----------------------------+
| c_id | c_name    | c_gender | c_address                   |
+------+-----------+----------+-----------------------------+
|    7 | 鲁智深     || 北京市西城区西直门             |
|   25 |%       || 北京市西城区西直门             |
+------+-----------+----------+-----------------------------+
2.3.4.自连接查询

18、笔试题

1.python中is和==的区别?

答:is是同一性运算符,是判断两个对象的id地址是否相同,是否指向同一块区域;==是比较操作符,用来判断两个对象的数据类型和值是否相同。

2.Django里QuerySet的get和filter方法的区别?

答:filter返回的是一个对象列表,如果查不到,返回一个空列表。get得到的是一个具体的对象,如果查不到,会报错。

3.列出至少4种HTTP请求返回状态码,并解释其意思。

  • 通过状态码告诉客户端服务器的执行状态,以判断下一步该执行什么操作。常见的状态机器码有:
    • 100-199:表示服务器成功接收部分请求,要求客户端继续提交其余请求才能完成整个处理过程。
    • 200-299:表示服务器成功接收请求并已完成处理过程,常用 200(OK 请求成功)。
    • 300-399:为完成请求,客户需要进一步细化请求。302(所有请求页面已经临时转移到新的 url)。304、307(使用缓存资源)。
    • 400-499:客户端请求有错误,常用 404(服务器无法找到被请求页面),403(服务器拒绝访问, 权限不够)。
    • 500-599:服务器端出现错误,常用 500(请求未完成,服务器遇到不可预知的情况)。

只列出一些特殊的,常见的大家都知道了,此处不做列出。

状态码 解释说明
302 跳转,新的url在响应的location头中给出
303 浏览器对于POST的响应进行重定向
307 浏览器对于POST的响应进行重定向
503 服务器维护或者负载过重未应答

4.多线程和多进程的区别?

进程是资源分配的单位,线程是操作系统调度的单位。进程切换需要的资源最大,效率低;线程切换需要的资源一般,效率一般(不考虑GIL的情况)。多进程和多线程根据CPU核数不一样可能是并行的。线程是基于进程存在的。

5.Flask中请求钩子的理解和应用?

在客户端和服务器交互的过程中,有些准备工作或扫尾工作需要处理的时候,为了让每个视图函数避免编写重复的代码,Flask提供了通用设施的功能,这就是请求钩子。

我们的项目中,在完善CSRFToken逻辑和拦截普通用户进入管理员页面的时候,用到了请求钩子。

请求钩子是通过装饰器的形式实现的,有4种:

1.before_first_request:在处理第一个请求前执行

2.before_request:在每次请求前执行,在该装饰函数中,一旦return,视图函数不再执行

a.接受一个参数:视图函数作出的响应

b.在此函数中可以对响应值,在返回之前做最后一步处理,再返回

3.after_request:如果没有抛出错误,在每次请求后执行

4.teardown_request:在每次请求后执行

a.接受一个参数:用来接收错误信息

但是我们常用的只有2和3两种,在项目中具体的代码展示一下,方便大家进行回忆:

 1 #使用请求钩子拦截所有的请求,通过的在cookie中设置csrf_token
 2    @app.after_request
 3    def after_request(resp):
 4        #调用系统方法,获取csrf_token
 5        csrf_token = generate_csrf()
 6
 7        #将csrf_token设置到cookie中
 8        resp.set_cookie("csrf_token",csrf_token)
 9
10        #返回响应
11        return resp
1# 使用请求钩子,拦截用户的请求,只有访问了admin_blue,所装饰的视图函数需要拦截

2# 1.拦截的是访问了非登录页面

3# 2.拦截的是普通的用户

4@admin_blue.before_request

5def before_request():

6    if not request.url.endswith("/admin/login"):

7        if not session.get("is_admin"):

8            return redirect("/")

19、intern机制

Python3的解释器中实现了小数字和字符串缓存的机制,小数字的缓存范围是[-5 ~ 256],字符串的缓存位数默认是20位。

字符串缓存机制实验:

>>> a = 'xx' * 20
>>> b = 'xx' * 20
>>> a is b
False
>>> a = 'x' * 3
>>> b = 'x' * 3
>>> a is b
True

可以看出字符串长度没有超过20,两个id是一致的,因为小于20,提前缓存好了,我们赋值操作其实是一个引用,两个都指向同一块内存空间。如果长度超过20,没有缓存,会新开辟内存,所以他们的id地址不一样。

小数字的缓存机制实验:

>>> a = -6
>>> b = -6
>>> a is b
False
>>> a = -5
>>> b = -5
>>> a is b
True

可以看出如果是-5的话,两个变量的id是一样的,因为提前缓存好了,他们只是一个引用,指向同一块空间地址。如果是-6的话就相当于重新开辟内存空间。

还有一种情况,就是如果两个字符串中含有除数字、字母下划线的任意一个符号,那么会触发intern机制,他们的内存地址也是不一样的。不论你的字符串多短。

20、gc模块

一.垃圾回收机制

Python中的垃圾回收是以引用计数为主,分代收集为辅。

1、导致引用计数+1的情况

对象被创建,例如a=23
对象被引用,例如b=a
对象被作为参数,传入到一个函数中,例如func(a)
对象作为一个元素,存储在容器中,例如list1=[a,a]

2、导致引用计数-1的情况

对象的别名被显式销毁,例如del a
对象的别名被赋予新的对象,例如a=24
一个对象离开它的作用域,例如f函数执行完毕时,func函数中的局部变量(全局变量不会)
对象所在的容器被销毁,或从容器中删除对象

3、查看一个对象的引用计数

import sys
a = "hello world"
sys.getrefcount(a)

可以查看a对象的引用计数,但是比正常计数大1,因为调用函数的时候传入a,这会让a的引用计数+1

二.循环引用导致内存泄露

内存泄漏

申请了某些内存,但是忘记了释放,那么这就造成了内存的浪费,久而久之内存就不够用了

  1. 让程序产生内存泄漏
import gc



class ClassA():

def __init__(self):

print('object born,id:%s'%str(id(self)))



def f2():

while True:

c1 = ClassA()

c2 = ClassA()

c1.t = c2

c2.t = c1

del c1

del c2



#python默认是开启垃圾回收的,可以通过下面代码来将其关闭

gc.disable()

f2()

执行f2(),进程占用的内存会不断增大。

创建了c1,c2后这两块内存的引用计数都是1,执行 c1.t=c2和 c2.t=c1后,这两块内存的引用计数变成2.
在del c1后,引用计数变为1,由于不是为0,所以c1对象不会被销毁;同理,c2对象的引用数也是1。

python默认是开启垃圾回收功能的,但是由于以上程序已经将其关闭,因此导致垃圾回收器都不会回收它们,所以就会导致内存泄露。

三.垃圾回收

class ClassA():

def __init__(self):

print('object born,id:%s'%str(id(self)))



def f2():

while True:

c1 = ClassA()

c2 = ClassA()

c1.t = c2

c2.t = c1

del c1

del c2

gc.collect()#手动调用垃圾回收功能,这样在自动垃圾回收被关闭的情况下,也会进行回收



#python默认是开启垃圾回收的,可以通过下面代码来将其关闭

gc.disable()



f2()

有三种情况会触发垃圾回收:

1. When the counter of the gc module reaches the threshold, garbage is automatically collected.
2. Call gc.collect() to manually collect garbage.
3. When the program exits, the Python interpreter collects garbage.

4. Automatic garbage collection triggering mechanism of gc module

In Python, the generational collection method is used. Divide the object into three generations. Initially, when the object is created, it is placed in the first generation. If the object survives a garbage check of the first generation, it will be placed in the second generation. Similarly, in a second generation garbage check, If the object survives the garbage check, it will be placed in the third generation.

There will be a counter with a length of 3 in the gc module, which can be obtained through gc.get_count().

For example (488,3,0), where 488 refers to the number of memory allocated by Python minus the number of released memory since the last generation garbage check. Note that it is memory allocation, not an increase in the reference count. For example:

print gc.get_count() # (590, 8, 0)
a = ClassA()
print gc.get_count() # (591, 8, 0)
del a
print gc.get_count() # (590, 8, 0)

3 refers to the number of first-generation garbage checks since the last second-generation garbage inspection. Similarly, 0 refers to the number of second-generation garbage inspections since the last third-generation garbage inspection.

The gc module has an automatic garbage collection threshold, which is a tuple of length 3 obtained through the gc.get_threshold function, such as (700,10,10). Each time the counter increases, the gc module will check the increased Whether the count reaches the threshold number, if so, the garbage check of the corresponding algebra will be performed, and then the counter will be reset.

For example, assume the threshold is (700,10,10):

当计数器从(699,3,0)增加到(700,3,0),gc模块就会执行gc.collect(0),即检查一代对象的垃圾,并重置计数器为(0,4,0)

当计数器从(699,9,0)增加到(700,9,0),gc模块就会执行gc.collect(1),即检查一、二代对象的垃圾,并重置计数器为(0,0,1)

当计数器从(699,9,9)增加到(700,9,9),gc模块就会执行gc.collect(2),即检查一、二、三代对象的垃圾,并重置计数器为(0,0,0)

21Talk about your understanding of load balancing in Nginx.

Answer: Simply speaking, load balancing means allocating tasks to different servers to make business processing more efficient. The load balancing strategy in Nginx includes polling. Of course, this is also the default method, which is to distribute tasks to the back-end servers in order; there is also weight. By setting the weight, some servers with better hardware conditions can process more business; ip_hash It allocates services based on client IP, ensuring that requests from the same client are always sent to the same server. The above are some common load balancing strategies.

22. Measures for database optimization, what optimizations have you done in project development?

Answer: There are many database optimization measures, common ones include optimizing indexes and SQL statements; when designing tables, design the database strictly according to the database design paradigm; use cache to put frequently accessed data that does not need to change frequently into the cache. Save disk IO; optimize hardware, use solid state, etc.; vertical table partitioning, which means putting infrequently read data into one table to save disk IO; separate master-slave, read-write separation; choose the appropriate engine; do not use full-text indexes, etc. measure.

We use foreign keys as little as possible during the project development process, because foreign key constraints will affect the insertion and deletion performance; use cache to reduce access to the database; need to connect to a page in the database multiple times to retrieve the required data at once , Reduce the number of queries to the database. In our query operations, we try to avoid full table scans and the use of cursors, because cursors are very inefficient. We also avoid large transaction operations and improve concurrency capabilities.

23. What are the underlying implementations of the five data types of redis?

There are five data types in redis: string, list, hash, set, and ordered set.

The bottom layer of redis has data structures such as simple strings, linked lists, dictionaries, jump lists, integer sets, and compressed lists. However, instead of using them directly to build key-value pairs, an object system is created based on these data structures. These object systems These are our five data types. Through these five different types of objects, Redis can determine whether an object can execute a given command based on the object type before executing the command, and can set a variety of different data structures for the object for different scenarios, thereby optimizing The efficiency of using objects in different scenarios.

In Redis, the key is always a string object, and the value can be a string, list, set, etc., so the key we usually say is a string key, which means that the value corresponding to this key is a string object. We When a key is said to be a collection key, it means that the value corresponding to this key is a collection object.

The first is a string object whose encoding can be int, raw or embstr. Among them, int encoding is used to store integer values, raw encoding is used to store long strings, and embstr is used to store short strings. The long and short strings are distinguished by 44 bytes. Of course, this is only changed after redis3.2. During the encoding conversion, it is worth noting that the floating-point number type is saved as a string in redis, and then converted into a floating-point number type when needed; the value saved by int encoding is not an integer or the size exceeds the long type (int is an integer that can be represented by long type), it is automatically converted to raw. In redis, embstr can only be used for reading due to defects in memory allocation. Therefore, when modifying the embstr object, it will be converted to raw first and then modified.

The list object encoding can be a compressed list or a doubly linked list. When the number of elements stored in the list is less than 512 and the length of each element is less than 64 bytes, compressed list encoding is used; in all other cases, double-ended linked list encoding is used.

Double-ended linked list: first explain what a linked list is, that is, when storing data, each node stores a pointer to the next node. The storage is very flexible. When adding data arbitrarily, it saves memory overhead. A double-ended linked list is a linked list with references to the preceding node and the following node. The time complexity of obtaining these two nodes is O(1).

Compressed List: A sequential data structure consisting of a series of specially encoded contiguous memory blocks. A compressed list can contain any number of nodes, and each node can store a byte array or an integer value.

Simple understanding: when you go to the cinema to buy tickets to watch a movie, the compressed list requires consecutive seats, and the double-ended linked list only needs seats, regardless of whether they are consecutive or not.

Hash object, the bottom layer is implemented by compressed list and hashtable. The hashtable-encoded hash table object uses a dictionary data structure at the bottom, and each key-value pair in the hash object uses a dictionary key-value pair. Similarly, when the number of elements stored in the list is less than 512 and the length of each element is less than 64 bytes, compressed list encoding is used; in all other cases, hashtable encoding is used.

The encoding of a collection object can be intset or hashtable. The collection object encoded by intset uses an integer collection as the underlying implementation, and all elements contained in the collection object are stored in the integer collection. The hashtable-encoded collection object uses a dictionary as the underlying implementation. Each key of the dictionary is a string object. Each string object here is an element in the collection, and the values ​​​​of the dictionary are all set to null. When all elements in the collection object are integers and the number of all elements does not exceed 512, intset encoding is used. Otherwise use hashtable encoding.

The encoding of an ordered collection can be ziplist or skiplist. The ordered set object encoded by ziplist uses a compressed list as the underlying implementation. Each set element is saved using two compressed list nodes close together. The first node saves the members of the element, and the second node saves the score of the element. . And the set elements in the compressed list are arranged in order from small to large scores, with small ones placed near the head of the table and large ones near the end of the table. The ordered set object encoded by skiplist uses the zset structure as the underlying implementation. A zset structure contains both a dictionary and a skip list. The key of the dictionary saves the value of the element, and the value of the dictionary saves the score of the element; the object attribute of the jump table node saves the members of the element, and the score attribute of the jump table node saves the score of the element. These two data structures share the members and values ​​of the same elements through pointers, so there will be no duplicate members and values, causing a waste of memory. The conditions for using the compressed list are the same as above, except that a skip table is used.

In fact, an ordered set can be implemented using either a dictionary or a jump list data structure alone, but here a combination of the two data structures is used. The reason is that if we use a dictionary alone, although we can find members with a time complexity of O(1) score, but because the dictionary stores collection elements in an unordered manner, it must be sorted every time a range operation is performed; if we use a jump table alone to implement it, although the range operation can be performed, the search operation has The complexity of O(1) becomes O(logN). Therefore, Redis uses two data structures to jointly implement ordered collections.

24. What do you know about the MySQL engine and what have you used?

Answer: There are two mainstream engines, namely InnoDB and MyISAM. InnoDB supports transactions, foreign key constraints, and row locks (for example, the select...for update statement will trigger a row lock, but the index is locked, not the record). MyISAM does not support transactions or foreign keys. It is the default engine of the database. InnoDB saves the number of rows in the table. When looking at how many rows there are in the table, InnoDB scans the entire table, while MyISAM directly reads the number of saved rows. When deleting a table, InnoDB deletes it row by row, while MyISAM rebuilds the table. InnoDB is suitable for applications with frequent modifications and high security requirements, and MyISAM is suitable for query-oriented applications. In our project we use InnoDB.

25. Cache penetration, cache breakdown, cache avalanche?

answer:

Cache penetration means that the data does not exist in the cache or database, but users continue to initiate requests (such as initiating requests with an id of -1 or a very large id that does not exist for the data), which causes excessive pressure on the database. In this case, you need to consider whether you are under attack. The solution is to add verification at the interface layer, verify the ID, and filter illegal requests; if the other party is obsessed with brute force attacks with the same ID, then we can write the key-value as key-null in the cache and set the cache validity time to a short time. a little.

Cache breakdown refers to data that is not in the cache but exists in the database (generally the cache time has expired). At this time, there are so many concurrent users that the cache cannot read them. At the same time, they go to the database to read the data, causing the database pressure to increase instantly. The phenomenon. The solution is that the hot data will never expire; another way is to sacrifice a little user experience to protect the database and add a mutex.

Cache avalanche refers to the large-scale expiration of data in the cache, and the huge amount of query data causes excessive pressure on the database. You may be thinking, isn’t this a cache breakdown? No, cache breakdown is when users query the same data, while cache avalanche is when users query different data. The solution is to set the time of cached data to random to prevent a large amount of data from expiring at the same time; if the cached data is deployed in a distributed manner, then the hotspot data will be distributed among other cache databases, and the hotspot data will be shared evenly; you can also set the hotspot data to be permanently cached. Not expired.

26. In addition to celery, what else is involved in asynchronous tasks? Why choose Celery?

Asynchronous tasks can use the threading module to implement multi-threading and then implement multi-tasking. You can also use the asyncio package to implement asynchronous tasks, which essentially uses coroutines. There are also asynchronous task queue RQ based on redis and so on. However, taking into account comprehensive factors such as performance, functionality, practicality, reduced coupling, and scalability, celery was adopted.

Celery is a producer-consumer model, which has three crucial modules, a task issuer, a middleman, and an executor. The task issuer issues the task and puts it in the middleman's message queue (the redis database is used in the project), and then the executor executes it immediately as soon as it listens to the task.

27. How is middleware used in Django?

1. First you need to define a middleware factory function, and then return a middleware that can be called. The middleware factory function needs to receive a callable get_response object. The returned middleware is also a callable object, and receives a request object parameter like a view and returns a Response object. Here is an example:

def simple_middleware(get_response):    # 此处编写的代码仅在Django第一次配置和初始化的时候执行一次。    
def middleware(request):        # 此处编写的代码会在每个请求处理视图前被调用。        
response = get_response(request)        # 此处编写的代码会在每个请求处理视图之后被调用。        
return response    
return middleware

2. After defining the middleware, you need to add the registration middleware in the settings.py file.

MIDDLEWARE = [    ...    'users.middleware.simple_middleware',  # 添加中间件]

28. After the project goes online, are there any disaster recovery measures for the server?

Data can be backed up off-site; projects can be deployed to servers on different platforms, etc. to prevent data loss.

29. How to improve concurrency performance?

Answer: You can use staticization of dynamic pages; increase cache; vertical table partitioning; master-slave separation of reading and writing of the database; sub-database and sub-table; asynchronous reading; asynchronous programming, etc. Database optimization is actually improving concurrency performance.

30. Use code to implement a simple TCP server?

import socket
# 创建套接字
tcp_server_socket = socket(socket.AF_INET,socket.SOCK_STREAM)
# 本地的信息
address = ('', 8888)
# 绑定地址
tcp_server_socket.bind(address)
# 设置监听
# 使用socket创建的套接字默认是属性是主动的,使用listen将其变为被动的,这样就可以接收到别人的连接了# 最大等待连接数我么设置为128,这是一个经验值,前人趟的坑,我们就不要在进去了
tcp_server_socket.listen(128)
# 如果有新的客户端来连接服务器,那么就产生一个新的套接字专门为这个客户端服务
# client_socket用来为这个客户端服务
# tcp_server_socket就可以省下来专门等待其他新客户端的连接
client_socket,clientAddr = tcp_server_socket.accept()
# 接收对方发送过来的数据
recv_data = client_socket.recv(1024)
print('接收到的数据为:',recv_data.decode('gbk'))
# 发送一些数据到客户端
client_socket.send('welcome to 小闫笔记'.encode('gbk'))
# 关闭为这个客户端服务的套接字,只要关闭了,就意味着不能再为这个客户端服务了,如果还需服务,只能再次进行重新连接。
client_socket.close()

31. How to check the port number of a program in Linux and what command is needed?

netstat -tnulp
------------------------
[ethanyan@localhost ~]$ netstat -tnulp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN
tcp6       0      0 :::111                  :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
tcp6       0      0 ::1:631                 :::*                    LISTEN
tcp6       0      0 ::1:25                  :::*                    LISTEN
tcp6       0      0 :::3306                 :::*                    LISTEN
udp        0      0 0.0.0.0:53814           0.0.0.0:*       
udp        0      0 0.0.0.0:5353            0.0.0.0:*   
udp        0      0 192.168.122.1:53        0.0.0.0:*
udp        0      0 0.0.0.0:67              0.0.0.0:*
udp        0      0 127.0.0.1:323           0.0.0.0:*
udp6       0      0 ::1:323                 :::*

-t: Display tcp port.
-u: Display udp port.
-l: Show only sockets.
-p: Display process identifier and program name.
-n: Do not perform DNS polling, display IP

32. How to implement singleton mode in Python? Please write down at least two implementation methods?

Singleton Pattern is a commonly used software design pattern. The main purpose of this pattern is to ensure that only one instance of a certain class exists. Singleton objects come in handy when you want only one instance of a certain class to appear in the entire system.

For example, the configuration information of a server program is stored in a file, and the client reads the configuration file information through an AppConfig class. If the contents of the configuration file need to be used in many places during the running of the program, that is to say, instances of the AppConfig object need to be created in many places, which will lead to the existence of multiple AppConfig instance objects in the system, and this will seriously waste memory. resources, especially if the configuration file contains a lot of content.

In fact, for a class like AppConfig, we hope that only one instance object will exist during the running of the program.

In Python, we can use a variety of methods to implement the singleton pattern:
1. Use modules; 2. Use new; 3. Use decorators; 4. Use metaclasses.

(1) Using modules:

In fact, Python's module is a natural singleton mode, because when the module is imported for the first time, a .pyc file will be generated. When it is imported for the second time, the .pyc file will be loaded directly without executing the module code again. Therefore, we only need to define the relevant functions and data in a module to obtain a singleton object.

1.# mysingle.py
2.class MySingle:
3.def foo(self):
4.pass
5.
6. sinleton = MySingle() 
7. 
8.将上面的代码保存在文件 mysingle.py 中,然后这样使用:
9.from mysingle import sinleton
10.singleton.foo()

(2) Use new:

In order to make only one instance of a class appear, we can use new to control the creation process of the instance.

class Singleton(object):
	def new (cls):
	# 关键在于这,每一次实例化的时候,我们都只会返回这同一个 instance 对象
	if not hasattr(cls, 'instance'):
		cls.instance = super(Singleton, cls). new (cls)
	return cls.instance
obj1 = Singleton()
obj2 = Singleton() 

obj1.attr1 = 'value1'
print obj1 is obj2

输出结果:
value1 value1

(3) Use decorators:

Decorators can dynamically modify the functionality of a class or function. Here, we can also use decorators to decorate a class so that it can only generate one instance

 def singleton(cls):
 	instances = {
    
    }
	def getinstance(*args,**kwargs):
		 if cls not in instances:
			instances[cls] = cls(*args,**kwargs) 
		return instances[cls]
	return getinstance
@singleton
class MyClass:
	a = 1 
	
c1 = MyClass()
c2 = MyClass()
print(c1 == c2) # True 

在上面,我们定义了一个装饰器 singleton,它返回了一个内部函数 getinstance,
该函数会判断某个类是否在字典 instances 中,如果不存在,则会将 cls 作为 key,cls(*args, **kw) 作为 value 存到instances 中 ,否则,直接返回 instances[cls]

(4) Use metaclass (metaclass):

The metaclass can control the creation process of the class. It mainly does three things:

  • Creation of interception class
  • Modify the class definition
  • Return the modified class
class Singleton2(type):
 	def init (self, *args, **kwargs):
		self. instance = None
		super(Singleton2,self). init (*args, **kwargs)
		
	def call (self, *args, **kwargs):
		if self. instance is None:
			self. instance = super(Singleton2,self). call (*args, **kwargs)
		return self. instance

class Foo(object):
    __metaclass = Singleton2 #在代码执行到这里的时候,元类中的 new 方法和 init 方法其实已经被执行了,而不是在 Foo 实例化的时候执行。且仅会执行一次。

foo1 = Foo()
foo2 = Foo()
print (Foo. dict ) #_Singleton instance': < main .Foo object at 0x100c52f10> 存在一个私有属性来保存属性, 而不会污染 Foo 类(其实还是会污染,只是无法直接通过 instance 属性访问)
print (foo1 is foo2) # True

Advantages and Disadvantages of Singleton Pattern

advantage:

  • Instance Control: The singleton pattern prevents other objects from instantiating their own copies of the singleton object, ensuring that all objects have access to a unique instance.

  • Flexibility: Because the class controls the instantiation process, the class can flexibly change the instantiation process.

shortcoming:

  • Overhead: Although the amount is small, there will still be some overhead if you check to see if an instance of the class exists every time the object requests a reference. This problem can be solved by using static initialization.

  • Possible development confusion: When using singleton objects (especially objects defined in class libraries), developers must remember that they cannot use the new keyword to instantiate the object. Because the library source code may not be accessible, application developers may unexpectedly find themselves unable to instantiate this class directly.

  • Object generation period: cannot solve the problem of deleting a single object. In languages ​​that provide memory management (such as those based on the .NET Framework), only singleton classes can cause dangling references to appear in the singleton class.

33. Implement a simple decorator.

def setFunc(func):    
	def wrapper(*args,**kwargs):        
		print('wrapper context')        
		return func(*arg,**kwargs)    
	return wrapper

34. Restful interface design style?

Answer: Try to deploy the domain name under a dedicated domain name (such as https://api.ethanyan.com). If the API is very simple and will not be further expanded, then you can consider placing it under the main domain name (https://www. ethanyan.com/api/). The API version number should be put in the URL, but it can also be put in the HTTP request header. The resource path is represented by a noun, and its plural form should be used, which generally corresponds to the table name of the database. The request method uses GET to indicate obtaining resources; uses POST to indicate new resources; PUT indicates to update resources; DELETE indicates to delete resources. Use accurate status codes, such as 201 for successfully creating new data; 204 for successfully deleting data; 403 for request errors and being restricted, and other common status codes. To handle errors, for example, if the status code is 4xx, we should return error information and use error as the key name to return the error information as the key value. The return results must also be standardized. For example, a GET request returns a single object or a list of resource objects, POST returns a newly created resource object, PUT returns a complete resource object, and DELETE returns an empty document. Using hypermedia, links to other API methods should be provided in the returned results, so that users can know what to do next without checking the documentation. The returned data format should use JSON as much as possible.

35. List some commonly used default ports?

Answer: The default port of MySQL is 3306, the default port of HTTP is 80, the default port of HTTPS is 443, the default port of Redis is 6379, and the default port of MongoDB is 27017.

36. What is SQL injection, how to prevent it, and how to prevent it in ORM?

Answer: SQL injection is to use normal SQL statements to obtain illegal data. Defensive measures generally include the following points: Verify user input, you can use regular expressions or limit the length; escape special characters such as single quotes and --; do not splice SQL statements dynamically, use parameterized SQL (The following example uses parameterization to solve SQL injection) Query the database; never use a database connection with administrator privileges, use a separate database connection with limited permissions for each application; do not store confidential information directly, but hash it Use salt encryption and other measures to protect sensitive data; application exception information should be as minimal as possible, and it is best to use custom error information to wrap the original error information. (Never trust any input from the user. It is very likely that the other side is a hacker who wants to attack you!!!!) The bottom layer of ORM actually uses parameterized form to execute SQL statements, and the ORM interface is an internal encapsulation mechanism. No interface, theoretically very safe, but everything is not absolute

Example to illustrate SQL injection
When we write query statements, it may involve placeholders for data replacement, and then access data such as user names and passwords entered by users:

select * from user where username = '%s' and password = '%s';

If the user enters some special characters, some serious consequences will occur (drag the library and obtain all the data directly). For example, if the user name is entered as root' or 1 --, and then spliced ​​into the above SQL statement, the following phenomenon will occur:

select * from user where username = 'root' or 1 -- ' and password = '%s';

– represents the meaning of annotation

The above query condition where is username='root' or 1, the subsequent password becomes a comment because of the appearance of -- and will not be executed, so only the username is verified. Regardless of whether username is right or wrong, because it is a logical operator or, the following 1 represents true, and the judgment condition is always true, then the query results are returned directly.

I’m just asking you whether it’s scary or not. Of course, the above is just a simple example to illustrate. You will definitely ask, then can't SQL injection be prevented? The answer is yes. That is parameterization. The question comes again, what is parameterization? When we program in Python database, we store all the data parameters of the SQL statement in a tuple (or list or dictionary) and pass them to the second parameter of the execute function. The following demonstrates the effect:

from  pymysql import *

# 创建数据库连接

db_connect = Connect(host='localhost',port=3306,database='test_db',user='root',password='123123',charset='utf8')

# 获取游标

cur = db_connect.cursor()

# 请输入一个查询的ID

query_id = input('please input ID:')

# 使用参数化来解决SQL注入

# 以字符串形式书写SQL语句,因为SQL语句中也会出现字符串,避免单引号或者双引号的错误,我们直接使用三引号进行书写

sql_str = ''' select * from students where id = %s '''

# 在准备SQL字符串时,不能再直接拼接参数

# 而是将参数做成一个元组,列表,字典,传入到 execute 方法中

# 下面执行SQL语句,并传入元组形式的参数

cur.execute(sql_str, (query_id,))

# 获取所有的数据

result = cur.fetchall()

# 遍历出所有的结果

for t in result:

print(t)

# 关闭游标

cur.close()

# 关闭数据库

db_connect.close()

37. Deploy related knowledge

The most important knowledge about deployment is divided into two parts, one is Nginx and the other is Docker. These two parts are explained below.

1、Nginx

Nginx is a lightweight server based on an asynchronous framework, which supports high concurrency and can efficiently process related businesses. Usually we use it as a web server, cache server and reverse proxy server, of course it can also be used as a mail server. So besides supporting high concurrency, does it have any advantages? The memory consumption is small, the configuration is simple and stable, and it is basically unnecessary to throw it on the remote server. If the company wants to solve more problems with limited resources, then it is the first choice because it is cheap and supports multiple systems. The most important point is that it is good at handling static files, so we often put static files on Nginx to reduce the pressure on the back-end server.

1.1 Related commands

View Nginx status (active or dead):

systemctl status nginx

Start|Stop|Restart|Reload Nginx server:

systemctl start|stop|restart|reload nginx

In fact, the above command is too troublesome to turn on and off, so you can use the following command instead:

nginx                   启动nginx -s stop|reload    停止|重载

Check whether the configuration file meets the syntax requirements (super important. As long as the configuration file changes at work, this is the command to be executed immediately. Restart after checking. Otherwise, an error will occur and it will be difficult to troubleshoot the problem):

nginx -t

View information about the ports occupied by open services under the current system:

netstat -tnulp

Configuration file directory:

/etc/nginx/conf.d

The structure of the file should be understood:

全局配置段
http配置段    
server配置段 # 项目或者应用的网站        
location配置段 # 网站里面的文件url
1.2 Nginx access principle

The browser splits the URL address to obtain related requests. Divided into address, port and path keywords. The purpose of each request is to find the server according to the address, find the application on the server according to the port, and the path keyword is used for location matching.

url:

协议:// 网站地址:端口 (/)路径地址 ? 参数

server configuration section:

server {    listen 端口;    server_name 主机名;    ...}

location configuration:

location optional_modifier location_match {...}

optional_modifier: matching condition; location_match is the matching style; {…}: the operation to be performed.

Another point to add: in the common actions of location, alias is equivalent to uri, and it is enough to directly go to the directory specified by alias to request, which is an absolute path. And root is to go to the uri in the directory specified by root, just go directly to the directory specified by alias to request, it is an absolute path. And root goes to the directory specified by root.Uri can directly go to the directory specified by alias to request, which is an absolute path . And root t is to go to uri/ in the directory specified by root t to find it, which is a relative path.

1.3 Forward proxy & reverse proxy

In terms of security, the forward proxy can protect the identity of the client, while the reverse proxy protects the identity of the server.

When we browsed the wall of China, the VPN we used was a forward proxy. When we crawl data and use high-density IP, it is also a forward proxy. After all, we need to protect the life safety of crawler engineers. So what is a reverse proxy? The Nginx we use is a reverse proxy? Forgot?

1.4Nginx proxy module

The key parameter is proxy_pass, which sets the address after the request is redirected, that is, the address of the proxy server and the mapped URI.

server{    listen 99;    location / {        proxy_pass http://192.168.33.24:888/ethanyan/;    }}

Be sure to note the / at the end of the address in proxy_pass. There was a big difference in the outcome.

1.5 Load balancing (super important)

Above we used proxy_pass to implement Nginx reverse proxy and request the backend. When the number of visits to a website increases, a server becomes a little overwhelmed. So what to do is to use load balancing. The simple understanding is to prepare a few more servers. It is as simple and crude as that. Load balancing uses the upstream module in our Nginx. It defines a set list of backend service addresses, each backend service is specified using a server command. It can also set the host status in load balancing through a series of properties.

down            当前主机故障时,直接进行隔离backup          后备主机,当线上主机故障或者访问量剧增服务器繁忙时,才开启max_fails       允许请求的最大失败数,默认是1,配合下一个参数使用fail_timeout    经历max_fails次失败后,暂停服务的时间,默认为10s。

Finally, let me talk about a common sense, but super important command:

curl [option] [url]如:curl http://www.ethanyan.com # 执行之后请求的页面内容就显示在屏幕上了,作为一个运维人员,或者一个后端服务器工程师,这个命令如果不晓得就说不过去了
1.6Nginx scheduling algorithm

Built-in strategies:

Polling: The default algorithm forwards related requests in sequence in server order. (Whether the rain or dew should be evenly distributed) Weighted polling: Specify the polling weight, the larger the value, the greater the probability of being allocated, which is suitable for the situation where the performance of the back-end server is unbalanced. ip_hash: Distribute requests according to the hash result of the accessed IP, allowing the same IP to access a fixed server, effectively solving the problem of dynamic web page session sharing.

Third-party algorithms:

fair: This allocates requests according to the server's response time. When a certain server requests without delay, the request is handed over to him directly. url_hash: Assign requests according to the hash result of the url, so that the same URL is directed to the same backend server, which can improve the efficiency of the backend cache server.

1.7 Log

Log address: /var/log/nginx

You can also customize the log and configure it in the following file:

/etc/nginx/nginx.conf

2、Docker

When you mention Docker, what words pop into your mind? Virtualization, containers, and more. If this is the case, then you understand its essence. It has no fixed concept. Someone once described it this way: Docker is a technical means to quickly solve production problems. Literary and difficult to understand.

Docker is actually a container that completely encapsulates the development environment. It solves the problem of throwing the pot between the development engineer and the operation and maintenance personnel. What the development looks like, the test looks like. And when deploying projects, how to achieve load balancing and manually configure one station? Don't be exhausted, just move the Docker over there, download, open and run successfully, and it will be done immediately.

2.1 Related operations

Start docker:

systemctl start docker

Check the status of Docker:

systemctl status docker

Docker service command:

docker start | stop | restart docker

Delete command:

你还真看啊,涉及到删除命令,哥劝你用的时候手动查一下,利用查询的时间思考一下

Search for images:

docker search [image_name]

First look in the local warehouse, if you can’t find it, go to the network warehouse to find it

Get the image:

docker pull [image_name]

View the image:

docker images# 查看一个镜像docker images [image_name]

View image history information:

docker history nginx

Check historical commands and the size of the generated files

Rename and label:

docker tag [old_image]:[old_version] [new_image]:[new_version]

Delete image:

docker rmi [IMAGE ID]或:docker rmi [image]:[image_version]

Docker's home directory:

/var/lib/docker

Export the Docker image and back it up:

docker save -o [包文件] [镜像]如:docker save -o nginx.tar ethanyan-nginx

Import image:

docker load < [image.tar_name]docker load --input [image.tar_name]

View running containers:

docker ps

View all running containers:

docker ps -a

Start the container;

docker run <参数,可选> [docker_image] [执行的命令]
# 让Docker容器在后台以守护形式运行docker run -d [docker_image]# 启动已经终止的容器docker start [container_id]

Create and enter the container:

docker run --name [container_name] -it [docker_image] /bin/bash--name    
给容器定义一个名字-i        让容器的标准输入保持打开-t        让docker分配一个伪终端,并绑定在容器的标准输入上

Enter the container:

docker exec -it <容器id> /bin/bash

Close the container:

docker stop [container_id]

Create an image based on a container:

docker commit -m <注释> -a <作者> [container_id] [new_image]:[new_version]

View all container details:

docker inspect [container_name]

Check the container running log:

docker logs [container_id]
2.2 Private warehouse deployment

1. Download the registry image:

docker pull registry

2. Start the container:

docker run -d -p 5000:5000 --restart=always registry

–restart represents when to restart

3. Check the container effect:

curl 127.0.0.1:5000/v2/_catelog

4. Configure container permissions:

vim /etc/docker/daemon.json{"registry-mirrors":["http://74f21445.m.daocloud.io"], "insecureregistries":["192.168.8.14: 5000"]}

The private warehouse IP address is the IP of the host machine, and there must be double quotes on both sides of the IP.

5. Restart the docker service:

systemctl restart dockersystemctl status docker

Effect view:

docker start <container_id>docker tag ubuntu-mini 192.168.33.24:5000/buntu-16.04-minidocker push 192.168.33.24:5000/ubuntu-16.04-minidocker pull 192.168.33.24:5000/ubuntu-16.04-mini

If the version number is not specified when labeling, the default is latest.

38. The underlying principle of Celery

Celery provides a task decorator that adds a delay method to the decorated function (save the original task method name and parameters into the redis list).

In celery's redis message queue, the lpush and brpop operations of redis list type are used. The task issuer adds tasks to the list through lpush. The task executor executes tasks asynchronously in sequence through the brpop operation. Because lpush can be visually understood as pushing elements from left to right, and brpop is taking elements from the right, which ensures that the order of adding tasks will not be messed up. Brpop command:

brpop <list>.... <timeout>

The Redis Brpop command removes and gets the last element of the list. If the list has no elements, it will block the list until the wait times out or a popable element is found.

If no element is popped within the specified time, return a nil and wait time. Otherwise, return a list with two elements, the first element is the key to which the popped element belongs, and the second element is the value of the popped element.

39. Database transactions

Transaction Transaction refers to the operation of a series of SQL statements executed as a basic unit of work, either completely executed or not executed at all. The four characteristics of transactions (ACID): atomicity, consistency, isolation, and durability.

A simple example (three steps packaged into one transaction, if any one fails, all must be rolled back):

1. 检查支票账户的余额高于或者等于200美元。
2. 从支票账户余额中减去200美元。
3. 在储蓄帐户余额中增加200美元。

1.Atomicity

A transaction must be regarded as an indivisible minimum unit of work. All operations in the entire transaction are either submitted successfully or failed and rolled back. For a transaction, it is impossible to perform only part of the operations. This is the purpose of the transaction. atomicity

2. Consistency

The database always transitions from one consistent state to another. (In the previous example, consistency ensures that even if the system crashes while executing the statement, $200 will not be lost in the checking account because the transaction was not ultimately committed, so the changes made in the transaction will not be saved to the database. .)

3.Isolation

Generally speaking, modifications made by one transaction are not visible to other transactions until they are finally committed. (In the previous example, when a transaction was not completed and another account summary program started running, it saw that the checking account balance had not been subtracted by $200.)

4. Durability

Once a transaction is committed, its changes are permanently saved to the database. (Even if the system crashes at this time, the modified data will not be lost.)

39.1. Transaction operations

Open the transaction (execute the modification command after opening the transaction, the changes will be maintained in the local cache, not in the physical table):

begin;或:start transaction;

Commit the transaction (maintain data changes in the cache to the physical table):

commit;

Roll back the transaction (abandoning the changed data in the cache means that the transaction execution failed and should return to the state before starting the transaction):

rollback;

40. MySQL database index

Everyone should already know what a database index is. Everyone should ask why indexing is created. Forget it, let me just say it briefly:

A database index can be understood as a sorted data structure in the database. It exists to assist in quickly querying and updating data in database tables. Optimize query efficiency. (It’s just like nonsense. Who doesn’t know that the index is just like the syllable index and radical search table in front of Xinhua Dictionary...)

So what is the principle of indexing? When is the index created? What are the indexes? Have you thought about this? It's okay if you don't know. I don't know either (whether I will be beaten to death...).

The indexes in MySQL use index data structures such as B+ trees and hash buckets, but the mainstream is still B+ trees. So why is B+ tree suitable for database indexing?

1.B+ tree reduces the number of IO reads and writes.

+The internal nodes of the tree do not have pointers to specific information about the keywords. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.

2. B+ tree query efficiency is stable.

When searching for any keyword, the length of the path taken is the same, which means that the efficiency of searching each data is the same.

3. B+ tree only needs to traverse leaf nodes (that is, nodes with no child nodes at the bottom) to achieve the purpose of traversing the entire tree. This also solves the range query problem of the database, and B numbers do not support such operations. .

When should we build indexes, and when should we build less or no indexes?

1. If the table has too few records, do not create an index, because creating an index table will increase the number of query steps and slow down the processing;

2. Tables that are frequently inserted, deleted, or modified should be indexed as little as possible, because the maintenance of index tables will also reduce performance;

3. For those fields where the data is repeated and evenly distributed, for example, a field has only True and False data, but there are too many records (assuming 1 million rows), such indexing will not improve the query speed;

4. Do not create too many fields in an index, as it will increase the time for data modification, insertion, and deletion.

5. For indexing millions or tens of millions of databases, believe me, it will make a qualitative leap.

6. Do not create indexes for fields that will not appear in the where condition, and do not increase the size of the index table.

40.1 Statement to create index

1.1 ALTER TABLE
1. Create a normal index

alter table <table_name> add index <index_name> (`字段名`);

2. Create multiple indexes

alter table <table_name> add index <index_name> (`column`,`column1`,`column_N`.......);

3. Create a primary key index

alter table <table_name> add primary key (`字段名`);

4. Create a unique index

alter table <table_name> add unique (`字段名`);

5. Create a full-text index

alter table <table_name> add fulltext (`字段名`);

1.2 CREATE INDEX
1. Add ordinary index

create index <index_name> on table_name (`字段名`)

2. Add UNIQUE index

create unique index <index_name> on <table_name> (`字段名`)

The index name in CREATE INDEX must be specified, and only ordinary and UNIQUE indexes can be added, and PRIMARY KEY indexes cannot be added.

40.2. Delete index

drop index <index_name> on <table_name>;
alter table <table_name> drop index <index_name>;
alter table <table_name> drop primary key;

41. Algorithm

1. Five characteristics of algorithms

Inputs: The algorithm has 0 or more inputs.

Output: The algorithm has at least one output.

Finiteness: The algorithm will automatically end after a limited number of steps without looping infinitely, and each step can be completed within an acceptable time.

Deterministic: Each step in the algorithm has a definite meaning, and there will be no ambiguity.

Feasibility: Each step of the algorithm is feasible, which means that each step can be executed a limited number of times.

2. The idea of ​​bubble sorting

The process of bubble sorting is like when we drink soda, those small bubbles pop up from bottom to top little by little, and finally reach the top. This is just a visual analogy, let me illustrate with a practical example. If there is a list in which the numbers are arranged out of order, the result to be achieved through bubbling is to sort the numbers in the list from small to large.

So how to achieve it? We can compare the first and second numbers on the left in the list first, and arrange the smaller ones on the left; then compare the second and third numbers, arrange the smaller ones on the left, and then compare The third and fourth... After the first round of comparing the numbers in the list, the largest number was placed at the end of the list. Then repeat the above steps, but the largest number at the end will no longer participate in the comparison. After one round of comparison, the effect of sorting the numbers in the list from small to large will be achieved. Does this mean that the smallest numbers come out bit by bit from the back to the front?

The optimal time complexity of bubbling is O(n), and the worst time complexity is O(n^2). The space complexity is O(1).

Then use the following code to implement a bubble sort:

def bubble_sort(alist):
for j in range(len(alist)-1,0,-1):
for i in range(j):
if alist[i] > alist[i+1]:
alist[i],alist[i+1] = alist[i+1],alist[i]
print(alist)
alist = [23,13,1,3,5,2,1,7]
bubble_sort(alist)
print(alist)
------结果--------
[1, 1, 2, 3, 5, 7, 13, 23]

Among them, range(len(alist)-1,0,-1) may not be well understood by everyone. Let me give an example below:

for i in range(3,0,-1):
print(i,end=',')

The output result is:

3,2,1,

Combining the above example, we can see that range(3,0,-1) is actually the reverse order of [3,0). Because the interval is closed on the left and open on the right, 0 cannot be obtained, which is equivalent to the range (0,3 ] in reverse order.

3. The idea of ​​quick sorting

The method of quick sort is similar to bubble sort and also belongs to exchange sort. That is, by constantly comparing the sizes of elements and constantly exchanging the positions of elements, the sorting effect is achieved. So why is it called quick sort? Because it's fast! (...looks like I need a beating)

Let's take a brief look at quick sort. Quick sorting is to first find a benchmark (generally speaking, the first element in the list is used as the benchmark), and then use this benchmark to divide the elements in the list into two parts. One part (the elements in this part are each larger than the base) is placed on the right side of the base; the other part (the elements in this part are smaller than the base) is placed on the left side of the base. After the first round of division is completed, the second round will take the left and right parts through the steps of the first round again. Then continue to divide, divide, and divide...until the elements in the list finally become ordered, the sorting ends.

Did you find a problem after seeing the above steps? Isn't this our recursive thought? Yes, we will use recursive method to implement quick sort later.

The optimal time complexity of quick sort is O(nlogn), the worst time complexity is O(n^2), and the space complexity is O(logn). It is unstable.

The following code implements a quick sort:

# ==============快速排序==================

def quick_sort(alist, start, end):

# 递归的退出条件

if start >= end:

return

# 设定起始元素为要寻找位置的基准元素

mid = alist[start]

# low为序列左边的,由左向右移动的游标

low = start

# high为序列右边的,由右向左移动的游标

high = end

while low < high:

# 如果low与high未重合,high指向的元素比基准元素大,则high向左移动

while low < high and alist[high] >= mid:

high -= 1

# 当high指向的元素比基准元素小了

# 将high指向的元素放到low的位置上

alist[low] = alist[high]



# 如果low与high未重合,low指向的元素比基准元素小,则low向右移动

while low < high and alist[low] < mid:

low += 1

# 当low指向的元素比基准元素大了

# 将low指向的元素放到high的位置上

alist[high] = alist[low]



# 退出循环后,low与high重合,此时所指位置为基准元素的正确位置

# 将基准元素放到该位置

alist[low] = mid

# 对基准元素左边的子序列进行快速排序

quick_sort(alist, start, low-1)

# 对基准元素右边的子序列进行快速排序

quick_sort(alist, low+1, end)



alist = [54,26,93,17,77,31,44,55,20]

quick_sort(alist,0,len(alist)-1)

print(alist)

There is no concept of pointers in python, and the cursor in the above code is similar to the effect of pointers.

The method used this time is similar to the "digging method" in C language. Of course, there are many other methods, you can refer to relevant information to learn.

4. Insertion sort

Insertion sort is a simple and intuitive sorting method. I wanted to say nonsense (insertion sort is sorting through insertion), but after thinking about it, I held back. What is the idea of ​​insertion sort? Now listen to me slowly.

There is an unordered list, let us sort the elements in it from small to large. Using insertion sort, the area where the first element from left to right is located is called the ordered area, and the area where the other elements are located is called the unordered area. Then take the elements in the unordered area from left to right, take out an element and put it in the appropriate position in the ordered area (for example, take a 3 in the unordered area, and there are two numbers 1 and 4 in the ordered area) , then we place 3 between 1 and 4). Continuously take from the unordered area and insert into the appropriate position in the ordered area, until finally there are no values ​​in the unordered area, and the list becomes an ordered list.

The optimal time complexity is O(n), and the worst time complexity is O(n^2), which is stable.

Then use code to implement an insertion sort:

def insert_sort(alist):
# 从第二个位置,即下标为1的元素开始向前插入
for i in range(1, len(alist)):
# 从第i个元素开始向前比较,如果小于前一个元素,交换位置
for j in range(i, 0, -1):
if alist[j] < alist[j-1]:
alist[j], alist[j-1] = alist[j-1], alist[j]

5. Selection sort

With the foundation of the above algorithm, it is not that difficult to understand selection sorting.

There is also an unordered list, and we need to sort it from smallest to largest. Using selection sort, we first pick a maximum value from the list, and then swap it with the value at the end of the list. The last element is the maximum value. There is no doubt that it has found its final destination and no longer needs to participate in sorting (the area where it is located is called the ordered area, and the remaining elements are in the unordered area). Pick a maximum value for the remaining elements and place it in the appropriate position in the ordered area... Repeat the above steps until all elements are placed in the ordered area, and an ordered list is obtained to complete our needs. .

The optimal time complexity is O(n 2), and the worst time complexity is O(n 2), which is unstable.

Code:

def selection_sort(alist):
n = len(alist)
# 需要进行n-1次选择操作
for i in range(n-1):
# 记录最小位置
min_index = i
# 从i+1位置到末尾选择出最小数据
for j in range(i+1, n):
if alist[j] < alist[min_index]:
min_index = j
# 如果选择出的数据不在正确位置,进行交换
if min_index != i:
alist[i], alist[min_index] = alist[min_index], alist[i]

6. Hill sorting

Shell Sort is a type of insertion sort. Also known as reducing incremental sorting, it is a more efficient and improved version of the direct insertion sort algorithm. Hill sorting is a non-stable sorting algorithm. This method is named after DL.Shell, which was proposed in 1959. Hill sorting groups records by a certain increment of the subscript, and sorts each group using the direct insertion sorting algorithm; as the increment gradually decreases, each group contains more and more keywords. When the increment decreases to 1, The algorithm terminates when the entire file has just been grouped.

The optimal time complexity varies depending on the step sequence, and the worst time complexity is O(n^2), which is unstable.

After reading this, you must have ten thousand grass-mud horses galloping in your heart, and you must want to say, "Speak humanly!" 』Okay, let’s use an example to illustrate the idea of ​​​​Hill sorting...

The basic idea of ​​Hill sorting is: list the array in a table and perform insertion sort on the columns respectively. Repeat this process, but use longer columns each time (longer step size, fewer columns). . In the end, the entire table has only one column. The purpose of converting the array into a table is to better understand the algorithm. The algorithm itself still uses arrays for sorting.

For example, suppose there is a set of numbers [13,14,94,33,82,25,59,94,65,23,45,27,73,25,39,10]. If we start with a step size of 5 To sort, we can better describe the algorithm by putting these numbers in a table with 5 columns, so that they look like this:

13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10

Then we sort each column:

10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45

When the above four rows of numbers are connected in order, we get: [10,14,73,25,23,13,27,94,33,39,25,59,94,65,82,45]. At this time, 10 has been moved to the correct position, and then sorted in steps of 3:

10 14 73
25 23 13
27 94 33
39 25 59
94 65 82
45

Sort the columns again so that they look like this:

10 14 13
25 23 33
27 25 59
39 65 73
45 94 82
94

Then connect the above 6 rows of numbers together: [10,14,13,25,23,33,27,25,59,39,65,73,45,94,82,94]. Then we sort with a step size of 1, which is a simple insertion sort. The final result is:

[10, 13, 14, 23, 25, 25, 27, 33, 39, 45, 59, 65, 73, 82, 94, 94]

We use code to implement a Hill sort:

def shell_sort(alist):

n = len(alist)

# 初始步长

gap = n // 2

while gap > 0:

# 按步长进行插入排序

for i in range(gap, n):

j = i

# 插入排序

while j>=gap and alist[j-gap] > alist[j]:

alist[j-gap], alist[j] = alist[j], alist[j-gap]

j -= gap

# 得到新的步长

gap = gap // 2



alist = [ 13,14,94,33,82,25,59,94,65,23,45,27,73,25,39,10 ]

shell_sort(alist)

print(alist)

7. Bucket sorting

When the range of values ​​in the list is too large, or is not an integer, you can use bucket sorting to solve the problem.

The basic idea of ​​bucket sorting is to divide a large interval into n sub-intervals of the same size, called buckets. Distribute n records into various buckets. If more than one record is assigned to the same bucket, sorting within the bucket is required. Finally, the records in each bucket are listed in order to obtain an ordered sequence.

In fact, it is to prepare many "buckets" in advance, and then put the numbers in the unordered list into the buckets according to the corresponding intervals. If the number in the bucket exceeds two, then sort the buckets, and then put all the numbers according to " "Buckets" are arranged in order (if there are multiple data in a bucket, they are taken out in the order in which they are arranged in the bucket). This is a typical example of exchanging space for time. The number of sorting is reduced, the efficiency is high, and the time is short, but it takes up a lot of space.

8. Merge sort

Merge sort adopts the idea of ​​divide and conquer, first recursively decomposes the unordered list, and then merges the list. Let's first take a look at how to merge two ordered lists:

1. We have two lists:

list1 = [3,5,7,8]
list2 = [1,4,9,10]

2. In order to merge, we need to create another empty list.

list = []

3. Then compare the two ordered lists list1 and list2 from left to right, which one of the two lists has the smaller value, and then copy it into the new list:

# 一开始新列表是空的
['3',5,7,8] ['1',4,9,10] []
# 然后两个指针分别指向第一个元素,进行比较,显然,1比3小,所以把1复制进新列表中:
['3',5,7,8] [1,'4',9,10] [1]
# list2的指针后移,再进行比较,这次是3比较小:
[3,'5',7,8] [1,'4',9,10] [1,3]
# 同理,我们一直比较到两个列表中有某一个先到末尾为止,在我们的例子中,list1先用完。
[3,5,7,8] [1,4,'9',10] [1,3,4,5,7,8]
# 最后把list2中剩余的元素复制进新列表即可。
[1,3,4,5,7,8,9,10]

In order to highlight the comparison process, we put quotation marks around the number during comparison. In fact, it is of type int. Since the premise is that both lists are ordered, the whole process is very fast.

The above is the main idea and implementation of merge sort. Next we fully describe merge sort:

There is a list. We divide it in half and repeat this process. The size of the problem becomes smaller and smaller until it is divided into numbers one by one (a number is equivalent to sorting). Next, it is similar to Based on the above process, merge two by two, and then merge two by two, until finally merged into an ordered list. As the saying goes, the power of the world will be divided after a long period of time. Have an understanding now?

The optimal time complexity is O(nlogn), and the worst time complexity is O(nlogn), which is stable.

Code:

def merge_sort(alist):

if len(alist) <= 1:

return alist

# 二分分解

num = len(alist)//2

left = merge_sort(alist[:num])

right = merge_sort(alist[num:])

# 合并

return merge(left,right)



def merge(left, right):

'''合并操作,将两个有序数组left[]和right[]合并成一个大的有序数组'''

#left与right的下标指针

l, r = 0, 0

result = []

while l<len(left) and r<len(right):

if left[l] < right[r]:

result.append(left[l])

l += 1

else:

result.append(right[r])

r += 1

result += left[l:]

result += right[r:]

return result



alist = [54,26,93,17,77,31,44,55,20]

sorted_alist = merge_sort(alist)

print(sorted_alist)

---------------------------

[17, 20, 26, 31, 44, 54, 55, 77, 93]

9. Counting and sorting

How to sort a list of integers whose numerical value range is not very large? That is counting sort, and its performance will even exceed those O(nlogn) sorting. Amazing, right?

It determines the correct position of the element through the subscript of the list. Suppose there is a list consisting of 20 random integers, and the value of each element ranges from 0 to 10. Using counting sorting, we can create a new list with a length of 11. The initial values ​​of the elements in the new list from 0 to 10 are all 0. Then the unordered list is traversed, each integer is placed in the new list according to its value, and the element with the corresponding subscript in the new list is incremented by 1 (for example, the integer 2, then the element with the subscript 2 in the new list is changed from the original 0 After adding 1, it becomes 1; if there is another integer 2, then the element with subscript 2 changes from the original 1 plus 1 to 2. Operations like this will continue until all the integers in the unordered list have been traversed).

The value of each subscript position in the new list represents the number of times the integer appears in the unordered list (the subscript value is equal to the corresponding integer in the unordered list). Finally, the new list is traversed. Note that the subscript value is output. The number of elements corresponding to the subscript is, and the subscript is output several times. The traversal is completed, and the unordered list becomes ordered.

Not suitable for counting sort:

1. When the gap between the maximum value and the minimum value in the list is too large;

2. When the elements in the list are not integers;

If the size of the original list is N and the difference between the maximum value and the minimum value is M, what are the time complexity and space complexity of counting sort?

Answer: The time complexity is O(N+M), and the space complexity is O(M) if only the size of the statistical list is considered.

42. Data structures and algorithms

1. Linked list

Linked list (Linked list) is a common basic data structure. It is a linear list, but it does not store data continuously like a sequential list, but stores the position information of the next node in each node (data storage unit) ( i.e. address).

Sequence table: Store elements sequentially in a continuous storage area, and the sequential relationship between elements is naturally represented by their storage order.

For example: four friends (B is A’s friend, C is B’s friend, and D is C’s friend. It’s quite embarrassing...) went to the cinema to buy tickets, and there happened to be four seats in a row. They were very happy, and then Naturally, A is next to B, and B is next to C... just sitting there, happily watching "Farewell My Concubine". This is the sequence table.

The next day, they wanted to watch a movie again, but unfortunately, there were only a few seats left. They discussed it and decided to buy it. After all, "Flying Life" was quite good, and then they took the tickets and separated. Sit down. Due to the awkward relationship mentioned above, A has B's contact information in his mobile phone, and B's mobile phone has C's contact information... Although they are not next to each other, they all have the contact information of their friends and know where their friends are sitting. This is the linked list.

Linked lists are further divided into one-way linked lists, doubly linked lists, and one-way circular linked lists. Let’s take a look at what they look like below.

1.1 One-way linked list

One-way linked list, also called singly linked list, is the simplest form of linked list. Each node of it contains two fields, an information field (element field) and a link field. This link points to the next node in the linked list, and the link field of the last node points to a null value. Just like the sweet chestnut we have above, the situation where four awkward friends are sitting separately, if A has B’s mobile phone number, but B’s mobile phone does not have A’s, and B’s mobile phone has C’s mobile phone number, but C There is no B... that is, only starting from A, they can find everyone, this is a one-way linked list.

The table element field elem is used to store specific data.

The link field next is used to store the position of the next node (identification in python). That is the mobile phone that stores their contact information in the example.

The variable p points to the position of the head node (first node) of the linked list. Starting from p, any node in the list can be found.

1.1.1 Operation of one-way linked list

is_empty() Whether the linked list is empty
length() The length of the linked list
travel() Traverse the entire linked list
add(item) Add an element to the head of the linked list
append(item) Add an element to the tail of the linked list
insert(pos, item) Add an element to a specified position
remove(item) Delete Node
search(item) finds whether the node exists

1.1.2 Comparison between linked lists and sequence lists

The advantage of the linked list is that four friends can still watch movies even if they are not seated together, and the use of storage space is relatively flexible. But the disadvantage is also obvious, that is, if they want to contact others, they need to use their mobile phones to save their friends’ mobile phone numbers, and saving mobile phone numbers wastes the phone’s memory (a 16G mobile phone cannot afford to hurt it). The linked list adds the pointer field of the node, which consumes a lot of space. Moreover, if they sit together, they can find whoever they want to find directly. However, if they sit separately, they cannot search randomly and can only search one after another starting from A. The linked list loses the advantage of random reading from the sequential list.

1.2 Doubly linked list

A more complex type of linked list is a "doubly linked list" or "double-sided linked list". Each node has two links: one pointing to the previous node, which points to a null value when this node is the first node; and the other pointing to the next node, which points to a null value when this node is the last node.

When you see the one-way linked list example, you must be thinking: these four idiots, A's mobile phone stores B's mobile phone number, but why doesn't B store A's? Maybe they heard you scolding them. Now B also saves A's mobile phone number, C also saves B's mobile phone number, and D also saves C's mobile phone number.

We are in a state of confusion now, a bit chaotic. Now the four friends are in this situation: A and B are each other's friends, B and C are each other's friends, C and D are each other's friends, and the friends have each other's mobile phone numbers (A, B , C, and D are 4 people, yes...). They didn't buy tickets for connecting seats, so they sat separately. Now their status is a two-way linked list.

1.2.1 Operation of doubly linked list
is_empty() 链表是否为空
length() 链表长度
travel() 遍历链表
add(item) 链表头部添加
append(item) 链表尾部添加
insert(pos, item) 指定位置添加
remove(item) 删除节点
search(item) 查找节点是否存在
1.3 One-way circular linked list

A variant of the singly linked list is a one-way circular linked list. The next field of the last node in the linked list is no longer None, but points to the head node of the linked list.

The same four friends are all "superficial brothers". B deleted A’s mobile phone number (B thought: It’s a waste of the memory of my 16G Huawei mobile phone. Wouldn’t it be great if you could contact me?); C thought the same and deleted B’s mobile phone number; coincidentally, D also C’s mobile phone number was deleted. Returned to the state of a one-way linked list (A has B's contact information, B has C's contact information, C has D's contact information). After watching so many movies, D fell in love with A (a sweet girl), and then D asked for A’s mobile phone number and saved it, hahahaha, B deserves to be single and delete the girl’s mobile phone number ( Laughing in a cute pig-like voice...).

Now their state is a one-way circular linked list.

2. Binary tree

Everyone must be familiar with the binary tree, which is a tree structure in which each node has at most two subtrees. Usually subtrees are called "left subtree" and "right subtree". The following is a binary tree.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

Properties of binary trees Property
1: There are at most 2^(i-1) nodes on the i-th level of the binary tree (i>0)

Property 2: A binary tree with depth k has at most 2^k - 1 nodes (k>0)

Property 3: For any binary tree, if the number of leaf nodes is N0 and the total number of nodes with degree 2 is N2, then N0=N2+1;

Property 4: The depth of a complete binary tree with n nodes must be log2(n+1)

Property 5: For a complete binary tree, if the node is numbered from top to bottom and from left to right, then the number of the node numbered i must be 2i, its left child number must be 2i, and its right child number must be 2i+1; the number of its parents must be i /2 (it is the root when i=1, except)

(1) Complete binary tree - If the height of the binary tree is h, except for the h-th layer, the number of nodes in each layer (1~h-1) reaches the maximum number. The h-th layer has leaf nodes, and the leaves The nodes are arranged in order from left to right, which is a complete binary tree.

(2) Full binary tree - a binary tree in which every node except leaf nodes has left and right subleaves, and the leaf nodes are all at the bottom.

2.1 Calculation of breadth and depth
import queue
class Node:
def __init__(self,value=None,left=None,right=None):
self.value=value
self.left=left
self.right=right
def treeDepth(tree):
if tree==None:
return 0
leftDepth=treeDepth(tree.left)
rightDepth=treeDepth(tree.right)
if leftDepth>rightDepth:
return leftDepth+1
if rightDepth>=leftDepth:
return rightDepth+1
def treeWidth(tree):
curwidth=1
maxwidth=0
q=queue.Queue()
q.put(tree)
while not q.empty():
n=curwidth
for i in range(n):
tmp=q.get()
curwidth-=1
if tmp.left:
q.put(tmp.left)
curwidth+=1
if tmp.right:
q.put(tmp.right)
curwidth+=1
if curwidth>maxwidth:
maxwidth=curwidth
return maxwidth
if __name__=='__main__':
root=Node('D',Node('B',Node('A'),Node('C')),Node('E',right=Node('G',Node('F'))))
depth=treeDepth(root)
width=treeWidth(root)
print('depth:',depth)
print('width:',width)
2.2 Binary tree traversal

Tree traversal is an important operation on trees. The so-called traversal refers to accessing the information of all nodes in the tree, that is, accessing each node in the tree once and only once. We call this access to all nodes traversal. Then the two important traversal modes of trees are depth-first traversal and breadth-first traversal. Depth-first generally uses recursion, and breadth-first generally uses queues. In general, most algorithms that can be implemented using recursion can also be implemented using stacks.

2.2.1 Depth-first traversal

For a binary tree, Depth First Search (Depth First Search) is to traverse the nodes of the tree along the depth of the tree, and search the branches of the tree as deep as possible.

Then there are three important methods for depth traversal. These three methods are often used to access the nodes of the tree. The difference between them lies in the order in which each node is accessed. These three types of traversal are called preorder traversal (preorder), inorder traversal (inorder) and postorder traversal (postorder). Let's give their detailed definitions and then look at their applications with examples.

Preorder traversal: In preorder traversal, we first visit the root node, then recursively use preorder traversal to access the left subtree, and then recursively use preorder traversal to access the right subtree.

Root node->left subtree->right subtree

def preorder(self, root):
"""递归实现先序遍历"""
if root == None:
return
print root.elem
self.preorder(root.lchild)
self.preorder(root.rchild)

In-order traversal: In in-order traversal, we recursively use in-order traversal to access the left subtree, then visit the root node, and finally recursively use in-order traversal to access the right subtree.

Left subtree->root node->right subtree

def inorder(self, root):
"""递归实现中序遍历"""
if root == None:
return
self.inorder(root.lchild)
print root.elem
self.inorder(root.rchild)

Post-order traversal: In post-order traversal, we first recursively use post-order traversal to visit the left and right subtrees, and finally visit the root node.

Left subtree->right subtree->root node

def postorder(self, root):
"""递归实现后续遍历"""
if root == None:
return
self.postorder(root.lchild)
self.postorder(root.rchild)
print root.elem

2.2.2 Breadth-first traversal
starts from the root of the tree and traverses the nodes of the entire tree from top to bottom and from left to right.

def breadth_travel(self):
"""利用队列实现树的层次遍历"""
if root == None:
return
queue = []
queue.append(root)
while queue:
node = queue.pop(0)
print node.elem,
if node.lchild != None:
queue.append(node.lchild)
if node.rchild != None:
queue.append(node.rchild)

3.Stack

Stack (stack), called stack in some places, is a container that can store data elements, access elements, and delete elements. Its characteristic is that it can only be allowed at one end of the container (called the stack top indicator, English: top) Perform operations of adding data (English: push) and outputting data (English: pop). Without the concept of position, it is guaranteed that the element that can be accessed and deleted at any time is the last element stored before, which determines a default access sequence. Just like our list in Redis, the lpush operation puts the elements in, and the last lrange fetches them in reverse order. forget? It's okay, let me tell you a little story.

If you are in Beijing, you must have experienced the pain of squeezing in the subway during morning and evening rush hours, right? That sour taste. Sometimes, after finally arriving at the station, I want to get off the bus, but I am forced to get on the bus again! ! ! ! Let me tell you a secret, when I get on the subway, I always lean on the railing on the side of the door because I'm afraid I won't be able to get off, hahahahaha. I'm off track, this is not about introducing life tips, it's about talking about stacks.

To give a loose example, a subway car only opens one door at each station. This is the stack. Everyone squeezes in, and when getting off the bus, the one who got in last must get off first (regardless of whether you arrive at the station or not).

3.1 Stack operation

Stack() creates a new empty stack
push(item) adds a new element item to the top of the stack
pop() pops the top element of the stack
peek() returns the top element of the stack
is_empty() determines whether the stack is empty
size() returns the stack Number of elements

43、@classmethod和@staticmethod?

@classmethod is used to identify class methods. For class methods, the first parameter must be a class object, and cls is generally used as the first parameter. The cls parameter is used to call class attributes, class methods, etc. Class methods can be accessed through instance objects and class objects. Class methods can also modify class properties. The method modified by @staticmethod is a static method. This method does not require passing parameters and can be accessed through objects and classes. If you reference a class attribute in a static method, you must reference it through the class.

44. Let’s talk about metaclasses in python

Answer: Generally speaking, we define classes in code and use the defined classes to create instances. When using metaclasses, the steps are slightly different. Define the metaclass, use the metaclass to create a class, and then use the created class to create an instance. The main purpose of metaclasses is to automatically change classes when they are created.

45. With open execution principle

Answer: The with context manager is used. The working mechanism of the context manager uses python methods: __enter__ and __exit__.

A simple understanding of a context manager is an operating environment. During file operations, files need to be opened and closed. When files are read and written, they are in the context of file operations, that is, in the file operation environment.

The __enter__ method will be executed when the statement after with is executed, and is generally used to process the content before the operation. For example, some object creation, initialization, etc. When with open is the operation of opening a file. The __exit__ method will be executed after the code in with is executed. It is generally used to handle some finishing work, such as file closing, database closing, etc. When with open, it is the operation of closing the file.

Customize a context manager to simulate the file opening process:

class MyOpen(object):

def __init__(self,file,mode):

self.__file = file

self.__mode = mode



def __enter__(self):

print('__enter__run... open the file')

self.__handle = open(self.__file,self.__mode)

return self.__handle



def __exit__(self,exc_type,exc_val,exc_tb):

print('__exit__.run... close the file')

self.__handle.close()



with MyOpen('test','w') as f:

f.write('欢迎来到 <亦向枫的博客>')

print('over')

46. ​​What is the difference between 301 and 302 in status code?

Answer: 301 is a permanent redirect and 302 is a temporary redirect.

47. The underlying implementation principle of lists in python

The bottom layer of lists in python uses sequential tables. So what is a sequence table? Sequential table is a way of linear table storage. So what is a linear table? (Why are there so many questions...) A linear table is a collection that stores some elements (not only the elements, but also the order relationship of the elements).

In order to understand the underlying implementation principles, let's first understand the sequence table. Suppose the class teacher is counting the information of the classmates in the class, and after the statistics is completed, a table is posted in front of the classroom. The table contains the number of students the class can accommodate and the number of current classmates. The classroom and this table form a sequence table.

一个顺序表的完整信息包括两部分,一部分是表中的元素集合,另一部分是为实现正确操作而需记录的信息,即有关表的整体情况的信息,这部分信息主要包括元素存储区的容量和当前表中已有的元素个数两项。

After understanding the sequence table, let's take a look at how the sequence table is implemented. There are two ways to implement sequence tables, namely integrated structure and separated structure.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

The one-piece model means that the table in front of the classroom is fixed with glue. It is integrated with the classroom and is in front of the classroom.

In an integrated structure, the units that store table information and the element storage area are arranged in a continuous manner in a storage area, and the two parts of data form a complete sequence table object.

The detached type means that the watch is on the desk of the class teacher, and the classroom is... (whether you love it or not, the classroom is there...). If the classroom is changed, the class teacher can just change the information on the watch. If it is an all-in-one, hahaha, let’s make another table, you can’t go to the classroom of Class 2 every day just to see the class information...

The above involves a replacement of the element storage area. That is, if we want to change the data area, the classroom will be moved from the South Campus to the North Campus. That table is very precious. The class teacher confusedly wrote it on a piece of papyrus during the Qianlong period. There is no other table in the world, but it is stuck to the wall. What should I do? The classroom is moved with the paper. Let's go (it's a bit violent, don't tell me why you don't take off the wall skin... I'm willful, I will move the whole thing, who told me to write the story... hum), this is the replacement of the element storage area of ​​the integrated structure. What about detachable? The classroom has been moved to the North Campus, but the teacher's office is in the South Campus. This table is still on the teacher's desk. The teacher only needs to wave his hand to change Class 1 of the South Campus to Class 1 of the North Campus.

In the integrated structure, since the sequence table information area and the data area are continuously stored together, if you want to replace the data area, you can only move it as a whole, that is, the entire sequence table object (referring to the area where the structural information of the sequence table is stored) is changed.

If you want to change the data area in a separated structure, you only need to update the data area link address in the table information area, and the sequence table object remains unchanged.

The list in python is a dynamic sequence table implemented by separation technology. The reason why you can access and update data at will according to the subscript, the time complexity is O(1), which is very fast, is because the elements in the sequence table are stored in a continuous storage area. In order to add elements arbitrarily, and in the process of adding elements continuously, the id of the table object remains unchanged, so it can only be implemented in a separate way. That is, the head teacher's watch is on the head teacher's desk, unchanged. When there are too many students in the class and you need to change the classroom, you can change it casually, and you don’t need to move the whole house violently.

48. Given a list, use sort to deduplicate it, starting from the last element

ids = [1,4,3,3,4,2,3,4,5,6,1]
ids.reverse()
news_ids = list(set(ids))
news_ids.sort(key=ids.index)
print(news_ids)
----------------
[1, 6, 5, 4, 3, 2]

sort() function
syntax:

list.sort(self,key=None,reverse=False)

Used to sort the original list.

key — specifies an element in the iterable to sort. Like the code above, the index sorting of the list is specified, so the order is the order required by the question.

reverse — True for descending order, False for ascending order.

49. Determine whether a string is a palindrome

Let’s first explain what a palindrome string is, which can be simply understood as a symmetric string. The following are palindrome strings:

'a' is a string composed of one element; abccba is a string with an even number of elements and is symmetrical on both sides; abcdcba is a string with an odd number of elements and the middle element is symmetrical on both sides.

def is_palindrom(s):
"""判断回文数,递归法"""
    if len(s) < 2:
    	return True
    if s[0] == s[-1]:
    	return is_palindrom(s[1:-1])
    else:
    	return False

50. Map and reduce in python?

map() is python's built-in higher-order function. It receives a function f and a list, and by applying function f to each element of the list in turn, it obtains a new list and returns it. (Iterator returned in python3) 2. The parameters received by the reduce() function are similar to map(), a function f and a list, but the behavior is different from map(). The function f passed in by reduce() must receive two Parameters, reduce() repeatedly calls function f on each element of the list and returns the value of the final result. python3 needs to be imported when using reduce, in functools.

51. Polymorphism

Answer: The type at definition time is different from the type at runtime. This is polymorphism. In Python, polymorphism is a weakened type. The focus is on whether the object parameters have specified attributes and methods. If so, it is deemed appropriate, regardless of whether the type of the object is correct.

52 transaction isolation level.

There are four main types of MySQL database transaction isolation levels:

Serializable (serialization), transaction-by-transaction execution.
Repeatable read (repeatable read), no matter whether other transactions modify and commit the data, the data values ​​seen in this transaction are never affected by other transactions.
Read committed (read committed), after other transactions have submitted modifications to the data, this transaction can read the modified data values.
Read uncommitted (read as committed), as long as other transactions modify the data, this transaction can see the modified data value even if it is not submitted.

MySQL database uses repeatable read (Repeatable read) by default.

53. How to use ORM in Django?

The ORM framework is embedded in Django. There is no need to program directly to the database. Instead, model classes are defined, and the addition, deletion, modification, and query operations of the data table are completed through the model classes and objects.

The steps for database development with Django are as follows:

1. Configure database connection information.

2. Define the model class in models.py.

3. Migration.

4. Complete data addition, deletion, modification and query operations through classes and objects.

When defining the model class, we inherit models.Model, this module is in django.db. When we perform query operations, we need to import model classes and complete data addition, deletion, modification and query through classes and objects.

54. Multiple inheritance writing methods and inheritance order issues.

Answer: When using multiple inheritance, our way of writing is:

class Son(Master, Father):    
	pass

New-style classes are mostly used in python, and when multiple inheritance is used, write in the above way.

Multiple inheritance inherits all properties and methods of the parent class. If there are properties and methods with the same name in multiple parent classes, the properties and methods of the first parent class will be used by default. We can check the inheritance order based on mro (a magic method called method resolution order, used to obtain the method inheritance order of the current class).

55. Destructor

Answer: The destructor is when the object ends its life cycle. For example, the function in which the object is located has been called and the program ends. The system automatically executes the destructor. In python, when the reference count of an object is 0, __del__ will be automatically called. __del__ is a destructor.

56. Inheritance, when executing the destructor, should the parent class be executed first, or the subclass first?

Answer: When destructing, the destructor of the subclass will be called first, and then the destructor of the parent class will be called.

When initializing a subclass, the constructor of the parent class is automatically called first, and then the constructor of the subclass is called.

57. When inheriting, will all methods be inherited?

Answer: No, such as constructors and destructors, they cannot be inherited.

58. What is the execution result of range(0,20)[2:-2]?

Range in python3 returns an iterable object, while in python2 it returns a list.

结果:range(0,20)[2:-2]range(2, 18)

59. Write the execution result of the following code

[1,2,3,5,6,7,8][2:-2]

The result is:

[3,5,6]

60. Remove duplicates from a list and sort it in original order

alist = [2,5,6,3,2,6,3,2,8]
new_list = list(set(alist))new_list.sort(key=alist.index)
print(new_list)
>>>[2, 5, 6, 3, 8]

61. Briefly talk about ACID and explain each characteristic.

Answer: ACID is the four major characteristics of transactions. They are atomicity, consistency, isolation and durability respectively. Atomicity means that a transaction must be regarded as an indivisible minimum unit of work. All operations in the entire transaction must either be submitted successfully or all failed and rolled back. Consistency means that the database always transitions from one consistent state to another consistent state. If the transaction is not committed and a step in the middle fails, the modifications made in the transaction will not be saved to the database. middle. Isolation refers to the fact that modifications made by one transaction are not visible to other transactions before they are finally committed. Durability means that once a transaction is committed, the modifications made will be permanently saved to the database.

62. Two mainstream engines of MySQL, and introduce their differences.

Answer: There are two mainstream engines, namely InnoDB and MyISAM. InnoDB supports transactions, foreign key constraints, and row locks (for example, the select...for update statement will trigger a row lock, but the index is locked, not the record). MyISAM does not support transactions or foreign keys. It is the default engine of the database. InnoDB saves the number of rows in the table. When looking at how many rows there are in the table, InnoDB scans the entire table, while MyISAM directly reads the number of saved rows. When deleting a table, InnoDB deletes it row by row, while MyISAM rebuilds the table. InnoDB is suitable for applications with frequent modifications and high security requirements, and MyISAM is suitable for query-oriented applications. In our project we use InnoDB.

63. If a list is passed into a function, will the global list be modified after being modified in the function?

Answer: It will be modified and passed into the function as a parameter. Internal modification is equivalent to modifying the external list.

a = [1,2,3]
def fuc(a):    
	a.append(1)    
	print(a)fuc(a)

The result is:

[1, 2, 3, 1]

64. You must be able to write by hand for tasks such as quick sorting and bubbling! ! ! Not only Tencent, every company asks! ! !

65. Written test questions

1.1 Hash tree

Hash tree is a data structure specially designed to optimize query efficiency. This tree has extremely high query efficiency. In terms of query alone, it is faster than a binary sort tree.

The hash tree is built by the prime number resolution algorithm, so let's first look at the prime number resolution algorithm.

A prime number is a number that is only divisible by 1 and itself, so 2 is the smallest prime number. In a paper, the prime number resolution theorem was described "n different prime numbers can "distinguish" consecutive integer numbers and their products are equal. "Resolution" means that these continuous integers cannot have exactly the same remainder sequence".

It was found that the int type can represent a maximum of 2.1 billion numbers. The number of integers that can be distinguished by 10 consecutive prime numbers is 2 3 5 7 11 13 17 19 23 *29=6464693230. This number exceeds 2.1 billion, which means that any number passes You can definitely find it within 10 times of calculation. That's why it's faster to query.

After reading it, if you are blinded, let me use a small chestnut to talk about this tree.

How many remainders are there when dividing a number by 2? 0 and 1, so level 2 of the hash tree has two forks. The query is to determine which path to take based on the remainder. First divide a number by 2, find the bifurcation based on the remainder, divide the quotient by 3, find the fork based on the remainder, and then divide the quotient by 5. . . . Keep dividing until the corresponding number is found based on the remainder, or it is empty if it cannot be found. This is a hash tree.

1.2 The difference between linked list and sequence list

The storage locations of elements in the sequence table are adjacent and continuous, and it is a data structure that can quickly access any element. The storage space size of a sequence table must be specified before use. Once the memory is allocated, it cannot be easily changed. The advantage is that the query is fast, but the disadvantage is that it is troublesome to insert or delete non-tail data. The sequence table consists of two parts, one is the element collection, and the other is information about the overall situation of the table (mainly including the capacity of the element storage area and the number of elements). The implementation method is integrated and separated, as the name suggests, it is distinguished by the two parts of the sequence table. The list in python is implemented using a separated sequence table.

The construction of the sequence table needs to know the data size in advance to apply for continuous storage space, and when expanding, the data has to be relocated, which is not very flexible to use. In order to make full use of the computer's memory space and realize flexible memory dynamic management, we use linked lists. As a basic data structure, the linked list is different from the sequential list. It is a data structure that describes the relationship between elements through pointers. When it saves elements, the memory space can be discontinuous, and each node only needs to save the pointer of the element and the next element. The advantages are obvious, but the disadvantages are also obvious, that is, to find elements, you must start from scratch.

1.3 Use code to implement a deadlock

If a thread unexpectedly terminates the response without releasing the lock, other threads cannot acquire the lock and wait, which causes a deadlock. Below we use a few simple lines of code to simulate a deadlock situation:

import threading

import time



# 创建互斥锁

lock = threading.Lock()



# 根据下标去取值, 保证同一时刻只能有一个线程去取值

def get_value(index):

# 上锁

lock.acquire()

print(threading.current_thread())

my_list = [3,6,8,1]

# 判断下标释放越界

if index >= len(my_list):

print("下标越界:", index)

return

value = my_list[index]

print(value)

time.sleep(0.2)

# 释放锁

lock.release()



if __name__ == '__main__':

# 模拟大量线程去执行取值操作

for i in range(30):

sub_thread = threading.Thread(target=get_value, args=(i,))

sub_thread.start()

After a deadlock occurs, the program will stop responding and consume resources. In order to avoid this situation, we need to release the lock in the right place.

66. How to find the nearest common ancestor of two leaf nodes in a sorted binary tree

A sorted binary tree has a characteristic that the nodes in the left subtree are smaller than the parent node, and the nodes in the right subtree are larger than the parent node. Grasping this feature, we start from the root node for comparison and search. If the value of the current node is greater than the values ​​of the two nodes, then the lowest common ancestor node must be in the left subtree of the node, and the next step is to traverse the left subtree of the current node. If the value of the current node is smaller than the values ​​of the two nodes, then the lowest common ancestor node must be in the right subtree of the node, and the next step is to traverse the right subtree of the current node. This finds the first node between the values ​​of the two input nodes from top to bottom.

After understanding this, we can use python code to implement it:

class TreeNode(object):
	def __init__(self,item):
		self.item = item
		self.lchild = None
		self.rchild = None
	def getCommonAncestor(root,node1,node2):
		while root:
			if root.item > node1.item and root.item > node2.item:
				root = root.lchild
			elif root.item < node1.item and root.item < node2.item:
				root = root.rchild
			else:
				return root
		return None

67. Find all python files in the current directory of linux.

Answer: In Linux, we use the find command to find files. As we all know, python files end with .py, so we can find them according to the following command.

find ./ -name "*.py"

68. Decorator

Answer: A decorator is to add new functions to a function without changing the original function code. Next we implement a universal decorator.

def decorator(func):
    def wrapper(*args, **kwargs):
        print('wrapper context')
        return func(*args, **kwargs)
    return wrapper

69. Generator

Answer: The generator is a special iterator. It can use the next function and the for loop to obtain values ​​without writing the iter () method and next () method itself, which is more convenient to use. There are two common ways to create a generator. One is to change [] in the list generation expression to ().

my_generator = (i * 2 for i in range(5))

We print and discover that the prompt is a generator object:

<generator object <genexpr> at 0x7f8971f48a40>

Another way is to use the yield keyword. Let's take the famous fibonacci sequence as an example:

def fibonacci(num):
    a = 0
    b = 1
    current_index = 0
    print("--11---")
    while current_index < num:
        result = a
        a, b = b, a + b
        current_index += 1
        print("--22---")
        yield result
        print("--33---")

70. Implement a one-way linked list

First implement a node:

class SingleNode(object):
    """单链表的结点"""
    def __init__(self, item):
        # item 存放数据元素
        self.item = item
        # next 是下一个节点的标识
        self.next = None

Then implement a singly linked list:

class SingleLinkList(object):
    """单链表"""
    
    def __init__(self):
        self.__head = None
        
    def is_empty(self):
        """判断链表是否为空"""
        return self.__head == None
        
    def length(self):
        """链表长度"""
        # cur初始时指向头节点
        cur = self.__head
        count = 0
        # 尾节点指向None,当未到达尾部时
        while cur != None:
            count += 1
            # 将cur后移一个节点
            cur = cur.next
        return count
    
    def travel(self):
        """遍历链表"""
        cur = self.__head
        while cur != None:
            print(cur.item, end=" ")
            cur = cur.next
        print("")

Add elements to the head:

def add(self, item):
    """头部添加元素"""
    # 先创建一个保存item值的节点
    node = SingleNode(item)
    # 将新节点的链接域next指向头节点,即_head指向的位置
    node.next = self.__head
    # 将链表的头_head指向新节点
    self.__head = node

Add elements at the end:

def append(self, item):
    """尾部添加元素"""
    node = SingleNode(item)
    # 先判断链表是否为空,若是空链表,则将_head指向新节点
    if self.is_empty():
        self.__head = node
    # 若不为空,则找到尾部,将尾节点的next指向新节点
    else:
        cur = self.__head
        while cur.next != None:
            cur = cur.next
        cur.next = node

Add an element at a specified position:

def insert(self, pos, item):
    """指定位置添加元素"""
    # 若指定位置pos为第一个元素之前,则执行头部插入
    if pos <= 0:
        self.add(item)
    # 若指定位置超过链表尾部,则执行尾部插入
    elif pos > (self.length()-1):
        self.append(item)
    # 找到指定位置
    else:
        node = SingleNode(item)
        count = 0
        # pre用来指向指定位置pos的前一个位置pos-1,初始从头节点开始移动到指定位置
        pre = self.__head
        while count < (pos-1):
            count += 1
            pre = pre.next
        # 先将新节点node的next指向插入位置的节点
        node.next = pre.next
        # 将插入位置的前一个节点的next指向新节点
        pre.next = node

Delete node:

def remove(self, item):
    """删除节点"""
    cur = self.__head
    pre = None
    while cur != None:
        # 找到了指定元素
        if cur.item == item:
            # 如果第一个就是删除的节点
            if not pre:
                # 将头指针指向头节点的后一个节点
                self.__head = cur.next
            else:
                # 将删除位置前一个节点的next指向删除位置的后一个节点
                pre.next = cur.next
            break
        else:
            # 继续按链表后移节点
            pre = cur
            cur = cur.next

Find if a node exists:

def search(self, item):
    """链表查找节点是否存在,并返回True或者False"""
    cur = self.__head
    while cur != None:
        if cur.item == item:
            return True
        cur = cur.next
    return False

71. Definition and implementation of heap and stack

Answer: The stack is a storage structure that can implement "first in, last out" (or "last in, first out"). The heap is a sorted tree data structure, often used to implement priority queues, etc.

It can be simply understood that a heap is a special complete binary tree. Among them, the nodes are filled from left to right, and the leaves of the last layer are all on the left (that is, if a node has no left son, then it must have no right son); the value of each node is less than (or greater than ) the value of its child node.

72. What is 2msl? Why do this?

2MSL is twice the MSL. The TIME_WAIT state of TCP is also called the 2MSL waiting state. When one end of TCP initiates an active shutdown, after the last ACK packet is sent, that is, after the completion of the third handshake, the ACK packet of the fourth handshake is sent. After that, it enters the TIME_WAIT state. It must stay in this state for twice the MSL time. The main purpose of waiting for 2MSL time is to prevent the other party from not receiving the last ACK packet, so the other party will resend the FIN packet of the third handshake after timeout. After actively closing the terminal to receive the retransmitted FIN packet, it can send another ACK response packet. In the TIME_WAIT state, the ports at both ends cannot be used and cannot be used until the 2MSL time is over. Any late segments while the connection is in the 2MSL waiting phase will be discarded. However, in actual applications, you can set the SO_REUSEADDR option to avoid having to wait for the 2MSL time to end before using this port.

73. Big data file reading:

①Use generator generator;

②The iterator performs iterative traversal: for line in file;

74. The difference between iterators and generators:

answer:

(1) Iterator is a more abstract concept, any object, if its class has next method and iter method return itself. For container objects such as string, list, dict, tuple, etc., it is very convenient to use for loop traversal. The for statement calls the iter() function on the container object in the background, and iter() is a built-in function of python. iter() will return an iterator object that defines the next() method, which accesses the elements in the container one by one in the container, and next() is also a built-in function of python. When there are no subsequent elements, next() will throw a StopIteration exception

(2) Generator is a simple and powerful tool for creating iterators. They are written like regular functions, except that they use the yield statement when they need to return data. Each time next() is called, the generator returns the position it left off from (it remembers the position where the statement was last executed and all the data values)

Difference: Generators can do everything that iterators can do, and because the __iter__() and next() methods are automatically created, generators are particularly concise, and generators are also efficient. Generator expressions are used instead of lists. Parsing can save memory at the same time. In addition to the automatic methods for creating and saving program state, a StopIteration exception is automatically thrown when the generator terminates.

Decorators are a special syntax in Python used to add extra functionality to a function or class without modifying the source code of the decorated object. Decorators can extend, modify, or wrap functions without changing the original function or class.

75. The role and function of decorators:

  1. Extend the functions of a function or class: Through decorators, you can add new functions to the original function or class without changing it, such as logging, performance analysis, exception handling, etc.
  2. Code reuse: By encapsulating common functional logic into decorators, it can be reused in multiple functions or classes. This can avoid code duplication and improve code maintainability.
  3. Modify the behavior of a function or class: Decorators can modify the input, output, execution process and other behaviors of a function or class to meet specific needs, such as parameter verification, cached data, permission checking, etc.
  4. AOP programming: Decorators can separate cross-cutting concerns (such as logging, transaction management) from core business logic, implement aspect-oriented programming (AOP), and improve the readability and maintainability of the code.
  5. Code injection: Some decorators can dynamically inject additional code into the decorated function or class to achieve some specific requirements, such as singleton mode, registered functions, etc.

76. Talk about your understanding of synchronous, asynchronous, blocking and non-blocking:

Synchronization: One thread only listens to one socket, which can be used for UDP.
Asynchronous: Use one of the IO models, such as select, event, and one thread listens to multiple sockets.
Blocking: Send recv of the socket and other operations are completed. It used to block

Non-blocking: The operation does not block, so the return value of the function is no longer accurate

77. Let’s briefly talk about GIL:

Global Interpreter Lock (global interpreter lock)

The execution of Python code is controlled by the Python virtual machine (also called the interpreter main loop, CPython version). From the beginning of its design, Python was considered to have only one thread executing at the same time in the interpreter's main loop, that is, at any time , only one thread is running in the interpreter. Access to the Python virtual machine is controlled by the Global Interpreter Lock (GIL), which ensures that only one thread is running at a time.

In a multi-threaded environment, the Python virtual machine executes as follows:

  1. SetupGIL

  2. Switch to a thread to run

  3. Run: Specify the number of bytecode instructions, or the thread actively gives up control (you can call time.sleep(0))

  4. Set the thread to sleep state

  5. Unlock GIL

  6. Repeat all the above steps again

When calling external code (such as C/C++ extension functions), the GIL will be locked until the end of this function (since no Python bytecode is run during this period, no thread switching will be performed)

78. Let’s briefly talk about the difference between python2 and python3:

1 Py3 uses utf-8 encoding by default, and python2 uses ascill encoding.

2 Removed <>, all replaced with !=

3 Integer division returns a floating point number. To get an integer result, use //

4 Remove the print statement and add the print() function to achieve the same function. The same is true for the exec statement, which has been changed to the exec() function.

5 Changed the behavior of sequential operators, such as x<y, when x and y types do not match, a TypeError is thrown instead of returning a random bool value

6 The input function has changed, raw_input has been deleted and replaced with input

7 Remove tuple parameter unpacking. You cannot define a function like def(a, (b, c)):pass『Pythonnote』

8 Py3.X has removed the long type. Now there is only one integer type - int, but it behaves like the 2.X version of long

9 Added a new bytes type, corresponding to the octet string of version 2.X

10 The next() method of iterator is renamed to __next__(), and the built-in function next() is added to call the __next__() method of iterator. "Pythonnote" Xiao Yan

11 Added two decorators, @abstractmethod and @abstractproperty, to make it more convenient to write abstract methods (properties).

12 All exceptions inherit from BaseException and StardardError is removed

13 Removed the sequence behavior and .message attribute of the exception class

14 Use raise Exception(args) instead of raise Exception, args syntax by Xiao Yan

15 The cPickle module has been removed and can be replaced by the pickle module. Eventually we will have a transparent and efficient module.

16 Removed imageop module

17 Removed audiodev, Bastion, bsddb185, exceptions, linuxaudiodev, md5, MimeWriter, mimify, popen2,

rexec, sets, sha, stringold, strop, sunaudiodev, timing and xmllib modules

18 Removed the bsddb module (released separately, available from)

19 Removed new module

20 xrange() renamed to range()

79. find and grep?:

The grep command is a powerful text search tool. The grep search content string can be a regular expression, allowing pattern search on text files. If a matching pattern is found, grep prints all lines containing the pattern.

find is usually used to search for files that meet the conditions in a specific directory, or it can also be used to search for files owned by a specific user.

80. What should I do if the online service may hang up due to various reasons?

Supervisor, a powerful background process management tool under Linux

After each file modification, execute service supervisord restart in Linux.

81. How to improve the operating efficiency of python:

  1. Use Appropriate Data Structures and Algorithms: Choosing appropriate data structures and algorithms can significantly improve program efficiency. Understand the performance characteristics of different data structures and algorithms and choose the solution that best fits your problem.
  2. Reduce the number of loops and iterations: Avoid unnecessary loops and iterations, and try to use built-in functions and library functions to handle set operations, list derivation, etc. to improve efficiency.
  3. Use generators and iterators: Using generators and iterators can reduce memory usage and improve program efficiency when processing large amounts of data.
  4. Take advantage of concurrency and parallelism: For situations where a large number of independent tasks need to be processed, concurrent and parallel programming techniques such as multi-threading, multi-processing, or asynchronous programming can be used to increase the concurrency and efficiency of the program.
  5. Cache calculation results: Cache the results of repeated calculations to avoid repeated calculations and improve program running efficiency.
  6. Use built-in and library functions: Python has many efficient built-in and library functions, and familiarity with and full use of these functions can improve program efficiency.
  7. Use appropriate data types: Choosing appropriate data types can reduce memory usage and improve computing efficiency. For example, for numerical calculations, you can use the array (ndarray) type provided by the NumPy library.
  8. Avoid excessive object copying: Object copying operations in Python are relatively expensive. Try to avoid unnecessary object copying to improve program efficiency.
  9. Use C extension or JIT compilation: For key parts with higher performance requirements, you can use C extension modules or use JIT (just-in-time compilation) technology, such as Numba, PyPy, etc., to improve running speed.
  10. Write optimized code: Pay attention to writing concise and efficient code to reduce unnecessary operations, function calls and memory allocations.

82. How to use yield in python?

Answer: Yield is simply a generator, so that the function remembers the position in the function body when it last returned. The second (or n) call to the generator jumps to this function.

83. How does python manage memory?

一、垃圾回收:python不像C++,Java等语言一样,他们可以不用事先声明变量类型而直接对变量进行赋值。对Python语言来讲,对象的类型和内存都是在运行时确定的。这也是为什么我们称Pyt

二、引用计数:Python采用了类似Windows内核对象一样的方式来对内存进行管理。每一个对象,都维护这一个对指向该对对象的引用的计数。当变量被绑定在一个对象上的时候,该变量的引用计数就是1,(还有另外一些情况也会导致变量引用计数的增加),系统会自动维护这些标签,并定时扫描,当某标签的引用计数变为0的时候,该对就会被回收。

三、内存池机制Python的内存机制以金字塔型,-1,-2层主要有操作系统进行操作,

Level 0 is the operation of memory allocation and release functions such as malloc and free in C;

Layer 1 and Layer 2 are memory pools, which are implemented by the Python interface function PyMem_Malloc function. When the object is less than 256K, this layer directly allocates memory;

Layer 3 is the top layer, which is our direct operation of Python objects;

In C, if malloc and free are called frequently, performance problems will occur. In addition, frequent allocation and release of small blocks of memory will cause memory fragmentation. The main tasks of Python here are:

If the requested memory allocation is between 1 and 256 bytes, use your own memory management system, otherwise use malloc directly.

Malloc will still be called here to allocate memory, but each time a large block of memory with a size of 256k will be allocated.

The memory registered through the memory pool will eventually be recycled to the memory pool and will not be released by calling C's free for next time use. For simple Python objects, such as values, strings, and tuples (tuples are not allowed to be changed) ) uses a copy method (deep copy?), which means that when another variable B is assigned to variable A, although the memory spaces of A and B are still the same, when the value of A changes, it will be assigned again. A allocates space, and the addresses of A and B are no longer the same.

84. Describe the differences between arrays, linked lists, queues, and stacks?

数组与链表是数据存储方式的概念,数组在连续的空间中存储数据,而链表可以在非连续的空间中存储数据;

队列和堆栈是描述数据存取方式的概念,队列是先进先出,而堆栈是后进先出;队列和堆栈可以用数组来实现,也可以用链表实现。

85. In Django, when a user logs into application server A (enters the login state), and then the next request is proxied by nginx to application server B, what will be the impact?

If the session data logged in by the user on application server A is not shared to application server B, the previous login status will be lost.

86. How to solve the cross-domain request problem in Django (principle)

Enable middleware

post request

Verification code

Add {%csrf_token%} tag to the form

87. Please explain or describe the architecture of Django

The Django framework follows the MVC design and has a proper name: MVT

M is spelled out as Model, which has the same function as M in MVC. It is responsible for data processing and has an embedded ORM framework.

V is spelled out as View, which has the same function as C in MVC. It receives HttpRequest, processes business, and returns HttpResponse.

T is spelled out as Template, which has the same function as V in MVC. It is responsible for encapsulating and constructing the HTML to be returned, and has a template engine embedded in it.

88. How to sort data query results in Django, how to do descending order, how to do query if it is greater than a certain field

Sort using order_by()

Descending order requires adding - before the sort field name.

To query a field greater than a certain value: use filter(field name_gt=value)

89. Tell me about Django, the role of MIDDLEWARES middleware?

Answer: Middleware is a processing process between request and response processing. It is relatively lightweight and globally changes the input and output of Django.

90. What do you know about Django?

Django is taking a big and comprehensive direction. It is most famous for its fully automated management backend: just use the ORM and make simple object definitions, and it can automatically generate a database structure and a full-featured management backend.

Django's built-in ORM is highly coupled with other modules in the framework. The application must use Django's built-in ORM, otherwise it will not be able to enjoy the various ORM-based conveniences provided in the framework; in theory, you can switch out its ORM module, but this is equivalent to dismantling and renovating the decorated house. It's better to go to the blank room to do a new decoration from the beginning. Please pay attention to the public account "Pythonnote" or "Full Stack Technology Selection"

The selling point of Django is its ultra-high development efficiency, but its performance expansion is limited; projects using Django need to be refactored after the traffic reaches a certain scale to meet performance requirements.

Django is suitable for small and medium-sized websites, or as a tool for large websites to quickly implement product prototypes.

The design philosophy of Django templates is to completely separate code and styles; Django fundamentally eliminates the possibility of coding and processing data in templates.

91. How do you implement Django redirection? What status code is used?

Using HttpResponseRedirect

redirect and reverse

Status code: 302,301

92. Forward proxy and reverse proxy of ngnix?

Answer: A forward proxy is a server between the client and the origin server. In order to obtain content from the origin server, the client sends a request to the proxy and specifies the target (origin server), and then the proxy forwards the request to the origin server. Request and return the obtained content to the client. The client must make some special settings to use the forward proxy.

A reverse proxy is just the opposite. It acts like the original server to the client and does not require any special setup on the client side. The client sends a normal request to the content in the reverse proxy's namespace, and then the reverse proxy will determine where to forward the request (the original server) and return the obtained content to the client, as if the content was itself The same.

93. What is the core of Tornado?

The core of Tornado is the two modules ioloop and iostream. The former provides an efficient I/O event loop, and the latter encapsulates a non-blocking socket. By adding network I/O events to ioloop, using non-blocking sockets, and matching the corresponding callback functions, you can achieve the coveted efficient asynchronous execution.

94. Django itself provides runserver, why can't it be used for deployment?

The runserver method is a running method often used when debugging Django. It uses the WSGI Server that comes with Django to run. It is mainly used in testing and development, and the runserver is also opened in a single process.

uWSGI is a Web server that implements the WSGI protocol, uwsgi, http and other protocols. Note that uwsgi is a communication protocol, and uWSGI is a web server that implements uwsgi protocol and WSGI protocol. uWSGI has the advantages of ultra-fast performance, low memory usage, and multi-app management. When paired with Nginx, please pay attention to the public account "Pythonnote" or "Full-stack Technology Selection" to create a production environment that can match user access requests with application apps. Isolate for real deployment. In comparison, it supports a higher amount of concurrency, which facilitates the management of multiple processes, takes advantage of multi-core, and improves performance.

95. What is the difference between GET request and POST?

  • The semantics of GET is to obtain specified resources from the server. This resource can be static text, pages, pictures, videos, etc. The parameter position of the GET request is generally written in the URL, and the URL stipulates that it can only support ASCII, so the parameters of the GET request only allow ASCII characters, and the browser will limit the length of the URL (the HTTP protocol itself does not do anything about the length of the URL. Regulation). For example, when you open my article, the browser will send a GET request to the server, and the server will return all the text and resources of the article.

  • The semantics of POST is to process the specified resource according to the request load (message body), and the specific processing method varies depending on the resource type. The position of the data carried in the POST request is generally written in the message body. The data in the body can be in any format, as long as the client and the server negotiate well, and the browser does not limit the body size. For example, if you type a message at the bottom of my article and click "Submit", the browser will execute a POST request, put your message text into the message body, and then splice the POST request header and send it through the TCP protocol. sent to the server.

  • For GET requests, the requested data will be appended after the URL to separate the URL and transfer data with ?, and connect multiple parameters with &. The URL encoding format adopts ASCII encoding instead of uniclde, which means that all non-ASCII characters must be encoded before transmission.

  • POST request: A POST request will place the requested data in the body of the HTTP request packet. The item=bandsaw above is the actual transmission data.
    Therefore, data from GET requests will be exposed in the address bar, while POST requests will not.

  • In the HTTP specification, there is no limitation on the length of the URL and the size of the transmitted data. But in the actual development process, for GET, specific browsers and servers have restrictions on the length of the URL. Therefore, when using a GET request, the transferred data is limited by the length of the URL.

  • For POST, since it is not a URL value, it will not be restricted in theory. However, in fact, each server will stipulate a limit on the size of POST submitted data. Apache and IIS have their own configurations.

  • POST is more secure than GET. The security here refers to the real security, which is different from the security in the security method mentioned above in GET. The security mentioned above is just not modifying the data of the server. For example, during the login operation, the user name and password will be exposed on the URL through the GET request, because the login page may be cached by the browser and other people view the history of the browser, the user name and password at this time are easy. Got it by someone else. In addition, the data submitted by the GET request may also cause Cross-site request frogery attacks.

  • GET is more efficient than POST.

    POST request process:

    • 1. The browser requests a tcp connection (first handshake)
    • 2. The server agrees to make a tcp connection (second handshake)
    • 3. The browser confirms and sends the post request header (the third handshake, this message is relatively small, so http will send the first data at this time)
    • 4. The server returns a 100 continue response
    • 5. The browser starts sending data
    • 6. The server returns a 200 ok response

    GET request process:

    • 1. The browser requests a tcp connection (first handshake)
    • 2. The server agrees to make a tcp connection (second handshake)
    • 3. The browser confirms and sends the get request header and data (the third handshake, this message is relatively small, so http will send the first data at this time)
    • 4. The server returns a 200 OK response

96. Let’s talk about what is a GIL lock?

1. Definition of GIL

GIL (Global Interpreter Lock) is a mechanism in the CPython interpreter to ensure that only one thread can execute Python bytecode at a time. The GIL is implemented by having a mutex at the interpreter level, which means that at any given point in time, only one thread can execute Python bytecode and manipulate Python objects.

Python's GIL is a special lock. It is not a lock provided by the operating system, but a lock provided by the Python interpreter. When the Python interpreter creates a thread, it automatically creates a GIL associated with it. When multiple threads run simultaneously, only one thread can obtain the GIL and thus execute Python bytecode. Other threads must wait for the GIL to be released before they can execute. While this mechanism ensures thread safety, it also causes performance problems in Python multi-threaded programs.

It is important to note that the GIL only affects interpreter-level threads (also known as "internal threads") and not operating system-level threads (also known as "external threads"). That is to say, when using multiple operating system-level threads in a Python program, these threads can execute in parallel and are not affected by the GIL. However, internal threads created in the same interpreter are restricted by the GIL, and only one thread can run Python code.

It should be noted that although GIL is one of the performance issues of Python's multi-threaded programs, it does not mean that Python cannot use multi-threading. For I/O-intensive tasks, Python's multi-threading model can bring performance improvements. But for CPU-intensive tasks, using multi-threading does not improve performance, but may lead to performance degradation. At this time, you can consider using multi-process or asynchronous programming to improve performance.

2. Mechanism of action of GIL

GIL was introduced to solve the thread safety problem of the CPython interpreter. Since CPython's memory management is not thread-safe, if multiple threads execute Python bytecode at the same time, data races and memory errors may result. To solve this problem, the GIL was introduced and ensured that only one thread could execute Python bytecode at a time, thereby eliminating race conditions.

Specifically, the GIL prevents other threads from executing Python bytecode by acquiring and locking the global interpreter lock before executing Python bytecode. Once a thread acquires the GIL, it will monopolize the interpreter and release the GIL after executing a certain number of bytecodes or time slices, giving other threads the opportunity to acquire the GIL and execute the bytecodes. This process is repeated across multiple threads to achieve multi-threaded execution.

3. The impact of GIL on multi-threaded programming

3.1 CPU-intensive tasks will not receive true parallel acceleration

Since only one thread can execute Python bytecode at a time, multi-threading cannot truly achieve parallel acceleration for CPU-intensive tasks. Even if multiple threads are used, only one thread can execute the bytecode, and the remaining threads are blocked by the GIL and cannot fully utilize the computing power of the multi-core CPU.

import threading

def count_up():
    count = 0
    for _ in range(100000000):
        count += 1

t1 = threading.Thread(target=count_up)
t2 = threading.Thread(target=count_up)

t1.start()
t2.start()

t1.join()
t2.join()

In the above code, t1 and t2 execute the count_up function respectively, which performs 100 million self-increment operations. However, in the CPython interpreter, due to the existence of GIL, only one thread can actually perform the auto-increment operation, so multi-threading cannot speed up the execution time of this task.

3.2 I/O-intensive tasks can gain certain concurrency advantages

For I/O-intensive tasks, since the thread will release the GIL while waiting for the I/O operation to complete, multi-threading can play a certain concurrency advantage. In the process of waiting for I/O, other threads can obtain GIL and execute Python bytecode, thereby improving the execution efficiency of the overall program.

import threading
import requests

def fetch_url(url):
    response = requests.get(url)
    print(response.status_code)

urls = [
    'https://www.example1.com',
    'https://www.example2.com',
    'https://www.example3.com',
]

threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

In the above code, multiple threads initiate HTTP requests concurrently, and the GIL is released when waiting for the request to be completed. Therefore, CPU resources can be fully utilized and multiple network requests can be executed concurrently.

3.3 Data sharing between threads requires attention to synchronization

Due to the existence of GIL, multiple threads need to pay attention to the synchronization mechanism when accessing shared data at the same time to avoid data competition and inconsistency.

import threading

count = 0

def increment():
    global count
    for _ in range(100000):
        count += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start()
t2.start()

t1.join()
t2.join()

print("Final count:", count)

In the above code, multiple threads perform the self-increment operation concurrently, which may cause a race condition due to the shared variable count. Due to the existence of GIL, only one thread can actually perform the increment operation, which may cause the final counting result to be incorrect.

To avoid this race condition, a thread lock (Lock) can be used for synchronization:

import threading

count = 0
lock = threading.Lock()

def increment():
    global count
    for _ in range(100000):
        lock.acquire()
        count += 1
        lock.release()

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start()
t2.start()

t1.join()
t2.join()

print("Final count:", count)

By introducing thread locks, it is ensured that only one thread can access and modify the shared variable count at a time, thereby avoiding race conditions and ultimately obtaining the correct counting result.

4. GIL guidelines

1. The current execution thread must hold GIL
2. When the thread encounters IO, the time slice expires, or encounters a blockage, the GIL will be released (Python 3.x uses a timer--after the execution time reaches the threshold, The current thread releases the GIL, or in Python 2.x, the tickets count reaches 100.)

5. Advantages and disadvantages of GIL

advantage:

Threads are not independent, so threads in the same process share data. When each thread accesses data resources, a "competition" state will occur, that is, the data may be occupied by multiple threads at the same time, causing data confusion. This is thread insecurity. Therefore, a mutex is introduced to ensure that a certain piece of key code and shared data can only be executed completely by one thread from beginning to end.
shortcoming:

In a single process, opening multiple threads cannot achieve parallelism, but can only achieve concurrency, sacrificing execution efficiency. Due to the limitations of GIL locks, multi-threading is not suitable for computing-intensive tasks and is more suitable for IO-intensive tasks. Common IO-intensive tasks: network IO (grabbing web page data), disk operations (reading and writing files), keyboard input

6. To avoid the impact of Python’s GIL lock, you can consider the following methods:

  1. Use multiple processes. Python's multi-process model can avoid GIL restrictions, and multiple processes can execute Python code in parallel. However, communication and data sharing between multiple processes need to be achieved through some additional means, such as pipes, shared memory, sockets, etc.
  2. Use third-party extension modules. Some third-party extension modules, such as NumPy, Pandas, etc., use underlying libraries written in C language when performing computationally intensive tasks, and these libraries are not restricted by the GIL. Therefore, using these extension modules can improve the performance of Python programs.
  3. Use asynchronous programming. Asynchronous programming is a non-blocking programming model that can execute multiple tasks in a single thread, thus avoiding the limitations of the GIL. There are many asynchronous programming frameworks for Python, such as asyncio, Tornado, Twisted, etc.
  4. Use multi-threading + process pool. Multithreading can be used to handle I/O-intensive tasks, and process pools can be used to handle computing-intensive tasks. Assigning multiple threads to different process pools can improve the processing speed of both I/O-intensive and compute-intensive tasks.

It should be noted that when using the above methods, the appropriate method should be selected according to the specific situation. For example, when handling a large number of I/O operations, using multiple processes may cause performance degradation because switching between processes is expensive. At this point, using asynchronous programming may be a better choice. When dealing with computationally intensive tasks, using multiple processes may be a better choice because the computationally intensive tasks can be performed in parallel between processes.

97. Briefly describe the differences and application scenarios of processes, threads, and coroutines?

1. Concept:

1. Process
  • Process: A running program or code is a process, and a code that is not running is called a program. The process is the smallest unit for resource allocation in the system. The process has its own memory space, so data is not shared between processes and the overhead is high.
  • A process is a program with certain independent functions that performs a running activity on a certain data set. A process is an independent unit for resource allocation and scheduling in the system. Each process has its own independent memory space, and different processes communicate through inter-process communication. Since processes are relatively heavy and occupy independent memory, the switching overhead (stack, register, virtual memory, file handle, etc.) between context processes is relatively large, but it is relatively stable and safe.
  • The process has its own independent heap and stack, the heap and stack are not shared, and the process is scheduled by the operating system
2. Thread
  • 线程:调度执行的最小单位,也叫执行路径,不能独立存在,依赖进程的存在而存在,一个进程至少有一个线程,叫主线程,多个线程共享内存(数据共享和全局变量),因此提升程序的运行效率。
  • 线程是进程的一个实体,是CPU调度和分派的基本单位,它是比进程更小的能独立运行的基本单位.线程自己基本上不拥有系统资源,只拥有一点在运行中必不可少的资源(如程序计数器,一组寄存器和栈),但是它可与同属一个进程的其他的线程共享进程所拥有的全部资源。线程间通信主要通过共享内存,上下文切换很快,资源开销较少,但相比进程不够稳定容易丢失数据
  • 线程拥有自己独立的栈,但是堆却是共享的,标准的线程是由操作系统调度的
3、协程
  • 协程:用户态的轻量级线程,调度有用户控制,拥有自己的寄存器上下文和栈,切换基本没有内核切换的开销,切换灵活。
  • 协程是一种用户态的轻量级线程,协程的调度完全由用户控制。协程拥有自己的寄存器上下文和栈。协程调度切换时,将寄存器上下文和栈保存到其他地方,在切回来的时候,恢复先前保存的寄存器上下文和栈,直接操作栈则基本没有内核切换的开销,可以不加锁的访问全局变量,所以上下文的切换非常快。
  • 协程共享堆却不共享栈,协程是由程序员在协程的代码块里显示调度

二、区别:

1、进程多与线程比较

线程是指进程内的一个执行单元,也是进程内的可调度实体。线程与进程的区别:

  1. 地址空间:线程是进程内的一个执行单元,进程内至少有一个线程,它们共享进程的地址空间,而进程有自己独立的地址空间
  2. 资源拥有:进程是资源分配和拥有的单位,同一个进程内的线程共享进程的资源
  3. 线程是处理器调度的基本单位,但进程是资源分配的基本单位
  4. 二者均可并发执行
  5. 每个独立的线程有一个程序运行的入口、顺序执行序列和程序的出口,但是线程不能够独立执行,必须依存在应用程序中,由应用程序提供多个线程执行控制

2、协程多与线程进行比较

  1. 一个线程可以多个协程,一个进程也可以单独拥有多个协程,这样python中则能使用多核CPU。
  2. 线程进程都是同步机制,而协程则是异步
  3. The coroutine can retain the state of the last call. Each time the process re-enters, it is equivalent to entering the state of the last call.

3. The use of processes, threads, and coroutines in python

1. Multiprocessing generally uses the multiprocessing library to take advantage of multi-core CPUs. It is mainly used in CPU-intensive programs. Of course, producers and consumers can also be used. The advantage of multi-process is that the crash of one sub-process will not affect the operation of other sub-processes and the main process. However, the disadvantage is that too many processes cannot be started at one time, which will seriously affect the resource scheduling of the system, especially the CPU usage and load.

2. Multi-threading generally uses the threading library to complete some IO-intensive concurrent operations. The advantages of multi-threading are fast switching and low resource consumption, but if one thread hangs up, all threads will be affected, so it is not stable enough.

3. Coroutines generally use the gevent library. Of course, this library is more troublesome to use, so it is not used much. On the contrary, coroutines are used much more in tornado. Using coroutines allows tornado to be single-threaded and asynchronous. It is said that it can also solve the C10K problem. Therefore, coroutines are most commonly used in web applications.

Note: IO-intensive types generally use multi-threads or multi-processes, CPU-intensive types generally use multi-processes, and those that emphasize non-blocking asynchronous concurrency generally use coroutines. Of course, sometimes it is also necessary to combine multi-process thread pools, or other combinations Way.

98. There is a conflict in the git merge file, how to deal with it?

  1. If there is a conflict in git merge, follow the prompts to find the conflicting file and resolve the conflict. If the file conflicts, there will be a similar mark.
  2. After modification, execute git add to conflict with the file name.
  3. git commit Note: There is no -m option to enter an operation interface similar to vim. Just delete the conflict-related lines and push directly, because the relevant merge operation has just been performed.

99. Python producer consumer model

Introduction to the producer-consumer model

1. Why use the producer-consumer model?

The producer refers to the task of producing data, and the consumer refers to the task of processing data. In concurrent programming, if the producer processing speed is fast and the consumer processing speed is slow, then the producer must wait for the consumer to process it. Completed, you can continue to produce data. In the same way, if the processing power of the consumer is greater than the producer, then the consumer must wait for the producer. In order to solve this problem, the producer and consumer models were introduced.

2. What is the producer and consumer model

The producer-consumer pattern solves the strong coupling problem between producers and consumers through a container. Producers and consumers do not communicate directly with each other, but communicate through blocking queues. Therefore, after the producers produce the data, they do not need to wait for the consumers to process it, but directly throw it to the blocking queue. The consumers do not ask the producers for data, but Take it directly from the blocking queue. The blocking queue is equivalent to a buffer, balancing the processing capabilities of producers and consumers.

The producer-consumer model is a concurrent programming model that is mainly used to solve resource sharing and communication problems in multi-threads or multi-processes. Using the producer-consumer model can bring the following benefits:

  1. Decoupling producers and consumers: Communication between producers and consumers occurs through message queues or buffers, without direct interaction with each other. This allows producers and consumers to evolve and debug independently, improving the flexibility and maintainability of the code.
  2. Concurrency and flow control: Producers and consumers can execute concurrently, improving system throughput and resource utilization. At the same time, flow control can be achieved by controlling the speed of producers and consumers to prevent resource overload.
  3. Buffering and asynchronous processing: By introducing buffers, the speed difference between producers and consumers can be balanced, thereby improving system performance. Producers can put data into buffers, and consumers can take data out of buffers for processing. This mechanism enables asynchronous processing and decoupling.
  4. Distributed computing: The producer-consumer model can be used to build task scheduling, message passing and other mechanisms in distributed systems. Producers and consumers on different nodes can communicate through message queues or shared storage to achieve distributed computing and collaborative processing.

In general, the producer-consumer model provides an elegant and scalable way to handle resource sharing and communication issues, improves the concurrency, responsiveness and scalability of the system, and makes the program design more flexible and maintainable. .

Producer consumer model implementation

1. Use threads and locks to implement the producer-consumer model:
import threading
import time

buffer = []  # 共享的缓冲区
lock = threading.Lock()  # 锁对象

# 生产者函数
def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        print(f'生产了{
      
      item}')
        with lock:
            buffer.append(item)  # 加锁操作,向缓冲区添加商品
        time.sleep(1)

# 消费者函数
def consumer():
    while True:
        with lock:
            if buffer:
                item = buffer.pop(0)  # 加锁操作,从缓冲区取出商品
                print(f'消费了{
      
      item}')
        time.sleep(2)

# 创建生产者和消费者线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
producer_thread.start()
consumer_thread.start()

# 等待线程结束
producer_thread.join()
consumer_thread.join()

In this example, we use a list as a shared buffer and a lock object to ensure that access to the buffer is thread-safe.
In the producer function, we use with lock to lock, and then add items to the buffer. In the consumer function, we also use with lock to lock, and then take out the goods from the buffer for consumption.

By using threads and locks, we implement the functionality of the producer-consumer pattern. The producer thread is responsible for producing goods and putting them into the buffer, and the consumer thread is responsible for taking the goods out of the buffer for consumption, and ensuring thread safety.

2. Use queues to implement the producer-consumer model:
import threading
import queue
import time

buffer = queue.Queue(5)  # 共享的缓冲区

# 生产者函数
def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        print(f'生产了{
      
      item}')
        buffer.put(item)  # 将商品放入缓冲区
        time.sleep(1)

# 消费者函数
def consumer():
    while True:
        item = buffer.get()  # 从缓冲区取出商品
        print(f'消费了{
      
      item}')
        time.sleep(2)

# 创建生产者和消费者线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
producer_thread.start()
consumer_thread.start()

# 等待线程结束
producer_thread.join()
consumer_thread.join()

In this example, we use queue.Queue as the shared buffer.

In the producer function, we use buffer.put(item) to put the item into the buffer. In the consumer function, we use buffer.get() to take out items from the buffer for consumption.

By using queues, we do not need to manually manage buffer sizes and synchronization, but instead take advantage of the thread safety of queues. Producer threads can put items into the queue, and consumer threads can take items out of the queue without cause conflict.

By creating producer and consumer threads and starting them, we implement the functionality of the producer consumer pattern.

3. Use condition variables to implement the producer-consumer model:
import threading
import time

buffer = []  # 共享的缓冲区
buffer_size = 5  # 缓冲区大小

# 条件变量
buffer_not_full = threading.Condition()  # 缓冲区非满条件
buffer_not_empty = threading.Condition()  # 缓冲区非空条件

# 生产者函数
def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        with buffer_not_full:
            while len(buffer) >= buffer_size:
                buffer_not_full.wait()  # 等待缓冲区非满
            buffer.append(item)  # 向缓冲区添加商品
            print(f'生产了{
      
      item}')
            buffer_not_empty.notify()  # 唤醒等待的消费者线程
        time.sleep(1)

# 消费者函数
def consumer():
    while True:
        with buffer_not_empty:
            while len(buffer) == 0:
                buffer_not_empty.wait()  # 等待缓冲区非空
            item = buffer.pop(0)  # 从缓冲区取出商品
            print(f'消费了{
      
      item}')
            buffer_not_full.notify()  # 唤醒等待的生产者线程
        time.sleep(2)

# 创建生产者和消费者线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
producer_thread.start()
consumer_thread.start()

# 等待线程结束
producer_thread.join()
consumer_thread.join()

In this example, we have used two condition variables buffer_not_full and buffer_not_empty to control the execution of producer and consumer threads.

In the producer function, we use buffer_not_full.wait() to wait for the buffer to be full. When the buffer is full, the producer thread will temporarily wait. When the buffer is not full, the producer thread adds items to the buffer and uses buffer_not_empty.notify() to wake up the waiting consumer thread.

In the consumer function, we use buffer_not_empty.wait() to wait for the buffer to be non-empty. When the buffer is empty, the consumer thread will temporarily wait. When the buffer is not empty, the consumer thread takes the item from the buffer and uses buffer_not_full.notify() to wake up the waiting producer thread.

By using condition variables, we can precisely control the execution of producer and consumer threads, and perform wait and wake-up operations at the appropriate time to realize the functions of the producer-consumer pattern.

4. Use semaphores to implement the producer-consumer model:
import threading
import time

buffer = []  # 共享的缓冲区
max_size = 5  # 缓冲区最大容量

producer_sem = threading.Semaphore(max_size)  # 生产者信号量
consumer_sem = threading.Semaphore(0)  # 消费者信号量

# 生产者函数
def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        producer_sem.acquire()  # 获取生产者信号量
        buffer.append(item)  # 向缓冲区添加商品
        print(f'生产了{
      
      item}')
        consumer_sem.release()  # 释放消费者信号量
        time.sleep(1)

# 消费者函数
def consumer():
    while True:
        consumer_sem.acquire()  # 获取消费者信号量
        item = buffer.pop(0)  # 从缓冲区取出商品
        print(f'消费了{
      
      item}')
        producer_sem.release()  # 释放生产者信号量
        time.sleep(2)

# 创建生产者和消费者线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
producer_thread.start()
consumer_thread.start()

# 等待线程结束
producer_thread.join()
consumer_thread.join()

In this example, we use two semaphores, producer_sem and consumer_sem, to control the execution of producer and consumer threads.

In the producer function, we use producer_sem.acquire() to obtain the producer semaphore. When the buffer is full, the producer thread will wait here. When there is space in the buffer, the producer thread adds the item to the buffer and uses consumer_sem.release() to release the consumer semaphore to wake up a waiting consumer thread.

In the consumer function, we use consumer_sem.acquire() to obtain the consumer semaphore. When the buffer is empty, the consumer thread will wait here. When there is an item in the buffer, the consumer thread takes out the item from the buffer and uses producer_sem.release() to release the producer semaphore to wake up a waiting producer thread.

By using semaphores, we can control the execution order and number of producer and consumer threads to implement the functionality of the producer-consumer pattern.

5. Use coroutines to implement the producer-consumer model:
import asyncio
import time

buffer = []  # 共享的缓冲区

# 生产者协程
async def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        print(f'生产了{
      
      item}')
        buffer.append(item)  # 向缓冲区添加商品
        await asyncio.sleep(1)

# 消费者协程
async def consumer():
    while True:
        await asyncio.sleep(0)  # 让出CPU时间片
        if buffer:
            item = buffer.pop(0)  # 从缓冲区取出商品
            print(f'消费了{
      
      item}')
        await asyncio.sleep(2)

# 创建事件循环并运行协程
loop = asyncio.get_event_loop()
tasks = asyncio.gather(producer(), consumer())
loop.run_until_complete(tasks)

In this example, we use two coroutines, producer and consumer, to complete the functions of producer and consumer.

In the producer coroutine, we use await asyncio.sleep(1) to simulate the delay of producing goods, and then add the goods to the buffer. After each product is added, we print out the product information produced.

In the consumer coroutine, we use await asyncio.sleep(0) to give up the CPU time slice so that other coroutines have a chance to execute. Then, we check if there is an item in the buffer that can be consumed, and if so, we take the item out of the buffer and print the consumed item information. After each consumption of an item, we again wait for a period of time.

By using the asyncio library, we can easily create and manage coroutines, and use the event loop to schedule the execution of coroutines. This approach enables efficient asynchronous programming and is suitable for handling IO-intensive tasks.

6. Function yield method:
import time

# 生产者生成商品
def producer():
    for i in range(10):
        item = f'商品{
      
      i}'
        print(f'生产了{
      
      item}')
        yield item
        time.sleep(1)

# 消费者消费商品
def consumer(products):
    for product in products:
        print(f'消费了{
      
      product}')
        time.sleep(2)

# 创建生产者生成器
products = producer()

# 消费商品
consumer(products)

In this example, the producer function is a generator function that generates items via a yield statement and returns them. Generator functions can pause execution each time yield is called, and return the yielded value. In this example, the producer function generates items one at a time and pauses until consumed by the consumer.

The consumer function consumer accepts the generated goods as parameters and consumes them one by one. In this example, we pass the generator object directly to the consumer function, and the consumer iterates through the generator and consumes each item in turn.

By using generator functions, we can implement the asynchronous producer-consumer pattern. Producers can pause execution after producing items, and consumers can consume items as needed, avoiding the additional complexity that buffers can bring.

100. Detailed explanation of each layer of OSI seven-layer protocol

The OSI seven-layer protocol (Open System Interconnection reference model) is a standard communication protocol architecture proposed by the International Organization for Standardization (ISO). This architecture divides all aspects of the computer network architecture into seven different abstraction layers. Each layer has its own communication function. The OSI seven-layer protocols are as follows from top to bottom:

1. Physical Layer

The physical layer is the protocol layer at the bottom of the communication system, and is mainly responsible for converting bit streams into physical signals that can be transmitted on physical media. The physical layer mainly involves issues such as the physical medium of data transmission, mechanical and electrical characteristics, interface standards, transmission rate and data transmission distance.

2. Data-Link layer (Data-Link layer)

The data link layer is mainly established on the physical layer, and is the physical layer that establishes communication with the interface specified by the local network protocol. It defines how similar network interfaces communicate, and specifies how errors at the network layer should be detected and corrected. The data link layer generally consists of two sublayers: Logical Link Control (LLC) and Media Access Control (MAC).

3. Network Layer (Network Layer)

The network layer is mainly used for data transmission and routing selection between different networks, and communication is realized through network addresses. The main protocols include IP (Internet Protocol) protocol, ICMP (Internet Control Message Protocol) protocol, IGMP (Internet Group Management Protocol) protocol, OSPF (Open Shortest Path First) protocol, etc.

4. Transport Layer (Transport Layer)

The transport layer is responsible for managing the quality of network communication and can provide good data transmission services for applications. The main protocols are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

5. Session Layer

The session layer is mainly responsible for establishing, managing and terminating session connections, and providing end-to-end data transmission flow control and synchronization services. The main protocols include RPC (Remote Procedure Call) protocol, NCP (NetWare Core Protocol) protocol, etc.

6. Presentation Layer

The presentation layer is used to handle the representation of exchanged data, such as data compression, encryption, decryption and other operations. The main protocols are ASCII, EBCDIC, JPEG, GIF, etc.

7.Application Layer

The application layer is mainly used to provide network services to users. For example: FTP (File Transfer Protocol) protocol, HTTP (Hyper Text Transfer Protocol) protocol, SMTP (Simple Mail Transfer Protocol) protocol, SSH (Secure Shell Protocol) protocol, DNS (Domain Name System) protocol, etc.

The functions of the seven-layer computer network protocol:

The seven-layer protocol of computer network is a layered communication protocol architecture. Each layer defines different functions to facilitate the development and implementation of standardization and interoperability of network communication. The main functions of each layer are as follows:

1. Physical Layer: Transmits bit streams and defines the electrical and physical characteristics that can be transmitted.

2. Data Link Layer (Data Link Layer): transmits frames, defines the format of frames, detects and controls errors, and media access control.

3. Network Layer (Network Layer): geographic location transmission, to achieve routing and data transmission between different networks.

4. Transport Layer: End-to-end transmission, providing reliable transmission services, such as TCP protocol.

5. Session Layer: Establishes, manages and terminates session connections, and provides end-to-end data transmission flow control and synchronization services.

6. Presentation Layer (Presentation Layer): handles the representation of exchanged data, such as data compression, encryption, decryption and other operations.

7. Application Layer: Provides specific application services, such as file transfer, email, Web browsing, etc.

101. Briefly describe the differences, advantages, disadvantages and usage scenarios between TCP and UDP?

一、TCP与UDP区别总结:

1、TCP面向连接(如打电话要先拨号建立连接);UDP是无连接的,即发送数据之前不需要建立连接

2、TCP提供可靠的服务。也就是说,通过TCP连接传送的数据,无差错,不丢失,不重复,且按序到达;UDP尽最大努力交付,即不保证可靠交付。Tcp通过校验和,重传控制,序号标识,滑动窗口、确认应答实现可靠传输。如丢包时的重发控制,还可以对次序乱掉的分包进行顺序控制。

3、UDP具有较好的实时性,工作效率比TCP高,适用于对高速传输和实时性有较高的通信或广播通信。

4.每一条TCP连接只能是点到点的;UDP支持一对一,一对多,多对一和多对多的交互通信

5、TCP对系统资源要求较多,UDP对系统资源要求较少。

二、TCP和UDP的优缺点

  • TCP协议的优点:
    可靠、稳定,TCP的可靠体现在TCP在传输数据之前,会有三次握手来建立连接,而且在数据传输之前,会有三次握手来建立连接,而且在数据传输时,有确认、窗口、重传、拥塞控制机制,在数据传完猴,还会断开连接用来节约系统资源。
  • TCP缺点:
    慢,效率低,占用系统资源高,易被攻击,TCP在传输数据之前,要先建立连接,这会消耗时间,而且在数据传递时,确认机制,重传机制,拥塞机制等都会消耗大量时间,而且要在每台设备上维护所有的传输连接,事实上,每个连接都会占用系统的CPU、内存等硬件资源。而且,因为TCP有确认机制、三次握手机制,这些也导致了TCP容易被人利用,实现DOS,DDOS,CC等攻击。
  • UDP的优点:
    快速,比TCP稍安全,UDP没有TCP的握手,确认,窗口,重传,拥塞控制等机制,UDP是一个无状态的传输协议,所以它在传递数据时非常快。没有TCP的这些机制,UDP较TCP被攻击者利用的漏洞就要少一点。
  • Disadvantages of UDP:
    However, UDP cannot avoid attacks; it is unreliable and unstable because UDP does not have the reliable mechanisms of TCP. During data transmission, if the network quality is not good, it is easy to lose packets.

When should you use TCP:

When there are requirements for network communication quality, such as: the entire data must be accurately transmitted to the other party, which is often used in applications that require reliability, such as QQ, browsers, HTTP, HTTPS, FTP and other file transfer protocols, POP , SITP and other mail transmission protocols.

When should you use UDP:

When the communication quality requirements are not high, and the network communication is required to be as fast as possible, UDP can be used, such as qq voice, qq video FTFP

102. Three handshakes and four waves

Three-way handshake process:

Professional version:
  • 1 First, the client sends a message with a SYN flag and a randomly generated sequence number 100 (0 bytes) to the server
  • 2. After receiving the message, the server returns a message (SYN200 (0 bytes), ACK1001 (bytes + 1)) to the client.
  • 3. The client again sends a message with the ACK flag 201 (byte +) sequence number to the server. At this point, the three-way handshake process ends, and the client begins to send data to the server.
Grounded version:
  • 1 The client initiates a request to the server: I want to communicate with you, are you ready?
  • 2 After receiving the request, the server responds to the client: I'ok, are you ready?
  • 3 The client politely replies to the client again: "Ready, let's start communicating!"
phone version

The whole process is exactly the same as making a phone call:

  • 1 Hello, are you there?
  • 2 Here, can you hear what I said?
  • 3 Well, you can hear it (please start your performance next)
Supplement: SYN: request inquiry, ACK: reply, response.

Four wave process:

Since the TCP connection can communicate in both directions (full duplex), each direction must be closed separately (this sentence is incisive, the following four waving processes are language descriptions of its specific implementation) four waving processes
, Both the client and the server can start disconnecting first

  • 1 The client sends a message with the fin identifier to the server, requesting that the communication be closed.
  • 2 After receiving the information, the server responds with ACK and agrees to close the client communication (connection) request.
  • 3. The server sends a message with the fin identifier to the client and also requests to close the communication.
  • 4 The client responds ACK to the server and agrees to close the server’s communication (connection) request.

103. The life cycle of Django request refers to:

What happens in the Django background between when the user enters the URL on the browser and when the user sees the web page.

To put it bluntly, it is the execution track of Django when the request comes and when the request goes.

A complete Django life cycle:

  1. After the user sends a request from the client, the data will first be parsed and encapsulated based on the http protocol.
  2. Then come to Nginx processing (nginx listens to a port of the public network ip, after receiving the request, if it is a static resource, nginx directly obtains the resource and returns it to the user, if it is a dynamic resource, nginx forwards the request to uWSGI, using The protocol is usually uwsgi),
  3. After uWSGI receives the request, it communicates with Django by converting the http protocol into the WSGI protocol.
  4. At this point, the request really comes to the backend. First, it will go through the first process of Django: middleware, (and the so-called middleware, in simple terms, is what Django adds when the request comes and goes. A process, when the request comes and goes, it must be processed by the middleware first, so the middleware can also be understood as an additional functional component provided by Django),
  5. After going through the middleware, come to the second process of Django: the routing layer (urls.py) to filter and match addresses that match the suffix of the request command,
  6. Then go to the third process of Django according to the matched address: the view layer (views.py) finds the corresponding view function/attribute in the view class,
  7. Then go to the fourth process: the model layer (models.py) obtains data from the database through orm operation, and returns to the view layer (views.py) to process the data after getting the data (serialization and deserialization) ,
  8. The fifth process of returning the processed data to Django: the template layer (Templates). After the template layer receives the data, it renders the data, and then passes through the view layer, routing layer, middleware, uWSGI server, and Nginx proxy again.
  9. Finally, the rendered data is returned to the client for display.
    The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

  1. The browser initiates a request to the django server

  2. Routing system (url.py)

    1. Match url through regular expression
    2. Hand over to the view system (views.py), call the corresponding method to process the data
  3. middleware

    1. Take effect globally (all methods need to be processed by middleware first)
  4. View system (views.py)

    1. Do you need to distinguish the processing method according to the post get request
    2. Data processing
    3. Return results
  5. templates

    1. Return directly to the page
    2. Return to page after data processing
  6. Return to the browser layer by layer

To put it simply: the life cycle of django is: front-end request->nginx->uwsgi.->middleware->url routing---->view view->orm---->get the data and return it to the view- --->The view renders the data into the template and gets the string---->middleware->uwsgi---->nginx---->front-end rendering.

The life cycle of Django requests is mainly divided into the following four stages:

  1. Initialization of WSGI applications
  2. URL route matching
  3. Execution of Django View functions
  4. Return response

Next, we will explain these four stages in detail one by one.

  1. Initialization of WSGI applications

When the WSGI server starts a Django application, it will call Django's own WSGI application processor to load the application into memory and set some global variables in the memory, such as settings, middleware, etc. These global variables can be used both by the Django application and by the application's middleware.

Among them, settings are the most important global variables in Django applications. It contains all application configuration information except URL routing jumps, such as local database URLs, template engine settings, debug switches, etc. middleware is a component that provides additional functionality and can extend the request and response of the view function.

The following is a code snippet for initializing a WSGI application:

def get_wsgi_application():
    django.setup(set_prefix=False)
    return WSGIHandler()

The get_wsgi_application() function is used to instantiate a WSGI application object and return the WSGI request handler object. Django performs important initialization operations here.

  1. URL route matching

Once a WSGI application is set up in memory, it will automatically provide views, that is, URLs mapped to corresponding handlers. In Django, URLs are defined and managed through a file called urls.py. When a client request arrives in Django, it is mapped to the callable view function in the urlpatterns variable.

The following is an example of a URL mapper:

from django.urls import path
from . import views

urlpatterns = [
    path('about/', views.about),
    path('contact/', views.contact),
]

Here, the URL "/about/" and "/contact/" paths will use views.about and views.contact functions to handle the request.

When matching URLs, Django attempts to match each URL rule in order, using the first matching URL rule. If no rule matches the requested URL path, Django allows you to define a wildcard URL rule that catches all cases, using the 'path:' parameter of the path() function.

urlpatterns = [
    path('about/', views.about),
    path('contact/', views.contact),
    path('<path:slug>/', views.page_not_found),
]

In this example, Django will try to match "/about/" and "/contact/", but regardless of whether they can be successfully matched, it will eventually jump to the views.page_not_found function.

  1. Execution of Django View functions

If Django can correctly map the request URL to a view function, the view function will be executed. View functions are controllers in Django MVC models that handle requests and return responses.

Here is an example of a view function:

from django.shortcuts import render

def index(request):
    return render(request, 'index.html')

In this example, the index view function will render the index.html template and return the result to the client as an HTTP response.

Django passes the request object as the first argument to the view function and provides methods and properties in the object for accessing the request data and other properties. View functions also have access to request parameters and other built-in objects such as settings and middleware.

  1. Return response

When the view function handles the request and generates a response, Django sends the response back to the client (usually the browser). The content of the response can be optional, and Django can generate text-based (HTML, JSON, XML, etc.) responses, or non-text-based (e.g., using a file response) responses.

Django provides a set of response objects accessible to users and allows customization of HTTP response headers. By default, Django uses HttpResponse. HttpResponse accepts the following two types of parameters:

  • content (required) - A string generated by the view function.
  • content_type (optional) - The content type of this response, such as 'text/html' or 'application/json'.

Guess you like

Origin blog.csdn.net/weixin_53909748/article/details/132637345
Recommended