python algorithm interview questions

These are the interview questions I collected as a technical interviewer years ago. There are some questions from python basics-machine learning-NLP-CV-deep learning framework-Linux-yolo. Ask candidates of different directions the corresponding direction of the question.
Basically, they are stereotyped essays for interviews, collect and record them, and I will use them in the future.

interview questions

Python basics:

1. One line of code realizes the sum of 1–100

2. Let’s talk about python’s GIL.
GIL is python’s global interpreter lock. If there are multiple threads running in the same process, one thread will occupy the python interpreter when running a python program (adding a lock is GIL), so that Other threads in the process cannot run, and other threads cannot run until the thread finishes running. If a time-consuming operation is encountered while the thread is running, the interpreter lock is unlocked to allow other threads to run. Therefore, in multithreading, the running of threads is still sequential, not at the same time.
In multi-process, because each process can be allocated resources by the system, it is equivalent to having a python interpreter for each process, so multi-process can realize the simultaneous operation of multiple processes. The disadvantage is that the process system resource overhead is large

3. Explain in one sentence what kind of language can use decorators?
Functions can be passed as parameters, and decorators can be used

4. Briefly describe the with method to open and process files. What did we do for me?
Some abnormal conditions may occur when opening a file for reading and writing. If we follow the conventional f.open
writing method, we need try, except, finally to make abnormal judgments, and no matter what happens to the file, we must execute finally f .close() closes the file, the with method helps us realize f.close in finally
(of course there are other custom functions, if you are interested, you can study the source code of the with method)

5. The difference between copy and deepcopy in python
1. Copying an immutable data type, regardless of copy or deepcopy, is the same address. When the value of the shallow copy is an immutable object (value, string, it is the same as the case of = "
assignment" , the id value of the object is the same as the original value of the shallow copy. 2. The copied value is a mutable
object (list and dictionary). The change of the value will not affect the value of the shallow copy, and the change of the value of the shallow copy will not affect the original value. The id value of the original value is different from the original value of the shallow copy. The second case: the copied object has Complex subobjects (for example, a subelement in a list is a list), changing the value of the complex subobject in the original value will affect the value of the shallow copy. Deep copy deepcopy: completely copy independent, including inner lists and dictionaries



6. Please change [i for i in range(3)] to a generator

9. What is the difference between static methods and class methods?
Python classes are just syntactic sugar. There is no difference between writing a function inside a class and writing it outside the class. The only difference is the parameters. The so-called instance method means that the first parameter is self, and the so-called class method means that the first parameter is class, while the static method does not need additional parameters. , so it must be distinguished.

1. Briefly describe the three major characteristics of object-oriented.
Encapsulation: Encapsulation refers to putting a bunch of data attributes and method data in a container, which is an object. Let the object use "." to call the data attribute and method attribute in the object.
Inheritance: Inheritance means that subclasses can inherit the data attributes and method attributes of the parent class, and can modify or use them.
Polymorphism: Polymorphism in python refers to unifying the naming conventions of multiple classes with similar data attributes and method attributes, which can improve the code uniformity of developers and make it easier for callers to understand.

Python machine learning:

1. What is the difference between supervised and unsupervised learning?
Supervised learning: Learning from labeled training samples to classify and predict data outside the training sample set as much as possible. (LR, SVM, BP, RF, GBDT)
Unsupervised learning: training and learning on unlabeled samples, than discovering structural knowledge in these samples. (KMeans,DL)

4. The difference and pros and cons of linear classifiers and nonlinear classifiers
If the model is a linear function of parameters and there is a linear classification surface, then it is a linear classifier, otherwise it is not.
Common linear classifiers include: LR, Bayesian classification, single-layer perceptron, linear regression
Common nonlinear classifiers: decision tree, RF, GBDT, multi-layer perceptron
SVM (see linear kernel or Gaussian Kernel)
Linear classifiers are fast and easy to program, but the fitting effect may not be very good.
Non-linear classifiers are complicated to program, but have strong effect fitting ability

5. Why do some machine learning models need to normalize the data?
Normalization is to limit the data you need to process (through a certain algorithm) to a certain range you need.

1) After normalization, the speed of gradient descent to find the optimal solution is accelerated. The contour line becomes smoother and can converge faster when solving the gradient descent. If normalization is not done, the gradient descent process is easy to go zigzag, it is difficult to converge or even unable to converge

2) Changing the dimensioned expression into a dimensionless expression may improve the accuracy. Some classifiers need to calculate the distance between samples (such as Euclidean distance), such as KNN. If the value range of a feature is very large, then the distance calculation mainly depends on this feature, which is contrary to the actual situation (for example, the actual situation is that the feature with a small value range is more important at this time). Models such as logistic regression assume that the data obeys
positive state distribution.

7. Which machine learning algorithms do not need to be normalized?
Probabilistic models do not need normalization, because they do not care about the value of the variable, but the distribution of the variable and the conditional probability between the variables, such as decision trees, rf. And optimization problems like adaboost, gbdt, xgboost, svm, lr, KNN, KMeans need normalization.

8. The difference between standardization and normalization
In simple terms, standardization is to process data according to the columns of the feature matrix. It converts the eigenvalues ​​​​of the samples into the same dimension by calculating the z-score method. Normalization is to process data according to the rows of the feature matrix. Its purpose is that the sample vectors have a unified standard when calculating the similarity of the point multiplication or other kernel functions, that is to say, they are all converted into "unit vectors".

15. Gradient boosting algorithm (GBM) and random forest are both tree-based algorithms. What is the difference between them?
The most fundamental difference is that the random forest algorithm uses bagging technology to make predictions; while GBM uses boosting technology to make predictions. In the bagging technique, the data set is divided into n samples by random sampling. Then, use a single learning algorithm to model on all samples. The resulting predictions are then combined using voting or averaging. Bagging is performed in parallel, while boosting is after the first round of predictions. The algorithm increases the weight of misclassified predictions so that they can be corrected in subsequent rounds. This sequential process of giving high weight to misclassified predictions continues until a stopping criterion is reached. Random forests improve the accuracy of the model by reducing variance (the main way). The spanning trees are uncorrelated to maximize variance reduction. On the other hand, GBM improves accuracy while reducing model bias and variance.

NLP interview questions

6: The difference between CRF and HMM.
1. There is a hidden horse assumption in the HMM model, but CRF does not exist, so the calculation speed of HMM is much faster than that of the CRF model, and it is suitable for occasions that require high prediction performance. 2. Also because of the
hidden Ma hypothesizes that when the hidden sequence unit in the prediction problem is not only related to the previous unit, the accuracy of HMM will be greatly reduced, while CRF is not limited by this, and the accuracy is significantly higher than that of HMM.

7: How is CRF trained? What is the loss function?
(1) The CRF layer can obtain restrictive rules from the training data.
(2) The CRF layer can add some constraints to the final predicted label to ensure the legality of the predicted label
In the process of training data training, these constraints can be automatically learned by the CRF layer, thus greatly reducing the probability of illegal sequences appearing in the prediction of label sequences .
Loss function:
The CRF loss function consists of the actual path score and the total score of all possible paths.
(1) For the output tag sequence y corresponding to the input sequence X, define the score as follows (essentially the cumulative sum of the launch probability and transition probability) (2) Use the softmax function to
define a probability value for each correct tag sequence y, In real training, it is only necessary to maximize the likelihood probability p(y|X), and the log likelihood is used specifically as follows
(3) The real path has the highest score among all possible paths. During the training process, the parameter values ​​of the model will be updated again and again to continue to increase the fractional percentage of the true path.

6: The difference between CRF and HMM.
1. There is a hidden horse assumption in the HMM model, but CRF does not exist, so the calculation speed of HMM is much faster than that of the CRF model, and it is suitable for occasions that require high prediction performance. 2. Also because of the
hidden Ma hypothesizes that when the hidden sequence unit in the prediction problem is not only related to the previous unit, the accuracy of HMM will be greatly reduced, while CRF is not limited by this, and the accuracy is significantly higher than that of HMM.


5: How does the batch_size parameter affect the training effect of the model ? Convergence speed? Overfitting? How did you solve the corresponding problem? What methods have you used? The average gradient, large batch_size reduces training time and improves stability. However, the generalization ability of too large batch_size decreases. Within a certain range, increasing the batch size helps to stabilize the convergence, but as the batch size increases, the performance of the model will decrease and the convergence speed will slow down. Through the impact on the number of training steps, a small batch_size increases the number of model iterations and reaches the fitting point in advance, but the epoch is not over, and continuing to learn the training data will easily lead to overfitting to the original data.
Solution: Add BN layer, dropout, and replace the optimization method of gradient descent

4: We have been saying that ReLU is better than sigmoid, but can you tell me what is wrong with ReLU? Please go deep into the essence of mathematics to explain your explanation. Answer: Although ReLU is better than sigmoid, ReLU also has
inevitable , it has a problem known as "ReLU dead zone": during training, some neurons "die", i.e. they stop outputting anything other than 0. In some cases, you may find that half of the neurons of your network have died, especially when using large learning rates. During training, if a neuron's weights are updated such that the weighted sum of the neuron's inputs is negative, it will start outputting 0. When this happens, since the gradient of the ReLU function is 0 when the input is negative, the neuron can only output 0.

3: Please explain the difference between Bert's attention and seq2seq's attention?
Answer: Bert's foundation comes from Transformer, so its attention mechanism is a multi-head attention mechanism, which is more efficient than traditional RNN and other models and can be parallelized The processing can capture long-distance semantic and structural dependencies at the same time. The extracted features are also more abundant, and all the information on the Encoder side of seq2seq's attention is compressed into a fixed-length semantic vector, and this fixed vector is used to represent all the information on the encoder side. . This will not only cause a loss of information, but also prevent the Decoder from focusing on the more important information when decoding, and cannot process problems in parallel.

2: Why can Bert solve a word with multiple meanings? How can it be done?
Answer: The same word is being converted into the input of Bert. Although the embedding vector is the same, after passing through the multi-layer transformer encoder in Bert, the attention is different. The context of different sentences will lead to different word vectors output by the same word after different sentences are input to bert, thus solving the problem of polysemy.

(1) Introduce dropout, why it can prevent overfitting

6. Which of the following techniques can be used to calculate the distance between two word vectors?
B. Euclidean Distance
C. Cosine Similarity

56. What are stop words in NLP?
Common words that appear in a sentence add weight to the sentence and are called stop words. These stop words act as bridges and ensure that the sentences are grammatically correct. In simple terms, words that are filtered out before processing natural language data are called stop words, and it is a common preprocessing method.

59. When processing text data, what are the characteristics of RNN over CNN?
The method of traditional text processing tasks generally uses TF-IDF vector as feature input, which actually loses the order of each word in the input text series.
CNN generally receives a fixed-length vector as input, and then converts the original input into a fixed-length vector representation through a sliding window plus pooling method. Doing so can capture some local features in the text, but the long-distance dependencies between two words are difficult to learn.

RNN能够很好处理文本数据变长并且有序的输入序列。将前面阅读到的有用信息编码到状态变量中去,从而拥有了一定的记忆能力。
在文本分类任务中,激活函数f可以选取Tanh或ReLU函数,g可以采用Softmax函数。
通过不断最小化损失误差(即输出的y与真实类别之间的距离),可以不断训练网络,使得得到的循环神经网络可以准确预测文本类别。相比于CNN,RNN由于具备对序列信息的刻画能力,往往能得到更加准确的结果。

60. Why does RNN have gradient disappearance or gradient explosion? What improvements are there?
The solution of RNN can be realized by BPTT (Back Propagation Through Time) algorithm. Actually a simple variant of BP. The original intention of RNN design is to capture the dependencies between long-distance inputs. However, the algorithm using BPTT cannot successfully capture long-distance dependencies. This phenomenon stems from the gradient disappearance problem in deep neural networks.

由于预测误差沿神经网络每一层反向传播,当雅克比矩阵最大特征值大于1时,随着离输出越来越远,每层的梯度大小会呈指数增长,导致梯度爆炸。反之若最大特征值小于1,梯度大小会指数减小,产生梯度消失。梯度消失意味着无法通过加深网络层数来提升预测效果,只有靠近输出的几层才真正起到学习的作用,这样RNN很难学习到输入序列中的长距离依赖关系。


梯度爆炸可以通过梯度裁剪来缓解,即当梯度的范式大于某个给定值的时候,对梯度进行等比缩放。而梯度消失问题需要对模型本身进行改进。深度残差网络是对前馈神经网络的改进。通过残差学习的方式缓解了梯度消失的现象,从而可以学习到更深层的网络表示。对于RNN来说,长短时记忆模型及其变种门控循环单元等模型通过加入门控机制,很大程度上缓解了梯度消失带来的损失。

在CNN中采用ReLU激活函数可以有效改进梯度消失,取得更好收敛速度和收敛结果,

64. Among the common probabilistic graphical models, which ones are generative models and which ones are discriminative models?

首先需要弄清楚生成式模型与判别式模型的区别。
假设可观测的变量集合为X,需要预测的变量集合为Y,其他的变量集合为Z。生成式模式是对联合概率分布P(X,Y,Z)P(X,Y,Z)P(X,Y,Z)进行建模,在给定观测集合X的条件下,通过计算边缘分布来求得对变量集合Y的推断。

判别式模型是直接对条件概率分布P(Y,Z∣X)P(Y,Z|X)P(Y,Z∣X)进行建模,然后消掉无关变量Z就可以得到对变量集合Y的预测,即常见的概率图模型由朴素贝叶斯、最大熵模型、贝叶斯网络、隐马尔可夫模型、条件随机场、pLSA、LDA等。

其中朴素贝叶斯、贝叶斯网络、pLSA、LDA属于生成式。
最大熵模型属于判别式。
隐马尔可夫模型、条件随机场是对序列数据进行建模的方法,其中隐马尔可夫属于生成式,条件随机场属于判别式

CV interview questions

3. Is the neural network a generative model or a discriminative model?
Discriminant model, directly output category label, or output class posterior probability p(y|x)

5. What are the main methods of model compression?
(1) In terms of model structure, it can be divided into: model pruning, model distillation, NAS automatic learning model structure, etc.
(2) Quantification of model parameters includes numerical precision quantization to FP16, etc.
Note: Many examples of model pruning appear on lightweight networks, such as the group conv in mobilenet v3, which changes the computationally intensive layer at the end of the network. Depthwise separable convolution, etc.
Model distillation is transfer learning.
The amount of parameters is also reflected in mobilenet v3, reducing the number of convolution kernels at the network head.

11. The principle of dropout, why can it prevent overfitting?
The principle of dropout is to make the activation value of neurons become zero with a certain probability when the network is forward propagating, which can make the generalization performance of the model stronger.

前向:训练时,利用伯努利分布,随机选出一个只包含0,1的mask矩阵,然后用这个mask矩阵去对应乘上每个输入得到的就是Dropout后的结果,再除以(1-p);测试的时候不用Dropout
反向:训练时根据mask来求对应的梯度,测试时无Dropout

dropout为什么可以防止过拟合呢?
1、dropout其实相当于我们日常用到的基于平均的ensemble,ensemble有两种方式,基于平均的ensemble和投票的ensemble。对于网络中的部分神经元进行概率暂时舍弃,这样相当于训练了多个网络。
2、dropout还取消了神经元之间的共适应关系,使得网络的输出不依赖于网络中的某些隐含节点的固定作用,使模型的鲁棒性更好。
3、类似于生物进化的角色,环境的变化不会对物种造成毁灭性的影响。

15. Briefly describe the methods of data enhancement.
It is mainly divided into offline enhancement and online enhancement methods.
Offline augmentation means that the dataset is processed locally.
Online enhancement: flip (horizontal, vertical), rotate, scale, crop, translate, add noise, etc.

17. How to understand operations such as convolution, pooling, and fully connected layers.
The role of convolution: it is equivalent to a filter, extracting different features of pictures, and generating
the role of feature_map activation function:
the role of introducing nonlinear factor pooling: 1. Reduce the size of feature dimensions, make features more controllable, reduce the number of parameters, thereby controlling the degree of overfitting, and increase the robustness of the network to slightly transformed images; 2. Achieve an invariance, including translation, rotation, scale wait. 3. It will cause the gradient to be sparse and lose information. In the GAN network, convolution with a step size is usually used to perform downsampling instead of pooling.
The role of full connection: classify and return the extracted features.

18. The function of the convolution kernel of 1x1 size.
By controlling the number of convolution kernels, the dimension can be increased or reduced, thereby reducing the model parameters.
Perform normalization operation (BN) on different features, and increase non-linearity (relu)
for different channels. fusion of features

19. Characteristics of common activation functions
Activation functions are divided into two categories, saturated activation functions and unsaturated activation functions.
Representatives of saturated activation functions are sigmoid, tanh. The characteristics are: slow convergence and easy gradient disappearance.
The characteristics of the unsaturated activation function are: fast convergence, suppression of gradient disappearance, and suppression of overfitting.
sigmoid: A large amount of calculation. The gradient disappears, which will change the original data distribution.
tanh: the amount of calculation is large, and the gradient disappearance is better than sigmoid
relu: the calculation is simple, which effectively prevents the gradient disappearance and gradient explosion, and neurons will die.
leakrelu: solves the problem of neuron death, but there is an additional parameter a
ELU: avoid dying neurons, and continue everywhere, thus speeding up SGD, but the calculation is more complicated

23. Why is gradient descent often used instead of Newton's method in machine learning?
1. The objective function of Newton's method is the second-order derivative. In the case of high dimensions, the matrix is ​​very large, and calculation and storage are both problems.
2. In the case of small batches, Newton's method estimates too much noise.
3. In the case of non-convex objective function, Newton's method is easily attracted by saddle points and maximum points.

32. SGD, Momentum, Adagard, Adam
Firstly, due to the existence of pathological curvature, the training will slow down and the problem of local minimum. So there is the next optimization function.
See my blog post for details.
1. During the training process, the SGD algorithm is very likely to select labeled data that is incorrectly marked, or data that is very different from normal data for training, then using this data to obtain the gradient will have a large deviation. , so SGD will have a strong random phenomenon during the training process. SGD only performs one update at a time, there is no redundancy, and new samples can be added. But SGD updates can cause serious shocks.
So there is a mini-batch gradient descent.
2. In the stochastic gradient learning algorithm, the step size of each step is fixed, while in the momentum learning algorithm, how far each step goes depends not only on the size of the current gradient but also on the past speed. The velocity v is the gradient of the accumulated training parameters for each round. The momentum method can accelerate SGD and suppress oscillations, so that the dimensions whose gradient direction does not change are updated faster, and the dimensions whose gradient direction changes are updated slower, which can speed up convergence and reduce oscillations, but requires certain prior knowledge.
3. The previous stochastic gradient and momentum stochastic gradient algorithms use a global learning rate, and all parameters are updated at a uniform pace. AdaGrad is actually very simple. It is to add up the squares of the respective historical gradients of each dimension, and then divide by the historical gradient value when updating. Therefore, Adagrad has a larger update for low-frequency parameters and a smaller update for high-frequency parameters. Therefore, it performs well for sparse data. The disadvantage is that Adagrad's learning rate will keep shrinking and eventually become very small.
4. Although AdaGrad has some good properties in theory, it does not perform very well in practice. The fundamental reason is that the learning rate decreases rapidly with the increase of the training cycle. The RMSProp algorithm introduces the attenuation factor on the basis of AdaGrad
5. Although momentum accelerates our search for the direction of the minimum value, RMSProp hinders our search in the direction of oscillation. Adam can see from his name that he is a fine-tuned version based on momentum and RMSProp. This method is the current deep learning The most popular optimization method in .

If the data is sparse, use an adaptive method, such as: Adagrad, Adam, RMSprop, etc. In most cases, Adam can achieve better results, but SGD can be faster under better initialization conditions. Convergence, the minimum point can also be found.

pytorh interview questions

Eight. What is the difference between train and eval in pytorch
(1). model.train() - enable BatchNormalization and Dropout during training
, and set BatchNormalization and Dropout to True
(2). model.eval() - verification and testing When enabled,
BatchNormalization and Dropout are not enabled, and BatchNormalization and Dropout are set to False

train模式会计算梯度,eval模式不会计算梯度。

8. What visualization tools are used

9. How to use the GPU to train the model
It is very simple to use the GPU to accelerate the model, just move the model and data to the GPU. The core code is only a few lines
# define the model
...
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device) # move the model to cuda
# train the model
...
features = features.to(device) # move data to cuda
labels = labels.to(device) # or labels = labels.cuda() if torch.cuda.is_available() else labels

10. How to check GPU usage, storage status
nvidia-smi

11. The difference between concat and stack during matrix transformation.
Similar to numpy, concat and stack methods can be used to merge multiple tensors, and tf.split method can be used to split a tensor into multiple tensors.

12. How to deal with low GPU usage during neural network training.

75. Functions of pytorch
1. DataLoader() load data
2. Initialize weights and bais
3. Optim optimization method, learning rate learning method
4. Loss function torch.nn.
5. Conv, pool, relu, dropout, FC, etc.
6. Backpropagation loss.backword
7. Save model torch.save Load model torch.load

tensorflow interview questions

The difference between tensorflow1.x and 2.x

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Re7Yh0oS-1675147990479)(…/AppData/Roaming/Typora/typora-user-images/image-20221024092557715.png )]

Autograph is the mechanism of TensorFlow2.0.

	动态计算图运行效率相对较低。

​ You can use the @tf.function decorator to convert ordinary Python functions into static calculation graph construction codes corresponding to TensorFlow1.0.

​ In TensorFlow1.0, using the calculation graph is divided into two steps. The first step is to define the calculation graph, and the second step is to execute the calculation graph in the session.

​ In TensorFlow2.0, if the calculation graph is used in the Autograph manner, the first step of defining the calculation graph becomes the definition of a function, and the second step of executing the calculation graph becomes a calling function.

No need to use sessions anymore, everything is as natural as the original Python syntax.

​ In practice, we usually use the dynamic calculation graph to debug the code first, and then use @tf.function to switch to Autograph to achieve higher efficiency where performance needs to be improved.

What is the difference between tf.print() and print()

Autograph机制可以将动态图转换成静态计算图,兼收执行效率和编码效率之利。

Of course, the code that the Autograph mechanism can convert is not without any constraints. There are some coding standards that need to be followed, otherwise the conversion may fail or not meet expectations.

1. Functions decorated with @tf.function should use functions in TensorFlow instead of other functions in Python as much as possible. For example use tf.print instead of print, use tf.range instead of range, use tf.constant(True) instead of True.

(Explanation: Functions in Python are only used in the stage of tracking execution functions to create static graphs. Ordinary Python functions cannot be embedded in static computing graphs, so when they are called again after the computing graph is built, these Python functions It is not calculated, and the functions in TensorFlow can be embedded in the calculation graph. Using ordinary Python functions will result in [eager execution] before being modified by @tf.function and [static graph execution] after being modified by @tf.function The output is inconsistent.)

​ 2. Avoid defining tf.Variable inside the function modified by @tf.function.

​ (Explanation: If tf.Variable is defined inside the function, then in [eager execution], this behavior of creating tf.Variable will happen every time the function is called. But in [static graph execution], this creation The behavior of tf.Variable will only occur when the first step is to trace the logic of Python code to create a calculation graph, which will result in the output of [eager execution] before being modified by @tf.function and [static graph execution] after being modified by @tf.function Inconsistent. In fact, TensorFlow will generally report an error in this case.)

3. Functions decorated with @tf.function cannot modify data structure variables such as Python lists or dictionaries outside the function.

​ (Explanation: Static calculation graphs are compiled into C++ codes and executed in the TensorFlow kernel. Data structure variables such as lists and dictionaries in Python cannot be embedded in calculation graphs. They can only be read when creating a calculation graph. Data structure variables such as lists or dictionaries in Python cannot be modified while executing a computational graph.)

Linux interview questions

6. List several commonly used Linux commands.
answer:

列出文件列表:ls【参数 -a -l】

创建目录和移除目录:mkdir rmdir

用于显示文件后几行内容:tail,例如: tail -n 1000:显示最后1000行

打包:tar -xvf

打包并压缩:tar -zcvf

查找字符串:grep

显示当前所在目录:pwd创建空文件:touch

编辑器:vim vi

7. View the command
ps of the process

20. How to make a command run in the background?
Answer:

Generally, & is used at the end of the command to let the program run automatically. (you can not add a space after the command)

22, The different meanings of "<" "<<" ">>">"

23. How to stop the process.

project related questions

1. For the face detection system, every time a new person is added, one side needs to be retrained, which will affect the efficiency. Do you have any good ideas?

11. Given a training dataset with 1000 columns and 1 million rows, this dataset is based on a classification problem. The manager asks you to reduce the dimensionality of this dataset to reduce model computation time, but your machine has limited memory. what will you do? (You are free to make all kinds of practical assumptions.)
Your interviewer should be very aware of the difficulty of dealing with high-dimensional data on limited memory. Here are the processing methods you can use:

1.由于我们的RAM很小,首先要关闭机器上正在运行的其他程序,包括网页浏览器等,以确保大部分内存可以使用。
2.我们可以随机采样数据集。这意味着,我们可以创建一个较小的数据集,比如有1000个变量和30万行,然后做计算。
3.为了降低维度,我们可以把数值变量和分类变量分开,同时删掉相关联的变量。对于数值变量,我们将使用相关性分析;对于分类变量,我们可以用卡方检验。
4.另外,我们还可以使用PCA(主成分分析),并挑选可以解释在数据集中有最大偏差的成分。
5.利用在线学习算法,如VowpalWabbit(在Python中可用)是一个不错的选择。
6.利用Stochastic GradientDescent(随机梯度下降法)建立线性模型也很有帮助。
7.我们也可以用我们对业务的理解来估计各预测变量对响应变量的影响的大小。但是,这是一个主观的方法,如果没有找出有用的预测变量可能会导致信息的显著丢失。

python development interview questions

1. Django command to create a project

django -admin startproject project name

​ python manage.py startapp application app name

3. Understanding of MVC, MVT interpretation

​ M: Model, model, and interact with the database

​ V: View, view, responsible for generating Html pages

​ C: Controller, the controller, receives the request, processes it, interacts with M and V, and returns a response.

img

MVT:

​ M: Model, model, has the same function as M in MVC, and interacts with the database.

​ V: view, view, has the same function as C in MVC, receives requests, processes them, interacts with M and T, and returns responses.

​ T: Template, template, has the same function as V in MVC, generating Html pages.

10. The application scenarios of the three major frameworks in Python

​ django:

​ It is mainly used for rapid development. Its highlight is rapid development and cost saving. The normal concurrency is only 10,000. If you want to achieve high concurrency, you need to carry out secondary development on django, such as giving the entire heavy framework to Remove it, write socket by yourself to realize http communication, use pure c and c++ to improve efficiency at the bottom layer, get rid of the ORM framework, and write a framework for encapsulating and interacting with the database yourself, because what, although ORM is object-oriented to operate the database, its The efficiency is very low, using foreign keys to link queries between tables;

​ flask:

​ Lightweight, mainly used to write a framework for the interface, realize the separation of front and back ends, and improve development efficiency. Flask itself is equivalent to a kernel, and almost all other functions need to be extended (mail extension Flask-Mail, user authentication Flask-Login), all need to be implemented with third-party extensions. For example, you can use Flask-extension to add ORM, form verification tools, file upload, identity verification, etc. Flask does not have a default database, you can choose MySQL or NoSQL.

​ Its WSGI toolbox uses Werkzeug (routing module), and its template engine uses Jinja2. These two are also the core of the Flask framework. The most famous framework for Python is Django, and there are also frameworks such as Flask and Tornado. Although Flask is not the most famous framework, Flask should be regarded as one of the most flexible frameworks, which is why Flask is loved by developers.

​ Tornado:

Tornado is an open source version of web server software. Tornado differs significantly from today's mainstream web server frameworks (including most Python frameworks): it is a non-blocking server, and it is quite fast.

​ Thanks to its non-blocking approach and use of epoll, Tornado can handle thousands of connections per second, so Tornado is an ideal framework for real-time Web services.

yolo interview questions

​Yolo's idea: take the entire picture as input, and directly return the position of the bounding box and its category at the output layer

​ The difference from RCNN and faster-RCNN: Both of them also use the entire picture as input, but faster-RCNN uses RCNN's proposal+classifier idea. Put the step of extracting proposal in CNN, while yolo adopts the idea of ​​direct regression

1. Introduce YOLO and explain why YOLO is so fast?
Yolo is the pioneering work of a single-stage detection algorithm. The original yolov1 was directly improved on the basis of the image classification network. It abandoned the RPN operation in the two-stage detection algorithm and directly performed classification prediction and regression on the input image, so it Compared with the two-stage target detection algorithm, the speed is very fast, but the accuracy will be much lower; but after iterating to the current V4 and V5 versions, the accuracy of yolo has been comparable to or even exceeded the two-stage target detection algorithm, and at the same time It maintains a very fast speed and is currently one of the most popular algorithms in the industry. The core idea of ​​yolo is to divide the input image into a grid of S x S after the backbone feature is extracted, and which grid the center of the object falls in is responsible for predicting the object. Confidence, category, and coordinate location.

  1. Let me introduce the principles of NMS and IOU;
    the full name of NMS is non-maximum value suppression, as the name implies, it is to suppress elements that are not maximum values. In the target detection task, usually when parsing the predicted frames output by the model, there will be a lot of predicted target frames, among which many repeated frames locate the same target. The function of NMS is to remove these repeated frames, so as to obtain the real target box. The NMS process uses IOU, which is a standard used to measure the correlation between the real and the prediction. The higher the correlation, the higher the value. The calculation of IOU is the overlapping part of the two regions divided by the set part of the two regions, which is simply the intersection divided by the union.

In NMS, the confidence of the prediction frame is first sorted, and the prediction frame with the highest confidence is sequentially compared with the following frame for IOU comparison. When the IOU is greater than a certain threshold, it can be considered that the two prediction frames have reached the same target, while The one with lower confidence will be eliminated, compared in turn, and finally get all the prediction boxes.

  1. Do you have any good solutions or techniques for small target detection?
    Image pyramid and multi-scale sliding window detection (MTCNN)
    multi-scale feature fusion detection (FPN, PAN, ASFF, etc.)
    increase training and detection image resolution;
    super-resolution strategy zooms in and detects;

Guess you like

Origin blog.csdn.net/qq_58832911/article/details/128817297