Machine Learning----Summary of Interview Questions (1)

The following topics are from: WeChat public account (artificial intelligence headlines)
  1. You mentioned in your resume that you have built a document mining system. What work have you done? Is it possible to implement document clustering using LDA technique in topic modeling?
  2. Suppose you have hundreds of megabytes of data files, including PDF files, text files, images, scanned PDF files, etc., please give a classification scheme.
  3. How do you read the content of scanned pdf files or written documents in image format?
  4. Why is Naive Bayes called "naive"?
  5. Please describe the Naive Bayes classifier in detail.
  6. What is deep learning? What is the difference between deep learning and machine learning? In unsupervised learning, how to do document clustering?
  7. How to find files related to certain query statements/searches?
  8. Explain TF-IDF technology.
  9. From my experience, the TF-IDF technique doesn't work well for document classification or clustering, how would you improve it?
  10. What is a Long Short Term Memory (LSTM) neural network? Explain how it works.
  11. What is a word2vec model?
  12. Explain mutable and immutable objects in python.
  13. What data structures have you used in python?
  14. How to handle multi-class classification problems with unbalanced datasets?
  15. How do you perform language recognition from a text sentence?
  16. How to represent hieroglyphs in Chinese or Japanese?
  17. How to design a chatbot? (I'm out of ideas, but I try to answer this with intent and feedback based on TF-IDF similarity.)
  18. Can a chatbot be designed using recurrent neural networks to respond to incoming questions with intent and answers .
  19. Suppose you design a chatbot using recurrent neural network or long short-term memory neural network on the Reddit dataset, it can provide 10 possible replies, how to choose the best reply, or how to delete other replies?
  20. Explain how support vector machines (SVMs) learn nonlinear boundaries.
  21.  What is precision and recall? In medical diagnosis, which do you think is more important?
  22. Explain precision and recall.
  23. How to draw receiver operating characteristic curve (ROC curve)? What does the area under the ROC curve mean?
  24.  How to plot ROC curve for multi-class classification task?
  25. List other metrics for multiclass classification tasks.
  26. What is sensitivity and specificity?
  27. What does "random" in random forest mean?
  28. How to do text classification?
  29. How to be sure that a text has been learned? Is it impossible to achieve without TF-IDF technology? (I replied to use an n-gram model (n=1, 2, 3, 4), and use the TF-IDF technique to create a long vector of counts)
  30. What else can you do with machine learning? (I suggest a combination of long short-term memory neural network and word2vec, or a 1D recurrent neural network combined with word2vec, for classification. But the interviewer wants to improve the machine learning based algorithm.)
  31. How does a neural network learn nonlinear shapes when it consists of linear nodes? What is the reason it learns nonlinear bounds?
  32. When training a decision tree, what are its parameters?
  33. To split at a certain node of the decision tree, what is the split standard?
  34. What is the formula for calculating the Gini coefficient?
  35. What is the formula for calculating entropy?
  36. How does a decision tree decide at which feature a split must be made?
  37. How to use the information gathered by mathematical calculations?
  38. Briefly describe the advantages of random forests.
  39. Briefly describe the boosting algorithm.
  40. How does gradient boosting work?
  41. Briefly describe the working principle of AdaBoost algorithm.
  42. Which kernels are used in SVM? What are the optimization techniques of SVM?
  43. How does SVM learn hyperplanes? Discuss the details of its mathematical operations.
  44. Talk about unsupervised learning? What algorithms are there?
  45. How to define the value of K in K-Means clustering algorithm?
  46. List at least 3 ways to define K in the K-Means clustering algorithm.
  47. Other than that, what clustering algorithms do you know?
  48. Introduce the DB-SCAM algorithm.
  49. Briefly describe the working principle of Hierarchical Agglomerative clustering.
  50. Explain the principal component analysis algorithm (PCA), and briefly describe the mathematical steps of using the PCA algorithm.
  51. 20. What are the disadvantages of using the PCA algorithm?
  52. Talk about how convolutional neural networks work? Details of its implementation are specified.
  53. Explain backpropagation in Convolutional Neural Networks.
  54. How do you deploy machine learning models?
  55. Most of the time we have to use C++ to build a machine learning model from scratch, can you do this?
  56. What is the scope of the sigmoid function?
  57. Name the package in scikit-learn that implements logistic regression.
  58. What is the mean and variance of the standard normal distribution?
  59. What data structures do you use in Python?
  60. What are the methods of text classification? How would you do the classification?
  61. Explain TF-IDF technology and its shortcomings, how to overcome the shortcomings of TF-IDF?
  62. What are Bigrams and Trigrams? Explain the TF-IDF technique of two-word collocation and three-word collocation with a text sentence.
  63. Give an example to illustrate the applications of word2vec.
  64. How to design a neural network? How to achieve "depth"? This is a basic neural network problem.
  65. Briefly describe how LSTM works. How does it remember text?
  66. What is Naive Bayes Classifier?
  67. What is the probability of tossing a coin 10 times and getting heads 4 times?
  68. How to get the index of an element in a Python list?
  69. How to merge two pandas datasets?
  70. From user behavior, you need to simulate a fraudulent activity, how would you solve this problem? This could be an anomaly detection problem or a classification problem!
  71. Decision tree or random forest, which one do you prefer?
  72. What is the difference between logistic regression and random forest?
  73. Would you use decision trees or random forests to solve classification problems? What are the advantages of random forests?
  74. In an imbalanced dataset, what model would you choose: Random Forest or Boosting? Why?
  75. What Boosting technologies do you know?
  76. Using supervised learning to solve a classification problem, which model would you choose? Let's say there are 40-50 categories!
  77. How do you use the Ensemble technique?
  78. Briefly describe how support vector machines (SVMs) work.
  79. What is Kernel? Briefly.
  80. How to implement nonlinear regression?
  81. What is Lasso Regression and Ridge Regression?
  82. You mentioned on your resume that you have done speech recognition in speeches. Specifically, what is your implementation method?
  83. What are Mel Frequency Cepstrums (MFCCs)?
  84. What is a Gaussian mixture model and how does it accomplish clustering?
  85. How to maximize expectations? Talk about its implementation steps.
  86. How are the probabilities in the GMM model calculated?
  87. How did you perform MAP adjustment for the GMM-UBM technique when doing pronunciation recognition?
  88. Talk about the I-vector technique you use.
  89. When analyzing context, what are the main factors?
  90. What is the difference between JFA and I-vector? Why choose I-vector over JFA?
  91. Have you ever used PLDA I-vector technology?
  92. Have you read Baidu's Deep Speaker paper?
  93. If you have two models to choose from, what is your basis for choosing? (Exploring techniques for model selection)
  94. Briefly describe the mathematical working principle of Bayesian Information Metric (BIC) and Akaike Information Quantity (AIC).
  95. How do Bayesian Information Metrics and Akaike Information Metrics work?
  96. What should I do if the data in the MFCC eigenvector matrix is ​​missing?
  97. How to do speech recognition? What are the characteristics?
  98. Is your classifier a classifier for speech and music, or a classifier for speech and non-speech?
  99. How are deep neural networks used in speech analysis?

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324883136&siteId=291194637