Power Academy Machine Learning 365 Days Special Training Camp

General question

apple

1. Suppose you are facing millions of users, and each user has hundreds of transactions involving thousands of products. How do you classify these users meaningfully?

Microsoft

2. Please describe a project you participated in, and talk about its uniqueness.

3. How to deal with categorical features with high-cardinality?

4. How to summarize a Twitter feed?

5. What are the steps to clean up data before applying it to machine learning algorithms?

6. How to measure the distance between data points?

7. Please define the variance.

8. Please describe the difference between box plot and histogram, and cite use cases.

Twitter

9. What features will you use to build a recommendation algorithm for users?

Uber

10. Choose a product or app that you really like and talk about how you plan to improve it.

11. How to find anomaly in the distribution?

12. If a particular trend in the distribution is due to an anomaly, how would you proceed to investigate?

13. How do you assess Uber's impact on traffic and driving conditions?

14. What parameters will you use to track whether Uber's paid advertising has really gained new customers? What method would you use to calculate an ideal new customer acquisition cost?

LinkedIn

15. Big data engineer, can you explain what REST is?

Machine learning problems

Google

16. Why do you use feature selection?

17. If two predictors are highly correlated, what is the effect on the logistic regression coefficient? What is the confidence interval for the coefficient?

18. What is the difference between Gaussian Mixture Model and K-Means?

19. How to pick k for K-Means?

20. When to apply the Gaussian mixture model?

21. Assuming that the label of a clustering model is known, how to evaluate the performance of the model?

Microsoft

22. Give an example of a machine learning project you are proud of.

23. Describe any machine learning algorithm.

24. Describe how Gradient Boosting works.

25. Data mining: Describe the decision tree model.

26. Data mining: What is a neural network?

27. Explain the bias-variance tradeoff (Bias-Variance Tradeoff).

28. How to deal with unbalanced binary classification?

29. What is the difference between L1 and L2 regularization?

Uber

30. What feature would you use to predict whether an Uber driver will accept a ride request? What supervised learning algorithm would you use to solve this problem? How to compare the results of algorithms?

LinkedIn

31. Give and describe three different kernel functions and their respective application conditions.

32. Describe a method used in machine learning.

33. How to deal with sparse data?

IBM

  1. How to prevent overfitting?

  2. How to deal with outliers in the data?

  3. Compared with the classification model, how to analyze the prediction performance of the regression model?

  4. How to evaluate a logistic regression model compared to a simple linear regression model?

  5. What is the difference between supervised learning and unsupervised learning?

  6. What is cross-validation? Why use cross-validation?

  7. What is the name of the matrix used to evaluate the predictive model?

  8. What is the relationship between logistic regression coefficient and Odds Ratio?

  9. What is the relationship between Principal Component Analysis (PCA) and Linear & Quadratic Discriminant Analysis (LDA & QDA)?

  10. If you have a categorical dependent variable, and a mixture of categorical and continuous independent variables, what algorithm, method, or tool would you use for analysis?

  11. Business Analysis: What is the difference between logistic and linear regression? How to avoid local minima (localminima)?

Salesforce

  1. What data and models would you use to measure loss/loss? How to measure the performance of the model?

  2. Please try to explain a machine learning algorithm to non-technical people.

Capital One

  1. How to develop a model to predict credit card fraud?

  2. How to deal with missing or bad data?

  3. How to derive new features from existing features?

  4. In customer gender forecasting, if you only have 100 data points, what problems might arise in your forecast?

  5. Assuming two years of transaction records, what characteristics would you use to predict credit risk?

  6. Please design an AI program that can play Tic-tac-toe.

Zilow

  1. Please explain what overfitting is and how to avoid it.

  2. Why does SVM need to maximize margin between support vectors?

Hadoop

Twitter

  1. How to use Map/Reduce to divide a large graph into small pieces, and parallel edge computing based on the rapid/dynamic changes of the data?

  2. Data engineer: Given a fan list, the format is: 123, 345234, 678345, 123... The first column is the fan ID, and the second column is the fan ID. The goal is to find all mutual fan groups (123, 345 in the example above). How to use Map / Reduce to solve the problem when the list exceeds the memory?

Capital One

  1. Data Engineer: What is Hadoop serialization (serialization)?

Explain a simple Map/Reduce problem.

  1. Explain a simple Map / Reduce problem.

Hive

LinkedIn

  1. Data Engineer: Please write a Hive UDF that outputs sentiment scores. For example, if good=1, bad=-1, and average=0, then the restaurant’s evaluation of "good food, poor service" may have a score of 1-1 = 0

Spark

Capital One

  1. Data Engineer: Please explain how RDD works in Scala in Spark?

Statistics and probability issues

Google

  1. Please explain cross-validation to non-technical personnel.

Describe a non-normal probability distribution and how to applyit.

  1. Please describe the non-normal probability distribution and how it is applied.

Microsoft

  1. Data mining: What is heteroskedasticity and how to solve it?

Twitter

  1. With Twitter user data, how to measure engagement?

Uber

  1. What is Time Series forecasting technology?

  2. Explain principal component analysis (PCA) and the equations it uses.

  3. How to solve multicollinearity (Multicollinearity)?

  4. Please write the equation for optimizing ad spending on Twitter and Facebook.

Facebook

  1. If you draw two cards from a deck, what is the probability of the same suit?

IBM

  1. What is the p-value and confidence interval?

Capital One

  1. Data Analyst: Suppose you have 70 red marbles. The ratio of green to red marbles is 2:7. How many green marbles are there?

  2. What should the daily commute traffic data distribution in New York City look like?

  3. For a dice, which is the highest probability that there will be 1 6 after 6 throws, or at least two 6s after 12 throws, and at least 100 6s after 600 throws?

PayPal

  1. What is the Central Limit Theorem and how to prove it? What is the application?

Programming and algorithms

Google

  1. Data Analyst: Please write a program to determine the height of an arbitrary binary tree.

Microsoft

  1. Please create a function to check whether a word is a palindrome.

Twitter

  1. Please construct a power set.

How do you find the median of a very large dataset?

  1. How to find the median in a huge data set?

Guess you like

Origin blog.csdn.net/weixin_52772147/article/details/112981997