Machine learning / recommendation systems / recommendation system algorithm engineer interview guide

Interview guide

  • 1, machine learning / recommendation systems / recommendation system algorithm engineer interviewing skills map
  • 2, knowledge, tools, logic, business face questions
  • 3, resume writing and recruitment needs
  • 4, recommended books Web site

1, machine learning / recommendation systems / recommendation system algorithm engineer interviewing skills map

But aside specific job requirements, look at this issue from the perspective of a slightly higher angle, a machine learning / recommendation system algorithm, the technical quality of R & D engineers can be basically broken down into the following four areas: knowledge, tools, logic, business.

[Pictures of foreign chains dumping fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-tap2Qtke-1583898506454) (/ Users / huxinghui / Library / Application Support / typora-user-images / image-20190707142046599 .png)]

On the basis of the minimum requirements, capacity requirements algorithm engineer is relatively comprehensive. The so-called algorithm engineers, because you should not only be a qualified "engineer" should also once again improved and the ability to achieve the algorithm basis. In addition, large data engineers pay more attention to improve the big data tools and platforms, researchers at the level of knowledge and logic relative prominence.

Here are the relevant requirements, general requirements, whether it is to do not want off the field. Then a thorough job recommendations in specific areas such as systems engineer, should have some specific competencies:

Knowledge : mainly refers to your knowledge and reserves the ML theory (40%)

  • Depth learning machine learning knowledge base of relevant knowledge +
  • CTR as the mainstream model, principles and technical details of the recommendation algorithm model

Tools : ML knowledge to your actual business framework tools (30%)

  • Such as coding ability, spark, tensorflow, serving tools

Logic : the base algorithm logically related (10%)

  • Common basic arithmetic problems, evolutionary relationships between the logical question, model, consider giving top priority to capacity

Business : in-depth understanding of the industry's business model, and the ability to find out of the business model algorithm improvements (20%)

  • According to the article, such as building a model recommended scenarios and requirements, understanding business trends, according to the business model target

2.1 recommendation system algorithm pen-related interview questions

11. If you use one of the other models replace or improve XGBoost XGBoost how would you do and why? (Business logic + + knowledge)

1, collaborative filtering (based on items and user-based) principle, ItemCF, UserCF, SVD matrix decomposition must be explained?

2, the following methods recommended, recommendation result is best diversity :( B)

A. Content-based recommendation

B. User-based collaborative filtering recommendation

C. collaborative filtering recommendation based on article

Hot Offers in D.

3, describe the principles used at least two algorithms familiar recommendation system (such as: filtering, matrix decomposition, etc.)

4, collaborative filtering recommendation system is often used, comprising a memory-based collaborative filtering, collaborative filtering model and a hybrid model. The following statement is not correct (C)

  • Filtering recommendation based on the user's collaborative filtering recommendation based on collaborative filtering items are memory-based collaborative
  • It combines the advantages of both models hybrid model, typically to achieve due to the effect of both
  • Can better solve the problem of cold start memory-based collaborative filtering
  • Memory-based collaborative filtering is relatively simple, the new data can be more easily added

5, collaborative filtering recommendation system is composed of a very classical algorithm, it divided based on collaborative filtering and user-based collaborative filtering items. Its essence is to predict the user's interest by calculating a similarity between the user and the user, with the article or articles similarity, recommend further related items to the user. Please use the above knowledge to answer the following questions:

(1) existing five user A, B, C, D, E; three items X, Y, Z; by analyzing a user's shopping history and people portrait label on the site, the analysis of the various user interest in various items index. Interest Index, see the following table:

X Y FROM
A 3 4 3
B 2 4 4
C 3 5 4
D 2 2 3
E 4 1 4

E is now recommended camera requires a user known A, B, C, D three kinds of interest in the camera M, N, O scoring follows:

M N O
A 3 4 3
B 5 1 2
C 2 5 5
D 4 2 3

Please recommend the best order given to E camera, giving a detailed answer process

(2) In the above-mentioned problems, we can find, collaborative filtering for historical data users rely on strong, then for the cold start problem, what kind of better solution?

6, video testimonials scene too focused video recommendation tends to undermine the user experience, so that the system will bring a sense of surprise discovery to the user through a certain degree of randomness. Assuming a recommended scene, the calculated A and B are two user video from the current user matching were 0.8 points and 0.2 points, system A generates a random evenly distributed to the final score of 0 to 0.8, in order to generate B a uniform distribution in the final score of 0 to 0.2, then the probability score of the final fraction B is greater than a is (B)

1/2、 1/8、1/16、1/4

7, is often used in collaborative filtering recommendation system, including memory-based collaborative filtering, and a hybrid model based on collaborative filtering model, the following statement is correct

  • Model-based collaborative filtering can better handle sparse data problem
  • Content-based collaborative filtering unwanted information item of the model
  • Can better solve the problem of cold start memory-based collaborative filtering
  • Memory-based collaborative filtering is relatively simple, the new data can be more easily added

8, talk about the matrix decomposition

9, briefly word2vec; talk sliding window size parameter, and a negative number of samples and setting the ratio; learn how to measure the quality of the embedding

10, talk about the recommendation system algorithm can be divided into what kinds: (1) based on the content; (2) based on collaborative filtering: Based on memory (UB IB); based on model (MF)

11, LR derivation process

12, illustrating the structure is how the stored? With this configuration you do achieve FIG depth / breadth first traversal, a depth-first traversal achieved with a stack structure; first traversal achieved by the queue structure

13, a detailed description of the work, draw out the overall framework?

14, Random Forests have to understand it? I know there are sampling methods back of it? Given n balls, sampled with replacement. When n tends to infinity when a ball is not to take what is the probability?

15, keyword extraction how? TF-IDF have improved it? How to improve? And TextRank difference?

16, UserCF, ItemCF formula? The principle difference? Based on the difference between the recommended content?

2.2 machine learning-related issues

Key:

  • Linear regression, logistic regression
  • Decision tree related algorithms: decision trees, random forest, GBDT, XGboost
  • Clustering algorithms related
  • Neural Networks: NN relevant basic principles
  • Optimization Algorithm: regularization, gradient descent, etc.

1.GBDT principle ** (knowledge) **

2. How to split a tree node feature selection? (know how)

3. Write and Gini Index Information Gain formulas and exemplified

(know how)

4. What is the difference tree classification and regression tree is? (know how)

5. compare and Random Forest, and in order to explain what model 6.Bias and Variance ** (knowledge) **

Parameter tuning 7.XGBoost of what experience ** (tool) **

8.XGBoost regularization is ** (tool) how to achieve **

Parallelization part 9.XGBoost is how to achieve ** (tool) **

10. Why is predicted severe ups and downs of the stock OVERFITTING usually appear

(business)

What is the definition 1.softmax function is? (know how)

2. Neural Network Why would disappear gradient phenomenon? (know how)

3. What are common activation function? What are the characteristics? (know how)

4. Pick an activating function of the derivation of the gradient descent. (Knowledge + logic)

What 5.Attention mechanism? (know how)

6. Ali is how to introduce the recommended model of attention mechanisms? (Knowledge + Business)

7.DIN what business logic is based on the introduction of attention mechanisms? (business)

Users will 8.DIN and commodities were embedding, please speak clearly two 9. embedding method you know. (know how)

10. How do you serving similar DIN such depth learning model ** (Tool + business) **

To find relevant information to find more interview questions: "Hundred face machine learning"

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

2.3 framework

  • Big Data relevant framework: spark, hbase, Hive, kafka
  • Deep learning framework: TensorFlow

2.4 Business Process

  • Project Summary

3, resume writing guide the project

[Pictures of foreign chains dumping fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-SqzZrOGr-1583898506457) (/ Users / huxinghui / Library / Application Support / typora-user-images / image-20190707142936815 .png)]

[Pictures of foreign chains dumping fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-Ng4WcsKx-1583898506458) (/ Users / huxinghui / Library / Application Support / typora-user-images / image-20190707142956053 .png)]

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-5PPr06i6-1583898506458) (/ Users / huxinghui / Library / Application Support / typora-user-images / image-20190707143034611 .png)]

Second, the acquisition of skills (based on employment orientation selectivity increases modify delete)

1, machine learning relevant, model, algorithm theory, feature processing

2, deep learning recommendation, recommend related basis

3, large data frame, using a database

  • Numpy skilled use scientific computing tool, Pandas data analysis package, Matplotlib data visualization tools for data manipulation and the like, acquisition, processing, cleaning, visualization, structured set of data.
  • Familiar Scikit-learn machine learning framework, master K- neighbors, LinearRegression, LogisticsRegression, RidgeRegression, LassoRegression, Decision Tree, Bayes, SVM, K-Means machine learning algorithm.
  • Familiar with the PCA for data dimensionality reduction.
  • Proficiency in the use of project characteristics (REF, chi2).
  • The basic processing method familiar with the data set (null handling, data normalization and standardization)
  • Familiar underfitting, over fitting causes and solutions generated.
  • Skilled application grid search elements, cross-validation, the confusion matrix model parameter tuning and evaluation model.
  • Familiar and popular integration of classification algorithms, such as RF (Bagging), GBDT (Boosting) algorithm.
  • Use familiar hadoop of building and related components (Yarn, Hdfs, MapReduce), development;
  • Flume skilled use of data acquisition tool;
  • Familiar with database using hbase, mysql, etc., as well as the preparation of Hivesql;
  • Familiar Kafka message processing tool;
  • Familiar spark, spark sql, spark streaming architecture model and use;
  • Skilled use of linux system, linux familiar conventional shell command, you can set up a development environment linux system;
  • Master the use of the data migration tool sqoop;
  • Familiar depth learning framework TensorFlow.

Third, the project description written (modified according to characteristics of the project increased selectivity deleted):

1, item description

sample:

Dark Horse headline recommendation system built on the huge number of users and massive article, using the Lambda architecture to integrate real-time computing and offline computing, with the distributed computing environment to enhance capacity; using click, browse, collection and other acts Flume collection of users, establish user portrait and article portraits, and stored in HDFS cluster; computing establish HIVE characteristic center offline Spark SQL, stored in HBase cluster; intelligent recommendation by ALS, LR, Wide & deep such as machine learning and deep learning, recommendation algorithm to achieve customer thousand faces of thousands of people recommended effect.

Project Description: This project is a personalized recommendation system. The project is offline recommendation based, real-time recommendations, supplemented by a combination of collaborative filtering and content-based recommendation to improve the user experience and increase user stickiness and time. The main processes include business data processing, modeling portrait books, log data processing, real-time recommendations and other parts.

project description

Project Description: The home provides a subject of discussion for users, store service, platform customer surveys, community functions. Love is still home advertising recommendation system designed to improve the user conversion rate, improve corporate earnings, increase the user experience, including the creation of a major ALS model, product recall, to achieve CTR estimates based on logistic regression, offline caching of data processing, real-time recommendations, etc.

Project Description: The project mainly through the collection of user behavior data, users often listen to the concerns and content, as well as the user's age distribution, terminal equipment, the establishment of user portraits for each user, through training model, complete recommendation system. The purpose is to achieve a more accurate push, without affecting business, for accurate Push, increased user stickiness.

With the growing number of video sites, in order to better meet customer experience degrees, movie recommendations we can solve this problem. According to the type of movie, ratings, plus information on the user's gender, age, multi-dimensional data analysis, users may achieve the recommended favorite movies, to achieve the same type of recommended movies, build customer personalized recommendation system, giving users a good of

Classification, analysis and recommendation results, and data for the user's behavior and constantly improve the user-portrait;

The typical social networking friend recommended design and development projects. The system is in the user's point of view, quickly find their like-minded and users are likely to become friends. Friends recommended items include second degree, have a common circle, the common interests of friends, visited the recommended strategies such as personal profile; up to people shopping, read circle, then non-personalized recommendation label. Whether new users or old users can quickly set up their own social circle in the system to enhance the user viscosity.

The project is a typical electrical projects recommended by the manufacturer, the system is primarily recommended for users of commodities, allowing users to find more speed you want to open, and users may purchase consumer goods, this project is based on user behavior, including user's browsing history the corresponding weights, collection, sharing, buying and other acts, each user's behavior given the heavy application of collaborative filtering personalized recommendation; recent sales ranking, collection of indices, such as non-personalized recommendation by historical sales, by the user's default collection recommended goods store address filtering

2, infrastructure projects and technical points

Project architecture: Flume + Kafka + HDFS + Spark Streaming + Spark SQL + HBase + TensorFlow

1, using the flume log data collected, the collected user behavior data to HDFS

2, by Flume collected user behavior data to the HDFS;

3, Kafka docking Flume user behavior collected log to the message queue;

4, Spark Streaming real-time processing Kafka transmitted click logs, updated in real time feature, updated in real time to recall set;

1. Clickstream log data acquired through the Flume, and access to mysql static data sqoop;

2. Hdfs save data to construct the data warehouse hive;

3. Reading the data file by spark, for processing data;

4. Eventually save data to Hbase in to save and redis in hbase

5, spark offline files saved tfrecords

6, using the model into the training TensorFlow estimator

7, TF serving the model deployment

10, the article text keyword, keyword construct, TFIDF, TextRank use

11, the text vector is calculated word2vec

12, into the tag storage Hbase, tags and labels and historical comparison, the attenuation coefficient merged

3, Project Business Development:

Social class is recommended, recommended by the electricity supplier of goods, information classes recommendation,

4, recommended books Web site

books

"Recommendation system system and deep learning."

"Hundred face machine learning"

"Machine learning" - Zhou Zhihua

Web site paper:

Understand the dynamics of the model changes of the latest recommended system papers. Several new year to learn new structure

Interview questions, recommended the Community:

Published 698 original articles · won praise 929 · views 120 000 +

Guess you like

Origin blog.csdn.net/qq_35456045/article/details/104793376