Directory Structure
Probability and deployment 1, the recommendation system
2, news recommendation system features analysis
3, collaborative filtering algorithm analysis
4, System Evaluation and Safety
I: Overview and deployment recommendation system
First, a clear concept of what the system is a recommendation, or say what kind of solve a problem.
News recommendation system to address is the relationship between the consultation, user and the environment, as shown by the user characteristics, environmental characteristics, feature articles do a comprehensive analysis of the most appropriate and effective content recommended to the user.
Recommended positioning system service platform
As the saying can not make bricks without straw, not just news recommendation system, almost all models of artificial intelligence are inseparable from the support of big data components.
To do a "thousand thousand faces of the recommendation system", needs the support of big data, big data components may be involved, such as spark, Hadoop, support components ES, Kafka and so on, do not do a detailed analysis, each good or bad, need to choose to use actual business environment, not the best, but to be the most suitable.
Recommended coarse positioning system in the whole system is as follows.
Recommended system core module
The system is a closed loop, starting from data collected by the subject matter of doing ETL process, after doing feature after feature project, referred to recommend training algorithm processing, forecasting, generate recommended results, it is recommended to submit the results to the user after user feedback, and as constantly optimize the data model is updated. As shown in FIG.
Common features analysis
Handling characteristics, helps recommendation accuracy, features generally contain relevant features, environment variable characteristics, heat characteristics, collaborative features. Choose to use depending on the particular features of business logic.
Recall, ordering policy
How to generate results for recall, also a need to consider the sort of problems, personal advice execute multiple recall, avoiding the limitations of a single algorithm, an effective solution to generate more and more monotonous recommended range of issues.
In particular by different routes right to recall reselection time decay down the right, to complete the hot spots punishment.
Two: News Recommended features analysis
Recommended features news, destined to need NLP Alchemist Stone.
Application recommended in the news in text analysis
User Interests: Games marked with tags for users who like Games related articles.
Content recommendation: like [the film] recommended to the relevant user like [the film].
Channel Generation: The [Finance] article [CNBC] to classify.
Text feature
News particularity products, destined to the need for real-time content, resulting in insufficient historical data on consumption, there is no text features cold start problems.
General classification using artificial, entity word and keyword explicit feature, using an implicit feature portion (e.g., LSA).
Text Feature Value:
Fine-grained start ability, such as: [] and [Forbidden City Travel Guide Beijing travel Raiders.
No text feature, search engines can not work.
Text features of the model:
Personal recommendation bert, bert in question answering system, sentiment analysis, named entity recognition, spam filtering, document clustering and other outstanding achievements, which are helpful to the news recommended work.
News BUY needs: keywords get, grab theme subsequent filtering junk news.
In addition, LDA topic model generated based on an article in doing a similar recommendation good results, consider using
Three: Select collaborative filtering algorithm analysis
A typical recommendation algorithm
This article focuses on collaborative filtering-based recommender system, not to compare other algorithms, explained.
Comparison of two classical collaborative filtering algorithm
Collaborative filtering can be divided into user-based collaborative filtering, collaborative filtering items based on usage scenarios shall own analysis, comparison of two in the back do.
Collaborative filtering items based on
Analysis: electricity providers and other items to the main application scenarios.
Features: The number of users is greater than the number of items, it is recommended by analyzing items.
Collaborative filtering based on the user
Scenario: news and other content-based platform
Features Analysis: The amount of content is far greater than the number of users, the recommended content by analyzing the user.
Person image
We need a little understanding of user-based collaborative filtering user, if the in-depth analysis, can be considered to generate portraits.
Demographic attributes - who the user is (gender, age, etc.).
Interest preferences - personal preference, brand preferences.
Social property - social activity.
Consumer Properties - consumer demand, spending habits.
Several calculation used to determine the degree of similarity.
Introduction to the above information, to sum up. Ultimately collaborative filtering, calculating a user, items with other users of similar articles, a process for the similar user, with relevant content items recommended.
For this "similar", it is generally used several methods to calculate
Euclidean distance:
Pearson correlation coefficient:
cosine relevance:
Comparative analysis:
Two recommended here to draw a comparison algorithm, we do not do other repeat them.
SparkMl Mahout and contrast options:
SparkMl: Mahout:
Language: Scala, Python, Java language: Java
Positioning: Spark in machine learning library Positioning: Java library
Art: Conventional machine learning algorithm such as CF2, characterized in engineering, data processing. Field: CF, clustering, classification
Here personally recommend using sparkMl, because the development of comprehensive documentation, ease of reference, and more mature.
Four: Assessment and Safety System
Factors that may affect the assessment of the effect and points:
Improvements recommended system architecture, recall model improvement, increase the recommended characteristics, algorithm parameter optimization.
A system to build successful, it is important that, not only to see if he could use, but also how he practical effect, and regardless of improvements.
Note: both long-term and short-term indicators, attention to the impact of synergies, the isolation statistics if necessary.
A / B Test Evaluation provided:
As recommended by the system as closed loop operation, A / B test is a closed loop workflow. By analyzing test data, suggest improvements added to the test the idea, the effective use of products, ineffective lessons.
A / B test implementation principle
Popular speaking, follow the principle of a single variable, expected to clear their goal, then the goal of complete control of a single variable, the implementation of the grayscale test, through the implementation of the feedback correction scheme.
Risk identification and content security:
A news recommendation system also needs to be at the same time recommended on the news content check, take the filter is not recommended strategy for some low-quality content, ensuring efficient and useful news recommended. It should be noted, such as the following points:
Low quality filter recommendation information (e.g., user marking mass is a difference).
Illegal content filtering.
Pulp Content ID filtered.