Share news recommendation based on the idea of collaborative filtering

Directory Structure

  Probability and deployment 1, the recommendation system

  2, news recommendation system features analysis

  3, collaborative filtering algorithm analysis

  4, System Evaluation and Safety

             

 

 

I: Overview and deployment recommendation system

    First, a clear concept of what the system is a recommendation, or say what kind of solve a problem.

        News recommendation system to address is the relationship between the consultation, user and the environment, as shown by the user characteristics, environmental characteristics, feature articles do a comprehensive analysis of the most appropriate and effective content recommended to the user.

              

 

 

    Recommended positioning system service platform

        As the saying can not make bricks without straw, not just news recommendation system, almost all models of artificial intelligence are inseparable from the support of big data components.

        To do a "thousand thousand faces of the recommendation system", needs the support of big data, big data components may be involved, such as spark, Hadoop, support components ES, Kafka and so on, do not do a detailed analysis, each good or bad, need to choose to use actual business environment, not the best, but to be the most suitable.

        Recommended coarse positioning system in the whole system is as follows.

             

 

     Recommended system core module

         The system is a closed loop, starting from data collected by the subject matter of doing ETL process, after doing feature after feature project, referred to recommend training algorithm processing, forecasting, generate recommended results, it is recommended to submit the results to the user after user feedback, and as constantly optimize the data model is updated. As shown in FIG.

            

 

      Common features analysis

          Handling characteristics, helps recommendation accuracy, features generally contain relevant features, environment variable characteristics, heat characteristics, collaborative features. Choose to use depending on the particular features of business logic.

              

 

     Recall, ordering policy

         How to generate results for recall, also a need to consider the sort of problems, personal advice execute multiple recall, avoiding the limitations of a single algorithm, an effective solution to generate more and more monotonous recommended range of issues.

         In particular by different routes right to recall reselection time decay down the right, to complete the hot spots punishment.

      

 

 

 Two: News Recommended features analysis

    Recommended features news, destined to need NLP Alchemist Stone.

     Application recommended in the news in text analysis

            User Interests: Games marked with tags for users who like Games related articles.

            Content recommendation: like [the film] recommended to the relevant user like [the film].

            Channel Generation: The [Finance] article [CNBC] to classify.

     Text feature

            News particularity products, destined to the need for real-time content, resulting in insufficient historical data on consumption, there is no text features cold start problems.

            General classification using artificial, entity word and keyword explicit feature, using an implicit feature portion (e.g., LSA).

              

 

 

      Text Feature Value:

              Fine-grained start ability, such as: [] and [Forbidden City Travel Guide Beijing travel Raiders.

              No text feature, search engines can not work.

 

      Text features of the model:

              Personal recommendation bert, bert in question answering system, sentiment analysis, named entity recognition, spam filtering, document clustering and other outstanding achievements, which are helpful to the news recommended work.

              News BUY needs: keywords get, grab theme subsequent filtering junk news.

                  

 

         

            In addition, LDA topic model generated based on an article in doing a similar recommendation good results, consider using

                  

 

 

Three: Select collaborative filtering algorithm analysis

      A typical recommendation algorithm

          This article focuses on collaborative filtering-based recommender system, not to compare other algorithms, explained.

          

 

 

      Comparison of two classical collaborative filtering algorithm

          Collaborative filtering can be divided into user-based collaborative filtering, collaborative filtering items based on usage scenarios shall own analysis, comparison of two in the back do.

            

 

     Collaborative filtering items based on

          Analysis: electricity providers and other items to the main application scenarios.

          Features: The number of users is greater than the number of items, it is recommended by analyzing items.

              

 

    Collaborative filtering based on the user

        Scenario: news and other content-based platform

        Features Analysis: The amount of content is far greater than the number of users, the recommended content by analyzing the user.

            

 

     Person image

         We need a little understanding of user-based collaborative filtering user, if the in-depth analysis, can be considered to generate portraits.

        Demographic attributes - who the user is (gender, age, etc.).

        Interest preferences - personal preference, brand preferences.

        Social property - social activity.

        Consumer Properties - consumer demand, spending habits.

              

 

 

      Several calculation used to determine the degree of similarity.

          Introduction to the above information, to sum up. Ultimately collaborative filtering, calculating a user, items with other users of similar articles, a process for the similar user, with relevant content items recommended.

          For this "similar", it is generally used several methods to calculate

          Euclidean distance:

          

 

          Pearson correlation coefficient:

          

          cosine relevance:

          

 

     Comparative analysis:

          Two recommended here to draw a comparison algorithm, we do not do other repeat them.

          

 

 

 

    SparkMl Mahout and contrast options:

        SparkMl:                           Mahout:

        Language: Scala, Python, Java language: Java

        Positioning: Spark in machine learning library Positioning: Java library

        Art: Conventional machine learning algorithm such as CF2, characterized in engineering, data processing. Field: CF, clustering, classification

        Here personally recommend using sparkMl, because the development of comprehensive documentation, ease of reference, and more mature.

 

 Four: Assessment and Safety System

    Factors that may affect the assessment of the effect and points:

        Improvements recommended system architecture, recall model improvement, increase the recommended characteristics, algorithm parameter optimization.

        A system to build successful, it is important that, not only to see if he could use, but also how he practical effect, and regardless of improvements.

        Note: both long-term and short-term indicators, attention to the impact of synergies, the isolation statistics if necessary.

 

 

    A / B Test Evaluation provided:

        As recommended by the system as closed loop operation, A / B test is a closed loop workflow. By analyzing test data, suggest improvements added to the test the idea, the effective use of products, ineffective lessons.

        

 

 

    A / B test implementation principle

         Popular speaking, follow the principle of a single variable, expected to clear their goal, then the goal of complete control of a single variable, the implementation of the grayscale test, through the implementation of the feedback correction scheme.

        

 

    Risk identification and content security:

        A news recommendation system also needs to be at the same time recommended on the news content check, take the filter is not recommended strategy for some low-quality content, ensuring efficient and useful news recommended. It should be noted, such as the following points:

        Low quality filter recommendation information (e.g., user marking mass is a difference).

        Illegal content filtering.

        Pulp Content ID filtered.

 

 

 

 

  These are my general perception of a system in which part of the picture due to personal draw really ugly and select the network map. In addition, I hope you more valuable advice on the wrong part I understand, welcome criticism and much appreciated.

Guess you like

Origin www.cnblogs.com/Hsir/p/11440983.html