Music recommendation system based on hadoop big data

1 Introduction

Today, I would like to introduce to you a graduation design project to help previous students complete, a music recommendation system based on hadoop big data .

1.4 Hadoop advantages (4 high)

1.5 Hadoop composition (interview focus)

1.5.1 Overview of HDFS architecture

Hadoop Distributed File System
,
HDFS for short
, is a distributed file system

(1
)
NameNode
(
nn
): Store
metadata
of files , such as
file name, file directory structure, file attributes
(generation time, number of copies,

file permissions), and
the block list
of each file and the DataNode
where the block is located , etc.

(2
)
DataNode(dn) : Store the file block data and the checksum of the block data
in the local file system .



(3
)
Secondary NameNode(2nn)
: Back up NameNode metadata
at regular intervals .


 简单的说就是NameNode就相当于一个目录,一个索引,负责标记每一个DataNode的存放位置


 而DataNode才是真正存放数据的,
 Secondary NameNode(2nn) :相当与老板的一个秘书,他会备份
 一部分
 数据,不会备份全部数据。

1.5.2 Overview of YARN architecture

Yet Another Resource Negotiator, YARN
for short , another resource coordinator, is the resource manager of Hadoop .



1.5.3 Overview of MapReduce architecture

MapReduce
divides the calculation process into two stages:
Map
and
Reduce

(1
)
Map
phase processes input data in parallel

(2
)
The Reduce
stage summarizes
the Map results

1.5.4 Relationship between HDFS **, YARN and ** MapReduce

HDFS

YARN

The user submits a task, and the task is sent to the ResourceManager. The ResourceManager will find a node **NodeManager, **open a Container, put the task (App Mstr) in the Container, and the App Mstr will send it to

ResourceManager applies to say how many resources it needs . ResourceManager sees which DataNode has resources. After allocating resources to it, App Mstr will start a computing task (MapTask) on the allocated resource node. This is actually the map stage of MapReduce , and then a Reduce will be returned. to their corresponding nodes, this is the relationship between them

Recommender systems widely exist in various websites, and provide personalized recommendations for users as an application. It needs some historical data of users, and generally consists of three parts: basic data, recommendation algorithm system, and front-end display**. The basic data includes many dimensions, including user access, browsing, ordering, collection, user historical order information, evaluation information and many other information; the recommendation algorithm system is mainly a recommendation model composed of multiple algorithms according to different recommendation demands; the front desk The display is mainly to respond to the client system and return relevant recommendation information for display. **

Basic data mainly includes:

  1. Metadata of the item or content to be recommended, such as keywords, genetic description, etc.;
  2. Basic information of system users, such as gender, age, etc.
  3. The user's preference for items or information, depending on the application itself, may include the user's rating of the item, the record of the user viewing the item, the user's purchase record, etc.

In fact, these user preference information can be divided into two categories:

  • Explicit user feedback: This type is that users explicitly provide feedback information outside of the natural browsing or use of the website, such as user ratings on items or comments on items.
  • Implicit user feedback: This type is the data generated by the user when using the website, which implicitly reflects the user's preferences for the item, such as the user viewing the information of an item, etc.

Explicit user feedback can accurately reflect the user's real preference for items, but requires the user to pay an additional price, while implicit user behavior, through some analysis and processing, can also reflect the user's preference, but the data is not very accurate. The analysis of some behaviors has a lot of noise. But as long as the correct behavior characteristics are selected, implicit user feedback can also get good results, but the choice of behavior characteristics may be very different in different applications. For example, on e-commerce websites, purchase behavior is actually An implicit feedback that can well represent user preferences.


2. Classification of recommendation engines

The classification of recommendation engines can be distinguished according to many indicators:

  1. Distinguish according to target users: According to this indicator, it can be divided into recommendation engines based on public behavior and personalized recommendation engines.
  • The recommendation engine based on public behavior gives each user the same recommendation. These recommendations can be statically set manually by the system administrator, or current popular items calculated based on the feedback statistics of all users of the system.
  • The personalized recommendation engine gives different users more accurate recommendations according to their tastes and preferences. At this time, the system needs to understand the content to be recommended and the characteristics of the users, or based on the social network, by finding the same preferences as the current user users to achieve recommendations.

This is the most basic classification of recommendation engines. In fact, most of the recommendation engines discussed by people are personalized recommendation engines, because fundamentally speaking, only personalized recommendation engines are more intelligent information discovery processes.

  1. Distinguish based on data sources: Recommendations are mainly made based on the correlation between data, because the working principle of most recommendation engines is to make recommendations based on similar sets of items or users.
  • Discover the relevance of users based on the basic information of system users, which is called Demographic-based Recommendation
  • According to the metadata of recommended items or content, the relevance of items or content is found, which is called content-based recommendation (Content-based Recommendation)
  • According to the user's preference for items or information, discover the relevance of the item or content itself, or discover the relevance of the user, this is called Collaborative Filtering-based Recommendation (Collaborative Filtering-based Recommendation).
  1. The root is distinguished according to the recommendation model: it can be imagined that in a system with a large number of items and users, the calculation amount of the recommendation engine is quite large. To achieve real-time recommendations, a recommendation model must be established. The establishment of the recommendation model can be divided into the following Several:
  • Based on the items and users themselves, this recommendation engine treats each user and each item as an independent entity, and predicts each user's preference for each item. This information is often described by a two-dimensional matrix. Since the items that users are interested in are much smaller than the total number of items, such a model leads to a large amount of empty data, that is, the two-dimensional matrix we get is often a large sparse matrix. At the same time, in order to reduce the amount of calculation, we can cluster items and users, and then record and calculate the preference of a type of user for a type of item, but such a model will lose the accuracy of the recommendation.
  • Rule-based Recommendation: The mining of association rules is already a classic problem in data mining, mainly to mine the dependencies of some data. The typical scenario is the "shopping basket problem". By mining, we can find out which items are often purchased at the same time, or which other items users usually buy after purchasing some items. After we mine out these association rules, we can make recommendations to users based on these rules.
  • Model-based Recommendation: This is a typical machine learning problem. Existing user preferences can be used as training samples to train a model that predicts user preferences, so that users can enter the system in the future. Recommendations are calculated based on this model. The problem with this method is how to feed back the user's real-time or recent preference information to the trained model, so as to improve the accuracy of the recommendation.

In fact, in the current recommendation system, there are very few recommendation engines that only use one recommendation strategy. Generally, different recommendation strategies are used in different scenarios to achieve the best recommendation effect. For example, Amazon’s recommendation will be based on The recommendation of the user's own historical purchase data, the recommendation based on the items currently browsed by the user, and the currently popular items based on public preferences are recommended to users in different areas, so that users can find their true feelings from the all-round recommendation. Items of interest.

3. Common recommendation algorithms

So far, in the personalized recommendation system, collaborative filtering technology is the most successful technology. At present, many large websites at home and abroad apply this technology to recommend content more intelligently for users.

3.1. User-based collaborative filtering algorithm

The first generation of collaborative filtering technology is a user-based collaborative filtering algorithm, which has achieved great success in recommendation systems, but it has its own limitations. Because the user-based collaborative filtering algorithm first calculates the similarity between users (like-like interests, people are divided into groups and things are clustered together), and then recommend the items purchased by user A with similar similarity to user B. Professionally speaking, this The algorithm uses the nearest-neighbor (nearest-neighbor) algorithm to find a set of neighbors of a user. The users in this set have similar preferences to the user, and the algorithm predicts the user according to the preferences of the neighbors.

There are two problems with user-based recommendation logic: cold start and huge computation. User-based algorithms only have the chance to recommend items that have been selected (purchased) by the user to other users. On large e-commerce websites, there are too many products, and there are too many items that have not been purchased by a considerable number of users, which directly leads to no chance to recommend them to users. This problem is called collaborative filtering. "Cold start". In addition, because the calculation of user similarity is done by comparing the historical behavior records of the target user with the records of every other user, for an e-commerce website with tens of millions of active users, every calculation of a user involves When it comes to calculations at the level of hundreds of millions, although we can use the clustering algorithm to group users first, the amount of calculation is still large enough.

328583b2c4254e1891fea3967800d398.png

3.2. Item-based collaborative filtering algorithm

The second generation of collaborative filtering technology is an item-based collaborative filtering algorithm, which is basically similar to a user-based collaborative filtering algorithm. It uses all users' preferences for items or information to find the similarity between items, and then recommends similar items to users based on the user's historical preference information. This sounds like a mouthful. To put it simply, if several products are purchased at the same time, it can be considered that these products are similar. Maybe the product names of these products are irrelevant, and the product attributes are very different. No, but after calculating it through the model, they are considered to be similar. What? You feel incredible and incomprehensible. Yes, it is that amazing !

For example: Suppose user A likes item A and item C, user B likes item A, item B and item C, and user C likes item A, from the historical preferences of these users, it can be analyzed that item A and item C are similar , People who like item A like item C. Based on this data, it can be inferred that user C also likes item C, so the system will recommend item C to user C.

e8be891d01e54be498860ca26dd2c2ba.png

In fact, the item-based collaborative filtering recommendation mechanism is a strategy improved by Amazon based on the user-based mechanism, because in most Web sites, the number of items is far smaller than the number of users, and the number of items and The similarity is relatively stable, and the project-based mechanism is better than the user-based real-time mechanism. But not all scenarios are like this. You can imagine that in some news recommendation systems, maybe the number of items, that is, the number of news may be greater than the number of users, and the update degree of news is also very fast, so its The shape is still unstable. Therefore, it can be seen that the choice of recommendation strategy has a lot to do with specific application scenarios.

2 Design outline

This system is based on java technology, uses UML to model, adopts springboot framework combination for design, and Mysql database to store data.

The functions of this system mainly include:

  1. User registration, login,
  2. information maintenance,
  3. member search,
  4. personalized recommendation
  5. administrator for information management, etc.
  6. music management
  7. Collaborative filtering recommendation
  8. hadoop big data analysis data
  9. mapreduce calculation data

3 Key technologies of the system

Use springboot, vue, mysql, mybaties, typescript, html, css, js, etc. for development

4 Development Tools

Development tools mainly include: idea, jdk1.8, maven, mysql5.7, Navicat, etc.

5 code display

@RequestMapping("/strategy")
@RestController
@Scope("prototype")
public class StrategyController {
    
    
    @Autowired
    private StrategyService strategyService;
    @Value("${web.upload-path}")
    private String path;

    @RequestMapping("/findPage")
    public ObjDat<Strategy> findPage(Strategy strategy, @RequestParam(value="page", defaultValue="1") int page, @RequestParam(value="limit", defaultValue="10") int limit){
    
    
        return strategyService.findPage(strategy,page-1,limit);
    }

    @RequestMapping("/edit")
    public JsonResult edit(HttpServletRequest request, Strategy strategy) throws IOException {
    
    
        User user=(User)request.getSession().getAttribute("user");
        if(user==null){
    
    
            return JsonResult.error("请登录");
        }
        String str=strategyService.edit(request,strategy);
        if(str.equals("成功")){
    
    
            return JsonResult.success("操作成功");
        }else{
    
    
            return JsonResult.error("操作失败");
        }
    }

6 System function description

Project function demo
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_42135426/article/details/128472541