1 Introduction

With the rapid development of the Internet, the amount of network data continues to increase, and we have entered the era of big data. A large amount of information and products are presented in front of users at the same time, which makes us face a serious problem - information overload, and personalized recommendation is an effective way to solve this problem. Collaborative filtering method is currently the most widely used personalized recommendation system. Recommendation method, which provides recommendations to users based on group preferences. The traditional stand-alone collaborative filtering algorithm can no longer meet the processing needs of massive information in terms of efficiency and computational complexity. The development of cloud computing technology provides a new research direction for recommendation algorithms. Therefore, it can be considered to use the combination of big data technology to deal with problems such as algorithm scalability. This paper researches and implements the clustering collaborative filtering recommendation algorithm based on Hadoop big data processing technology, and analyzes the application of movie data sets. It mainly studies the two main frameworks of Hadoop, classic clustering algorithms and related concepts of recommendation algorithms; proposes a distributed clustering collaborative filtering recommendation algorithm based on Hadoop big data processing technology to solve the problem of collaborative filtering algorithms dealing with data sparsity and algorithm The expansibility of the data; the matrix decomposition method is used to preprocess the initial data for the sparsity of the data, and the preprocessed data is used to establish a clustering model through a clustering algorithm

2 Design outline

1 Introduction

The design and implementation of recommendation systems also face such problems. Many existing user data are obtained from website logs, and for e-commerce websites with very large traffic, the amount of data is very large. Usually, if user information and commodity data are processed by a single machine, it is an impossible task. Therefore, the algorithm and data storage in the recommendation system need a distributed framework for processing. Among the big data processing frameworks currently emerging, Hadoop is undoubtedly one of the most popular frameworks, HDFS distributed storage framework and MapRedu.

2 Hadoop platform research

Hadoop is a distributed system infrastructure and a software platform that can be more easily developed and run to process large-scale data. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance (fault-tolerant) and is designed to be deployed on low-cost (low-cost) hardware. And it provides high throughput (high throughput) to access application data, suitable for those applications with large data sets (large data set). HDFS relaxes (relax) POSIX requirements (requirements) so that data in the file system can be accessed in the form of streaming (streaming access).

HDFS adopts the master-slave architecture mode. There are two kinds of nodes in HDFS: a name node (namenode) and multiple data nodes (datanode).

The name node is responsible for managing the namespace of the file system, maintaining the file system structure tree and all files and index directories in it. In addition, the name node also records the location of the data node where each data block of each file is located, and this information will be re-established each time the system starts. Client users access the entire file system by accessing the name node and obtaining the location of the required data in the corresponding data node. Therefore, users do not need to know the name node and data node and their location when programming.

3 Hybrids of Recommendation Algorithms

Since various recommendation methods have advantages and disadvantages, in practice, Hybrid Recommendation is often used. The most researched and applied is the combination of content recommendation and collaborative filtering recommendation.

4 Design of hybrid recommendation system based on Hadoop platform

The whole system is implemented based on Hadoop. Hadoop is a distributed big data computing system, which is mainly composed of Master stage and DataNode nodes. The Master node is responsible for managing the entire distributed system including the calculation of MapRedcue tasks and the element Data management (usually managed by the Master, can also be set separately), and the DataNode is responsible for storing data and computing Map tasks and Reduce tasks. The working mechanism of Hadoop has been similarly explained in Chapter 3. Other working modules are implemented depending on Hadoop, as shown in Figure 5-6. Each module first requests the Master when starting data processing and data access (obtaining HDFS or HBase data), and interacts with the DataNode after the Master processes the request.

The algorithm inside the recommendation engine is usually faced with a large amount of data and the algorithm can be divided into parallel calculations. The algorithm is divided into Map and Reduce, and the key and value are designed. The similar design of the algorithm will be described according to the specific algorithm in the implementation part of the recommendation system. The use of Hadoop usually uses the client package packaged by Hadoop to make calls. The client makes a request to the Master, and then divides the Map tasks according to the data, and assigns each Map task to a different node to run. After running, the reduce task pulls the corresponding data results to continue the calculation. Generate the final result after completion.

5 Conclusion

For the design of Hadoop-based recommendation system, some ideas of software design patterns are used to guide the design. For example, the use of strategy mode, factory mode and so on. These are mainly described in the implementation part of the algorithm. In addition, the whole system is designed in layers.

In the realization part, the paper focuses on the realization of the data preprocessing module and the realization of the recommendation engine, and uses the strategy pattern to realize the scalability of the recommendation engine. This paper also describes the implementation of each recommendation engine in detail.

The thesis can also explore and research the following aspects:

(1) The cold start of the system, in this regard, user registration data and product data can be used to make recommendations using a content-based recommendation system. But the specific implementation needs to be studied.

(2) Real-time performance was not considered when making recommendations for users. Subsequent research work will focus on how to consider real-time performance based on extracting log data.

3 Key technologies of the system

Use springboot, vue, mysql, mybaties, typescript, html, css, js, etc. for development

4 Development Tools

Development tools mainly include: idea, jdk1.8, maven, mysql5.7, Navicat, etc.

5 code display

@RequestMapping("/strategy")
@RestController
@Scope("prototype")
public class StrategyController {
    
    
    @Autowired
    private StrategyService strategyService;
    @Value("${web.upload-path}")
    private String path;

    @RequestMapping("/findPage")
    public ObjDat<Strategy> findPage(Strategy strategy, @RequestParam(value="page", defaultValue="1") int page, @RequestParam(value="limit", defaultValue="10") int limit){
    
    
        return strategyService.findPage(strategy,page-1,limit);
    }

    @RequestMapping("/edit")
    public JsonResult edit(HttpServletRequest request, Strategy strategy) throws IOException {
    
    
        User user=(User)request.getSession().getAttribute("user");
        if(user==null){
    
    
            return JsonResult.error("请登录");
        }
        String str=strategyService.edit(request,strategy);
        if(str.equals("成功")){
    
    
            return JsonResult.success("操作成功");
        }else{
    
    
            return JsonResult.error("操作失败");
        }
    }