Java学习(Day 27)

学习来源:日撸 Java 三百行(51-60天,kNN 与 NB)_闵帆的博客-CSDN博客

基于 M-distance 的推荐

一、算法概述

所谓 M-distance, 就是根据平均分来计算两个用户 (或项目) 之间的距离.

令项目的 j j j 的平均分为 x . j x_{.j} x.j

采用 item-based recommendation, 则第 j j j 个项目关于第 i i i 个用户的邻居项目集合为

N i j = { 1 ≤ j ′ ≤ m ∣ j ′ ≠ j , p i j ′ ≠ 0 , ∣ r . j ‾ − r . j ′ ‾ ∣ < δ } N_{ij} = \{ 1 \le {j}' \le m | {j}' \neq j, p_{i{j}'} \neq 0, \left | \overline {r_{.j}} - \overline {r_{.{j}'}} \right | < \delta \} Nij={ 1jmj=j,pij=0,r.jr.j<δ}

i i i 个用户对 j j j 个项目的评分预测为

p i j = ∑ j ′ ∈ N i j r i j ′ ∣ N i j ∣ p_{ij} = \frac{\textstyle \sum_{ {j}' \in N_{ij}}r_{i{j}'}} {\left | N_{ij} \right |} pij=NijjNijrij

二、算法特点

邻居不用 k k k 控制. 距离小于 radius ( 即 δ \delta δ ) 的都是邻居. 使用 M-distance 时, 这种方式效果更好.

使用 leave-one-out 的测试方式, 很高效的算法才能使用这种方式. 这种算法的特点就是只用一个做为测试, 其他全部用于训练.

三、算法图解

u i ( i = 0 , 1 , 2 , 3 , 4 ) u_i (i=0,1,2,3,4) ui(i=0,1,2,3,4) 表示不同用户, m i ( i = 0 , 1 , 2 , 3 , 4 ) m_i (i=0,1,2,3,4) mi(i=0,1,2,3,4) 表示不同电影.

表格中的数字表示不同用户对不同电影的评分.

n u m num num 表示对于某一个电影已有数据的个数, s u m sum sum 表示对某个电影已有数据的总和.

r ‾ \overline{r} r 表示平均评分, 即 ( s u m n u m \frac{sum}{num} numsum)

图中 0 占大多数, 表示还未能看过电影. 我们就是需要对这些数据进行预测.

在预测时只取差值小于 δ \delta δ 的那一列数据. 如果列中对需推荐用户的值为 0 则不带入计算.

四、代码分析

1. 数据集分析

评分表 (用户, 项目, 评分) 的压缩方式给出
前几行数据为:

0,0,5
0,1,3
0,2,4
0,3,3
0,4,3
0,5,5
0,6,4
…
1,0,4
1,9,2
1,12,4

其中, “0,2,4” 表示用户 0 对项目 2 的评分为 4. 用户 1 对项目 1、2 等的评分没有, 表示没看过该电影. 在用户数、项目数很多时, 必须使用压缩存储.

2. 完整代码

package knn;

import java.io.*;

/**
 * Recommendation with M-distance.
 *
 * @author Shihuai Wen Email: [email protected].
 */
public class MBR {
    
    
    /**
     * Default rating for 1-5 points.
     */
    public static final double DEFAULT_RATING = 3.0;

    /**
     * The total number of users.
     */
    private int numUsers;

    /**
     * The total number of items.
     */
    private int numItems;

    /**
     * The total number of ratings (non-zero values)
     */
    private int numRatings;

    /**
     * The predictions.
     */
    private double[] predictions;

    /**
     * Compressed rating matrix. User-item-rating triples.
     */
    private int[][] compressedRatingMatrix;

    /**
     * The degree of users (how many item he has rated).
     */
    private int[] userDegrees;

    /**
     * The average rating of the current user.
     */
    private double[] userAverageRatings;

    /**
     * The degree of users (how many item he has rated).
     */
    private int[] itemDegrees;

    /**
     * The average rating of the current item.
     */
    private double[] itemAverageRatings;

    /**
     * The first user start from 0. Let the first user has x ratings, the second
     * user will start from x.
     */
    private int[] userStartingIndices;

    /**
     * Number of non-neighbor objects.
     */
    private int numNonNeighbors;

    /**
     * The radius (delta) for determining the neighborhood.
     */
    private double radius;

    /**
     * ************************
     * Construct the rating matrix.
     *
     * @param paraFilename the rating filename.
     * @param paraNumUsers       number of users
     * @param paraNumItems       number of items
     * @param paraNumRatings     number of ratings
     * ************************
     */
    public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
    
    
        // Step 1. Initialize these arrays
        numItems = paraNumItems;
        numUsers = paraNumUsers;
        numRatings = paraNumRatings;

        userDegrees = new int[numUsers];
        userStartingIndices = new int[numUsers + 1];
        userAverageRatings = new double[numUsers];
        itemDegrees = new int[numItems];
        compressedRatingMatrix = new int[numRatings][3];
        itemAverageRatings = new double[numItems];

        predictions = new double[numRatings];

        System.out.println("Reading " + paraFilename);

        // Step 2. Read the data file.
        File tempFile = new File(paraFilename);
        if (!tempFile.exists()) {
    
    
            System.out.println("File " + paraFilename + " does not exists.");
            System.exit(0);
        } // Of if
        BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
        String tempString;
        String[] tempStrArray;
        int tempIndex = 0;
        userStartingIndices[0] = 0;
        userStartingIndices[numUsers] = numRatings;
        while ((tempString = tempBufReader.readLine()) != null) {
    
    
            // Each line has three values
            tempStrArray = tempString.split(",");
            compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
            compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
            compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);

            userDegrees[compressedRatingMatrix[tempIndex][0]]++;
            itemDegrees[compressedRatingMatrix[tempIndex][1]]++;

            if (tempIndex > 0) {
    
    
                // Starting to read the data of a new user.
                if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
    
    
                    userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
                } // Of if
            } // Of if
            tempIndex++;
        } // Of while
        tempBufReader.close();

        double[] tempUserTotalScore = new double[numUsers];
        double[] tempItemTotalScore = new double[numItems];
        for (int i = 0; i < numRatings; i++) {
    
    
            tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
            tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];
        } // Of for i

        for (int i = 0; i < numUsers; i++) {
    
    
            userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
        } // Of for i
        for (int i = 0; i < numItems; i++) {
    
    
            itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
        } // Of for i
    }// Of the first constructor

    /**
     * ************************
     * Set the radius (delta).
     *
     * @param paraRadius The given radius.
     * ************************
     */
    public void setRadius(double paraRadius) {
    
    
        if (paraRadius > 0) {
    
    
            radius = paraRadius;
        } else {
    
    
            radius = 0.1;
        } // Of if
    }// Of setRadius

    /**
     * ************************
     * Leave-one-out prediction. The predicted values are stored in predictions.
     *
     * @see #predictions
     * ************************
     */
    public void leaveOneOutPrediction() {
    
    
        double tempItemAverageRating;
        // Make each line of the code shorter.
        int tempUser, tempItem, tempRating;
        System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);

        numNonNeighbors = 0;
        for (int i = 0; i < numRatings; i++) {
    
    
            tempUser = compressedRatingMatrix[i][0];
            tempItem = compressedRatingMatrix[i][1];
            tempRating = compressedRatingMatrix[i][2];

            // Step 1. Recompute average rating of the current item.
            tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
                    / (itemDegrees[tempItem] - 1);

            // Step 2. Recompute neighbors, at the same time obtain the ratings
            // Of neighbors.
            int tempNeighbors = 0;
            double tempTotal = 0;
            int tempComparedItem;
            for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
    
    
                tempComparedItem = compressedRatingMatrix[j][1];
                if (tempItem == tempComparedItem) {
    
    
                    continue;// Ignore itself.
                } // Of if

                if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
    
    
                    tempTotal += compressedRatingMatrix[j][2];
                    tempNeighbors++;
                } // Of if
            } // Of for j

            // Step 3. Predict as the average value of neighbors.
            if (tempNeighbors > 0) {
    
    
                predictions[i] = tempTotal / tempNeighbors;
            } else {
    
    
                predictions[i] = DEFAULT_RATING;
                numNonNeighbors++;
            } // Of if
        } // Of for i
    }// Of leaveOneOutPrediction

    /**
     * ************************
     * Compute the MAE based on the deviation of each leave-one-out.
     *
     * @author Shihuai Wen
     * ************************
     */
    public double computeMAE() {
    
    
        double tempTotalError = 0;
        for (int i = 0; i < predictions.length; i++) {
    
    
            tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
        } // Of for i

        return tempTotalError / predictions.length;
    }// Of computeMAE

    /**
     * ************************
     * Compute the MAE based on the deviation of each leave-one-out.
     *
     * @author Shihuai Wen
     * ************************
     */
    public double computeRSME() {
    
    
        double tempTotalError = 0;
        for (int i = 0; i < predictions.length; i++) {
    
    
            tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
                    * (predictions[i] - compressedRatingMatrix[i][2]);
        } // Of for i

        double tempAverage = tempTotalError / predictions.length;

        return Math.sqrt(tempAverage);
    }// Of computeRSME

    /**
     * ************************
     * The entrance of the program.
     *
     * @param args Not used now.
     * ************************
     */
    public static void main(String[] args) {
    
    
        try {
    
    
            MBR tempRecommender = new MBR("D:/Work/sampledata/movielens-943u1682m.txt", 943, 1682, 100000);

            for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
    
    
                tempRecommender.setRadius(tempRadius);

                tempRecommender.leaveOneOutPrediction();
                double tempMAE = tempRecommender.computeMAE();
                double tempRSME = tempRecommender.computeRSME();

                System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME = " + tempRSME
                        + ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
            } // Of for tempRadius
        } catch (Exception ee) {
    
    
            System.out.println("Error occurred in main \r\n" + ee);
        } // Of try
    }// Of main
}// Of class MBR

3. 运行截图

总结

大道至简, 越是简单的算法可能会解决更多的问题.

不过代码中的数组太过于多, 阅读起来存在一定的障碍.

运行结果中出现了 0.1 + 0.2 = 0.30000000000000004 , 浮点数的奥秘.

猜你喜欢

转载自blog.csdn.net/qq_44309220/article/details/124559886