eachmovie 数据集说明

From: ( http://www.research.digital.com/SRC/eachmovie/ )
[EachMovie]
EachMovie collaborative filtering data set

Contents

Introduction
Terms of usage
Schema
Obtaining the data set
Introduction

The DEC Systems Research Center ran the EachMovie recommendation service for 18 months to experiment with a collaborative filtering algorithm. During that time, some 72916 users entered a total of 2811983 numeric ratings for 1628 different movies (films and videos). We are making this preference data set available, with all user identification removed, so that other collaborative filtering researchers can use it to test their algorithms.

If you are interested in the design of our system, you can read the Each to Each Programmer's Reference Manual written by Paul McJones and John DeTreville.

Terms of usage

Copyright © Digital Equipment Corporation 1997.

The preference data set was compiled by Digital Equipment Corporation using our collaborative filtering technology. Digital is making the data set available for use under the terms that apply to this Digital web site (see Legal) including the following terms:

1. All information is provided "AS IS". Digital makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise. DIGITAL DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. In no event shall Digital be liable for damages, and in particular Digital shall not be liable for special, indirect, consequential, or incidental damages, or damages for lost profits, loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute, in equity, at law or otherwise.

3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications resulting from use of the information shall credit Digital Equipment Corporation as the provider of the data. The user agrees to provide Digital with a copy of any such publication using any of the contact names provided at this web site. The user may make copies of the data set as needed for internal use only for the preceding purposes. All such copies shall duplicate Digital's copyright notice and this notice.

Schema

The data set is available as eachmoviedata.tar.gz (zipped tab-separated-value text files, 17632000 bytes compressed). There are three tables, one per file:

Person (person.txt) provides optional, unaudited demographic data supplied by each person:
ID: Number -- primary key
Age: Number
Gender: Text -- one of "M", "F"
Zip_Code: Text
Movie (movie.txt) provides descriptive information about each movie:
ID: Number -- primary key
Name: Text
PR_URL: Text -- URL of studio PR site
IMDb_URL: Text -- URL of Internet Movie Database entry
Theater_Status: Text -- either "old" or "current"
Theater_Release: Date/Time
Video_Status: Text -- either "old" or "current"
Video_Release: Date/Time
Action, Animation, Art_Foreign, Classic, Comedy, Drama, Family, Horror, Romance, Thriller: Yes/No
IMDb URLs are provided by courtesy of Internet Movie Database.

The theater and video status and release dates were (approximately) correct in the San Francisco bay area as of September 15, 1997, when EachMovie was terminated.

Vote (vote.txt) is the actual rating data:
Person_ID: Number
Movie_ID: Number
Score: Number -- 0 <= Score <= 1
Weight: Number -- 0 < Weight <= 1
Modified: Date/Time
Score is the rating provided by this person for this movie. The zero-to-five star rating used externally on EachMovie is mapped linearly to the interval [0,1]. Here's a histogram of the Score values:

     Score Count
     0 347191
     0.2 150495
     0.4 339718
     0.6 701236
     0.8 761676
     1.0 511667
     
Weight is only relevant in the case of a Score of zero, in which case it distinguishes whether the person rated a movie as zero stars (weight = 1) or "sounds awful" (weight < 1). (Most "sounds awful" weights are 0.2, but for historical reasons about 10% are 0.5.) The idea behind "sounds awful" was to let a user indicate he never planned to see a movie (hence we would omit it from future list of predictions). Our collaborative filtering algorithm treated such a declaration as less authoratative than a regular rating of zero stars.

Given our site design, there is no way to know whether the person had seen the movie in a theater or on video.

Obtaining the data set

If you have read the terms above, and agree to them, contact

Steve Glassman
<[email protected]>
1 650 853-2166
Compaq Systems Research Center
130 Lytton Avenue
Palo Alto, CA 94301
by telephone or email. He will give you a password for downloading the data. You may also send copies of your publications involving this data (see term 3 above) to Steve.

Legal

Digital

Developed by Digital Equipment Corporation.
Copyright © Digital Equipment Corporation, 1997.
The DIGITAL logo is a trademark of Digital Equipment Corporation.

All other trademarks are the property of their respective owners. kumpf last updated Jul 30, 1999


转自:http://www.douban.com/note/502794377/

猜你喜欢

转载自blog.csdn.net/qq_21280629/article/details/49591807