Road of big data [CHAPTER 14]: Data Mining - recommendation algorithm (Mahout tool)

Data mining --- recommendation algorithm (Mahout tool)

I. Introduction

  • Apache top-level project (2010.4)
  • Open Source Machine Learning Library on Hadoop
  • Scalable extension
  • Java library
  • The recommendation engine (collaborative filtering), clustering and classification

 

Second, the introduction of machine learning

  • The problem is usually the property of these types of problems
  • Classification
  • Regression
  • Clustering Problem
  • Recommended questions

Third, the installation method

3.1 download Mahout

wget http://archive.apache.org/dist/mahout/0.9/mahout-distribution-0.9.tar.gz

3.2 unzip

tar -zxvf  mahout-distribution-0.9.tar.gz

Fourth, configure the environment variables

4.1 Configuration mahout environment variables

# set mahout environment
export MAHOUT_HOME=/usr/local/src/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH

Mahout Hadoop configuration required 4.2 Environment Variables

# set hadoop environment
export HADOOP_HOME=/usr/local/src/hadoop-1.2.1
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME_WARN_SUPPRESS=not_null

Fifth, verify successful

Direct command execution mahout

Supported algorithm list

Sixth, prepare data

Data format:
 1,100001,5 
1,100002,3 
1,100003,4 
1,100004,3 
1,100005,3 
1,100007,4 
1,100008,1 
1,100009,5 
1,1000011,2

Seven, training

INPUT="/movie_lens.data"
TMP_DIR="/mahout_temp"
OUTPUT="/cf_mahout_output"
MAHOUT_CMD="/usr/local/src/mahout-distribution-0.9/bin/mahout“
$MAHOUT_CMD itemsimilarity
-i $INPUT
-o $OUTPUT
--maxSimilaritiesPerItem 1000
--threshold 0.0000001
--similarityClassname SIMILARITY_COSINE
--tempDir $TMP_DIR

 

 Eight output

 

Guess you like

Origin www.cnblogs.com/hackerer/p/11480119.html