MapReduceプログラムはUVの数をカウントします

UV（ユニークビジター）は、（Cookieに基づいて）1日でサイトにアクセスしたユーザーの数をカウントする独立したビジターであり、Webサイトにアクセスするコンピュータークライアントがビジターです。これは、ウェブサイトにアクセスするコンピューターの数として理解できます。Webサイトは、訪問先コンピューターのCookieを介して、訪問先コンピューターの身元を判断します。IPを変更してもCookieをクリアせずに同じWebサイトにアクセスした場合、Webサイトの統計に含まれるUVの数は変わりません。ユーザーがCookieアクセスを保存しない、Cookieをクリアしない、またはデバイスアクセスを変更しない場合、カウントは1増加します。同じクライアントが00：00〜24：00の間に複数回アクセスすると、1人の訪問者としてカウントされます。

書き込み中のwebLogUVMapperクラスファイル

パッケージcom.huadian.webloguvs; 

import org.apache.commons.lang.StringUtils; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 

import java.io.IOException; 

パブリッククラスWebLogUVMapperはMapper <LongWritable、Text、Text、Text> { 

    private Text outputKey = new Text（）;を拡張します。
    プライベートテキストoutputValue = new Text（）; 
    @Override 
    protected void map（LongWritable key、Text value、Context context）はIOException、InterruptedExceptionをスローします{ 
       //分割每一行内容、
        String line = value.toString（）; 
        String [] items = line.split（ "\ t"）;

        / ** 
         *（1）36のフィールドがあります。分割後の配列の長さが36未満の場合、このデータはダーティデータであり破棄できます
         *（2）URLが空の場合、レコードは添字1を破棄します""、null、 "null" 
         * City subscript 23 
         *出力（city Id、1）
         * / 
        if（items.length> = 36）{ 
            if（StringUtils.isBlank（items [5]））{ 
                return; 
            } 
            outputKey .set（items [24]）; 
            outputValue.set（items [5]）; 
            context.write（outputKey、outputValue）; 

        } else { 
            return; 
        } 
    } 
}

2つのWebLogUVMapReduceクラスファイルを書き込む

パッケージcom.huadian.webloguvs; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.fs.FileSystem; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 


パブリッククラスWebLogUVMapReduce extends Configured implements Tool { 
    @Override 
    public int run（String [] args）throws Exception {

        // 2、创建job 
        Job job = Job.getInstance（this.getConf（）、 "WebLogUVMapReduce"）; 
        //设置ジョブ运行的
        主类job.setJarByClass（WebLogUVMapReduce.class）; 

        //设置ジョブ
        // a、入力
        パスinputPath = new Path（args [0]）; 
        FileInputFormat.setInputPaths（job、inputPath）; 

        // b、map 
        job.setMapperClass（WebLogUVMapper.class）; 
        job.setMapOutputKeyClass（Text.class）; 
        job.setMapOutputValueClass（Text.class）; 

        job.setNumReduceTasks（2）; 

        // c、reduce 
        job.setReducerClass（WebLogUVReducer.class）; 
        job.setOutputKeyClass（Text.class）;
        job.setOutputValueClass（Text.class）; 

        // d、出力
        パスoutputPath = new Path（args [1]）; 

        //場合は、出目录存在、先删除
        FileSystem hdfs = FileSystem.get（this.getConf（））; 
        if（hdfs.exists（outputPath））{ 
            hdfs.delete（outputPath、true）; 
        } 
        FileOutputFormat.setOutputPath（job、outputPath）; 

        //第四步、提交job 
        boolean isSuccess = job.waitForCompletion（true）; 

        isSuccess？0：1を返します。
    } 


    public static void main（String [] args）{ 
        Configuration configuration = new Configuration（）; 
        /// public static int run（Configuration conf、Tool tool、String [] args）
        { 
           int status = ToolRunner.run（configuration、new WebLogUVMapReduce（）、args）;を試してください。
           System.exit（status）; 
        } catch（例外e）{ 
            e.printStackTrace（）; 
        } 
    } 
}

3つの書き込みWebLogUVReducerクラスファイル

package com.huadian.webloguvs;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;


public class WebLogUVReducer extends Reducer<Text,Text,Text,Text> {
    private Text outputValue = new Text(  );
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws InterruptedException, IOException {
       //key ：城市；  value：<guid1,guid1,guid2,guid3>
        Set<Text> set = new HashSet<Text>();
        for (Text value:values) {
            set.add(value);
        }
        /*Iterator<Text> iterator = set.iterator();
        Text text = null;
        while (iterator.hasNext()){
            text = iterator.next();
        }*/
        outputValue.set(String.valueOf(set.size()));
        context.write( key,outputValue );
    }
}

double_lifly

发布了105 篇原创文章 · 获赞 536 · 访问量 7万+

私信关注

MapReduceプログラムはUVの数をカウントします

おすすめ