大数据之统计股票开盘和收盘平均价

一、说明：该实验环境是基于虚拟机Ubuntu、hadoop、eclipse、mapreduce。

1、MapReduce模型简介：

•MapReduce将复杂的、运行于大规模集群上的并行计算过程高度地抽象到了两个函数：Map（Map 任务 (分割及映射)）和Reduce（Reduce 任务 (重排，还原)）

•编程容易，不需要掌握分布式并行编程细节，也可以很容易把自己的程序运行在分布式系统上，完成海量数据的计算

•MapReduce采用“分而治之”策略，一个存储在分布式文件系统中的大规模数据集，会被切分成许多独立的分片（split），这些分片可以被多个Map任务并行处理

•MapReduce设计的一个理念就是“计算向数据靠拢”，而不是“数据向计算靠拢”，因为，移动数据需要大量的网络传输开销

•MapReduce框架采用了Master/Slave架构，包括一个Master和若干个Slave。Master上运行JobTracker，Slave上运行TaskTracker

•Hadoop框架是用Java实现的，但是，MapReduce应用程序则不一定要用Java来写

2、详细的整个过程

•映射的任务是为每个分割创建在分割每条记录执行映射的函数。

•有多个分割是好处的，因为处理一个分割使用的时间相比整个输入的处理的时间要少，当分割比较小时，处理负载平衡是比较好的，因为我们正在并行地处理分割。

•然而，也不希望分割的规模太小。当分割太小，管理分割和映射创建任务的超负荷开始逐步控制总的作业执行时间。

•对于大多数作业，最好是分割成大小等于一个HDFS块的大小(这是64 MB，默认情况下)。

•map任务执行结果到输出写入到本地磁盘的各个节点上，而不是HDFS。

•之所以选择本地磁盘而不是HDFS是因为，避免复制其中发生 HDFS 存储操作。

•映射输出是由减少任务处理以产生最终的输出中间输出。

•一旦任务完成，映射输出可以扔掉了。所以，复制并将其存储在HDFS变得大材小用。

•在节点故障的映射输出之前，由 reduce 任务消耗，Hadoop 重新运行另一个节点在映射上的任务，并重新创建的映射输出。

•减少任务不会在数据局部性的概念上工作。每个map任务的输出被供给到 reduce 任务。映射输出被传输至计算机，其中 reduce 任务正在运行。

•在此机器输出合并，然后传递到用户定义的 reduce 函数。

•不像到映射输出，reduce输出存储在HDFS(第一个副本被存储在本地节点上，其他副本被存储于偏离机架的节点)。因此，写入 reduce 输出

二、实际运用(股票统计)

1、需要处理的股票数据文件夹内容（export）

2、其中一个文件部分内容

3、编码解析：

（1）map函数分割：

 public static class Map extends Mapper<Object,Text,Text,Text>{           
        private Text text = new Text();   
        private Text keys = new Text();  
        private int no = 0;  
 
        public void map(Object key,Text value,Context context)throws IOException,InterruptedException{  
            String line = value.toString();  
            this.no +=1;              
            System.out.println(this.no+line);             
            String[] lines = line.split("\\s+");  
            for(int i =0;i<lines.length;i++){  
                System.out.print(lines[i]+" ~~");  
            }  
            if(this.no == 1){             
                this.keys.set("股票编码："+lines[0]);              
            }  
            if(this.no > 2){  
                if(lines.length == 7){  
                    this.text.set(lines[0]+"+"+lines[1]+"+"+lines[4]);   
                    System.out.println(this.no+"---->"+lines[0]+"+"+lines[1]+"+"+lines[4]);  
                    context.write(this.keys, this.text);  
                }                 
            }                  
        }  
}

（2）reduce函数还原：

public static class Reduce extends Reducer<Text,Text,Text,Text>{        
        private Text text = new Text();       
        public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{  
                double sum1 = 0.0;  
                double sum2 = 0.0;  
                int n = 0;                
                System.out.println("...................start"+key.toString());  
                Iterator<Text> $it = values.iterator();  
                while($it.hasNext()){                     
                    String record =$it.next().toString();  
                    System.out.println(n);                    
                    System.out.println("原始数据："+record);  
                    n++;                      
                    System.out.println("第"+n+"次循环");  
                    String []result = record.split("[+]");  
                    System.out.println(Double.valueOf(result[1])+" "+Double.valueOf(result[2]));  
                    sum1 +=(Double.valueOf(result[1])*100);                   
                    sum2 +=(Double.valueOf(result[2])*100);                   
                    System.out.println(sum1/100+" "+sum2/100);                    
                }  
                System.out.println("最后的结果："+sum1/100+" "+sum2/100);  
                double openPrise = sum1/(100*n);  
                double closePrise = sum2/(100*n);  
                openPrise = (double)Math.round(openPrise*100)/100;  
                closePrise = (double)Math.round(closePrise*100)/100;  
                System.out.println("平均值："+openPrise+" "+closePrise);         
                Double.toString(closePrise);  
                String result ="开盘平均价："+Double.toString(openPrise)+",   收盘平均价："+Double.toString(closePrise);  
                this.text.set(result);  
                context.write(key, this.text);     
        }  
    }

4、部分结果显示

5、完整代码：

import java.io.IOException;  
import java.util.Iterator;  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.util.GenericOptionsParser;  
  
public class  Data{  
 
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {  
        Configuration conf = new Configuration();  
        conf.set("fs.default.name", "hdfs://localhost:9000");  
        String[] otherArgs = (new GenericOptionsParser(conf,args)).getRemainingArgs();  
        if(otherArgs.length<2){  
            System.err.println("Usage:Data<in><out>");  
            System.exit(2);  
        }  
        Job job = Job.getInstance(conf,"Data");  
        job.setJarByClass(Data.class);  
        job.setMapperClass(Data.Map.class);  
        System.out.println("Mapper over");  
        job.setReducerClass(Data.Reduce.class);  
        System.out.println("Reduce over");  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(Text.class);  
        System.out.println("all over");  
        for(int i = 0;i<otherArgs.length-1;i++){  
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));  
        }         
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length-1]));         
        System.exit(job.waitForCompletion(true)?0:1);  
} 
 
    public static class Map extends Mapper<Object,Text,Text,Text>{           
        private Text text = new Text();   
        private Text keys = new Text();  
        private int no = 0;  
 
        public void map(Object key,Text value,Context context)throws IOException,InterruptedException{  
            String line = value.toString();  
            this.no +=1;              
            System.out.println(this.no+line);             
            String[] lines = line.split("\\s+");  
            for(int i =0;i<lines.length;i++){  
                System.out.print(lines[i]+" ~~");  
            }  
            if(this.no == 1){             
                this.keys.set("股票编码："+lines[0]);              
            }  
            if(this.no > 2){  
                if(lines.length == 7){  
                    this.text.set(lines[0]+"+"+lines[1]+"+"+lines[4]);   
                    System.out.println(this.no+"---->"+lines[0]+"+"+lines[1]+"+"+lines[4]);  
                    context.write(this.keys, this.text);  
                }                 
            }                  
        }  
}  
    
    public static class Reduce extends Reducer<Text,Text,Text,Text>{        
        private Text text = new Text();       
        public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{  
                double sum1 = 0.0;  
                double sum2 = 0.0;  
                int n = 0;                
                System.out.println("...................start"+key.toString());  
                Iterator<Text> $it = values.iterator();  
                while($it.hasNext()){                     
                    String record =$it.next().toString();  
                    System.out.println(n);                    
                    System.out.println("原始数据："+record);  
                    n++;                      
                    System.out.println("第"+n+"次循环");  
                    String []result = record.split("[+]");  
                    System.out.println(Double.valueOf(result[1])+" "+Double.valueOf(result[2]));  
                    sum1 +=(Double.valueOf(result[1])*100);                   
                    sum2 +=(Double.valueOf(result[2])*100);                   
                    System.out.println(sum1/100+" "+sum2/100);                    
                }  
                System.out.println("最后的结果："+sum1/100+" "+sum2/100);  
                double openPrise = sum1/(100*n);  
                double closePrise = sum2/(100*n);  
                openPrise = (double)Math.round(openPrise*100)/100;  
                closePrise = (double)Math.round(closePrise*100)/100;  
                System.out.println("平均值："+openPrise+" "+closePrise);         
                Double.toString(closePrise);  
                String result ="开盘平均价："+Double.toString(openPrise)+",   收盘平均价："+Double.toString(closePrise);  
                this.text.set(result);  
                context.write(key, this.text);     
        }  
    }  
}

大数据之统计股票开盘和收盘平均价

2、详细的整个过程

猜你喜欢