[java] 数据处理


背景:

有一组30天内的温度与时间的数据,格式如下:

详细情况:共30天的8k+项数据,每天内有260+项,每个记录温度的时间精确到秒

任务就是想根据这样的数据找到规律,来完成给定具体的时间预测出此时的温度

处理思路:先把将数据用时序图表示出来,看看有什么样的规律

代码如下:

import java.awt.Font;
import java.io.BufferedReader;
import java.io.FileReader;

import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.axis.DateAxis;
import org.jfree.chart.axis.ValueAxis;
import org.jfree.chart.plot.XYPlot;
import org.jfree.data.time.Day;
import org.jfree.data.time.Hour;
import org.jfree.data.time.Minute;

import org.jfree.data.time.Second;
import org.jfree.data.time.TimeSeries;
import org.jfree.data.time.TimeSeriesCollection;
import org.jfree.data.xy.XYDataset;


public class TimeSeriesChart {
    ChartPanel frame1;  
    public TimeSeriesChart(){  
        XYDataset xydataset = createDataset();  
        JFreeChart jfreechart = ChartFactory.createTimeSeriesChart("temperature-time", "time", "temperature",xydataset, true, true, true);  
        XYPlot xyplot = (XYPlot) jfreechart.getPlot();  
        DateAxis dateaxis = (DateAxis) xyplot.getDomainAxis();  
        frame1=new ChartPanel(jfreechart,true); 
        
        //水平底部标题
        dateaxis.setLabelFont(new Font("黑体",Font.BOLD,14));
        //垂直标题
        dateaxis.setTickLabelFont(new Font("宋体",Font.BOLD,12));
        //获取柱状 
        ValueAxis rangeAxis=xyplot.getRangeAxis(); 
        rangeAxis.setLabelFont(new Font("黑体",Font.BOLD,15));  
        jfreechart.getLegend().setItemFont(new Font("黑体", Font.BOLD, 15));
        //设置标题字体  
        jfreechart.getTitle().setFont(new Font("宋体",Font.BOLD,20));
  
    }   
    private static XYDataset createDataset()
    {
        TimeSeries timeseries = new TimeSeries("温度随时间变化图");
        String temperature = null;
        String time = null;
        try 
        {
            BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\lichaoxing\\Desktop\\52001848#2018-07-01-00-00-00_2018-07-31-00-00-00.csv"));
            reader.readLine(); 
            String line = null;
            //int i = 0;
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(",");//CSV格式文件为逗号分隔符文件,这里根据逗号切分 
                temperature = item[2];//这就是你要的数据了 
                time = item[4];
                double value = Double.parseDouble(temperature);//如果是数值,可以转化为数值 
                time = time.replace("\"", "");
                String tmp_split1[] = time.split(" ");
                String tmp_split2[] = tmp_split1[0].split("-");
                String tmp_split3[] = tmp_split1[1].split(":");
                    
                //System.out.println(tmp_split1[0]);
                System.out.println(tmp_split1[1]);
                Day day = new Day(Integer.valueOf(tmp_split2[2]), Integer.valueOf(tmp_split2[1]), Integer.valueOf(tmp_split2[0]));
                Hour hour = new Hour(Integer.valueOf(tmp_split3[0]), day);
                Minute minute = new Minute(Integer.valueOf(tmp_split3[1]), hour);
                Second second = new Second( Integer.valueOf(tmp_split3[2]) ,minute);

                timeseries.add(second, value);

                  //if(i++ > 260)
                  //    break;

            }
            reader.close();
        }
        catch(Exception e) 
        {
            e.printStackTrace();
        }
        TimeSeriesCollection timeseriescollection = new TimeSeriesCollection();  
        timeseriescollection.addSeries(timeseries);  

        return timeseriescollection;  
     }  
    public ChartPanel getChartPanel()
    {  
        return frame1;  
              
    }  
}
import java.awt.GridLayout;  
import javax.swing.JFrame; 

public class tmp {

    public static void main(String[] args)throws Exception
    {
        
        JFrame frame=new JFrame("统计图");  
        frame.setLayout(new GridLayout(1,1,10,10)); 
        /*添加折线图*/  
        frame.add(new TimeSeriesChart().getChartPanel());
        frame.setBounds(50, 50, 800, 600);  
        frame.setVisible(true);           
    }
}

 得到下面的时序图

分析:除了个别异样数据点外,看上去十分平滑,但是并不能具体看到每天的状况,介于每天温度变化基本一致,于是考虑在代码while中,添加提前终止条件(上边注释的代码),观察一天的情况

分析:现在这一天的数据看着就清晰很多了,可以大致认为数据是类正弦的,如果对于精确度要求不高,可以认为它就是一个具有周期的数据

于是考虑将含有一个谷底(极小值)的一段作为周期的一个,可以近似看作是二次函数,那现在就来拟合这个二次函数,拟合采用多项式拟合

方法就是:根据局部极小值连续出现两次求解周期(这两次的值及可能不同,不过也无所谓,只是用其来大概计算周期)

代码如下:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.commons.math3.fitting.PolynomialCurveFitter;
import org.apache.commons.math3.fitting.WeightedObservedPoints;

public class predict_temperature 
{
       
    private static String[] observed_data(double flag, BufferedReader reader) throws Exception
    {
        String line = null;
        String[] i_want = new String[4];
        if(flag > 0)
        {
            double tmp = 1000;
            System.out.println(flag);
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else 
                {
                    i_want[0] = item[0];
                    i_want[1] = item[4].replace("\"", "");
                    break;
                }
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else 
                    break;
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else
                {
                    i_want[2] = item[0];
                    i_want[3] = item[4].replace("\"", "");
                    break;
                }
            }
        }
        else
        {
            double tmp = -1000;
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else 
                {
                    i_want[0] = item[0];
                    i_want[1] = item[4].replace("\"", "");
                    break;
                }
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else 
                    break;
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else
                {
                    i_want[2] = item[0];
                    i_want[3] = item[4].replace("\"", "");
                    break;
                }
            }
        }
        return i_want;
        
    }
    
    public static void main(String[] args) throws Exception
    {
        
        WeightedObservedPoints points = new WeightedObservedPoints();

        String input_time = args[1] + " " + args[2];
        File file = new File(args[0]);
        double time_diff = 0;
        
        BufferedReader reader = new BufferedReader(new FileReader(file));
        reader.readLine();
        reader.mark((int)file.length());
        
        /*计算周期*/
        double compare_item1 = Double.parseDouble(reader.readLine().split(",")[2]);
        double compare_item2 = Double.parseDouble(reader.readLine().split(",")[2]);
        String[] cycle_result = new String[4];
        cycle_result = observed_data(compare_item1 - compare_item2, reader);
        int start_num = Integer.parseInt(cycle_result[0]);
        int end_num = Integer.parseInt(cycle_result[2]);
        SimpleDateFormat tmp_day = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start_now = tmp_day.parse(cycle_result[1]);
        Date end_now = tmp_day.parse(cycle_result[3]);
        /*计算周期*/
        int cycle = end_num - start_num;
        reader.reset();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        SimpleDateFormat input_time_format = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
        Date input_time_ = input_time_format.parse(input_time);
        Date start_time = null;
        int i = 0;
        String line = null;
        String time = null;
        while((line=reader.readLine())!=null)
        {  
            String item[] = line.split(","); 
            time = item[4];
            double value = Double.parseDouble(item[2]); 
            time = time.replace("\"", "");
            Date now = sdf.parse(time);
            if(i == 0)
                start_time = now;
            double offset = (now.getTime() - start_time.getTime());
            points.add(offset, value);
            time_diff = (input_time_.getTime() - start_time.getTime()) % (end_now.getTime() - start_now.getTime());
            if(i++ > cycle)
                break;
                
        }
        PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
        double[] result = fitter.fit(points.toList());

        double result_time = result[2] * time_diff * time_diff + result[1]* time_diff + result[0];
        System.out.println(result_time);
        reader.close();
    }
}

这里我要解释一下  observed_data   方法

由于数据开始不知道是递增还是递减,可以先读取两个连续的温度用于判断此时是增还是减,就是下面这两行代码

double compare_item1 = Double.parseDouble(reader.readLine().split(",")[2]);
double compare_item2 = Double.parseDouble(reader.readLine().split(",")[2]);

我这里的找周期方法思路很简单,就是先找到一个局部最低(高)点,记录此时的序号与时间

再继续沿着线向前走,下一个拐点肯定是局部最高(低)点,此时它是中间点,什么都不做

再继续的话,又到了一个局部最低(高)点,记录此时的序号与时间

现在:计算两次记录的差值,便可以知道周期点的个数,以及周期时间

对于预测,当然就可以根据预测时间与一天的起始时间差值模周期时间将其映射到第一个周期内,将余数代数拟合函数,求解近似值

到这,就可以预测温度了,比如配置时间参数

观测的真实值是:

预测结果为:

可以看出,结果还算可以(不过有些时间点的数据误差有在1-2之间的)


本节完......

猜你喜欢

转载自www.cnblogs.com/xinglichao/p/9498358.html