Python data analysis project actual combat

Ready-made case sharing

Python data analysis: stock data analysis case

step:

  1. prepare data
  2. Visualize data, review data
  3. Data processing
  4. Ranking according to ACF and PACF
  5. Fitting an ARIMA model
  6. predict
    作者:python分享站
    链接:https://www.zhihu.com/question/280744341/answer/1651341817
    来源:知乎
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
    
    import pandas as pd
    import pandas_datareader
    import datetime
    import matplotlib.pylab as plt
    from matplotlib.pylab import style
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    
    style.use('ggplot')     # 设置图片显示的主题样式
    
    # 解决matplotlib显示中文问题
    plt.rcParams['font.sans-serif'] = ['SimHei']  # 指定默认字体
    plt.rcParams['axes.unicode_minus'] = False  # 解决保存图像是负号'-'显示为方块的问题
    
    
    def run_main():
        """
            主函数
        """
        # 1. 准备数据
        # 指定股票分析开始日期
        start_date = datetime.datetime(2009, 1, 1)
        # 指定股票分析截止日期
        end_date = datetime.datetime(2019, 4, 1)
        # 股票代码
        stock_code = '600519.SS'    # 沪市贵州茅台
    
        stock_df = pandas_datareader.data.DataReader(
                            stock_code, 'yahoo', start_date, end_date
                    )
        # 预览数据
        print(stock_df.head())
    
        # 2. 可视化数据
        plt.plot(stock_df['Close'])
        plt.title('股票每日收盘价')
        plt.show()
    
        # 按周重采样
        stock_s = stock_df['Close'].resample('W-MON').mean()
        stock_train = stock_s['2014':'2018']
        plt.plot(stock_train)
        plt.title('股票周收盘价均值')
        plt.show()
    
        # 分析 ACF
        acf = plot_acf(stock_train, lags=20)
        plt.title("股票指数的 ACF")
        acf.show()
    
        # 分析 PACF
        pacf = plot_pacf(stock_train, lags=20)
        plt.title("股票指数的 PACF")
        pacf.show()
    
        # 3. 处理数据,平稳化数据
        # 这里只是简单第做了一节差分,还有其他平稳化时间序列的方法
        stock_diff = stock_train.diff()
        diff = stock_diff.dropna()
        print(diff.head())
        print(diff.dtypes)
    
        plt.figure()
        plt.plot(diff)
        plt.title('一阶差分')
        plt.show()
    
        acf_diff = plot_acf(diff, lags=20)
        plt.title("一阶差分的 ACF")
        acf_diff.show()
    
        pacf_diff = plot_pacf(diff, lags=20)
        plt.title("一阶差分的 PACF")
        pacf_diff.show()
    
        # 4. 根据ACF和PACF定阶并建立模型
        model = ARIMA(stock_train, order=(1, 1, 1), freq='W-MON')
        # 拟合模型
        arima_result = model.fit()
        print(arima_result.summary())
    
        # 5. 预测
    
        pred_vals = arima_result.predict(start=str('2019-01'),end=str('2019-03'),
                                         dynamic=False, typ='levels')
        print(pred_vals)
    
        # 6. 可视化预测结果
        stock_forcast = pd.concat([stock_s, pred_vals], axis=1, keys=['original', 'predicted'])
    
        plt.figure()
        plt.plot(stock_forcast)
        plt.title('真实值vs预测值')
        plt.savefig('./stock_pred.png', format='png')
        plt.show()
    
    
    if __name__ == '__main__':
        run_main()

book recommendation

 

This book is almost a must-read for getting started with data analysis. It mainly introduces the learning of three python libraries numpy ( array ), pandas (data analysis) and matplotlib (drawing).

I saw in the book that when the knowledge of statistical probability is incomprehensible, at this time, I will learn the knowledge of statistical probability in turn.
The wrong way for many people to learn is to learn statistical probability first, and then learn data analysis programming tools (Excel, Python, R).
In the end, it was too difficult to complain, and I couldn't learn it.
In fact, the learning method can be improved. Why is this?
There are two reasons:
1) Because a lot of statistical probability talks about complex mathematical formulas2, but does not talk about how statistical probability is applied in life. The result of this
is that you learn a lot, but also forget a lot.
2) Most of the knowledge of statistical probability is a theoretical basis. If you do not use it in combination with data analysis tools (Excel, Python, R)
, you will definitely not learn it.
For example, you have learned the theory of quartile 9, but how to use it in practice, you don't know the tools of data analysis, of course you don't know how to
use it.
But if you know the tools of data analysis, the actual operation is just one line of code, and the quartiles are calculated. Of course you are excited, and
your interest in learning will come up when you are happy.
Therefore, my suggestion, which is also the correct way to learn, is: first learn the basic data analysis duration method, and when you encounter statistical probability knowledge, then
supplement this knowledge, and use data analysis tools to realize it while learning.

video recommendation

Python-Data Analysis Training Camp [Project]_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/wshyb0314/article/details/130268125