Python draws several common timing analysis diagrams

Time series data is a collection of observations arranged in time order, with each observation corresponding to a specific point in time. This kind of data has important application value in many fields, such as finance, economics, climate science, etc. By analyzing time series data, we can help us grasp underlying patterns, discover trends, seasonal fluctuations and other important information.

Time series analysis is a technique used to evaluate time series data with the aim of determining relevant statistics and other data properties. Its main goal is to study the key ideas of market trends and economic cycles, and any time series with repeating patterns may be affected.

Visualization is essential to derive insightful information from time series data, helping us understand complex relationships and make informed decisions. This article will introduce how to use Python to draw several common timing analysis diagrams.

data set

Dataset address: https://github.com/jbrownlee/Datasets/blob/master/monthly-sunspots.csv
Dataset variables: The data set consists of 2 columns - "Months" and "sunspots" from 1749 to 1983. It basically describes the number of sunspots seen on the Sun each month recorded in this data set.

1. Statsmodels library

This time I used Statsmodels. The Statsmodels library is a powerful statistical analysis library in Python. It includes functions such as hypothesis testing, regression analysis, and time series analysis. It can be well combined with libraries such as Numpy and Pandas to improve work efficiency. Supports Python3.8, 3.9 and 3.10.

installation method

Anaconda

conda install -c conda-forge statsmodels

PyPI (pip)

pip install statsmodels

If it cannot be downloaded, add the domestic source and the command is as follows:

pip install statsmodels -i https://pypi.tuna.tsinghua.edu.cn/simple

Install from source

You need to install a C compiler to build statistical models. If you're building from github sources rather than source releases, you'll also need Cython. You can obtain the C compiler settings for Windows by following the instructions below.

If your system already has pip, compiler and git installed, you can try:

pip install git+https://github.com/statsmodels/statsmodels

Dependent libraries

Python >= 3.8

NumPy >= 1.18

SciPy >= 1.4

Pandas >= 1.0

Patsy >= 0.5.2

example

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.DataFrame({
    
    "X":np.arange(10,100,0.5)})
data["Y"] = 2 * data["X"] + 1 + np.random.randn(180)
mod = smf.ols("Y ~ X", data).fit()
print(mod.summary())


data.plot(x="X", y="Y",kind="scatter",figsize=(8,5))
plt.plot(data["X"], mod.params[0] + mod.params[1]*data["X"],"r")
plt.text(10, 38, "y="+str(round(mod.params[1],4)) + "*x" + str(round(mod.params[0],4)))
plt.title("linear regression")
plt.show()

The output of print(mod.summary()):

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      Y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.744e+05
Date:                Mon, 13 Nov 2023   Prob (F-statistic):          7.49e-307
Time:                        11:25:51   Log-Likelihood:                -256.52
No. Observations:                 180   AIC:                             517.0
Df Residuals:                     178   BIC:                             523.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.0831      0.176      6.157      0.000       0.736       1.430
X              1.9993      0.003    688.746      0.000       1.994       2.005
==============================================================================
Omnibus:                        0.781   Durbin-Watson:                   2.104
Prob(Omnibus):                  0.677   Jarque-Bera (JB):                0.462
Skew:                           0.075   Prob(JB):                        0.794
Kurtosis:                       3.198   Cond. No.                         141.
==============================================================================

Insert image description here

time graph

code show as below:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-sunspots.csv"
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
print(data)
# 时间图
plt.figure(figsize=(7, 5))
plt.plot(data.index, data['Sunspots'], marker='o', linestyle='-', markersize=5)
plt.xlabel('Date')
plt.ylabel('Number of Sunspots')
plt.title('Monthly Sunspots Time Plot')
plt.grid(True)
plt.show()

Insert image description here

line chart

A line chart is a common way of visualizing data. It connects a series of data points to form one or more line segments to show the trend of data over time or other variables. Line charts are usually used to display changes in time series data or other ordered data, such as stock prices, temperature changes, sales data, etc.

In a line chart, each data point is usually represented as a marker (such as a dot, a square, etc.), and adjacent data points are connected by straight line segments. The x-axis of a line chart usually represents time or other ordinal variable, while the y-axis represents the data value to be displayed. By observing the shape, trend, and fluctuations of the line chart, we can draw some useful information and conclusions.

For example, in stock market analysis, line charts can be used to show the trend of stock prices to help investors judge stock trends and buying and selling opportunities; in meteorology, line charts can be used to show changes in meteorological data such as temperature and rainfall. Trends to help people better understand climate change and predict weather conditions. In addition, line charts can also show relationships and comparisons between different variables by adding multiple lines, such as showing sales data in different regions or the popularity of different products.

The line chart is an intuitive and simple way of visualizing data, which can help us better understand the trends and patterns of data and make more informed decisions.
The code is as follows:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)
# 折线图
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 5))
plt.plot(data)
plt.xlabel('Date')
plt.ylabel('Number of Sunspots')
plt.title('Monthly Sunspots Line Plot')
plt.grid(True)
plt.show()

Insert image description here

Seasonal plot

Seasonality plots are a way of visualizing time series data that show recurring patterns in the data over specific time intervals (such as years, months, days, etc.). Such plots are commonly used to observe and analyze seasonal changes in time series data, such as climate, sales, population dynamics, etc.

Seasonal charts usually use time series data as the x-axis and performance values ​​as the y-axis to represent the data points on the chart. To demonstrate seasonal changes more clearly, different colors or markers can be used to represent data points for each season. Additionally, trend lines or smooth curves can be used to fit the data points to help identify long-term trends in seasonal patterns.

Seasonal plots can be used in various fields such as climatology, sales analysis, demographics, etc. For example, in climatology, seasonal charts can be used to show seasonal changes in meteorological data such as temperature and rainfall; in sales analysis, seasonal charts can be used to observe changes in product sales in different seasons; in population In statistics, seasonal graphs can be used to show changes in population size at different points in time.

Seasonality plots are a powerful visualization tool that can help us better understand and analyze seasonal changes in time series data.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)

# 季节性图
plt.figure(figsize=(7, 5))
sns.lineplot(x=data.index.month, y=data['Sunspots'], ci=None)
plt.xlabel('Month')
plt.ylabel('Number of Sunspots')
plt.title('Seasonal Plot')
plt.xticks(range(1, 13), labels=[
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.grid(True)
plt.show()

Insert image description here

Histograms and Density Plots

Histograms and density plots are two types of graphics used to visualize the distribution of data.

A histogram is a bar graph that shows the frequency of a data distribution. The height of each bar represents the number of occurrences of the data value, and the width of the bar represents the range of the data value. By observing the histogram, we can understand the central tendency of the data, the degree of dispersion, and possible outliers. When drawing a histogram, you can choose different colors and column widths to enhance the visual effect.

A density plot is a graph used to display the density of data distribution. Unlike histograms, density plots show how densely packed the data points are within a certain range, rather than specific numerical values. Density plots are often used to show the probability density of data, especially in the case of large data sets or continuous variables. By looking at a density plot, we can understand how the data is distributed and whether there is a tendency toward concentration or dispersion.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)


# 直方图和密度图
plt.figure(figsize=(7, 5))
sns.histplot(data['Sunspots'], kde=True)
plt.xlabel('Number of Sunspots')
plt.ylabel('Frequency')
plt.title('Histogram and Density Plot')
plt.grid(True)
plt.show()

Insert image description here

autocorrelogram

An autocorrelation plot is a planar two-dimensional coordinate pendant plot. The abscissa represents the delay order, and the ordinate represents the autocorrelation coefficient. It is an important statistical tool for analyzing recurring patterns and cyclical trends in time series data. By observing autocorrelation plots, we can determine how similarities between data change over time and make predictions or explanations based on this.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)

# 自相关图
# plt.figure(figsize=(7,5))
plot_acf(data['Sunspots'], lags=50)
plt.xlabel('Lags')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation Plot')
plt.grid(True)
plt.show()

Insert image description here

PACF chart

The PACF graph refers to the partial autocorrelation function graph, which measures the correlation between any time point and the observed value in a period of time before that time point, and eliminates the influence of other lag values. The partial autocorrelation function (PACF) measures the correlation between a time series and a lagged version of itself, after controlling for the effects of all shorter lags.

In the PACF plot, the x-axis represents the number of lags and the y-axis represents the partial autocorrelation coefficient. Similar to the ACF plot, each lag number in the PACF plot has a vertical line representing the partial autocorrelation coefficient, and two horizontal blue lines representing the confidence interval.

By analyzing the ACF plot and PACF plot, you can derive a model that may fit the data. For example, if the ACF plot is censored at a certain point and the PACF plot exhibits a tail after that point, then the data may fit an AR model. If the ACF plot exhibits a tail after a certain point and the PACF plot is censored at that point, then the data may fit the MA model. If both the ACF plot and the PACF plot exhibit tailing, the data may fit the ARMA model.

PACF chart is an important tool for analyzing time series data, which can help us better understand the internal structure and relationship of the data.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)

# PACF图
# plt.figure(figsize=(7, 5))
plot_pacf(data['Sunspots'], lags=50)
plt.xlabel('Lags')
plt.ylabel('Partial Autocorrelation')
plt.title('Partial Autocorrelation Function (PACF) Plot')
plt.grid(True)
plt.show()

Polar plot

A polar plot is a graphic used to represent directionality and distance information, often used to represent data in geographic information systems. In a polar coordinate chart, each point has a distance and angle information relative to the pole, which can be used to represent data such as longitude, latitude, direction, and altitude in geographic information.

In polar coordinate diagrams, the angle is usually 0 degrees in the north direction, and the angle value increases clockwise. The distance usually takes the pole as the origin and extends in all directions, and the unit can be set as needed.

Polar plots can be used in various fields such as physics, engineering, geophysics, etc. In geophysics, polar coordinate diagrams can be used to represent seismic data, geomagnetic data, etc. to better analyze and study the physical properties of the earth.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)

# 从上述数据集“Monthly Sunspots”的索引中提取月份和年份
data['Month_Num'] = data.index.month

# 按月份对数据进行分组,计算每个月的平均太阳黑子数
monthly_average = data.groupby('Month_Num')['Sunspots'].mean()

# 极坐标图θ(角度)和半径(长度)设置
theta = np.linspace(0, 2 * np.pi, len(monthly_average))
radii = monthly_average.values

# 极坐标图
plt.figure(figsize=(7, 5))
plt.polar(theta, radii)
plt.title('Polar Plot of Monthly Average Sunspots')
plt.xticks(theta, ['Jan', 'Feb', 'Mar', 'Apr', 'May',
        'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

# 设置y轴限制以适应数据
plt.ylim(0, radii.max() + 10)
plt.show()

Insert image description here

moving average chart

The moving average chart is a commonly used technical analysis tool that reflects market trends and price fluctuations by calculating the average of a series of continuous price data. In a moving average chart, a set of moving averages with different periods are usually drawn to reflect the average cost and trend of the market in different time periods.

Moving average charts can be used in trading markets such as stocks, futures, and foreign exchange to help traders better grasp market trends and price fluctuations. In the moving average chart, moving averages of different periods are represented by lines of different colors to facilitate comparison and analysis by traders.

By looking at moving average charts, traders can spot trends and turning points in the market, as well as price fluctuations. For example, when the short-term moving average crosses above the long-term moving average, it may indicate that the market is about to rise; while when the short-term moving average crosses below the long-term moving average, it may indicate that the market is about to fall. In addition, traders can also analyze the market's support and resistance levels, as well as the duration and strength of the trend, by observing moving averages of different periods.

Moving average charts are an important technical analysis tool that can help traders better understand market trends and price fluctuations, allowing them to make more informed trading decisions.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf


# 加载每月太阳黑子数据集
data = pd.read_csv("monthly-sunspots.csv", parse_dates=['Month'], index_col='Month')
print(data)




# 移动平均线图
plt.figure(figsize=(7, 5))
values = data['Sunspots']

# 7天移动平均线
rolling_mean = values.rolling(window=7).mean()
plt.plot(values, label='Original')
plt.plot(rolling_mean, label='7-day Moving Average', color='red')
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Moving Average Plot')
plt.legend()
plt.grid(True)
plt.show()

Insert image description here

Reference article: https://mp.weixin.qq.com/s/4tZan5-1X94oITmCCLshOw

Guess you like

Origin blog.csdn.net/hhhhhhhhhhwwwwwwwwww/article/details/134370908