Time series analysis involves examining data points collected over time with the goal of identifying patterns and trends that can inform future forecasts. We have introduced many time series analysis libraries, but as time goes by, new libraries and updates are constantly appearing, so this article will share 8 commonly used Python libraries for dealing with time series problems. They are tsfresh, autots, darts, atspy, kats, sktime, greykite.
1、Tsfresh
Tsfresh is powerful in time series feature extraction and selection. It is designed to automatically extract a large number of features from time series data and identify the most relevant ones. Tsfresh supports multiple time series formats and can be used in various applications such as classification, clustering, and regression.
importpandasaspd
fromtsfreshimportextract_features
fromtsfresh.utilities.dataframe_functionsimportmake_forecasting_frame
# Assume we have a time series dataset `data` with columns "time" and "value"
data=pd.read_csv('data.csv')
# We will use the last 10 points to predict the next point
df_shift, y=make_forecasting_frame(data["value"], kind="value", max_timeshift=10, rolling_direction=1)
# Extract relevant features using tsfresh
X=extract_features(df_shift, column_id="id", column_sort="time", column_value="value", impute_function=impute)
2、AutoTS
autots is another Python library for time series forecasting:
- Various algorithms for univariate and multivariate time series forecasting are provided, including ARIMA, ETS, Prophet, and DeepAR.
- Perform automatic model ensemble for the best model.
- Confidence interval predictions with upper and lower bounds are provided.
- Process data by learning optimal NaN imputation and outlier removal.
fromautots.datasetsimportload_monthly
df_long=load_monthly(long=True)
fromautotsimportAutoTS
model=AutoTS(
forecast_length=3,
frequency='infer',
ensemble='simple',
max_generations=5,
num_validations=2,
)
model=model.fit(df_long, date_col='datetime', value_col='value', id_col='series_id')
# Print the description of the best model
print(model)
3、darts
darts (Data Analytics and Real-Time Systems) has a variety of time series forecasting models, including ARIMA, Prophet, various variants of exponential smoothing, and various deep learning models such as LSTMs, gru, and tcn. Darts also has built-in methods for cross-validation, hyperparameter tuning, and feature engineering.
A key feature of darts is the ability to make probabilistic predictions. This means that not only a single point forecast can be generated for each time step, but also a distribution of possible outcomes can be generated, giving a more complete understanding of the uncertainty in the forecast.
importpandasaspd
importmatplotlib.pyplotasplt
fromdartsimportTimeSeries
fromdarts.modelsimportExponentialSmoothing
# Read data
df=pd.read_csv("AirPassengers.csv", delimiter=",")
# Create a TimeSeries, specifying the time and value columns
series=TimeSeries.from_dataframe(df, "Month", "#Passengers")
# Set aside the last 36 months as a validation series
train, val=series[:-36], series[-36:]
# Fit an exponential smoothing model, and make a (probabilistic)
# prediction over the validation series’ duration
model=ExponentialSmoothing()
model.fit(train)
prediction=model.predict(len(val), num_samples=1000)
# Plot the median, 5th and 95th percentiles
series.plot()
prediction.plot(label="forecast", low_quantile=0.05, high_quantile=0.95)
plt.legend()
4、AtsPy
atspy, one can simply load the data and specify the model to test, as shown in the code below.
# Importing packages
importpandasaspd
fromatspyimportAutomatedModel
# Reading data
df=pd.read_csv("AirPassengers.csv", delimiter=",")
# Preprocessing data
data.columns= ['month','Passengers']
data['month'] =pd.to_datetime(data['month'],infer_datetime_format=True,format='%y%m')
data.index=data.month
df_air=data.drop(['month'], axis=1)
# Select the models you want to run:
models= ['ARIMA','Prophet']
run_models=AutomatedModel(df=df_air, model_list=models, forecast_len=10)
This package provides a set of fully automated models. include:
5、kats
kats (kit to Analyze Time Series) is a Python library developed by Facebook (now Meta). The three core features of this library are:
Model Forecasting: Provides a complete set of forecasting tools, including 10+ individual forecasting models, ensembles, meta-learning models, backtesting, hyperparameter tuning, and empirical forecasting intervals.
Detection: Kats supports functions for detecting various patterns in time series data, including seasonality, anomalies, change points, and slow trend changes.
Feature extraction and embedding: The time series feature (TSFeature) extraction module in Kats can generate 65 features with clear statistical definition, which can be applied to most machine learning (ML) models, such as classification and regression.
# pip install kats
importpandasaspd
fromkats.constsimportTimeSeriesData
fromkats.models.prophetimportProphetModel, ProphetParams
# Read data
df=pd.read_csv("AirPassengers.csv", names=["time", "passengers"])
# Convert to TimeSeriesData object
air_passengers_ts=TimeSeriesData(air_passengers_df)
# Create a model param instance
params=ProphetParams(seasonality_mode='multiplicative')
# Create a prophet model instance
m=ProphetModel(air_passengers_ts, params)
# Fit model simply by calling m.fit()
m.fit()
# Make prediction for next 30 month
forecast=m.predict(steps=30, freq="MS")
forecast.head()
6、Sktime
sktime is a library for time series analysis built on top of scikit-learn and follows a similar API, making it easy to switch between the two libraries. Here is an example of how to use Sktime for time series classification:
fromsktime.datasetsimportload_arrow_head
fromsktime.classification.composeimportTimeSeriesForestClassifier
fromsktime.utils.samplingimporttrain_test_split
# Load ArrowHead dataset
X, y=load_arrow_head(return_X_y=True)
# Split data into train and test sets
X_train, X_test, y_train, y_test=train_test_split(X, y)
# Create and fit a time series forest classifier
classifier=TimeSeriesForestClassifier(n_estimators=100)
classifier.fit(X_train, y_train)
# Predict labels for the test set
y_pred=classifier.predict(X_test)
# Print classification report
fromsklearn.metricsimportclassification_report
print(classification_report(y_test, y_pred))
7、GreyKite
Greykite is a time series forecasting library released by LinkedIn. The library can handle complex time series data and provides a range of capabilities including automated feature engineering, exploratory data analysis, predictive pipelines, and model tuning.
fromgreykite.common.data_loaderimportDataLoader
fromgreykite.framework.templates.autogen.forecast_configimportForecastConfig
fromgreykite.framework.templates.autogen.forecast_configimportMetadataParam
fromgreykite.framework.templates.forecasterimportForecaster
fromgreykite.framework.templates.model_templatesimportModelTemplateEnum
# Defines inputs
df=DataLoader().load_bikesharing().tail(24*90) # Input time series (pandas.DataFrame)
config=ForecastConfig(
metadata_param=MetadataParam(time_col="ts", value_col="count"), # Column names in `df`
model_template=ModelTemplateEnum.AUTO.name, # AUTO model configuration
forecast_horizon=24, # Forecasts 24 steps ahead
coverage=0.95, # 95% prediction intervals
)
# Creates forecasts
forecaster=Forecaster()
result=forecaster.run_forecast_config(df=df, config=config)
# Accesses results
result.forecast # Forecast with metrics, diagnostics
result.backtest # Backtest with metrics, diagnostics
result.grid_search # Time series CV result
result.model # Trained model
result.timeseries # Processed time series with plotting functions
Summarize
We can see that the main functions of these time series libraries have two directions, one is the generation of features, and the other is the integration of multiple time series forecasting models, so whether they are dealing with univariate or multivariate data, they can satisfy Our needs, but the specific use depends on the specific needs and usage habits.
https://avoid.overfit.cn/post/45451d119a154aeba72bf8dd3eaa9496
Author: Joanna