Introduction and code samples of 7 latest time series analysis libraries

Time series analysis involves examining data points collected over time with the goal of identifying patterns and trends that can inform future forecasts. We have introduced many time series analysis libraries, but as time goes by, new libraries and updates are constantly appearing, so this article will share 8 commonly used Python libraries for dealing with time series problems. They are tsfresh, autots, darts, atspy, kats, sktime, greykite.

1、Tsfresh

Tsfresh is powerful in time series feature extraction and selection. It is designed to automatically extract a large number of features from time series data and identify the most relevant ones. Tsfresh supports multiple time series formats and can be used in various applications such as classification, clustering, and regression.

 importpandasaspd
 fromtsfreshimportextract_features
 fromtsfresh.utilities.dataframe_functionsimportmake_forecasting_frame
 
 # Assume we have a time series dataset `data` with columns "time" and "value"
 data=pd.read_csv('data.csv')
 
 # We will use the last 10 points to predict the next point
 df_shift, y=make_forecasting_frame(data["value"], kind="value", max_timeshift=10, rolling_direction=1)
 
 # Extract relevant features using tsfresh
 X=extract_features(df_shift, column_id="id", column_sort="time", column_value="value", impute_function=impute)

2、AutoTS

autots is another Python library for time series forecasting:

  • Various algorithms for univariate and multivariate time series forecasting are provided, including ARIMA, ETS, Prophet, and DeepAR.
  • Perform automatic model ensemble for the best model.
  • Confidence interval predictions with upper and lower bounds are provided.
  • Process data by learning optimal NaN imputation and outlier removal.
 fromautots.datasetsimportload_monthly
 
 df_long=load_monthly(long=True)
 
 fromautotsimportAutoTS
 
 model=AutoTS(
     forecast_length=3,
     frequency='infer',
     ensemble='simple',
     max_generations=5,
     num_validations=2,
 )
 model=model.fit(df_long, date_col='datetime', value_col='value', id_col='series_id')
 
 # Print the description of the best model
 print(model)

3、darts

darts (Data Analytics and Real-Time Systems) has a variety of time series forecasting models, including ARIMA, Prophet, various variants of exponential smoothing, and various deep learning models such as LSTMs, gru, and tcn. Darts also has built-in methods for cross-validation, hyperparameter tuning, and feature engineering.

A key feature of darts is the ability to make probabilistic predictions. This means that not only a single point forecast can be generated for each time step, but also a distribution of possible outcomes can be generated, giving a more complete understanding of the uncertainty in the forecast.

 importpandasaspd
 importmatplotlib.pyplotasplt
 
 fromdartsimportTimeSeries
 fromdarts.modelsimportExponentialSmoothing
 
 # Read data
 df=pd.read_csv("AirPassengers.csv", delimiter=",")
 
 # Create a TimeSeries, specifying the time and value columns
 series=TimeSeries.from_dataframe(df, "Month", "#Passengers")
 
 # Set aside the last 36 months as a validation series
 train, val=series[:-36], series[-36:]
 
 # Fit an exponential smoothing model, and make a (probabilistic) 
 # prediction over the validation series’ duration
 model=ExponentialSmoothing()
 model.fit(train)
 prediction=model.predict(len(val), num_samples=1000)
 
 # Plot the median, 5th and 95th percentiles
 series.plot()
 prediction.plot(label="forecast", low_quantile=0.05, high_quantile=0.95)
 plt.legend()

4、AtsPy

atspy, one can simply load the data and specify the model to test, as shown in the code below.

 # Importing packages
 importpandasaspd
 fromatspyimportAutomatedModel
 
 # Reading data
 df=pd.read_csv("AirPassengers.csv", delimiter=",")
 
 # Preprocessing data 
 data.columns= ['month','Passengers']
 data['month'] =pd.to_datetime(data['month'],infer_datetime_format=True,format='%y%m')
 data.index=data.month
 df_air=data.drop(['month'], axis=1)
 
 # Select the models you want to run:
 models= ['ARIMA','Prophet']
 run_models=AutomatedModel(df=df_air, model_list=models, forecast_len=10)

This package provides a set of fully automated models. include:

5、kats

kats (kit to Analyze Time Series) is a Python library developed by Facebook (now Meta). The three core features of this library are:

Model Forecasting: Provides a complete set of forecasting tools, including 10+ individual forecasting models, ensembles, meta-learning models, backtesting, hyperparameter tuning, and empirical forecasting intervals.

Detection: Kats supports functions for detecting various patterns in time series data, including seasonality, anomalies, change points, and slow trend changes.

Feature extraction and embedding: The time series feature (TSFeature) extraction module in Kats can generate 65 features with clear statistical definition, which can be applied to most machine learning (ML) models, such as classification and regression.

 # pip install kats
 
 importpandasaspd
 fromkats.constsimportTimeSeriesData
 fromkats.models.prophetimportProphetModel, ProphetParams
 
 # Read data
 df=pd.read_csv("AirPassengers.csv", names=["time", "passengers"])
 
 # Convert to TimeSeriesData object
 air_passengers_ts=TimeSeriesData(air_passengers_df)
 
 # Create a model param instance
 params=ProphetParams(seasonality_mode='multiplicative')
 
 # Create a prophet model instance
 m=ProphetModel(air_passengers_ts, params)
 
 # Fit model simply by calling m.fit()
 m.fit()
 
 # Make prediction for next 30 month
 forecast=m.predict(steps=30, freq="MS")
 forecast.head()

6、Sktime

sktime is a library for time series analysis built on top of scikit-learn and follows a similar API, making it easy to switch between the two libraries. Here is an example of how to use Sktime for time series classification:

 fromsktime.datasetsimportload_arrow_head
 fromsktime.classification.composeimportTimeSeriesForestClassifier
 fromsktime.utils.samplingimporttrain_test_split
 
 # Load ArrowHead dataset
 X, y=load_arrow_head(return_X_y=True)
 
 # Split data into train and test sets
 X_train, X_test, y_train, y_test=train_test_split(X, y)
 
 # Create and fit a time series forest classifier
 classifier=TimeSeriesForestClassifier(n_estimators=100)
 classifier.fit(X_train, y_train)
 
 # Predict labels for the test set
 y_pred=classifier.predict(X_test)
 
 # Print classification report
 fromsklearn.metricsimportclassification_report
 print(classification_report(y_test, y_pred))

7、GreyKite

Greykite is a time series forecasting library released by LinkedIn. The library can handle complex time series data and provides a range of capabilities including automated feature engineering, exploratory data analysis, predictive pipelines, and model tuning.

 fromgreykite.common.data_loaderimportDataLoader
 fromgreykite.framework.templates.autogen.forecast_configimportForecastConfig
 fromgreykite.framework.templates.autogen.forecast_configimportMetadataParam
 fromgreykite.framework.templates.forecasterimportForecaster
 fromgreykite.framework.templates.model_templatesimportModelTemplateEnum
 
 # Defines inputs
 df=DataLoader().load_bikesharing().tail(24*90)  # Input time series (pandas.DataFrame)
 config=ForecastConfig(
      metadata_param=MetadataParam(time_col="ts", value_col="count"),  # Column names in `df`
      model_template=ModelTemplateEnum.AUTO.name,  # AUTO model configuration
      forecast_horizon=24,   # Forecasts 24 steps ahead
      coverage=0.95,         # 95% prediction intervals
  )
 
 # Creates forecasts
 forecaster=Forecaster()
 result=forecaster.run_forecast_config(df=df, config=config)
 
 # Accesses results
 result.forecast     # Forecast with metrics, diagnostics
 result.backtest     # Backtest with metrics, diagnostics
 result.grid_search  # Time series CV result
 result.model        # Trained model
 result.timeseries   # Processed time series with plotting functions

Summarize

We can see that the main functions of these time series libraries have two directions, one is the generation of features, and the other is the integration of multiple time series forecasting models, so whether they are dealing with univariate or multivariate data, they can satisfy Our needs, but the specific use depends on the specific needs and usage habits.

https://avoid.overfit.cn/post/45451d119a154aeba72bf8dd3eaa9496

Author: Joanna

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/130025218