2023 Certification Cup Little America Competition (Question A): Sunspot Prediction | Modeling Analysis, Senior Lulu leads the team to guide the entire article code ideas

I am Senior Lulu, studying at Shanghai Jiao Tong University. So far, I have helped 200+ people complete modeling and idea building~
Let’s take a look. Question A of the Certification Cup!
Insert image description here
The complete content can be obtained at the end of the article!

Question restatement

Question A (MCM): Sunspot prediction

Sunspots are a phenomenon on the Sun's photosphere that appear as spots that are temporarily darker than the surrounding area. These spots are caused by a decrease in surface temperature caused by magnetic flux concentration, thereby inhibiting convection. Sunspots usually occur in active areas, often in pairs, with opposite magnetic properties. Their numbers vary with the roughly 11-year solar cycle. Each sunspot or group of sunspots may last from several days to several months before eventually decaying. Sunspots expand and contract as they move across the Sun's surface, ranging from 16 kilometers (10 miles) [1] to 160,000 kilometers (100,000 miles) in diameter. Some larger sunspots can even be seen from Earth without a telescope[2]. When they first appear, they may move at a relative speed, or correct motion, of several hundred meters per second.

Solar cycles typically last about 11 years, with their length ranging from just under 10 to just over 12 years. The highest point of sunspot activity in a cycle is called solar maximum, and the lowest point is called solar minimum. This cycle also affects other solar activities and is related to the changing polarity of the sun's magnetic field.

Sunspot numbers also change over longer periods of time. For example, during the modern maximum period from 1900 to 1958, sunspot counts showed an upward trend; during the following 60 years, the trend was mainly downward [3]. Overall, the last time the sun was active to its modern maximum was more than 8,000 years ago [4].

Because of the correlation of sunspots with other solar activity, they can be used to help predict space weather, the state of the ionosphere, and conditions related to shortwave radio propagation or satellite communications. Although several models based on time series analysis, spectral analysis and neural networks have been used to predict sunspot activity, the results have generally been poor. This may be related to the fact that most predictive models are phenomenological at the data level. While we generally know the length of the solar activity cycle, the cycle is not completely stable, the maximum intensity of activity changes over time, and the timing and duration of the peak are difficult to predict accurately.

We are tasked with predicting sunspots and often need to average the results on a monthly basis. You and your team have been asked to develop sound mathematical models that predict sunspots with as much confidence as possible. Relevant observational data are publicly available at many observatories and space science research institutions, including observations of historical sunspot numbers, sunspot areas, and other indicators that may be relevant.

Specific tasks include:

  1. Please predict the start and end times of the current solar cycle and the next solar cycle;
  2. Please predict the occurrence time and duration of solar maximum in the next solar cycle;
  3. Predict the number and area of ​​sunspots for the current solar cycle and the next solar cycle, and explain the reliability of the model in your paper.

Question one

Use Prophet to solve the problem of sunspot periodic prediction.
Prophet is an open source time series forecasting tool developed by Facebook, designed to simplify the forecasting process of time series data. Prophet is particularly suitable for data with seasonal and holiday effects, such as sales data, weather data, etc. Here are some of Prophet’s key features and benefits of using it:

  1. Seasonality and holiday models: Prophet is able to automatically handle strong seasonal and holiday effects, making it particularly suitable for time series data that contain these characteristics. Users can improve model accuracy by adding custom holiday effects.

  2. Handling missing data: Prophet can effectively handle missing values ​​in the data, which is very helpful for common problems in practical applications.

  3. Interpretability: Prophet generates highly interpretable models that provide detailed information on trends, seasonality, and holiday effects. This allows users to understand how the model makes its predictions.

  4. Quick modeling: Prophet is relatively simple to use and does not require users to have in-depth knowledge of complex time series modeling techniques. This allows even users without professional background to get started quickly.

  5. Flexible trend model: Prophet adopts a flexible trend model that can adapt to changes in data in different time periods. Users can choose to add or remove trend components according to actual conditions.

  6. Visualization Tools: Prophet provides a wealth of tools for visualizing forecast results, including charts for trends, seasonal components, and uncertainty bounds.

  7. Open source and flexibility: Prophet is an open source tool and users are free to use and modify it, extending and adjusting it to their needs.

The main goal of Prophet is to enable more people to perform time series analysis and forecasting effectively by providing simple yet powerful time series forecasting tools. It is designed to make time series modeling more intuitive and easy to understand, thereby lowering the barrier to entry and enabling a wider range of users to benefit from it.

Modeling ideas

  1. Install Prophet: First, make sure you have Prophet installed. You can install via pip using the following command:

    pip install prophet
    
  2. Import the necessary libraries: Import the required libraries and modules in Python:

    import pandas as pd
    from fbprophet import Prophet
    from fbprophet.diagnostics import cross_validation
    from fbprophet.diagnostics import performance_metrics
    from fbprophet.plot import plot_cross_validation_metric
    import matplotlib.pyplot as plt
    
  3. Prepare data: Prepare historical sunspot activity data into the format required by Prophet, including "ds" (date and time stamp) and "y" (target value) :

    # 假设df是您的数据框,包含日期和太阳黑子计数
    df = pd.read_csv("your_data.csv")
    df.columns = ['ds', 'y']
    
  4. Create and fit the Prophet model: Use Prophet to create the model and fit the training data:

    model = Prophet()
    model.fit(df)
    
  5. Generate future time points: Generate future time points for prediction:

    future = model.make_future_dataframe(periods=365)  # 365天的未来时间点
    
  6. Make a prediction: Use the model to make a prediction:

    forecast = model.predict(future)
    
  7. Model performance evaluation: Use cross-validation to evaluate model performance and calculate the root mean square error (RMSE):

    df_cv = cross_validation(model, initial='730 days', period='180 days', horizon='365 days')
    df_p = performance_metrics(df_cv)
    print(df_p['rmse'].values[0])
    

    Here, initial is the historical number of days of training data, period is the number of days for each rolling, horizon is the number of days to be Predicted number of days into the future.

  8. Visualized results: Use the visualization tools provided by Prophet to view the prediction results:

    fig = model.plot(forecast)
    plt.show()
    

Question 2

When using Prophet to predict the occurrence time and duration of solar maximum, the following modeling ideas can be used:

  1. Data loading and preparation:

    • Load historical sunspot count data into a Pandas data frame, making sure the data contains a date timestamp ('ds') and a target value ('y', sunspot count).
    • Exploratory Data Analysis (EDA): Visualize historical data to understand trends, seasonality, and other characteristics of sunspot counts.
  2. Add holiday effects:

    • Add appropriate holiday effects based on the periodicity of solar maxima. Solar cycles are often related to the 11-year Schwabe cycle, and this feature can be captured by adding custom holidays to the model.
  3. Create and fit the Prophet model:

    • Initialize the Prophet model and set some parameters as needed, such asyearly_seasonality, to consider annual seasonality.
    • Fit the model using historical data.
  4. Generate future time points:

    • Use the make_future_dataframe function to generate a time point in the future for prediction. This period should include the forecast period of solar maximum.
  5. Make a prediction:

    • Use the fitted model to predict future sunspot counts. Prophet provides predictions for each point in time, including the time of solar maximum and predicted sunspot count.
  6. Results interpretation and visualization:

    • Analyze the prediction results generated by Prophet, paying attention to the occurrence time and duration of solar maximum.
    • Use visualization tools to visualize forecast results, including historical data, forecast values, and uncertainty bounds.
  7. Model evaluation:

    • The model is evaluated, using techniques such as cross-validation to check the accuracy of predictions of solar maximum.
  8. Adjust models and parameters:

    • Based on the results of model evaluation, adjust model parameters and add or modify holiday effects to improve forecast accuracy.
  9. Generate a paper or report:

    • Explain the model's predictions, including estimates of time of occurrence and duration, in a report or paper. Discuss the reliability, strengths, and possible limitations of the model.

The following is an example of Python code based on the above ideas:

import pandas as pd
from fbprophet import Prophet
import matplotlib.pyplot as plt

# 1. 数据加载
df = pd.read_csv("your_data.csv")
df.columns = ['ds', 'y']

# 2. 添加节假日效应
# 根据实际情况,添加适当的自定义节假日效应

# 3. 创建并拟合Prophet模型
model = Prophet(yearly_seasonality=True, holidays=holidays)  # 根据需要调整参数
model.fit(df)

# 4. 生成未来时间点
future = model.make_future_dataframe(periods=365)  # 365天的未来时间点

# 5. 进行预测
forecast = model.predict(future)

# 6. 结果解释和可视化
fig = model.plot_components(forecast)  # 可视化趋势、季节性等组件
plt.show()

# 7. 模型评估
# 进行交叉验证等评估方法

# 8. 调整模型和参数
# 根据评估结果,调整模型和参数

# 9. 生成论文或报告
# 在报告中解释模型的预测结果,包括太阳最大值的发生时间和持续时间的估计

The above code assumes some characteristics of the data, such as the fact that the data contains ‘ds’ (date timestamp) and ‘y’ (sunspot count).

Question three

The prediction problem of modeling sunspot number and area can be divided into the following steps, in which we use the Prophet model to predict the number and area respectively:

1. Data collection and preparation:

  • Collect historical data containing sunspot count and area, making sure the data contains date timestamp ('ds'), sunspot count ('y_count') and sunspot area ('y_area').
  • Exploratory Data Analysis (EDA): Visualize historical data to understand trends, seasonality, and other characteristics of sunspot numbers and areas.

2. Feature engineering:

  • If other relevant features are present (e.g. solar radiation, solar activity index, etc.), consider adding these features to the model to improve prediction performance.

3. Create and fit the Prophet model (number):

  • Initialize the model using Prophet and setyearly_seasonality to True to consider annual seasonality.
  • If desired, add any custom holiday effects, such as special dates related to the solar cycle.
  • The fitted model uses historical sunspot number data.

4. Generate future time points (number):

  • Use the make_future_dataframe function to generate future time points.

5. Make quantity forecasts:

  • Use the fitted quantity model to predict the number of sunspots.

6. Create and fit the Prophet model (area):

  • Repeat steps 3-5, but this time use sunspot area data.

7. Results Interpretation and Visualization:

  • Analyze the prediction results generated by Prophet and focus on the number and area of ​​sunspots during the solar cycle.
  • Use visualization tools to visualize historical data, forecast values, and uncertainty bounds.
  • You can use the plot_components function to view components such as trend and seasonality.

8. Model evaluation:

  • Both models are evaluated using techniques such as cross-validation.
  • Focus on the accuracy of predictions of sunspot number and area.

9. Generate a paper or report:

  • Explain the model's predictions in a report or paper, including estimates of sunspot number and area for the solar cycle.
  • Discuss the reliability, strengths, and possible limitations of the model.

10. Summary:

  • The prediction results of the two models are summarized, mentioning the performance and reliability of the models in predicting the number and area of ​​sunspots.
  • Emphasis is placed on the potential applications and contributions of models in solar cycle prediction.

Feasibility of the model

The feasibility of a model refers to the applicability and effectiveness of the model in solving the problem. For the problem of predicting the number and area of ​​sunspots, the Prophet model has some characteristics and advantages, which improves its feasibility:

  1. Handling Seasonality and Cyclicity: The Prophet model is specifically designed to handle time series data with seasonality and cyclicality. Since sunspot number and area are related to the solar activity cycle, Prophet's built-in seasonality analysis helps better capture these cyclical changes.

  2. Consider special dates and holiday effects: By allowing users to customize holiday effects, the Prophet model can flexibly deal with the impact of special dates on sunspot activity. This is of great significance for the prediction of special events in the solar cycle, such as solar storms.

  3. Handling missing data and outliers: Prophet can handle missing values ​​in the data and is also robust to some outliers. This is common in real data, as observing sunspots may be affected by the observing equipment or other factors.

  4. Interpretability: Forecasts generated by Prophet include trend, seasonality, and other components that help explain the model’s forecasts. This is an important advantage for solar activity researchers and decision-makers in forecasting applications.

  5. Flexibility: Prophet models can be tuned and optimized relatively easily. Users can adjust parameters and add holiday effects or other special events to improve model performance based on the specific requirements of the problem.

import pandas as pd
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric, plot_components
import matplotlib.pyplot as plt

# 1. 数据加载和准备
df = pd.read_csv("your_data.csv")
df.columns = ['ds', 'y_count', 'y_area']

# 2. 创建并拟合Prophet模型(数量)
model_count = Prophet(yearly_seasonality=True)
model_count.fit(df[['ds', 'y_count']])

# 3. 生成未来时间点(数量)
future_count = model_count.make_future_dataframe(periods=365)  # 365天的未来时间点

# 4. 进行预测(数量)
forecast_count = model_count.predict(future_count)

# 5. 创建并拟合Prophet模型(面积)
model_area = Prophet(yearly_seasonality=True)
model_area.fit(df[['ds', 'y_area']])

# 6. 生成未来时间点(面积)
future_area = model_area.make_future_dataframe(periods=365)  # 365天的未来时间点

# 7. 进行预测(面积)
forecast_area = model_area.predict(future_area)

# 8. 结果解释和可视化(数量)
fig_count = model_count.plot(forecast_count)
plt.title('Sunspot Count Forecast')
plt.show()

# 结果解释和可视化(面积)
fig_area = model_area.plot(forecast_area)
plt.title('Sunspot Area Forecast')
plt.show()

# 9. 交叉验证和性能评估
df_cv_count = cross_validation(model_count, initial='730 days', period='180 days', horizon='365 days')
df_cv_area = cross_validation(model_area, initial='730 days', #见完整版

For more information, you can click on the business card below to learn more, and let Senior Xiaolu lead you on the road to winning the Certification Cup! Stay tuned to see what our efforts will bring! Remember to follow Senior LuluYeah

Guess you like

Origin blog.csdn.net/Tech_deer/article/details/134730274