Hertz Stock Quantitative Trading Software: Neural Network Experiments (Part 5): Regularizing the Input Parameters Transferred to the Neural Network

Overview

After some reflection on the results of previous experiments, I started thinking about how to improve the training effectiveness and profitability of the intelligent system we had previously developed.

Today I will emphasize the importance of signals, i.e. transmitting data to neural networks for analysis and prediction of future outcomes. This is probably the most important component of a neural network. I want to convey to my readers the importance of understanding signals so that you can avoid annoying misunderstandings such as "I used the most advanced library and it didn't work". In the previous article, Hertzian quantization was applied in some interesting ways. Hertz Quantitative Trading Software will now attempt to normalize indicator values ​​before transmitting the data.

As usual, I'll try to explain everything in detail without overcomplicating it. I think everyone can figure it out.

The importance of normalizing input before passing it to neural networks

Normalization of input is an important step in the data preparation phase for training a neural network. This process allows Hertzian quantization to bring the data input into a certain range, which helps improve the stability and speed of training convergence.

In this article, Hertzian Quantization will examine why normalization is an important step in neural network training and what normalization methods can be used.

What is input normalization?

Normalization of input involves transforming the input data so that it has a certain range of values. The two main methods of normalization are by mean and standard deviation (z-normalization), and by minimum and maximum values ​​(min-max normalization).

Z-normalization uses the mean and standard deviation to center and scale the data. To do this, subtract each value from the mean and divide by the standard deviation. Min-max normalization takes the minimum and maximum values ​​to scale the data to a given range.

Why is normalization of input important?

Normalization of the input is important to improve the stability and rate of training convergence. If the input data is not normalized, some parameters may have a large range of values, which may cause problems with neural network training. For example, gradients can become too large or too small, leading to optimization problems and poor prediction accuracy.

Normalization also allows to speed up the training process, since the convergence of the optimization algorithm can be improved. Properly normalized data also helps avoid overfitting problems that can occur when the input data lacks representativeness.

What normalization methods can be used?

The normalization method may vary depending on the data type and the problem we are trying to solve. For example, the most common normalization methods for images are Z-normalization and min-max normalization. However, for other types of data, such as audio signals or text data, other normalization methods may be more effective.

For example, for audio signals, maximum amplitude normalization is often used, where all signal values ​​are scaled between -1 and 1. For text data, it can be useful to normalize by the number of words or characters in the sentence.

Furthermore, in some cases it is useful to normalize not only the input data but also the target variables. For example, in a regression problem, if the target variable has a large numerical range, it may be useful to normalize the target variable to improve training stability and prediction accuracy.

Normalization of input is an important step in the data preparation phase for training a neural network. This process allows us to bring the data input into a certain range, which helps improve the stability and speed of training convergence. Depending on the data type and the problem we are trying to solve, different normalization methods can be used. Furthermore, in some cases it is useful to normalize not only the input data but also the target variables.

normalization method

Minimum-Maximum Normalization

In machine learning, regularization is an important data preprocessing step to improve stability and training convergence rate. One of the most common normalization methods is min-max normalization, which allows you to fit data values ​​into the range of 0 to 1. In this article, we will look at how to apply min-max normalization to time series.

A time series is a sequence of values ​​measured at different points in time. Examples of time series include data on popularity, stock prices, or the number of items sold. Time series can be used to predict future values, analyze trends and patterns, or detect anomalies.

Time series may have different numerical ranges and may vary unevenly over time. For example, stock prices can range widely and fluctuate based on seasonality, news, and other factors. In order to effectively analyze and predict time series, it is necessary to bring the numbers into a certain range.

The min-max normalization method scales the values ​​according to the minimum and maximum values, normalizing the values ​​to the range of 0 to 1. The min-max normalized equation is as follows:

x_norm = (x - x_min) / (x_max - x_min)

where x is the data value, x_min is the minimum value in the entire data set, x_max is the maximum value in the entire data set, and x_norm is the normalized value.

Applying the Min-Max normalization method to a time series helps bring the data into a normal range of values ​​and simplifies analysis. For example, if we have heat data ranging from -30 to +30 degrees, we can apply min-max normalization to bring the values ​​into the range of 0 to 1. This will allow us to compare values ​​over different time periods and identify trends and anomalies.

However, when applying min-max normalization to a time series, it is necessary to consider the characteristics of the method, and its impact on the data. First, applying min-max normalization may result in a loss of information about the distribution of values ​​within a range. For example, if there are outliers or extreme values ​​in the data set, they will be coerced to 0 or 1 and lost in the analysis. In this case, other normalization methods can be used, such as Z-score normalization.

Secondly, when applying min-max normalization, the dynamics of data changes need to be considered. If the changing dynamics of the data are not uniform, normalization may lead to distortion of temporal shapes and anomalies. In this case, we can apply local normalization, where the minimum and maximum values ​​for each data group within a specific time interval are determined.

Third, when applying min-max normalization, the impact of samples on the analysis results needs to be considered. If the sample of observations is unbalanced, or contains peaks, normalization may lead to erroneous conclusions. In this case, alternative data processing methods can be used, such as peak removal, or data smoothing.

To sum up, the min-max normalization method is one of the most common normalization methods in machine learning and can be effectively applied to time series to bring the values ​​into the normal range. However, when applying this method, it is necessary to consider the characteristics of the data and apply other processing methods to avoid distortions in time series-based analysis and forecasting.

Example:

int OnInit()
  {

// declare and initialize the array
double data_array[] = {1.2, 2.3, 3.4, 4.5, 5.6};

// find the minimum and maximum value in the array
double min_value = ArrayMinimum(data_array, 0, ArraySize(data_array)-1);
double max_value = ArrayMaximum(data_array, 0, ArraySize(data_array)-1);

// create an array to store the normalization result
double norm_array[ArraySize(data_array)];

// normalize the array
for(int i = 0; i < ArraySize(data_array); i++) {
    norm_array[i] = (data_array[i] - min_value) / (max_value - min_value);
}

// display the result
for(int i = 0; i < ArraySize(data_array)-1; i++) {
Print("Source array: ", data_array[i]);
Print("Min-Max normalization result: ", norm_array[i]);
}

return(INIT_SUCCEEDED);
}

This code creates a data_array array containing five floating point numbers. Then, it calls the ArrayMinimum() and ArrayMaximum() functions to find the minimum and maximum values ​​in the array, creates a new array named norm_array to store the normalized results, and calculates each element (data_array[i] - min_value) / (max_value - min_value) fills it. Finally, call the Print() function to display the results on the screen.

result:

2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Source array: 1.2
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Min-Max normalization result: 0.39999999999999997
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Source array: 2.3
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Min-Max normalization result: 0.7666666666666666
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Source array: 3.4
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Min-Max normalization result: 1.1333333333333333
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Source array: 4.5
2023.04.07 13:22:32.937 11111111111111 (EURUSD,H1)      Min-Max normalization result: 1.5

Z-normalization

Time series is an important tool for data analysis, especially in the fields of economics, finance, meteorology, materials science, etc. One of the main time series preprocessing methods is Z-normalization, which helps improve the quality of data analysis.

Z-normalization is a method of centering and scaling time series. It is constructed to transform the time series in such a way that the mean of the time series is equal to zero and the standard deviation is equal to one. This is useful for comparing time series and removing the effects of seasonality and trends.

The Z-normalization process of time series includes the following steps:

  1. Calculate the average of a time series.
  2. Calculate the standard deviation of a time series.
  3. For each element of the time series, calculate the difference between its value and the time series mean.
  4. Divide each difference by the standard deviation.

The resulting values ​​have a mean of 0 and a standard deviation of 1.

Benefits of Z-routinization:

  1. Improve the quality of data analysis. Z-normalization helps remove the effects of seasonality and trends, thereby improving the quality of data analysis.
  2. Easy to use. Z-normalization is easy to use and can be applied to different types of time series.
  3. Can be used to compare time series. Z-normalization allows time series to be compared with each other because it removes the effects of different scales and units of measurement.

However, Z-regularization also has some limitations:

  1. It is not suitable for time series containing extreme values. If the time series contains extreme values, Z-normalization may skew the results.
  2. It does not work for non-stationary time series. If the time series is non-stationary (i.e. has a trend or seasonality), Z-normalization can remove these characteristics, which may lead to incorrect data analysis.
  3. Normal distribution is not guaranteed. Z-normalization can help normalize the distribution of a time series, but it does not guarantee that the distribution is completely normal.

Despite these limitations, Z-normalization is an important time series preprocessing technique that can help improve the quality of data analysis. It can be used in various fields, including economics, finance, meteorology, and materials science.

For example, in economics and finance, Z-normalization can be used to compare the performance of different assets or portfolios and analyze risk and volatility.

In meteorology, Z-normalization helps remove seasonality and trends from the analysis of weather data such as temperature or precipitation.

In materials science, Z-normalization can be used to analyze time series of material properties, such as thermal expansion or magnetism.

To sum up, Z-normalization is an important time series preprocessing technology that helps improve the quality of data analysis in various fields. Despite its limitations, Z-normalization is easy to use and can be applied to different types of time series.

Guess you like

Origin blog.csdn.net/herzqthz/article/details/133683845