See how TensorFlow, the #1 deep learning framework, performs time series prediction!

Abstract: In 2017, the deep learning framework attention ranking tensorflow occupied the top spot with an absolute advantage. This article introduces the application of TensorFlow in time series prediction through a small example.

For more in-depth articles, please follow: https://yq.aliyun.com/cloud

TensorFlow is an open source software library for numerical computation using data flow graphs. The nodes (Nodes) represent mathematical operations in the graph, and the lines (edges) in the graph represent the multi-dimensional data arrays, that is, tensors, which are interconnected between nodes. Its flexible architecture allows you to perform computations on a variety of platforms, such as one or more CPUs (or GPUs) in desktop computers, servers, mobile devices, and more. TensorFlow was originally developed by researchers and engineers in the Google Brain group (part of Google's Machine Intelligence Research Institute) for research in machine learning and deep neural networks, but the generality of the system makes it widely available for other computations as well field.

Time series analysis has important implications in econometrics and financial analysis, but can also be applied to understanding trends, making decisions and responding to changes in behavioral patterns. Among them, for example, a MapR Fusion Data Platform customer, a major oil and gas supplier, puts sensors on wells, sends the data to MapR Streams, and then uses it to trend well conditions such as volume and temperature. In finance, time series analysis is used for the forecasting of stock prices, prices of assets and commodities. Econometricians have long used the "autoregressive moving average model of difference" (ARIMA) model for univariate forecasting.

The ARIMA model has been used for decades and is well understood. However, with the rise of machine learning, and more recently deep learning, other modalities are being explored and exploited.

Deep learning (DL) is a branch of machine learning based on a set of algorithms that attempts to make high-level abstractions and then model data by using an artificial neural network (ANN) architecture composed of multiple nonlinear transformations. One of the more popular DL neural networks is the Recurrent Neural Network (RNN). RNNs are a class of neural networks that depend on the sequential nature of their inputs. Such input can be text, speech, time series, and the occurrence of an element in the sequence depends on the elements that appear before it. For example, the next word in a sentence, if someone writes "grocery" is most likely to be "shop" rather than "school". In this case, given this sequence, the RNN might predict a store rather than a school.

Artificial Neural Networks

In fact , it turns out that while neural networks are sometimes intimidating structures, the mechanism that makes them work is surprisingly simple: stochastic gradient descent. For each parameter in our network (like weights or biases), all we have to do is calculate the derivative of the parameter with respect to the loss, and fine-tune it a little in the opposite direction.

ANNs use a method called backpropagation (for those who want to understand the BP algorithm, you can refer to the BP algorithm for bidirectional transmission, chain derivation is the most tangled) to adjust and optimize the results. Backpropagation is a two-step process where the input is fed into the neural network via forward propagation and multiplied with (initially random) weights and biases before being transformed through an activation function. The depth of your neural network will depend on how much your input should be transformed. Once the forward propagation is complete, the backpropagation step adjusts the error by computing the partial derivatives of the weights that produce the error. Once the weights are adjusted, the model repeats the process of forward and backpropagation steps to minimize the error rate until convergence. In the image below you can see that this is an ANN with only one hidden layer, so backpropagation does not need to perform multiple gradient descent calculations.

a6161ec8ae12d8dd8af2ae247cc3f1613e52e265

Recurrent Neural Network

Recurrent Neural Networks (RNNs) are called recurrent because they perform the same computation on all elements in the input sequence. RNNs are becoming very popular due to their widespread use. They can analyze time series data, such as stock prices, and provide forecasts. In self-driving systems, they can predict car trajectories and help avoid accidents. They can take sentences, documents or audio samples as input, and they can also be applied to natural language processing (NLP) systems such as automatic translation, speech-to-text or sentiment analysis.

6f5d7ed86bf86419c5e5997ff7a5e66b11d40a6f

The image above is an example of an RNN architecture, and we see that xt is the input for time step t. For example, x1 might be the first price of a stock in time period 1. st is the hidden state at time step tn and is computed based on the previous hidden state and the input at the current step using an activation function. St-1 is normally initialized to zero. ot is the output of step t. For example, if we wanted to predict the next value in the sequence, it would be a vector of probabilities in our time series.

The hidden layers of RNNs grow by relying on the hidden state or memory of previous inputs, capturing what has been seen so far. The value of the hidden state at any point in time is the result of a function calculation between the hidden state value at the previous time step and the input value at the current time. RNNs have a different structure than ANNs and use backpropagation through time (BPTT) to compute gradient descent after each iteration.

A small example:

This example is done using a small MapR cluster of 3 nodes. This example will use the following:

Python 3.5
TensorFlow 1.0.1
Red Hat 6.9
If you use Anaconda, you need to ensure that you can install TensorFlow 1.0.1 version on your local machine. This code will not work on TensorFlow < 1.0 versions. If the TensorFlow version is the same, it can be run on the local machine and transferred to the cluster. Other deep learning libraries to consider are MXNet, Caffe2, Torch and Theano. Keras is another deep learning library that provides python packages for TensorFlow or Theano.

0b42ccea1c95c56996a5320e51c4fae0772a81af

MapR offers user-favorite Jupyter Notebook (or Zeppelin) integration. What we will show here is the tail end of the data pipeline. The real value of running RNN time series models in a distributed environment is the data pipelines you can build that push aggregated series data into a format that can be fed into a TensorFlow computational graph.

If I'm aggregating network streams from multiple devices (IDS, syslogs, etc.) and I want to predict future network traffic pattern behavior, I can use MapR Streams to build a real-time data pipeline that aggregates this data into a queue that feeds my TensorFlow model. For this example, I'm using only one node on the cluster, but I can have TensorFlow installed on the other two nodes and can have three TF models running with different hyperparameters.

For this example, I generated some dummy data.

14decf4632b8d6c8bac257ab9d9b85534c69c16d

ee5bc477e09ee15c28087137ddc0dcfa8ca03e5c

We have 209 observations in our data. I want to make sure that I have the same number of observations for each batch of inputs.

What we see is that our training dataset consists of 10 batches with 20 observations. Each observation is a sequence of single values.

922d1232fd95f373681eb4353c778478d56a1412

Now that we have our data, let's create a TensorFlow graph that will perform the computation.

9e670be1e31241ad8c9ea4eb987883fb29520ed2

There's a lot going on here. For example we are specifying the number of periods we use to forecast. We specify our variable placeholders. We initialize a used RNN cell (size 100) and the type of activation function we want. ReLU stands for "Rectified Linear Unit" and is the default activation function, but can be changed to Sigmoid, Hyberbolic Tangent (Tanh), etc. if desired.

We want our output to be in the same format as our input, and we can use a loss function to compare our results. In this case, we use mean squared error (MSE) because this is a regression problem and our goal is to minimize the difference between actual and predicted. If we are dealing with classification results, we might use cross-entropy. Now that we have defined this loss function, we can define the training operation in TensorFlow, which will optimize our input and output network. To perform the optimization, we will use the Adam optimizer. The Adam optimizer is a good general-purpose optimizer that implements gradient descent via backpropagation.

Now is the time to implement this model on our training data.

6b6e10d89a9dd1e71067e9c99c8b3a6f0265f7e0

We will specify the number of iterations/epochs our batch training sequence loops over. Next, we create our graph object ( tf.Session() ), and initialize our data to be fed into the model as we traverse epochs. The abbreviated output shows the MSE after every 100 epochs. As our model feeds data forward and backpropagation runs, it adjusts the weights applied to the input and runs another training epoch, our MSE continues to improve (reduce). Finally, once the model is complete, it will take the parameters and apply them to the test data, giving the predicted output of Y.

Let's see how far our predictions are from reality. For our test data, we focus on the last 20 epochs of the entire 209 epochs.

95079f18c392a097a31241a4a6af0a4f56bb0a05

3fee4633397851c23f0d62637ee0633d985ecfbd

It looks like this has some room for improvement. This can be done by changing the number of hidden neurons or increasing the number of iterations. Optimizing our schema was a trial and error process, but we had a good start. This is random data, so we expect great results, but maybe applying this model to a real-time series will put some competitive pressure on the ARIMA model.

With the advent of RNNs (and deep learning), data scientists have more options available to solve more interesting problems. A question many data scientists face is, once we have optimized, how do we automate our analytical runs? Having a platform like MapR allows this capability as you can build, train, test and optimize your models in large data environments. In this example, we only used 10 training batches. If my data allows me to utilize hundreds of batches instead of just 20 epochs, I think I can definitely improve this model. Once I do, I can package it into an automation script to run in a single node, a GPU node, a Docker container. This is the power of data science and deep learning on a converged data platform.

Hope the above article can help you understand TensorFlow.

More reading:

Read the blog "TensorFlow on MapR Tutorial: A Perfect Place to Start"
Read the blog "Deep Learning: What Are My Options?"
Read the blog "Scalable Machine Learning on the MapR Fusion Data Platform via SparkR and H2O"
Hope the above introduction can help you! 

This article was recommended by Beiyou@爱可可-爱生活 teacher, and translated by Aliyun Yunqi Community Organization.

The original title of the article "Applying Deep Learning to Time Series Forecasting with TensorFlow", Author :

Justin Brandenburg Translator: Yuan Hu Reviewer: Dong Zhaonan For the purpose of self-translation and publication, if you find that the copyright of the original author is violated, please contact the community to deal with it [email protected]













Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326179620&siteId=291194637