Machine learning based on the C # - Deep Belief Network

 

 

 

 

We've all heard the depth of learning, but how many people know what the depth of faith networks are? Let us begin to answer this question from this chapter. Deep belief network is a very advanced machine learning form and is evolving rapidly. As a developer of machine learning, have a certain understanding of this concept is very important, so when you come across it or it encounters that you will be very familiar with it!

In machine learning, the depth of faith network is technically a deep neural network. We should point out the depth of meaning, when it comes to depth or depth study beliefs, it means that the network is a multi-layer (hidden units) composed. Depth belief networks, the connection between each neuron in one layer, but not between the different layers. A deep belief network can be trained to unsupervised learning, in order to re-establish the probability of the input network. These layers as "feature detector", as may be identified or classified images, letters and the like.

In this chapter, we will include:

Restricted Boltzmann Machine

Creating and training a deep belief network in c #

 

Restricted Boltzmann Machine

One popular method of constructing depth belief networks which is set as a stratification Restricted Boltzmann Machine (RBMs) a. These RMBs automatic functions as an encoder, each hidden layer as a lower layer visible. Deep belief network will provide multi-layer phase of training before RBMs, then feed network provides a front for the fine-tuning stage. The first step in training is to learn one of the properties from the visible unit. The next step is to get activated from the previously trained properties and make them visible to a new unit. Then we repeat the process, so that we can learn more features in the second hidden layer. Then continue the process for all the hidden layers.

We should provide two pieces of information here.

First, we should explain a little what is automatic encoder. Automatic encoder is a feature of learning core. They encode input (input vector is typically compressed with important features) and data reconstructed by unsupervised learning.

Secondly, we should note that the RBMs stacked on a deep belief network is just one way to correct this problem. Will limit the linear unit (ReLUs) stack up, plus deleted and training, and then back propagation Plus, which has once again become the most advanced technology. I repeat, because 30 years ago, the supervision method is feasible. Instead of letting the algorithm to view all the data and determine the features of interest, sometimes we humans can actually find better features we want.

I think the depth of belief networks of the two most important features are as follows:

l has a valid, layer by layer from top to bottom of the learning process, generates a weight. It determines how a variable dependent on the layer above it in a variable layer.

L After completion of the learning, the value of each layer can easily variable through a single bottom-up traversal inferred that a traversal starts from the observed data vector underlying generate weights using reverse remodeling data.

Having said that, we now turn to RBMs and general Boltzmann machines.

Boltzmann machine is a recurrent neural network, with no difference between binary and a unit to the side. Undirected edge means (or links) are bidirectional, they do not point to any particular direction.

The following is an undirected edges with undirected graph:

Boltzmann machine is one of the earliest neural network capable of learning internal representation, as long as there is enough time, they will be able to solve the problem. However, they are not good at stretching, which brings us to our next topic, RBMs.

RBMs are introduced in order to solve the Boltzmann machine can not scale problems. They have hidden layer, each connection between the hidden units is restricted, but not outside of these units, which helps to improve the efficiency of learning. More formally, we must examine a number of graph theory to explain this properly.

RBMs neurons is necessary to form a bipartite graph, which is a more advanced theory FIG form; may have asymmetric connection between a pair of nodes of each group from the two sets of cells (visible layer and the hidden layer). There can not be any connection between the group of nodes. Bipartite graph, sometimes called biological FIG, is a set of graphs vertex into two disjoint sets, so that no two adjacent vertices in the same set.

Here's a good example, it will help visualize this topic.

Note that the same group is not connected (to the left is red, the right is black), but there is a connection between two groups:

More formally, RBM is called symmetric bipartite graph. This is because the input of all visible nodes are passed to all the hidden nodes. We say that symmetry is visible because each node is associated with a hidden node.

We assume that the RBM shows an image of cats and dogs, we have two output nodes, each animal a. As we move through learning, our RBM ask yourself: "? For the pixels I see that I should send a stronger signal to the weight of a cat or dog you" in it wanted to know "as a dog, I should see which pixel distribution "my friends, this is the lesson today on the joint probability:? while in the case of the probability of X given a and X are given. In our case, this joint probability is expressed as weight between two layers of heavy, it is an important aspect of RBMs.

We now talk about reconstruction, which is an important part of rbms. In the example we have discussed, we are learning a set of images which pixel groups appear (ie open). When a hidden layer node is an important weight to activate (no matter what it decided to open the weight is), which represents the common occurrence of what is happening in our case, it is a dog or a cat. If this is a cat, pointed ears, a round face, small eyes may be what we're looking for. + + Long tail big ears big nose can make your image become a dog. These activation represents RBM "think" like the original data. In fact, we are rebuilding the original data.

We should also quick to point out, RBM has two bias, instead of one. This is very important because it is different from other automatic coding algorithm. Hidden deviation of RBM help us produce what we need to activate in the forward pass, but visible to the deviation help us learn correct reconstruction in the backward pass. Hidden bias is important because its main job is to ensure that no matter how sparse our data, some nodes will be triggered. Later you will see how this will affect the dream of a network of deep faith.

     Stratified

Once we understand the structure of the RBM to the input data, it is associated with activation performed in the first hidden layer, the data is passed to the next hidden layer. The first hidden layer then become the new visible layer. Activation we created in the hidden layer is now our input. They will be multiplied by a new weight in the hidden layer, to produce another set of activation.

This process continues in all the hidden layers of our network. Hidden layers become visible layer, we have another hidden layer, we will use its weight, then repeat. Right after adjustment for each new generation hidden layer will be heavy, until we are able to identify so far input from the previous layer.

To explain in more detail, which is technically known as unsupervised, greedy, stratified training. Improvements in each layer does not need to input value, which means that do not involve any type of external influence. This further means that we should be able to use our algorithm to train unsupervised data have not seen before.

As we have emphasized, the more data we have, the better our results! With each layer of the image quality is getting better, more and more accurate, we can better through each hidden layer to improve our ability to learn, and weights role is to guide us to the right image classification in the learning process.

But when discussing the reconstruction, we should point out, one at a time to work in the digital reconstruction (weight) is non-zero, which indicates that our data from the RBM learned something. In a sense, you can process digital image processing to return the same percentage index. The higher the number, the more confidence in the algorithm what it saw. Remember, we have a master data set we want to return, we have a reference data set for our reconstruction. When our RBM through each image, it does not know what it is in the process image; that's what it's trying to determine.

Let's take a moment to clarify something. When we say that we use the greedy algorithm, we really mean is that our RBM will use the shortest path to get the best results. We have seen images from randomly selected pixels and pixel test which lead us to find the right answer.

RBM will test each hypothesis based on the master data set (test set), which is our ultimate goal right. Remember that every image we just try to set a pixel classification. The pixel includes the features and characteristics of the data. For example, a pixel may have a different brightness, which represents a dark pixel may border, light may indicate pixel numbers, and the like.

But if things do not go as we want, what will happen? If we do not correct what happens at a given step learned something? If this happens, it means that our algorithm guessed wrong. We have to do is go back and try again. This is not as bad as it looks, but also not look so time-consuming.

Of course, wrong to assume come with a cost in time, but the ultimate goal is that we must learn to improve efficiency and reduce errors at each stage. Every wrong weighted connections will be punished, just as we did in reinforcement learning. These connections will reduce weight, is no longer so strong. A next traverse desired accuracy can be improved while reducing errors, and the larger the weight, the greater the impact.

Suppose we classify digital images by digital. Some images have curves, such as 2,3,6,8,9 and so on. Other numbers, such as 1, 4 and 7, it will not. Such knowledge is very important, because our RBM, will use it to continuously improve their own learning, reduce errors. If we think we are dealing with is the number 2, then the weight of this path will be heavier than the value of the right to the other path. This is an extreme oversimplification, but hopefully it's enough to help you understand what we're going to start.

When we put all these together, the theoretical framework we now have a network of deep faith. While we studied the theory in more depth than the other chapters, but as you can see examples of our work program as it will start to make sense. You will be better prepared to use it in an application, understand what happened behind the scenes.

In order to demonstrate the depth of belief networks and RBMs, we will use Mattia Fagerlund written an excellent open-source software SharpRBM. The software for the open source community has made an incredible contribution, I have no doubt you will spend hours or even days to use it. The software comes with some incredible demonstration. In this chapter, we will use the letter classification demonstration.

The screenshot below is our deep conviction test application. Have you ever thought what you dream about when the computer is asleep?

 Upper left corner of the program is to train our designated area of ​​the layer. We have three hidden layers, they need to be properly trained before testing. We can train one time, from the beginning of the first layer. The more training, the better your system:

After the training is part of the options under our progress. When we train, all the relevant information, such as generating reconstruction error, error detector, learning rate, are shown here:

The next one is our characteristic detector drawing, if you select Draw box, it will update themselves throughout the training process:

When starting a training level, we will note feature detector and reconstruction is substantially empty. They will be training with us constantly improve themselves. Remember, we are rebuilding we already know is the real thing as the training continues, the reconstructed digital becomes more and more clear, our feature detector has become increasingly clear!:

Here is a snapshot of the application during training. As shown, this is the first generation of 31, digital reconstruction is very clear.

They are still incomplete or incorrect, but you can see how much progress we have made:

Computer dreaming?

电脑做梦时会梦到什么?对我们来说,直觉是一个特征,它允许我们看到计算机在重构阶段在想什么。当程序试图重建我们的数字时,特征检测器本身将在整个过程中以各种形式出现。我们在dream window中显示的就是这些形式(红色圆圈表示):

我们花了很多时间查看应用程序的屏幕截图。我想是时候看看代码了。让我们先看看如何创建DeepBeliefNetwork对象本身:

DeepBeliefNetwork = new DeepBeliefNetwork(28 * 29, 500, 500, 1000);

一旦创建了这个,我们需要创建我们的网络训练器,我们根据我们正在训练的层的权重来做这件事:

DeepBeliefNetworkTrainer trainer = new DeepBeliefNetworkTrainer(DeepBeliefNetwork, DeepBeliefNetwork?.LayerWeights?[layerId], inputs);

这两个对象都在我们的主TrainNetwork循环中使用,这是应用程序中大部分活动发生的部分。这个循环将继续,直到被告知停止。

private void TrainNetwork(DeepBeliefNetworkTrainer trainer)
{
  try
  {
    Stopping = false;
    ClearBoxes();
    _unsavedChanges = true;
    int generation = 0;
    SetThreadExecutionState(EXECUTION_STATE.ES_CONTINUOUS | EXECUTION_STATE.ES_SYSTEM_REQUIRED);
    
while (Stopping == false)     {       Stopwatch stopwatch = Stopwatch.StartNew();       TrainingError error = trainer?.Train();       label1.Text = string.Format(       "Gen {0} ({4:0.00} s): ReconstructionError=       {1:0.00}, DetectorError={2:0.00},       LearningRate={3:0.0000}",       generation, error.ReconstructionError,       error.FeatureDetectorError,       trainer.TrainingWeights.AdjustedLearningRate,       stopwatch.ElapsedMilliseconds / 1000.0);       Application.DoEvents();       ShowReconstructed(trainer);       ShowFeatureDetectors(trainer);       Application.DoEvents();       if (Stopping)       {         break;       }
      generation
++;     }
    DocumentDeepBeliefNetwork();   }
  finally   {     SetThreadExecutionState(EXECUTION_STATE.ES_CONTINUOUS);   } }

在前面的代码中,我们突出显示了trainer.Train()方法,它是一个基于数组的学习算法,如下所示:

public TrainingError Train()
{
  TrainingError trainingError = null;
  if (_weights != null)
  {
    ClearDetectorErrors(_weights.LowerLayerSize,
    _weights.UpperLayerSize);
    float reconstructionError = 0;
    ParallelFor(MultiThreaded, 0, _testCount,
      testCase =>
      {
        float errorPart =
        TrainOnSingleCase(_rawTestCases,_weights?.Weights, _detectorError,testCase,
           _weights.LowerLayerSize,_weights.UpperLayerSize, _testCount);         
lock (_locks?[testCase % _weights.LowerLayerSize])         {           reconstructionError += errorPart;         }       }
    );     
float epsilon =     _weights.GetAdjustedAndScaledTrainingRate(_testCount);     UpdateWeights(_weights.Weights,_weights.LowerLayerSize, _weights.UpperLayerSize,_detectorError, epsilon);     trainingError = new TrainingError(_detectorError.Sum(val =>Math.Abs(val)), reconstructionError);     _weights?.RegisterLastTrainingError(trainingError);     return trainingError;   }
  return trainingError; }

此代码使用并行处理(突出显示的部分)并行地训练单个案例。这个函数负责处理输入层和隐藏层的更改,正如我们在本章开头所讨论的。它使用TrainOnSingleCase函数,如下图所示:

private float TrainOnSingleCase(float[] rawTestCases, float[] weights, float[] detectorErrors, int testCase,
int lowerCount, int upperCount, int testCaseCount)
{
  float[] model = new float[upperCount];
  float[] reconstructed = new float[lowerCount];
  float[] reconstructedModel = new float[upperCount];
  int rawTestCaseOffset = testCase * lowerCount;
  ActivateLowerToUpperBinary(rawTestCases, lowerCount,
  rawTestCaseOffset, model, upperCount, weights); // Model
  ActivateUpperToLower(reconstructed, lowerCount, model,upperCount, weights); // Reconstruction
  ActivateLowerToUpper(reconstructed, lowerCount, 0,
  reconstructedModel, upperCount, weights); //
  Reconstruction model
  return AccumulateErrors(rawTestCases, lowerCount,rawTestCaseOffset, model, upperCount, reconstructed,
    reconstructedModel, detectorErrors); // Accumulate
  detector errors
}

最后,我们在处理过程中积累错误,这就是我们的模型应该相信的和它实际做的之间的区别。

显然,错误率越低越好,对于我们的图像重建最准确。AccumulateErrors函数如下所示:

private float AccumulateErrors(float[] rawTestCases, int lowerCount, int rawTestCaseOffset, float[] model,
int upperCount, float[] reconstructed, float[] reconstructedModel, float[] detectorErrors)
{
  float reconstructedError = 0;
  float[] errorRow = new float[upperCount];
  for (int lower = 0; lower < lowerCount; lower++)
  {
    int errorOffset = upperCount * lower;
    for (int upper = 0; upper < upperCount; upper++)
    {
      errorRow[upper] = rawTestCases[rawTestCaseOffset +
      lower] * model[upper] +
      // 模型应该相信什么
      -reconstructed[lower] *
        reconstructedModel[upper];
        // 模型真正相信什么
    }
    lock (_locks[lower])
    {
      for (int upper = 0; upper < upperCount; upper++)
      {
        detectorErrors[errorOffset + upper] -= errorRow[upper];
      }
    }
    reconstructedError += Math.Abs(rawTestCases[rawTestCaseOffset + lower] - reconstructed[lower]);
  }
  return reconstructedError;
}

总结

在本章中,我们学习了RBMs、一些图论,以及如何在c#中创建和训练一个深入的信念网络。我建议你对代码进行试验,将网络层训练到不同的阈值,并观察计算机在重构时是如何做梦的。记住,你训练得越多越好,所以花时间在每一层上,以确保它有足够的数据来进行准确的重建工作。

警告:如果启用特性检测器和重构输入的绘图功能,性能将会极速下降。

如果你正在尝试训练你的图层,你可能希望先在没有可视化的情况下训练它们,以减少所需的时间。相信我,如果你把每一个关卡都训练成高迭代,那么可视化会让你感觉像一个永恒的过程!在你前进的过程中,随时保存你的网络。

在下一章中,我们将学习微基准测试,并使用有史以来最强大的开源微基准测试工具包之一!

Guess you like

Origin www.cnblogs.com/wangzhenyao1994/p/11183385.html