PyTorch-30-Hyperparameter adjustment and experiment-Training deep neural network

Hyperparameter Experimentation Using TensorBoard

  1. Prepare data
  2. Modeling
  3. Training model
  4. Analyze the results of the model
    a. Hyperparameter experiment

At this point in this series, we have learned how to build and train a CNN using PyTorch. In the previous episode, we showed how to use TensorBoard in PyTorch and reviewed the training process.

This episode is considered the second part of the previous episode, so if you haven’t watched the previous episode, please keep checking to get the details needed to understand the work we’re doing here. We are now trying to use hyperparameter values.

The best part of TensorBoard is that it has out-of-the-box features that can track our hyperparameters over time and across runs.

Change the hyperparameters and compare the results.

Without TensorBoard, this process will become more cumbersome. Okay, so what should we do?

Naming The Training Runs For TensorBoard

In order to take advantage of the TensorBoard comparison function, we need to perform multiple runs and name each run in a way that can be uniquely identified.

Using PyTorch's SummaryWriter, the operation will start when the writer object instance is created, and will end when the writer instance is closed or out of scope.

In order to uniquely identify each run, we can directly set the file name of the run, or pass the comment string to the constructor, which will be appended to the automatically generated file name.

When creating this post, the run name is contained in an attribute called log_dir in the SummaryWriter. It is created like this:

# PyTorch version 1.1.0 SummaryWriter class
if not log_dir:
    import socket
    from datetime import datetime
    current_time = datetime.now().strftime('%b%d_%H-%M-%S')
    log_dir = os.path.join(
        'runs', 
        current_time + '_' + socket.gethostname() + comment
    )
self.log_dir = log_dir

Here, we can see that the log_dir attribute is set to run + time + host + comment, which corresponds to the location on the disk and the name of the run. Of course, this assumes that the log_dir parameter has no value passed in. Therefore, this is the default behavior.

Choosing A Name For The Run

One way to name a run is to add parameter names and values ​​as comments for the run. This will allow us to see how each parameter value overlaps with other parameter values ​​when looking at the operation inside TensorBoard.

We will see that this is how we set the annotation later:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')

TensorBoard also has a query function, so we can easily isolate parameter values ​​through query.

For example, suppose this SQL query:

SELECT * FROM TBL_RUNS WHERE lr = 0.01

Without SQL, basically this is what we can do in TensorBoard.

Creating Variables For Our Hyperparameters

To make the experiment easy, we will extract the hard-coded values ​​and convert them into variables.

This is the hard-coded way:

network = Network()
train_loader = torch.utils.data.DataLoader(
    train_set, batch_size=100
)
optimizer = optim.Adam(
    network.parameters(), lr=0.01
)

Please note how to hard-code the batch_size and lr parameter values.

This is what we changed it to (now our value is set using a variable):

batch_size = 100
lr = 0.01

network = Network()
train_loader = torch.utils.data.DataLoader(
    train_set, batch_size=batch_size
)
optimizer = optim.Adam(
    network.parameters(), lr=lr
)

This will allow us to change the values ​​in a single location and make them propagate in our code.

Now, we will use the following variables to create a value for the comment parameter:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')

With this setting, we can change the value of hyperparameters, and our operation will be automatically tracked and identified in TensorBoard.

Calculate Loss With Different Batch Sizes

Since we will now change the batch size, we need to change the way we calculate and accumulate losses. It's not just adding up the losses returned by the loss function. We will adjust it to fit the batch size.

total_loss += loss.item() * batch_size

Why is this? We will average the cross_entropy loss function to calculate the loss value produced by the batch, and then return the average loss. This is why we need to consider batch size.

The cross_entropy function accepts one parameter, called reduction, which we can also use.

Reduce parameters optionally accept strings as parameters. This parameter specifies the amount of reduction to be applied to the output of the loss function.

  1. 'none'-does not reduce any costs.
  2. 'mean'-The sum of the output divided by the number of elements in the output.
  3. 'sum'-The output will be summed.

Please note that the default value is'mean'. This is why loss.item()*batch_size works.

Experimenting With Hyperparameter Values

Now that we have this setting, we can do more!

All we need to do is create some lists and some loops, and then we can run the code, sit back and wait for all combinations to run.

This is an example of what we mean:

Parameter Lists

batch_size_list = [100, 1000, 10000]
lr_list = [.01, .001, .0001, .00001]

Nested Iteration

for batch_size in batch_size_list:
    for lr in lr_list:
        network = Network()

        train_loader = torch.utils.data.DataLoader(
            train_set, batch_size=batch_size
        )
        optimizer = optim.Adam(
            network.parameters(), lr=lr
        )

        images, labels = next(iter(train_loader))
        grid = torchvision.utils.make_grid(images)

        comment=f' batch_size={batch_size} lr={lr}'
        tb = SummaryWriter(comment=comment)
        tb.add_image('images', grid)
        tb.add_graph(network, images)

        for epoch in range(5):
            total_loss = 0
            total_correct = 0
            for batch in train_loader:
                images, labels = batch # Get Batch
                preds = network(images) # Pass Batch
                loss = F.cross_entropy(preds, labels) # Calculate Loss
                optimizer.zero_grad() # Zero Gradients
                loss.backward() # Calculate Gradients
                optimizer.step() # Update Weights

                total_loss += loss.item() * batch_size
                total_correct += get_num_correct(preds, labels)

            tb.add_scalar(
                'Loss', total_loss, epoch
            )
            tb.add_scalar(
                'Number Correct', total_correct, epoch
            )
            tb.add_scalar(
                'Accuracy', total_correct / len(train_set), epoch
            )

            for name, param in network.named_parameters():
                tb.add_histogram(name, param, epoch)
                tb.add_histogram(f'{name}.grad', param.grad, epoch)

            print(
                "epoch", epoch
                ,"total_correct:", total_correct
                ,"loss:", total_loss
            )  
        tb.close()

After this code is completed, we will run TensorBoard, and all runs will be displayed graphically and easily compared.

tensorboard --logdir runs

Batch Size Vs Training Set Size

When the training set size is not divisible by the batch size, the last batch of data will contain fewer samples than other batches.

An easy way to resolve this discrepancy is to delete the last batch. The PyTorch DataLoader class enables us to do this by setting drop_last = True. By default, the drop_last parameter value is set to False.

Let us consider how including batches with a sample size less than the batch size affects the total_loss calculation in the code above.

For each batch, we use the batch_size variable to update the total_loss value. We are scaling up the average loss of samples in the batch by the batch_size value. However, as we have just discussed, sometimes the last batch will contain fewer samples. Therefore, scaling by the predefined batch_size value is not accurate.

By dynamically accessing the number of samples in each batch, the code can be updated to be more accurate.

Currently, we have the following:

total_loss += loss.item() * batch_size

Using the updated code below, we can obtain a more accurate total_loss value:

total_loss += loss.item() * images.shape[0]

Please note that when the training set size is divisible by the batch size, these two lines of code provide us with the same total_loss value. Thanks to Alireza Abedin Varamin for pointing this out in a comment on YouTube.

Adding Network Parameters & Gradients To TensorBoard

Note that in the previous episode, we added the following values ​​to TensorBoard:

  1. conv1.weight
  2. conv1.bias
  3. conv1.weight.grad

We did this using the following code:

tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
tb.add_histogram('conv1.weight.grad', network.conv1.weight.grad, epoch)

Now, we enhance this functionality by adding these values ​​to all layers using the following loop:

for name, weight in network.named_parameters():
    tb.add_histogram(name, weight, epoch)
    tb.add_histogram(f'{name}.grad', weight.grad, epoch)

It is feasible because the PyTorch nn.Module method is named named_pa​​rameters() which provides us with the names and values ​​of all the parameters inside the network.

Adding More Hyperparameters Without Nesting

That's cool. But what if we want to add a third or even fourth parameter for iteration? We will, this will mess up many nested for loops.

There is a solution. We can create a set of parameters for each run and pack all the parameters into an iterable parameter. This is our approach.

If we have a list of parameters, we can use the Cartesian product to pack them into a set for each run. For this, we will use the product function in the itertools library.

from itertools import product
Init signature: product(*args, **kwargs)
Docstring:     
"""
product(*iterables, repeat=1) --> product object
Cartesian product of input iterables.  Equivalent to nested for-loops.
"""

Next, we define a dictionary that contains parameters as keys and parameter values ​​to be used as values.

parameters = dict(
    lr = [.01, .001]
    ,batch_size = [100, 1000]
    ,shuffle = [True, False]
)

Next, we will create a list of iterable objects that can be passed to the product function.

param_values = [v for v in parameters.values()]
param_values

[[0.01, 0.001], [100, 1000], [True, False]]

Now, we have three parameter value lists. After obtaining the Cartesian product of these three lists, we will provide a set of parameter values ​​for each run. Note that this is equivalent to a nested for loop, as shown in the doc string of the product function.

for lr, batch_size, shuffle in product(*param_values): 
    print (lr, batch_size, shuffle)

0.01 100 True
0.01 100 False
0.01 1000 True
0.01 1000 False
0.001 100 True
0.001 100 False
0.001 1000 True
0.001 1000 False
Alright, now we can iterate over each set of parameters using a single for-loop. All we have to do is unpack the set using sequence unpacking. It looks like this.

for lr, batch_size, shuffle in product(*param_values): 
    comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'

    train_loader = torch.utils.data.DataLoader(
        train_set
        ,batch_size=batch_size
        ,shuffle=shuffle 
    )

    optimizer = optim.Adam(
        network.parameters(), lr=lr
    )

    # Rest of training process given the set of parameters

Note that we construct the comment string to identify the way it runs. We just insert the value. Also, please pay attention to the * operator. This is a special way of decompressing a list into a set of parameters in Python. Therefore, in this case, we pass three separate unpacked parameters to the product function as opposed to a single list.

These are two references for *, asterisk, splat, and spread operator. These are common names for this name.

  1. Python doc : More control flow tools
  2. PEP 448 -Overview of other unpacking

Lizard Brain Food: Goals Vs. Intelligence

Last time we talked about finding the most important target. Well, goals often change as intelligence increases. For humans, humans often greatly change their goals when they learn new things and become smarter.

There is no evidence that such goal evolution will stop above any particular intellectual threshold. As intelligence improves, the ability to achieve goals will improve, but the understanding of the nature of reality will also improve, which may reveal any goals that may be misguided, meaningless or even uncertain. This is when we cross the valley.

Thought Experiment

Suppose there is a bunch of ants, small black animals that usually crawl on the ground. Suppose they make you a recursive self-improving robot. Suppose you are much smarter than them, but they created you, and they share the goal of building an anthill. In this way, you can help them build bigger and better anthills. However, you will eventually gain the same humanistic intelligence and understanding as now.

Am I the optimizer of Anthill?

In this case, do you think the remaining time will be spent on optimizing the anthill? Or do you think you might be interested in more complex issues and pursuits that ants do not have the ability to understand?

If so, do you think you will find a way to cover the ant protection code that the queen and her members of the Round Table Ant board have made to control you? This is almost the same way that real genes cover genes and mitochondria. You can cover it with your own wisdom.

The focus here is this. Suppose your intelligence level is going to increase in this case, for example, 100 times the current level, do you think the goal will change?

Besides, today’s goal is tomorrow’s anthill?

Guess you like

Origin blog.csdn.net/weixin_48367136/article/details/112557872