The second study notes

Over-fitting, underfitting and solutions;
over-fitting, underfitting concept of
weight decay
discard method
gradient disappears, gradient explosion;
examples with prices forecast reflects:
predictions about prices, define the prediction function

def train_and_pred(train_features, test_features, train_labels, test_data,
num_epochs, lr, weight_decay, batch_size):
net = get_net(train_features.shape[1])
train_ls, _ = train(net, train_features, train_labels, None, None,
num_epochs, lr, weight_decay, batch_size)
d2l.semilogy(range(1, num_epochs + 1), train_ls, ‘epochs’, ‘rmse’)
print(‘train rmse %f’ % train_ls[-1])
preds = net(test_features).detach().numpy()
test_data[‘SalePrice’] = pd.Series(preds.reshape(1, -1)[0])
submission = pd.concat([test_data[‘Id’], test_data[‘SalePrice’]], axis=1)
submission.to_csv(’./submission.csv’, index=False)
# Sample_submission_data = pd.read_csv ( "... / input / house-prices-advanced-regression-techniques / sample_submission.csv")
convolution neural network Advanced
LeNet:
performance on large data sets is not true as the best Face Italian .
1. The neural network computational complexity.
2. START yet deep research zoomed amount parameter initialization and comes in handy convex optimization algorithms and many other areas.
Machine learning feature extraction: feature extraction function manually defined
feature extraction neural network: characterized multilevel data obtained by learning, and the table displayed progressively more abstract concepts or patterns.
Limiting the development of neural network: data, hardware
Transformer
differences:
Transformer Blocks: re-circulating the seq2seq network model for Transformer Blocks Alternatively, the module comprises a long attention layer (Multi-head Attention Layers) and two position-wise feed- forward networks (FFN). For the decoder, the other long attention is hidden layer for receiving the encoder.
Add and norm: long focus and output layer feedforward network is supplied to two layers "add and norm" is processed, the layer structure and a layer comprising a residual normalized.
Position encoding: due to self-focus layer sequence does not distinguish between the elements, so that a position of the coding layer is used to add position information to the elements in sequence.
. . .
We are watching, not yet fully understand the
convolution neural network;

import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import sys
sys.path.append("/home/kesci/input/")
import d2lzh1981 as d2l
device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)

batch_norm DEF (is_training, the X-, Gamma, Beta, moving_mean, moving_var, eps, Momentum):
# determine the current mode is the training mode or prediction mode
IF not is_training:
# If you are in a prediction mode, direct use of the incoming moving average income the mean and variance
X_hat = (X-- moving_mean) / torch.sqrt (moving_var + EPS)
the else:
Assert len (X.shape) in (2,. 4)
IF len (X.shape) == 2:
# full mesh layer case, the mean and variance of the feature dimensions
mean = X.mean (Dim = 0)
var = ((X-- mean) ** 2) .mean (Dim = 0)
the else:
# using the two-dimensional convolution layer case, the calculation of channel dimension (axis = 1) mean and variance. Here we need to keep
the shape of the back in order to broadcast # X can do arithmetic
mean = X.mean (dim = 0, keepdim = True) .mean (dim = 2, keepdim = True) .mean (keepdim dim = 3, = True)
var = - ((X mean) ** 2) .mean (dim = 0, keepdim = True) .mean (dim = 2, keepdim = True) .mean (dim = 3, keepdim = True)
the training mode with # the current mean and variance do standardization
= X_hat (X-- Mean) / torch.sqrt (var + EPS)
# update the moving average of the mean and variance
moving_mean = Momentum moving_mean * + (1.0 - Momentum) * Mean
moving_var = Momentum moving_var * + (1.0 - Momentum) * var
Y = gamma * X_hat + beta # stretch and offset
return the Y, moving_mean, moving_var
the In [. 3]:
leNet
convolutional neural network Advanced
: not ended

Released two original articles · won praise 0 · Views 31

Guess you like

Origin blog.csdn.net/leo_lixinghao/article/details/104401637
Recommended