[Handwritten in the past] Analysis reference: Numpy handwritten multi-layer neural network

foreword

        Since the original job needs to implement too many functions, this article does not intend to explain the algorithm principle first, but directly paste the result code for your reference. The experiment is constructed according to the standards of this article:

Numpy-For-MNNhttp://t.csdn.cn/xtvYV


Analysis reference: Numpy handwritten multi-layer neural network

foreword

Provide finished code files

File acquisition:

File structure:

1. Preprocessed data 

preprocess.py

2. One-hot encoding

onehot.py

3. Core abstraction

core.py

4. Network layer

.layers

5. Activation function

.activations.py

☆6. Fill function

 model.py 

☆def batch_step() parsing:

☆ class SequentialModel in assgnment.py

7. Loss function

losses.py 

8. Optimization function

.optimizer.py

9. Accuracy index

.metrics.py 

10. Training and testing

def  get_simple_model() in assgnment.py

 get_advanced_model() in assgnment.py

11. Visualize the results

.visualize.py

12. Call the code written in the previous 11 steps to train and test the model

.assignment.py


Provide finished code files

File acquisition:

Link: https://pan.baidu.com/s/1Fw_7thL5PxR79zI6XbpnYQ 
Extraction code: txqe 

File structure:

| - hw2 

        | - code

                | - Rice

                        | - 8 .py files are used to implement the experimental requirement functions

                | - assignment.py

                |- preprocess.py

                | - visualize.py

        | - data

                | - mnist

                        | - Four dataset files

                | - Iris (can be ignored, not used in this experiment)

1. Preprocessed data 

        This file comes with the experiment, and the main functions are: read the mnist data set from the 4 .gz files in ../data/mnist/ for the Tran and Test training set and test set (2*2 = four).

preprocess.py

import gzip
import pickle
from unicodedata import numeric

import numpy as np

"""
TODO: 
Same as HW1. Feel free to copy and paste your old implementation here.
It's a good time to vectorize it, while you're at it!
No need to include CIFAR-specific methods.
"""

def get_data_MNIST(subset, data_path="../data", is_reshape=True):
    """
    :param subset: string indicating whether we want the training or testing data 
        (only accepted values are 'train' and 'test')
    :param data_path: directory containing the training and testing inputs and labels
    :return: NumPy array of inputs (float32) and labels (uint8)
    """
    ## http://yann.lecun.com/exdb/mnist/
    subset = subset.lower().strip()
    assert subset in ("test", "train"), f"unknown data subset {subset} requested"
    inputs_file_path, labels_file_path, num_examples = {
        "train": ("train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz", 60000),
        "test": ("t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz", 10000),
    }[subset]
    inputs_file_path = f"{data_path}/mnist/{inputs_file_path}"
    labels_file_path = f"{data_path}/mnist/{labels_file_path}"

    ## TODO: read the image file and normalize, flatten, and type-convert image
    with open(inputs_file_path, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
        buf = bytestream.read(num_examples*28*28 + 16)
        dt = np.dtype(np.uint8)
        temp = np.frombuffer(buf, dtype=dt) 
        image = temp[16:]
        if is_reshape:
            image = image.reshape((num_examples,28*28))
        else:
            image = image.reshape((num_examples, 28, 28, 1))
        image = image/255.0
    print(image.shape)

    ## TODO: read the label file
    with open(labels_file_path, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
        buf = bytestream.read(num_examples + 8)
        dt = np.dtype(np.uint8)
        temp = np.frombuffer(buf, dtype=dt) 
        label = temp[8:]

    return image, label
    
## THE REST ARE OPTIONAL!

'''
def shuffle_data(image_full, label_full, seed):
    
    pass
    
def get_subset(image_full, label_full, class_list=list(range(10)), num=100):
    pass
'''

2. One-hot encoding

        This file is used to implement one-hot encoding. The places that need to be handwritten are as follows:

● fit(): [TODO] In this function, you take in Data (store it in self.uniq ) and create a dictionary with labels as keys and their corresponding one-hot encodings as values. Hint: you might want to check np.eye() for one-hot encoding. Eventually, you will store it in the self.uniq2oh dictionary.

forward(): In this function, we pass a vector containing all the actual labeled training sets in the object and call fit() to fill the uniq2oh dictionary with unique labels and their corresponding one-hot encodings, then use it to return a An array of one-hot encoded labels for each label in the training set.

This function is already filled out for you!

● inverse(): In the function, we invert the one-hot encoding to the actual encoding label.

This has already been done for you.

For example, if we have labels X and Y, which are one-hot encoded as [1,0] and [0,1], we would {X:[1,0], Y:[0,1]}.

For MNIST, you will have 10 labels, so your dictionary should have 10 entries!

onehot.py

import numpy as np

from .core import Callable


class OneHotEncoder(Callable):
    """
    One-Hot Encodes labels. First takes in a candidate set to figure out what elements it
    needs to consider, and then one-hot encodes subsequent input datasets in the
    forward pass.

    SIMPLIFICATIONS:
     - Implementation assumes that entries are individual elements.
     - Forward will call fit if it hasn't been done yet; most implementations will just error.
     - keras does not have OneHotEncoder; has LabelEncoder, CategoricalEncoder, and to_categorical()
    """

    def fit(self, data):
        """
        Fits the one-hot encoder to a candidate dataset. Said dataset should contain
        all encounterable elements.

        :param data: 1D array containing labels.
            For example, data = [0, 1, 3, 3, 1, 9, ...]
        """
        ## TODO: Fetch all the unique labels and create a dictionary with
        ## the unique labels as keys and their one hot encodings as values
        ## HINT: look up np.eye() and see if you can utilize it!

        ## HINT: Wouldn't it be nice if we just gave you the implementation somewhere...

        self.uniq = np.unique(data)  # all the unique labels from `data`
        self.uniq2oh = {}  # a lookup dictionary with labels and corresponding encodings
        eye = np.eye(len(self.uniq))
        for i in range(len(self.uniq)):
            self.uniq2oh[self.uniq[i]] = eye[i]
        

    def forward(self, data):
        if not hasattr(self, "uniq2oh"):
            self.fit(data)
        return np.array([self.uniq2oh[x] for x in data])

    def inverse(self, data):
        assert hasattr(self, "uniq"), \
            "forward() or fit() must be called before attempting to invert"
        return np.array([self.uniq[x == 1][0] for x in data])

3. Core abstraction

        This file is the given code for the experiment, no need to modify it.

core.py

from abc import ABC, abstractmethod  # # For abstract method support
from typing import Tuple

import numpy as np


## DO NOT MODIFY THIS CLASS
class Callable(ABC):
    """
    Callable Sub-classes:
     - CategoricalAccuracy (./metrics.py)       - TODO
     - OneHotEncoder       (./preprocess.py)    - TODO
     - Diffable            (.)                  - DONE
    """

    def __call__(self, *args, **kwargs) -> np.array:
        """Lets `self()` and `self.forward()` be the same"""
        return self.forward(*args, **kwargs)

    @abstractmethod
    def forward(self, *args, **kwargs) -> np.array:
        """Pass inputs through function. Can store inputs and outputs as instance variables"""
        pass


## DO NOT MODIFY THIS CLASS
class Diffable(Callable):
    """
    Diffable Sub-classes:
     - Dense            (./layers.py)           - TODO
     - LeakyReLU, ReLU  (./activations.py)      - TODO
     - Softmax          (./activations.py)      - TODO
     - MeanSquaredError (./losses.py)           - TODO
    """

    """Stores whether the operation being used is inside a gradient tape scope"""
    gradient_tape = None  ## All-instance-shared variable

    def __init__(self):
        """Is the layer trainable"""
        super().__init__()
        self.trainable = True  ## self-only instance variable

    def __call__(self, *args, **kwargs) -> np.array:
        """
        If there is a gradient tape scope in effect, perform AND RECORD the operation.
        Otherwise... just perform the operation and don't let the gradient tape know.
        """
        if Diffable.gradient_tape is not None:
            Diffable.gradient_tape.operations += [self]
        return self.forward(*args, **kwargs)

    @abstractmethod
    def input_gradients(self: np.array) -> np.array:
        """Returns gradient for input (this part gets specified for all diffables)"""
        pass

    def weight_gradients(self: np.array) -> Tuple[np.array, np.array]:
        """Returns gradient for weights (this part gets specified for SOME diffables)"""
        return ()

    def compose_to_input(self, J: np.array) -> np.array:
        """
        Compose the inputted cumulative jacobian with the input jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `input_gradients` to provide either batched or overall jacobian.
        Assumes input/cumulative jacobians are matrix multiplied
        """
        #  print(f"Composing to input in {self.__class__.__name__}")
        ig = self.input_gradients()
        batch_size = J.shape[0]
        n_out, n_in = ig.shape[-2:]
        j_new = np.zeros((batch_size, n_out), dtype=ig.dtype)
        for b in range(batch_size):
            ig_b = ig[b] if len(ig.shape) == 3 else ig
            j_new[b] = ig_b @ J[b]
        return j_new

    def compose_to_weight(self, J: np.array) -> list:
        """
        Compose the inputted cumulative jacobian with the weight jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `weight_gradients` to provide either batched or overall jacobian.
        Assumes weight/cumulative jacobians are element-wise multiplied (w/ broadcasting)
        and the resulting per-batch statistics are averaged together for avg per-param gradient.
        """
        # print(f'Composing to weight in {self.__class__.__name__}')
        assert hasattr(
            self, "weights"
        ), f"Layer {self.__class__.__name__} cannot compose along weight path"
        J_out = []
        ## For every weight/weight-gradient pair...
        for w, wg in zip(self.weights, self.weight_gradients()):
            batch_size = J.shape[0]
            ## Make a cumulative jacobian which will contribute to the final jacobian
            j_new = np.zeros((batch_size, *w.shape), dtype=wg.dtype)
            ## For every element in the batch (for a single batch-level gradient updates)
            for b in range(batch_size):
                ## If the weight gradient is a batch of transform matrices, get the right entry.
                ## Allows gradient methods to give either batched or non-batched matrices
                wg_b = wg[b] if len(wg.shape) == 3 else wg
                ## Update the batch's Jacobian update contribution
                j_new[b] = wg_b * J[b]
            ## The final jacobian for this weight is the average gradient update for the batch
            J_out += [np.mean(j_new, axis=0)]
        ## After new jacobian is computed for each weight set, return the list of gradient updatates
        return J_out


class GradientTape:

    def __init__(self):
        ## Log of operations that were performed inside tape scope
        self.operations = []

    def __enter__(self):
        # When tape scope is entered, let Diffable start recording to self.operation
        Diffable.gradient_tape = self
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        # When tape scope is exited, stop letting Diffable record
        Diffable.gradient_tape = None

    def gradient(self) -> list:
        """Get the gradient from first to last recorded operation"""
        ## TODO:
        ##
        ##  Compute weight gradients for all operations.
        ##  If the model has trainable weights [w1, b1, w2, b2] and ends at a loss L.
        ##  the model should return: [dL/dw1, dL/db1, dL/dw2, dL/db2]
        ##
        ##  Recall that self.operations is populated by Diffable class instances...
        ##
        ##  Start from the last operation and compute jacobian w.r.t input.
        ##  Continue to propagate the cumulative jacobian through the layer inputs
        ##  until all operations have been differentiated through.
        ##
        ##  If an operation that has weights is encountered along the way,
        ##  compute the weight gradients and add them to the return list.
        ##  Remember to check if the layer is trainable before doing this though...

        grads = []
        return grads

4. Network layer

        This layer imitates Dense in Keras and requires handwritten functions as:

● forward() : [TODO] Implement forward pass and return output.

● weight_gradients() : [TODO] Calculates the gradient weights and biases on . This will be used to optimize the layer.

● input_gradients() : [TODO] Calculates the input of the gradient layer with respect to . This will be used to propagate the gradient to the previous layers.

● _initialize_weight() : [TODO] 

Initializing weight values ​​for dense layers By default, all weights are initialized to zero (which is generally a bad idea, by the way). You also need to allow more complex options (when initializers are set to normal, xavier and kaiing). Follow Keras' math assumptions!

〇Normal: Self-explanatory, unit normal distribution.

○Xavier Normal: Based on keras.GlorotNormal.

〇Kaiing He Normal: Based on Keras.HeNormal.

When implementing these, you may find np.random.normal helpful. The action plan explains why these different initialization methods are necessary, but for more details, check out this site! Feel free to add more initializer options!

.layers

import numpy as np

from .core import Diffable


class Dense(Diffable):

    # https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79

    def __init__(self, input_size, output_size, learning_rate=0.01, initializer="kaiming"):
        super().__init__()
        self.w, self.b = self.__class__._initialize_weight(
            initializer, input_size, output_size
        )
        self.weights = [self.w, self.b]
        self.learning_rate = learning_rate
        self.inputs  = None
        self.outputs = None

    def forward(self, inputs):
        """Forward pass for a dense layer! Refer to lecture slides for how this is computed."""
        self.inputs = inputs

        # TODO: implement the forward pass and return the outputs
        self.outputs = np.matmul(inputs, self.w) + self.b
        return self.outputs

    def weight_gradients(self, eta):
        """Calculating the gradients wrt weights and biases!"""
        # TODO: Implement calculation of gradients
        wgrads = np.dot(self.inputs.T, eta)
        bgrads = np.sum(eta, axis=0)
        return wgrads, bgrads

    def input_gradients(self, eta):
        """Calculating the gradients wrt inputs!"""
        # TODO: Implement calculation of gradients
        inputgrads = np.dot(eta, self.w.T)
        wgrads, bgrads = self.weight_gradients(eta)
        self.w = self.w - self.learning_rate*wgrads
        self.b = self.b - self.learning_rate*bgrads
        return inputgrads

    @staticmethod
    def _initialize_weight(initializer, input_size, output_size):
        """
        Initializes the values of the weights and biases. The bias weights should always start at zero.
        However, the weights should follow the given distribution defined by the initializer parameter
        (zero, normal, xavier, or kaiming). You can do this with an if statement
        cycling through each option!

        Details on each weight initialization option:
            - Zero: Weights and biases contain only 0's. Generally a bad idea since the gradient update
            will be the same for each weight so all weights will have the same values.
            - Normal: Weights are initialized according to a normal distribution.
            - Xavier: Goal is to initialize the weights so that the variance of the activations are the
            same across every layer. This helps to prevent exploding or vanishing gradients. Typically
            works better for layers with tanh or sigmoid activation.
            - Kaiming: Similar purpose as Xavier initialization. Typically works better for layers
            with ReLU activation.
        """
        initializer = initializer.lower()
        assert initializer in (
            "zero",
            "normal",
            "xavier",
            "kaiming",
        ), f"Unknown dense weight initialization strategy '{initializer}' requested"
        io_size = (input_size, output_size)

        # TODO: Implement default assumption: zero-init for weights and bias
        initial_b = np.zeros((1,output_size))
        if initializer=="zero":
            initial_w = np.zeros(io_size)
        # TODO: Implement remaining options (normal, xavier, kaiming initializations). Note that
        # strings must be exactly as written in the assert above
        elif initializer=="normal":
            initial_w = np.random.randn(input_size, output_size)
            
        elif initializer=="xavier":
            initial_w = np.random.randn(input_size, output_size) * np.sqrt(1 / output_size)
        
        elif initializer=="kaiming":
            initial_w = np.random.randn(input_size, output_size) * np.sqrt(2 / output_size)

        return initial_w, initial_b

5. Activation function

        This file is used to implement the LeakRelu activation function and SoftMax activation function, and handwritten their forward propagation [def forward] and back propagation [def input_fradients]:

● LeakyReLU()

        〇forward() : [TODO] Given input x, compute and return LeakyReLU(x).

        〇input_gradients() : [TODO] Calculates and returns the input obtained by derivation of LeakyReLU.

● Softmax():(2470 ONLY)

        〇forward(): [TODO] Given input x, compute and return Softmax(x). Make sure you're using a stable softmax, i.e. subtracting the max of all terms to prevent overflow/undvim erflow issues.

        〇input_gradients(): [TODO] Partial wrt input of Softmax().

.activations.py

import numpy as np

from .core import Diffable


class LeakyReLU(Diffable):
    def __init__(self, alpha=0.3):
        super().__init__()
        self.alpha = alpha
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        # TODO: Given an input array `x`, compute LeakyReLU(x)
        self.inputs = inputs
        # Your code here:
        self.outputs = inputs if inputs.all()>=0 else inputs*self.alpha
        return self.outputs

    def input_gradients(self, eta):
        # TODO: Compute and return the gradients
        eta[self.inputs<=0] = 0
        return eta

    def compose_to_input(self, J):
        # TODO: Maybe you'll want to override the default?
        return super().compose_to_input(J)


class ReLU(LeakyReLU):
    def __init__(self):
        super().__init__(alpha=0)


class Softmax(Diffable):
    def __init__(self):
        super().__init__()
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        """Softmax forward pass!"""
        # TODO: Implement
        # HINT: Use stable softmax, which subtracts maximum from
        # all entries to prevent overflow/underflow issues
        self.inputs = inputs
        # Your code here:
        z = inputs - np.max(inputs, axis=-1,keepdims=True)
        numerator = np.exp(z)
        denominator = np.sum(numerator)
        self.outputs = numerator/denominator
        return self.outputs

    def input_gradients(self, etc):
        """Softmax backprop!"""
        # TODO: Compute and return the gradients
        
        return etc

☆6. Fill function

        This article is used to handwrite the sequence model SequentialModel class in Keras. SequentialModel inherits the Model class, so we first implement the Model class as follows:

● compile() : Initializes the model optimizer, loss function and precision function, which are input as parameters for the SequentialModel instance to use.

● fit() : Train the model to associate the input and output. The training is repeated for each epoch, and the data is batched based on parameters. It also computes Batch_metrics, epoch_metrics and aggregated agg_metrics which can be used to track the training progress of the model.

● evaluate() : [TODO] Evaluate the performance of the final model using the metrics mentioned in the test phase. It almost matches the () function; think about what happens between training and testing).

● call() : [TODO] Tip: What does it mean to call a sequential model? Remember that a sequential model is a stack of layers, and each layer has only one input vector and one output vector. You can do this in the SequentialModel class in assignment.py.

● batch_step() : [TODO] You will see fit() calling this function for each batch. You will first compute the model predictions for the input batch. During the training phase, you need to calculate gradients and update your weights according to the optimizer you are using. For backpropagation during training, you will use GradientTape from the core abstraction (core.py) to record operations and intermediate values. Then you'll use the model's optimizer to apply gradients to the model's trainable variables. Finally, calculate and return the loss and accuracy for that batch. You can do this in the SequentialModel class in assignment.py.

 model.py 

from abc import ABC, abstractmethod
from collections import defaultdict

import numpy as np

from .core import Diffable


def print_stats(stat_dict, b=None, b_num=None, e=None, avg=False):
    """
    Given a dictionary of names statistics and batch/epoch info,
    print them in an appealing manner. If avg, display stat averages.
    """
    title_str = " - "
    if e is not None:
        title_str += f"Epoch {e+1:2}: "
    if b is not None:
        title_str += f"Batch {b+1:3}"
        if b_num is not None:
            title_str += f"/{b_num}"
    if avg:
        title_str += f"Average Stats"
    print(f"\r{title_str} : ", end="")
    op = np.mean if avg else lambda x: x
    print({k: np.round(op(v), 4) for k, v in stat_dict.items()}, end="")
    print("   ", end="" if not avg else "\n")
    

def update_metric_dict(super_dict, sub_dict):
    """
    Appends the average of the sub_dict metrics to the super_dict's metric list
    """
    for k, v in sub_dict.items():
        super_dict[k] += [np.mean(v)]


class Model(ABC):
    ###############################################################################################
    ## BEGIN GIVEN

    def __init__(self, layers):
        """
        Initialize all trainable parameters and take layers as inputs
        """
        # Initialize all trainable parameters
        assert all([issubclass(layer.__class__, Diffable) for layer in layers])
        self.layers = layers[:-1]
        self.trainable_variables = []
        for layer in layers:
            if hasattr(layer, "weights") and layer.trainable:
                for weight in layer.weights:
                    self.trainable_variables += [weight]

    def compile(self, optimizer, loss_fn, acc_fn):
        """
        "Compile" the model by taking in the optimizers, loss, and accuracy functions.
        In more optimized DL implementations, this will have more involved processes
        that make the components extremely efficient but very inflexible.
        """
        self.optimizer = optimizer
        self.compiled_loss = loss_fn
        self.compiled_acc = acc_fn

    def fit(self, x, y, epochs, batch_size):
        """
        Trains the model by iterating over the input dataset and feeding input batches
        into the batch_step method with training. At the end, the metrics are returned.
        """
        agg_metrics = defaultdict(lambda: [])
        batch_num = x.shape[0] // batch_size
        for e in range(epochs):
            epoch_metrics = defaultdict(lambda: [])
            for b, b1 in enumerate(range(batch_size, x.shape[0] + 1, batch_size)):
                b0 = b1 - batch_size
                batch_metrics = self.batch_step(x[b0:b1], y[b0:b1], training=True)
                update_metric_dict(epoch_metrics, batch_metrics)
                print_stats(batch_metrics, b, batch_num, e)
            update_metric_dict(agg_metrics, epoch_metrics)
            print_stats(epoch_metrics, e=e, avg=True)
        return agg_metrics

    def evaluate(self, x, y, batch_size):
        """
        X is the dataset inputs, Y is the dataset labels.
        Evaluates the model by iterating over the input dataset in batches and feeding input batches
        into the batch_step method. At the end, the metrics are returned. Should be called on
        the testing set to evaluate accuracy of the model using the metrics output from the fit method.

        NOTE: This method is almost identical to fit (think about how training and testing differ --
        the core logic should be the same)
        """
        # TODO: Implement evaluate similarly to fit.
        agg_metrics = defaultdict(lambda: [])
        batch_num = x.shape[0] // batch_size
        for e in range(1):
            epoch_metrics = defaultdict(lambda: [])
            for b, b1 in enumerate(range(batch_size, x.shape[0] + 1, batch_size)):
                b0 = b1 - batch_size
                batch_metrics = self.batch_step(x[b0:b1], y[b0:b1], training=False)
                update_metric_dict(epoch_metrics, batch_metrics)
                print_stats(batch_metrics, b, batch_num, e)
            update_metric_dict(agg_metrics, epoch_metrics)
            print_stats(epoch_metrics, e=e, avg=True)
        
        return agg_metrics

    @abstractmethod
    def call(self, inputs):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

    @abstractmethod
    def batch_step(self, x, y, training=True):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

☆def batch_step() parsing:

 y_pre = self.call(x)

: The predicted value after one network propagation is obtained through forward propagation,

 loss = self.compiled_loss.forward(y_pre, y)

: Put the predicted value and the real value into the loss function to get the loss value through forward propagation .

acc = self.compiled_acc(y_pre, y)

: Put the predicted value and the real value into the precision function to obtain the precision value through forward propagation .

The meaning of backpropagation of each function:

Activation function : After the input of the upper layer of the neural network is transformed by the nonlinear transformation of the neural network layer, the output is obtained through the activation function. Common activation functions include: sigmoid, tanh, relu, etc.

Loss function : A way of measuring the difference between the predicted value of the output of a neural network and the actual value. Common loss functions include: least squares loss function, cross entropy loss function, smooth L1 loss function used in regression, etc.

Optimization function : that is, how to pass the loss value from the outermost layer of the neural network to the front. Such as the most basic gradient descent algorithm, stochastic gradient descent algorithm, batch gradient descent algorithm, gradient descent algorithm with momentum, Adagrad, Adadelta, Adam, etc.

loss function

eta = self.compiled_loss.input_gradients()

: The gradient is obtained by back-propagation of the loss function .

activation function
 

for layer in self.layers[::-1]:

        eta = layer.input_gradients(eta)

: Backpropagates the gradients to each network layer.

optimization function

 if training:

            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])

: Put the weights and bias updated after forward propagation once and back propagation once into the optimizer, and pass the loss value from the outermost layer of the neural network to the front.

☆ class SequentialModel in assgnment.py

class SequentialModel(Beras.Model):
    """
    Implemented in Beras/model.py

    def __init__(self, layers):
    def compile(self, optimizer, loss_fn, acc_fn):
    def fit(self, x, y, epochs, batch_size):
    def evaluate(self, x, y, batch_size):           ## <- TODO
    """

    def call(self, inputs):
        """
        Forward pass in sequential model. It's helpful to note that layers are initialized in Beras.Model, and
        you can refer to them with self.layers. You can call a layer by doing var = layer(input).
        """
        # TODO: The call function!
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

    def batch_step(self, x, y, training=True):
        """
        Computes loss and accuracy for a batch. This step consists of both a forward and backward pass.
        If training=false, don't apply gradients to update the model! 
        Most of this method (forward, loss, applying gradients)
        will take place within the scope of Beras.GradientTape()
        """
        # TODO: Compute loss and accuracy for a batch.
        # If training, then also update the gradients according to the optimizer
        y_pre = self.call(x)
        loss = self.compiled_loss.forward(y_pre, y)
        acc = self.compiled_acc(y_pre, y)

        eta = self.compiled_loss.input_gradients()
        # backwarding...
        for layer in self.layers[::-1]:
            #print(type(layer))
            eta = layer.input_gradients(eta)

        if training:
            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])
        return {"loss": loss, "acc": acc}

7. Loss function

        This is one of the most critical aspects of model training. In this assignment, instead of implementing the MSE or mean squared error loss function as described in the experiment, I chose the CrossEntropyLoss loss function. Because after experiments, the effect of the other two loss functions is not satisfactory.

Note: Generally, the back propagation of SoftMax is performed together with the CrossEntropyLoss loss function, so do not fill in the direction propagation part of SoftMax.

● forward() : [TODO] Write a function that computes and returns the mean given the squared error of the predicted and actual labels.

Hint: What is MSE? Mean squared error is the difference between the predicted value and the actual value, given the predicted and actual labels.

● input_gradients() : [TODO] Calculate and return gradients. Use a formula that derives these gradients by differentiation.

losses.py 

import numpy as np
from .core import Diffable
from abc import ABCMeta, abstractmethod
import numpy as np

class CrossEntropyLoss(Diffable):
    def __init__(self):

        self.classifier = Softmax()

    def input_gradients(self):
        return self.grad

    def forward(self, a, y):
        a = self.classifier.forward(a)
        self.grad = a - y
        loss = -1 * np.einsum('ij,ij->', y, np.log(a), optimize=True) / y.shape[0]
        return loss

class Layer(metaclass=ABCMeta):

    @abstractmethod
    def forward(self, *args):
        pass

    @abstractmethod
    def backward(self, *args):
        pass
    
class Softmax(Layer):
    def forward(self, x):
        v = np.exp(x - x.max(axis=-1, keepdims=True))    
        return v / v.sum(axis=-1, keepdims=True)
    
    def backward(self, eta):
        pass

8. Optimization function

        For the Mnist dataset, just RMSProp: is completely enough, so this article only implements this optimization function.

● RMSProp :  [TODO]  Root mean square of error propagation.

.optimizer.py

from collections import defaultdict
import numpy as np

class RMSProp:
    def __init__(self, learning_rate, beta=0.9, epsilon=1e-6):
        self.learning_rate = learning_rate

        self.beta = beta
        self.epsilon = epsilon

        self.v = defaultdict(lambda: 0)

    def apply_gradients(self, weights, grads):
        # TODO: Implement RMSProp optimization
        # Refer to the lab on Optimizers for a better understanding!
        self.mean_square = self.v['mean_square']
        self.mean_square = self.beta*self.mean_square + (1-self.beta)*(grads)**2
        self.v['mean_square'] = self.mean_square
        weights = weights - self.learning_rate/(np.sqrt(self.mean_square) + self.epsilon)*grads
        return 

9. Accuracy index

        This document simply implements an accuracy model for measuring model accuracy :

● forward() :  [TODO]  Returns the model's classification accuracy predicted probabilities and true labels. You should return a proportional predicted label equal to the true label, where the predicted label for the image is the label corresponding to the highest probability. Refer to the web or lecture slides for classification accuracy math!

.metrics.py 

import numpy as np

from .core import Callable


class CategoricalAccuracy(Callable):
    def forward(self, probs, labels):
        """Categorical accuracy forward pass!"""
        super().__init__()
        # TODO: Compute and return the categorical accuracy of your model given the output probabilities and true labels
        probsArg = np.argmax(probs, axis=1)
        labelsArg = np.argmax(labels, axis=1)
        
        return sum(probsArg==labelsArg)/len(labels)
         

10. Training and testing

        Two models are built, imitating Keras:

● A simple model in get_simple_model() with at most one diffusion layer (eg: density - ./layers.py ) and one activation function (in /activation.py ). While it is possible to do so, this option is given to you by default. You can change it if you want. An automatic grader will evaluate the original one!

● A slightly more complex model in get_advanced_model(), with two or more diffusion layers and two or more activation functions. We recommend using Adam's optimizer for this model with a fairly low learning rate.

def  get_simple_model() in assgnment.py

def get_simple_model_components():
    """
    Returns a simple single-layer model.
    """
    ## DO NOT CHANGE IN FINAL SUBMISSION

    from Beras.activations import Softmax
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy

    # TODO: create a model and compile it with layers and functions of your choice
    model = SequentialModel([Dense(784, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=10, batch_size=100)

 get_advanced_model() in assgnment.py

def get_advanced_model_components():
    from Beras.activations import Softmax, LeakyReLU
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.batchnorm import BatchNorm
    """
    Returns a multi-layered model with more involved components.
    """
    # TODO: create/compile a model with layers and functions of your choice.
    model = SequentialModel([Dense(784, 398), BatchNorm(398), LeakyReLU(0), Dense(398, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=12, batch_size=100)

11. Visualize the results

   We provide you with the visualize_metrics method to visualize your loss and accuracy changing each time you use matplotlib.

.visualize.py

import matplotlib.pyplot as plt
import numpy as np


def visualize_metrics(losses=[], accuracies=[]):
    """
    param losses: a 1D array of loss values
    param accuracies: a 1D array of accuracy values

    Displays a plot with loss and accuracy values on the y-axis and batch number/epoch number on the
    x-axis
    """
    if not losses or not accuracies:
        return print("Must provide a list of losses/accuracies to visualize")
    x = np.arange(1, max(len(losses), len(accuracies)) + 1)
    plt.plot(x, losses)
    plt.plot(x, accuracies)
    plt.ylabel("Loss/Acc Value")
    plt.show()


def visualize_images(model, train_inputs, train_labels_ohe, num_searching=500):
    """
    param model: a neural network model (i.e. SequentialModel)
    param train_inputs: sample training inputs for the model to predict
    param train_labels_ohe: one-hot encoded training labels corresponding to train_inputs

    Displays 10 sample outputs the model correctly classifies and 10 sample outputs the model
    incorrectly classifies
    """

    rand_idx = np.random.choice(len(train_inputs), num_searching)
    rand_batch = train_inputs[rand_idx]
    probs = model.call(rand_batch)

    pred_classes = np.argmax(probs, axis=1)
    true_classes = np.argmax(train_labels_ohe[rand_idx], axis=1)

    right_idx = np.where(pred_classes == true_classes)
    wrong_idx = np.where(pred_classes != true_classes)

    right = np.reshape(rand_batch[right_idx], (-1, 28, 28))
    wrong = np.reshape(rand_batch[wrong_idx], (-1, 28, 28))

    right_pred_labels = true_classes[right_idx]
    wrong_pred_labels = pred_classes[wrong_idx]

    assert len(right) >= 10, f"Found less than 10 correct predictions!"
    assert len(wrong) >= 10, f"Found less than 10 correct predictions!"

    fig, axs = plt.subplots(2, 10)
    fig.suptitle("Classigications\n(PL = Predicted Label)")

    subsets = [right, wrong]
    pred_labs = [right_pred_labels, wrong_pred_labels]

    for r in range(2):
        for c in range(10):
            axs[r, c].imshow(subsets[r][c], cmap="Greys")
            axs[r, c].set(title=f"PL: {pred_labs[r][c]}")
            plt.setp(axs[r, c].get_xticklabels(), visible=False)
            plt.setp(axs[r, c].get_yticklabels(), visible=False)
            axs[r, c].tick_params(axis="both", which="both", length=0)

    plt.show()

12. Call the code written in the previous 11 steps to train and test the model

.assignment.py

from types import SimpleNamespace

import Beras
import numpy as np

class SequentialModel(Beras.Model):
    """
    Implemented in Beras/model.py

    def __init__(self, layers):
    def compile(self, optimizer, loss_fn, acc_fn):
    def fit(self, x, y, epochs, batch_size):
    def evaluate(self, x, y, batch_size):           ## <- TODO
    """

    def call(self, inputs):
        """
        Forward pass in sequential model. It's helpful to note that layers are initialized in Beras.Model, and
        you can refer to them with self.layers. You can call a layer by doing var = layer(input).
        """
        # TODO: The call function!
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

    def batch_step(self, x, y, training=True):
        """
        Computes loss and accuracy for a batch. This step consists of both a forward and backward pass.
        If training=false, don't apply gradients to update the model! 
        Most of this method (forward, loss, applying gradients)
        will take place within the scope of Beras.GradientTape()
        """
        # TODO: Compute loss and accuracy for a batch.
        # If training, then also update the gradients according to the optimizer
        y_pre = self.call(x)
        loss = self.compiled_loss.forward(y_pre, y)
        acc = self.compiled_acc(y_pre, y)

        eta = self.compiled_loss.input_gradients()
        # backwarding...
        for layer in self.layers[::-1]:
            #print(type(layer))
            eta = layer.input_gradients(eta)

        if training:
            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])
        return {"loss": loss, "acc": acc}

def get_simple_model_components():
    """
    Returns a simple single-layer model.
    """
    ## DO NOT CHANGE IN FINAL SUBMISSION

    from Beras.activations import Softmax
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy

    # TODO: create a model and compile it with layers and functions of your choice
    model = SequentialModel([Dense(784, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=10, batch_size=100)

def get_advanced_model_components():
    from Beras.activations import Softmax, LeakyReLU
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.batchnorm import BatchNorm
    """
    Returns a multi-layered model with more involved components.
    """
    # TODO: create/compile a model with layers and functions of your choice.
    model = SequentialModel([Dense(784, 398), BatchNorm(398), LeakyReLU(0), Dense(398, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=12, batch_size=100)

if __name__ == "__main__":
    """
    Read in MNIST data and initialize/train/test your model.
    """
    from Beras.onehot import OneHotEncoder
    import preprocess

    ## Read in MNIST data,
    train_inputs, train_labels = preprocess.get_data_MNIST("train", "../data")
    test_inputs,  test_labels  = preprocess.get_data_MNIST("test",  "../data")

    ## TODO: Use the OneHotEncoder class to one hot encode the labels
    # ohe = lambda x: 0  ## placeholder function: returns zero for a given input
    ohe = OneHotEncoder()
    ohe.fit(train_labels)
    ## Get your model to train and test
    simple = False
    args = get_simple_model_components() if simple else get_advanced_model_components()
    model = args.model

    ## REMINDER: Threshold of accuracy: 
    ##  1470: >85% on testing accuracy from get_simple_model_components
    ##  2470: >95% on testing accuracy from get_advanced_model_components

    # TODO: Fit your model to the training input and the one hot encoded labels
    # Remember to pass all the arguments that SequentialModel.fit() requires
    # such as number of epochs and the batch size
    print('---------------------------[[[Train]]]]---------------------------')
    train_agg_metrics = model.fit(
        train_inputs, 
        ohe(train_labels), 
        epochs     = args.epochs, 
        batch_size = args.batch_size
    )
    print('-------------------------------------------------------------------')
    ## Feel free to use the visualize_metrics function to view your accuracy and loss.
    ## The final accuracy returned during evaluation must be > 80%.

    # from visualize import visualize_images, visualize_metrics
    # visualize_metrics(train_agg_metrics["loss"], train_agg_metrics["acc"])
    # visualize_images(model, train_inputs, ohe(tr  ain_labels))

    ## TODO: Evaluate your model using your testing inputs and one hot encoded labels.
    ## This is the number you will be using!
    print('---------------------------[[[Evaluate]]]---------------------------')
    test_agg_metrics = model.evaluate(test_inputs, ohe(test_labels), batch_size=100)
    print('Testing Performance:', test_agg_metrics)
    print('-----------------------------------------------------------------')

I consider myself to be a barely qualified (not good enough) homework, and the answers provided are for reference only. I wish you all a good time! 

Guess you like

Origin blog.csdn.net/qq_51831335/article/details/127356284