prefacio

Dado que el trabajo original necesita implementar demasiadas funciones, este artículo no pretende explicar primero el principio del algoritmo, sino pegar directamente el código de resultado para su referencia. El experimento se construye de acuerdo con los estándares de este artículo:

Numpy para MNN http://t.csdn.cn/xtvYV

Referencia de análisis: Numpy red neuronal multicapa escrita a mano

prefacio

Proporcionar archivos de código terminados

Adquisición de archivos:

Estructura del archivo:

1. Datos preprocesados

preproceso.py

2. Codificación en caliente

onehot.py

3. Abstracción central

core.py

4. Capa de red

.capas

5. Función de activación

.activaciones.py

☆ 6. Función de relleno

modelo.py

☆def análisis por lotes_paso():

☆ clase SequentialModel en assgnment.py

7. Función de pérdida

pérdidas.py

8. Función de optimización

.optimizador.py

9. Índice de precisión

.métricas.py

10. Entrenamiento y pruebas

def get_simple_model() en asignación.py

get_advanced_model() en asignación.py

11. Visualiza los resultados

.visualizar.py

12. Llame al código escrito en los 11 pasos anteriores para entrenar y probar el modelo

.asignación.py

Proporcionar archivos de código terminados

Adquisición de archivos:

Enlace: https://pan.baidu.com/s/1Fw_7thL5PxR79zI6XbpnYQ
Código de extracción: txqe

Estructura del archivo:

| - hw2

| - código

| - Arroz

| - 8 archivos .py se utilizan para implementar las funciones de requisitos experimentales

|-asignación.py

| - preproceso.py

| - visualizar.py

| - datos

| - mnista

| - Cuatro archivos de conjuntos de datos

| - Iris (se puede ignorar, no se usa en este experimento)

1. Datos preprocesados

Este archivo viene con el experimento, y las funciones principales son: leer el conjunto de datos mnist de los 4 archivos .gz en ../data/mnist/ para el conjunto de entrenamiento y prueba Tran and Test (2*2 = cuatro).

preproceso.py

import gzip
import pickle
from unicodedata import numeric

import numpy as np

"""
TODO: 
Same as HW1. Feel free to copy and paste your old implementation here.
It's a good time to vectorize it, while you're at it!
No need to include CIFAR-specific methods.
"""

def get_data_MNIST(subset, data_path="../data", is_reshape=True):
    """
    :param subset: string indicating whether we want the training or testing data 
        (only accepted values are 'train' and 'test')
    :param data_path: directory containing the training and testing inputs and labels
    :return: NumPy array of inputs (float32) and labels (uint8)
    """
    ## http://yann.lecun.com/exdb/mnist/
    subset = subset.lower().strip()
    assert subset in ("test", "train"), f"unknown data subset {subset} requested"
    inputs_file_path, labels_file_path, num_examples = {
        "train": ("train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz", 60000),
        "test": ("t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz", 10000),
    }[subset]
    inputs_file_path = f"{data_path}/mnist/{inputs_file_path}"
    labels_file_path = f"{data_path}/mnist/{labels_file_path}"

    ## TODO: read the image file and normalize, flatten, and type-convert image
    with open(inputs_file_path, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
        buf = bytestream.read(num_examples*28*28 + 16)
        dt = np.dtype(np.uint8)
        temp = np.frombuffer(buf, dtype=dt) 
        image = temp[16:]
        if is_reshape:
            image = image.reshape((num_examples,28*28))
        else:
            image = image.reshape((num_examples, 28, 28, 1))
        image = image/255.0
    print(image.shape)

    ## TODO: read the label file
    with open(labels_file_path, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
        buf = bytestream.read(num_examples + 8)
        dt = np.dtype(np.uint8)
        temp = np.frombuffer(buf, dtype=dt) 
        label = temp[8:]

    return image, label
    
## THE REST ARE OPTIONAL!

'''
def shuffle_data(image_full, label_full, seed):
    
    pass
    
def get_subset(image_full, label_full, class_list=list(range(10)), num=100):
    pass
'''

2. Codificación en caliente

Este archivo se utiliza para implementar la codificación one-hot. Los lugares que deben escribirse a mano son los siguientes:

● fit(): [TODO] En esta función, toma datos (los almacena en self.uniq) y crea un diccionario con etiquetas como claves y sus correspondientes codificaciones one-hot como valores. Sugerencia: es posible que desee verificar np.eye() para la codificación one-hot. Eventualmente, lo almacenará en el diccionario self.uniq2oh.

forward(): en esta función, pasamos un vector que contiene todos los conjuntos de entrenamiento etiquetados reales en el objeto y llamamos a fit() para llenar el diccionario uniq2oh con etiquetas únicas y sus codificaciones one-hot correspondientes, luego lo usamos para devolver un An matriz de etiquetas codificadas one-hot para cada etiqueta en el conjunto de entrenamiento.

¡Esta función ya está llena para ti!

● inverse(): en la función, invertimos la codificación one-hot a la etiqueta de codificación real.

Esto ya se ha hecho por ti.

Por ejemplo, si tenemos etiquetas X e Y, que están codificadas como [1,0] y [0,1], tendríamos {X:[1,0], Y:[0,1]}.

Para MNIST, tendrá 10 etiquetas, ¡así que su diccionario debería tener 10 entradas!

onehot.py

import numpy as np

from .core import Callable


class OneHotEncoder(Callable):
    """
    One-Hot Encodes labels. First takes in a candidate set to figure out what elements it
    needs to consider, and then one-hot encodes subsequent input datasets in the
    forward pass.

    SIMPLIFICATIONS:
     - Implementation assumes that entries are individual elements.
     - Forward will call fit if it hasn't been done yet; most implementations will just error.
     - keras does not have OneHotEncoder; has LabelEncoder, CategoricalEncoder, and to_categorical()
    """

    def fit(self, data):
        """
        Fits the one-hot encoder to a candidate dataset. Said dataset should contain
        all encounterable elements.

        :param data: 1D array containing labels.
            For example, data = [0, 1, 3, 3, 1, 9, ...]
        """
        ## TODO: Fetch all the unique labels and create a dictionary with
        ## the unique labels as keys and their one hot encodings as values
        ## HINT: look up np.eye() and see if you can utilize it!

        ## HINT: Wouldn't it be nice if we just gave you the implementation somewhere...

        self.uniq = np.unique(data)  # all the unique labels from `data`
        self.uniq2oh = {}  # a lookup dictionary with labels and corresponding encodings
        eye = np.eye(len(self.uniq))
        for i in range(len(self.uniq)):
            self.uniq2oh[self.uniq[i]] = eye[i]
        

    def forward(self, data):
        if not hasattr(self, "uniq2oh"):
            self.fit(data)
        return np.array([self.uniq2oh[x] for x in data])

    def inverse(self, data):
        assert hasattr(self, "uniq"), \
            "forward() or fit() must be called before attempting to invert"
        return np.array([self.uniq[x == 1][0] for x in data])

3. Abstracción central

Este archivo es el código proporcionado para el experimento, no es necesario modificarlo.

core.py

from abc import ABC, abstractmethod  # # For abstract method support
from typing import Tuple

import numpy as np


## DO NOT MODIFY THIS CLASS
class Callable(ABC):
    """
    Callable Sub-classes:
     - CategoricalAccuracy (./metrics.py)       - TODO
     - OneHotEncoder       (./preprocess.py)    - TODO
     - Diffable            (.)                  - DONE
    """

    def __call__(self, *args, **kwargs) -> np.array:
        """Lets `self()` and `self.forward()` be the same"""
        return self.forward(*args, **kwargs)

    @abstractmethod
    def forward(self, *args, **kwargs) -> np.array:
        """Pass inputs through function. Can store inputs and outputs as instance variables"""
        pass


## DO NOT MODIFY THIS CLASS
class Diffable(Callable):
    """
    Diffable Sub-classes:
     - Dense            (./layers.py)           - TODO
     - LeakyReLU, ReLU  (./activations.py)      - TODO
     - Softmax          (./activations.py)      - TODO
     - MeanSquaredError (./losses.py)           - TODO
    """

    """Stores whether the operation being used is inside a gradient tape scope"""
    gradient_tape = None  ## All-instance-shared variable

    def __init__(self):
        """Is the layer trainable"""
        super().__init__()
        self.trainable = True  ## self-only instance variable

    def __call__(self, *args, **kwargs) -> np.array:
        """
        If there is a gradient tape scope in effect, perform AND RECORD the operation.
        Otherwise... just perform the operation and don't let the gradient tape know.
        """
        if Diffable.gradient_tape is not None:
            Diffable.gradient_tape.operations += [self]
        return self.forward(*args, **kwargs)

    @abstractmethod
    def input_gradients(self: np.array) -> np.array:
        """Returns gradient for input (this part gets specified for all diffables)"""
        pass

    def weight_gradients(self: np.array) -> Tuple[np.array, np.array]:
        """Returns gradient for weights (this part gets specified for SOME diffables)"""
        return ()

    def compose_to_input(self, J: np.array) -> np.array:
        """
        Compose the inputted cumulative jacobian with the input jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `input_gradients` to provide either batched or overall jacobian.
        Assumes input/cumulative jacobians are matrix multiplied
        """
        #  print(f"Composing to input in {self.__class__.__name__}")
        ig = self.input_gradients()
        batch_size = J.shape[0]
        n_out, n_in = ig.shape[-2:]
        j_new = np.zeros((batch_size, n_out), dtype=ig.dtype)
        for b in range(batch_size):
            ig_b = ig[b] if len(ig.shape) == 3 else ig
            j_new[b] = ig_b @ J[b]
        return j_new

    def compose_to_weight(self, J: np.array) -> list:
        """
        Compose the inputted cumulative jacobian with the weight jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `weight_gradients` to provide either batched or overall jacobian.
        Assumes weight/cumulative jacobians are element-wise multiplied (w/ broadcasting)
        and the resulting per-batch statistics are averaged together for avg per-param gradient.
        """
        # print(f'Composing to weight in {self.__class__.__name__}')
        assert hasattr(
            self, "weights"
        ), f"Layer {self.__class__.__name__} cannot compose along weight path"
        J_out = []
        ## For every weight/weight-gradient pair...
        for w, wg in zip(self.weights, self.weight_gradients()):
            batch_size = J.shape[0]
            ## Make a cumulative jacobian which will contribute to the final jacobian
            j_new = np.zeros((batch_size, *w.shape), dtype=wg.dtype)
            ## For every element in the batch (for a single batch-level gradient updates)
            for b in range(batch_size):
                ## If the weight gradient is a batch of transform matrices, get the right entry.
                ## Allows gradient methods to give either batched or non-batched matrices
                wg_b = wg[b] if len(wg.shape) == 3 else wg
                ## Update the batch's Jacobian update contribution
                j_new[b] = wg_b * J[b]
            ## The final jacobian for this weight is the average gradient update for the batch
            J_out += [np.mean(j_new, axis=0)]
        ## After new jacobian is computed for each weight set, return the list of gradient updatates
        return J_out


class GradientTape:

    def __init__(self):
        ## Log of operations that were performed inside tape scope
        self.operations = []

    def __enter__(self):
        # When tape scope is entered, let Diffable start recording to self.operation
        Diffable.gradient_tape = self
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        # When tape scope is exited, stop letting Diffable record
        Diffable.gradient_tape = None

    def gradient(self) -> list:
        """Get the gradient from first to last recorded operation"""
        ## TODO:
        ##
        ##  Compute weight gradients for all operations.
        ##  If the model has trainable weights [w1, b1, w2, b2] and ends at a loss L.
        ##  the model should return: [dL/dw1, dL/db1, dL/dw2, dL/db2]
        ##
        ##  Recall that self.operations is populated by Diffable class instances...
        ##
        ##  Start from the last operation and compute jacobian w.r.t input.
        ##  Continue to propagate the cumulative jacobian through the layer inputs
        ##  until all operations have been differentiated through.
        ##
        ##  If an operation that has weights is encountered along the way,
        ##  compute the weight gradients and add them to the return list.
        ##  Remember to check if the layer is trainable before doing this though...

        grads = []
        return grads

4. Capa de red

Esta capa imita Dense en Keras y requiere funciones manuscritas como:

● forward() : [TODO] Implementar pase hacia adelante y salida de retorno.

● weight_gradients() : [POR HACER] Calcula los pesos y sesgos del degradado en . Esto se utilizará para optimizar la capa.

● input_gradients() : [TODO] Calcula la entrada de la capa de degradado con respecto a . Esto se usará para propagar el degradado a las capas anteriores.

● _initialize_weight() : [POR HACER]

Inicializar valores de peso para capas densas De forma predeterminada, todos los pesos se inicializan a cero (lo que, por cierto, generalmente es una mala idea). También debe permitir opciones más complejas (cuando los inicializadores están configurados en normal, xavier y kaiing). ¡Sigue las suposiciones matemáticas de Keras!

〇Normal: Distribución normal unitaria que se explica por sí misma.

○Xavier Normal: Basado en keras.GlorotNormal.

〇Kaiing He Normal: Basado en Keras.HeNormal.

Al implementarlos, puede encontrar útil np.random.normal. El plan de acción explica por qué estos diferentes métodos de inicialización son necesarios, pero para obtener más detalles, consulte este sitio ¡Siéntase libre de agregar más opciones de inicialización!

.capas

import numpy as np

from .core import Diffable


class Dense(Diffable):

    # https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79

    def __init__(self, input_size, output_size, learning_rate=0.01, initializer="kaiming"):
        super().__init__()
        self.w, self.b = self.__class__._initialize_weight(
            initializer, input_size, output_size
        )
        self.weights = [self.w, self.b]
        self.learning_rate = learning_rate
        self.inputs  = None
        self.outputs = None

    def forward(self, inputs):
        """Forward pass for a dense layer! Refer to lecture slides for how this is computed."""
        self.inputs = inputs

        # TODO: implement the forward pass and return the outputs
        self.outputs = np.matmul(inputs, self.w) + self.b
        return self.outputs

    def weight_gradients(self, eta):
        """Calculating the gradients wrt weights and biases!"""
        # TODO: Implement calculation of gradients
        wgrads = np.dot(self.inputs.T, eta)
        bgrads = np.sum(eta, axis=0)
        return wgrads, bgrads

    def input_gradients(self, eta):
        """Calculating the gradients wrt inputs!"""
        # TODO: Implement calculation of gradients
        inputgrads = np.dot(eta, self.w.T)
        wgrads, bgrads = self.weight_gradients(eta)
        self.w = self.w - self.learning_rate*wgrads
        self.b = self.b - self.learning_rate*bgrads
        return inputgrads

    @staticmethod
    def _initialize_weight(initializer, input_size, output_size):
        """
        Initializes the values of the weights and biases. The bias weights should always start at zero.
        However, the weights should follow the given distribution defined by the initializer parameter
        (zero, normal, xavier, or kaiming). You can do this with an if statement
        cycling through each option!

        Details on each weight initialization option:
            - Zero: Weights and biases contain only 0's. Generally a bad idea since the gradient update
            will be the same for each weight so all weights will have the same values.
            - Normal: Weights are initialized according to a normal distribution.
            - Xavier: Goal is to initialize the weights so that the variance of the activations are the
            same across every layer. This helps to prevent exploding or vanishing gradients. Typically
            works better for layers with tanh or sigmoid activation.
            - Kaiming: Similar purpose as Xavier initialization. Typically works better for layers
            with ReLU activation.
        """
        initializer = initializer.lower()
        assert initializer in (
            "zero",
            "normal",
            "xavier",
            "kaiming",
        ), f"Unknown dense weight initialization strategy '{initializer}' requested"
        io_size = (input_size, output_size)

        # TODO: Implement default assumption: zero-init for weights and bias
        initial_b = np.zeros((1,output_size))
        if initializer=="zero":
            initial_w = np.zeros(io_size)
        # TODO: Implement remaining options (normal, xavier, kaiming initializations). Note that
        # strings must be exactly as written in the assert above
        elif initializer=="normal":
            initial_w = np.random.randn(input_size, output_size)
            
        elif initializer=="xavier":
            initial_w = np.random.randn(input_size, output_size) * np.sqrt(1 / output_size)
        
        elif initializer=="kaiming":
            initial_w = np.random.randn(input_size, output_size) * np.sqrt(2 / output_size)

        return initial_w, initial_b

5. Función de activación

Este archivo se usa para implementar la función de activación de LeakRelu y la función de activación de SoftMax, y escribe a mano su propagación hacia adelante [def hacia adelante] y la propagación hacia atrás [def input_fradients]:

● LeakyReLU()

〇forward() : [TODO] Dada la entrada x, calcula y devuelve LeakyReLU(x).

〇input_gradients() : [TODO] Calcula y devuelve la entrada obtenida por derivación de LeakyReLU.

● Softmax():(2470 SOLAMENTE)

〇forward(): [TODO] Dada la entrada x, calcular y devolver Softmax(x). Asegúrese de estar utilizando un softmax estable, es decir, restando el máximo de todos los términos para evitar problemas de desbordamiento/desbordamiento.

〇input_gradients(): [TODO] Entrada de escritura parcial de Softmax().

.activaciones.py

import numpy as np

from .core import Diffable


class LeakyReLU(Diffable):
    def __init__(self, alpha=0.3):
        super().__init__()
        self.alpha = alpha
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        # TODO: Given an input array `x`, compute LeakyReLU(x)
        self.inputs = inputs
        # Your code here:
        self.outputs = inputs if inputs.all()>=0 else inputs*self.alpha
        return self.outputs

    def input_gradients(self, eta):
        # TODO: Compute and return the gradients
        eta[self.inputs<=0] = 0
        return eta

    def compose_to_input(self, J):
        # TODO: Maybe you'll want to override the default?
        return super().compose_to_input(J)


class ReLU(LeakyReLU):
    def __init__(self):
        super().__init__(alpha=0)


class Softmax(Diffable):
    def __init__(self):
        super().__init__()
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        """Softmax forward pass!"""
        # TODO: Implement
        # HINT: Use stable softmax, which subtracts maximum from
        # all entries to prevent overflow/underflow issues
        self.inputs = inputs
        # Your code here:
        z = inputs - np.max(inputs, axis=-1,keepdims=True)
        numerator = np.exp(z)
        denominator = np.sum(numerator)
        self.outputs = numerator/denominator
        return self.outputs

    def input_gradients(self, etc):
        """Softmax backprop!"""
        # TODO: Compute and return the gradients
        
        return etc

☆ 6. Función de relleno

Este artículo se usa para escribir a mano la clase SequentialModel del modelo de secuencia en Keras. SequentialModel hereda la clase Model, por lo que primero implementamos la clase Model de la siguiente manera:

● compile() : inicializa el optimizador de modelo, la función de pérdida y la función de precisión, que se ingresan como parámetros para que los use la instancia de SequentialModel.

● fit() : entrena el modelo para asociar la entrada y la salida. El entrenamiento se repite para cada época y los datos se procesan por lotes en función de los parámetros. También calcula Batch_metrics, epoch_metrics y agg_metrics agregados que se pueden usar para rastrear el progreso de entrenamiento del modelo.

● evaluar() : [TODO] Evaluar el rendimiento del modelo final utilizando las métricas mencionadas en la fase de prueba. Casi coincide con la función (); piense en lo que sucede entre el entrenamiento y la prueba).

● call() : [TODO] Sugerencia: ¿Qué significa llamar a un modelo secuencial? Recuerde que un modelo secuencial es una pila de capas, y cada capa tiene solo un vector de entrada y un vector de salida. Puede hacer esto en la clase SequentialModel en asignación.py.

● batch_step() : [TODO] Verá fit() llamando a esta función para cada lote. Primero calculará las predicciones del modelo para el lote de entrada. Durante la fase de entrenamiento, debe calcular gradientes y actualizar sus pesos de acuerdo con el optimizador que está utilizando. Para la retropropagación durante el entrenamiento, utilizará GradientTape de la abstracción central (core.py) para registrar operaciones y valores intermedios. Luego, utilizará el optimizador del modelo para aplicar gradientes a las variables entrenables del modelo. Finalmente, calcule y devuelva la pérdida y la precisión de ese lote. Puede hacer esto en la clase SequentialModel en asignación.py.

modelo.py

from abc import ABC, abstractmethod
from collections import defaultdict

import numpy as np

from .core import Diffable


def print_stats(stat_dict, b=None, b_num=None, e=None, avg=False):
    """
    Given a dictionary of names statistics and batch/epoch info,
    print them in an appealing manner. If avg, display stat averages.
    """
    title_str = " - "
    if e is not None:
        title_str += f"Epoch {e+1:2}: "
    if b is not None:
        title_str += f"Batch {b+1:3}"
        if b_num is not None:
            title_str += f"/{b_num}"
    if avg:
        title_str += f"Average Stats"
    print(f"\r{title_str} : ", end="")
    op = np.mean if avg else lambda x: x
    print({k: np.round(op(v), 4) for k, v in stat_dict.items()}, end="")
    print("   ", end="" if not avg else "\n")
    

def update_metric_dict(super_dict, sub_dict):
    """
    Appends the average of the sub_dict metrics to the super_dict's metric list
    """
    for k, v in sub_dict.items():
        super_dict[k] += [np.mean(v)]


class Model(ABC):
    ###############################################################################################
    ## BEGIN GIVEN

    def __init__(self, layers):
        """
        Initialize all trainable parameters and take layers as inputs
        """
        # Initialize all trainable parameters
        assert all([issubclass(layer.__class__, Diffable) for layer in layers])
        self.layers = layers[:-1]
        self.trainable_variables = []
        for layer in layers:
            if hasattr(layer, "weights") and layer.trainable:
                for weight in layer.weights:
                    self.trainable_variables += [weight]

    def compile(self, optimizer, loss_fn, acc_fn):
        """
        "Compile" the model by taking in the optimizers, loss, and accuracy functions.
        In more optimized DL implementations, this will have more involved processes
        that make the components extremely efficient but very inflexible.
        """
        self.optimizer = optimizer
        self.compiled_loss = loss_fn
        self.compiled_acc = acc_fn

    def fit(self, x, y, epochs, batch_size):
        """
        Trains the model by iterating over the input dataset and feeding input batches
        into the batch_step method with training. At the end, the metrics are returned.
        """
        agg_metrics = defaultdict(lambda: [])
        batch_num = x.shape[0] // batch_size
        for e in range(epochs):
            epoch_metrics = defaultdict(lambda: [])
            for b, b1 in enumerate(range(batch_size, x.shape[0] + 1, batch_size)):
                b0 = b1 - batch_size
                batch_metrics = self.batch_step(x[b0:b1], y[b0:b1], training=True)
                update_metric_dict(epoch_metrics, batch_metrics)
                print_stats(batch_metrics, b, batch_num, e)
            update_metric_dict(agg_metrics, epoch_metrics)
            print_stats(epoch_metrics, e=e, avg=True)
        return agg_metrics

    def evaluate(self, x, y, batch_size):
        """
        X is the dataset inputs, Y is the dataset labels.
        Evaluates the model by iterating over the input dataset in batches and feeding input batches
        into the batch_step method. At the end, the metrics are returned. Should be called on
        the testing set to evaluate accuracy of the model using the metrics output from the fit method.

        NOTE: This method is almost identical to fit (think about how training and testing differ --
        the core logic should be the same)
        """
        # TODO: Implement evaluate similarly to fit.
        agg_metrics = defaultdict(lambda: [])
        batch_num = x.shape[0] // batch_size
        for e in range(1):
            epoch_metrics = defaultdict(lambda: [])
            for b, b1 in enumerate(range(batch_size, x.shape[0] + 1, batch_size)):
                b0 = b1 - batch_size
                batch_metrics = self.batch_step(x[b0:b1], y[b0:b1], training=False)
                update_metric_dict(epoch_metrics, batch_metrics)
                print_stats(batch_metrics, b, batch_num, e)
            update_metric_dict(agg_metrics, epoch_metrics)
            print_stats(epoch_metrics, e=e, avg=True)
        
        return agg_metrics

    @abstractmethod
    def call(self, inputs):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

    @abstractmethod
    def batch_step(self, x, y, training=True):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

☆def análisis por lotes_paso():

 y_pre = self.call(x)
: el valor predicho después de una propagación de red se obtiene a través de la propagación hacia adelante,

 loss = self.compiled_loss.forward(y_pre, y)
: Coloque el valor predicho y el valor real en la función de pérdida para obtener el valor de pérdida a través de la propagación directa .

acc = self.compiled_acc(y_pre, y)
: Coloque el valor predicho y el valor real en la función de precisión para obtener el valor de precisión a través de la propagación directa .

El significado de la retropropagación de cada función:

Función de activación : después de que la entrada de la capa superior de la red neuronal se transforme mediante la transformación no lineal de la capa de la red neuronal, la salida se obtiene a través de la función de activación. Las funciones de activación comunes incluyen: sigmoid, tanh, relu, etc.

Función de pérdida : una forma de medir la diferencia entre el valor predicho de la salida de una red neuronal y el valor real. Las funciones de pérdida comunes incluyen: función de pérdida de mínimos cuadrados, función de pérdida de entropía cruzada, función de pérdida suave L1 utilizada en regresión, etc.

Función de optimización : es decir, cómo pasar el valor de pérdida de la capa más externa de la red neuronal al frente. Como el algoritmo de descenso de gradiente más básico, el algoritmo de descenso de gradiente estocástico, el algoritmo de descenso de gradiente por lotes, el algoritmo de descenso de gradiente con impulso, Adagrad, Adadelta, Adam, etc.

función de pérdida
eta = self.compiled_loss.input_gradients()
: El gradiente se obtiene por retropropagación de la función de pérdida .

función de activación
for layer in self.layers[::-1]:

        eta = layer.input_gradients(eta)
: retropropaga los degradados a cada capa de red.

función de optimización
 if training:

            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])
: Coloque los pesos y el sesgo actualizados después de la propagación hacia adelante una vez y la propagación hacia atrás una vez en el optimizador, y pase el valor de pérdida de la capa más externa de la red neuronal al frente.

☆ clase SequentialModel en assgnment.py

class SequentialModel(Beras.Model):
    """
    Implemented in Beras/model.py

    def __init__(self, layers):
    def compile(self, optimizer, loss_fn, acc_fn):
    def fit(self, x, y, epochs, batch_size):
    def evaluate(self, x, y, batch_size):           ## <- TODO
    """

    def call(self, inputs):
        """
        Forward pass in sequential model. It's helpful to note that layers are initialized in Beras.Model, and
        you can refer to them with self.layers. You can call a layer by doing var = layer(input).
        """
        # TODO: The call function!
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

    def batch_step(self, x, y, training=True):
        """
        Computes loss and accuracy for a batch. This step consists of both a forward and backward pass.
        If training=false, don't apply gradients to update the model! 
        Most of this method (forward, loss, applying gradients)
        will take place within the scope of Beras.GradientTape()
        """
        # TODO: Compute loss and accuracy for a batch.
        # If training, then also update the gradients according to the optimizer
        y_pre = self.call(x)
        loss = self.compiled_loss.forward(y_pre, y)
        acc = self.compiled_acc(y_pre, y)

        eta = self.compiled_loss.input_gradients()
        # backwarding...
        for layer in self.layers[::-1]:
            #print(type(layer))
            eta = layer.input_gradients(eta)

        if training:
            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])
        return {"loss": loss, "acc": acc}

7. Función de pérdida

Este es uno de los aspectos más críticos del entrenamiento de modelos. En esta tarea, en lugar de implementar el MSE o la función de pérdida de error cuadrático medio como se describe en el experimento, elegí la función de pérdida CrossEntropyLoss . Porque después de los experimentos, el efecto de las otras dos funciones de pérdida no es satisfactorio.

Nota: Generalmente, la propagación hacia atrás de SoftMax se realiza junto con la función de pérdida CrossEntropyLoss, por lo tanto, no complete la parte de propagación de dirección de SoftMax.

● forward() : [TODO] Escriba una función que calcule y devuelva la media dado el error cuadrático de las etiquetas predichas y reales.

Sugerencia: ¿Qué es MSE? El error cuadrático medio es la diferencia entre el valor pronosticado y el valor real, dadas las etiquetas pronosticadas y reales.

● input_gradients() : [TODO] Calcular y devolver gradientes. Use una fórmula que derive estos gradientes por diferenciación.

pérdidas.py

import numpy as np
from .core import Diffable
from abc import ABCMeta, abstractmethod
import numpy as np

class CrossEntropyLoss(Diffable):
    def __init__(self):

        self.classifier = Softmax()

    def input_gradients(self):
        return self.grad

    def forward(self, a, y):
        a = self.classifier.forward(a)
        self.grad = a - y
        loss = -1 * np.einsum('ij,ij->', y, np.log(a), optimize=True) / y.shape[0]
        return loss

class Layer(metaclass=ABCMeta):

    @abstractmethod
    def forward(self, *args):
        pass

    @abstractmethod
    def backward(self, *args):
        pass
    
class Softmax(Layer):
    def forward(self, x):
        v = np.exp(x - x.max(axis=-1, keepdims=True))    
        return v / v.sum(axis=-1, keepdims=True)
    
    def backward(self, eta):
        pass

8. Función de optimización

Para el conjunto de datos Mnist, solo RMSProp: es completamente suficiente, por lo que este artículo solo implementa esta función de optimización.

● RMSProp : [TODO] Raíz cuadrática media de la propagación del error.

.optimizador.py

from collections import defaultdict
import numpy as np

class RMSProp:
    def __init__(self, learning_rate, beta=0.9, epsilon=1e-6):
        self.learning_rate = learning_rate

        self.beta = beta
        self.epsilon = epsilon

        self.v = defaultdict(lambda: 0)

    def apply_gradients(self, weights, grads):
        # TODO: Implement RMSProp optimization
        # Refer to the lab on Optimizers for a better understanding!
        self.mean_square = self.v['mean_square']
        self.mean_square = self.beta*self.mean_square + (1-self.beta)*(grads)**2
        self.v['mean_square'] = self.mean_square
        weights = weights - self.learning_rate/(np.sqrt(self.mean_square) + self.epsilon)*grads
        return

9. Índice de precisión

Este documento simplemente implementa un modelo de precisión para medir la precisión del modelo :

● forward() : [TODO] Devuelve las probabilidades predichas de precisión de clasificación del modelo y las etiquetas verdaderas. Debe devolver una etiqueta pronosticada proporcional igual a la etiqueta verdadera, donde la etiqueta pronosticada para la imagen es la etiqueta correspondiente a la probabilidad más alta. ¡Consulte la web o las diapositivas de conferencias para conocer las matemáticas de precisión de clasificación!

.métricas.py

import numpy as np

from .core import Callable


class CategoricalAccuracy(Callable):
    def forward(self, probs, labels):
        """Categorical accuracy forward pass!"""
        super().__init__()
        # TODO: Compute and return the categorical accuracy of your model given the output probabilities and true labels
        probsArg = np.argmax(probs, axis=1)
        labelsArg = np.argmax(labels, axis=1)
        
        return sum(probsArg==labelsArg)/len(labels)

10. Entrenamiento y pruebas

Se construyeron dos modelos, imitando a Keras:

● Un modelo simple en get_simple_model() con una capa de difusión como máximo (p. ej., densidad - ./layers.py) y una función de activación (en /activation.py). Si bien es posible hacerlo, esta opción se le brinda de manera predeterminada. Lo puedes cambiar si quieres. ¡Un calificador automático evaluará el original!

● Un modelo ligeramente más complejo en get_advanced_model(), con dos o más capas de difusión y dos o más funciones de activación. Recomendamos usar el optimizador de Adam para este modelo con una tasa de aprendizaje bastante baja.

def get_simple_model() en asignación.py

def get_simple_model_components():
    """
    Returns a simple single-layer model.
    """
    ## DO NOT CHANGE IN FINAL SUBMISSION

    from Beras.activations import Softmax
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy

    # TODO: create a model and compile it with layers and functions of your choice
    model = SequentialModel([Dense(784, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=10, batch_size=100)

get_advanced_model() en asignación.py

def get_advanced_model_components():
    from Beras.activations import Softmax, LeakyReLU
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.batchnorm import BatchNorm
    """
    Returns a multi-layered model with more involved components.
    """
    # TODO: create/compile a model with layers and functions of your choice.
    model = SequentialModel([Dense(784, 398), BatchNorm(398), LeakyReLU(0), Dense(398, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=12, batch_size=100)

11. Visualiza los resultados

Le proporcionamos el método visualize_metrics para visualizar su pérdida y precisión cambiando cada vez que usa matplotlib.

.visualizar.py

import matplotlib.pyplot as plt
import numpy as np


def visualize_metrics(losses=[], accuracies=[]):
    """
    param losses: a 1D array of loss values
    param accuracies: a 1D array of accuracy values

    Displays a plot with loss and accuracy values on the y-axis and batch number/epoch number on the
    x-axis
    """
    if not losses or not accuracies:
        return print("Must provide a list of losses/accuracies to visualize")
    x = np.arange(1, max(len(losses), len(accuracies)) + 1)
    plt.plot(x, losses)
    plt.plot(x, accuracies)
    plt.ylabel("Loss/Acc Value")
    plt.show()


def visualize_images(model, train_inputs, train_labels_ohe, num_searching=500):
    """
    param model: a neural network model (i.e. SequentialModel)
    param train_inputs: sample training inputs for the model to predict
    param train_labels_ohe: one-hot encoded training labels corresponding to train_inputs

    Displays 10 sample outputs the model correctly classifies and 10 sample outputs the model
    incorrectly classifies
    """

    rand_idx = np.random.choice(len(train_inputs), num_searching)
    rand_batch = train_inputs[rand_idx]
    probs = model.call(rand_batch)

    pred_classes = np.argmax(probs, axis=1)
    true_classes = np.argmax(train_labels_ohe[rand_idx], axis=1)

    right_idx = np.where(pred_classes == true_classes)
    wrong_idx = np.where(pred_classes != true_classes)

    right = np.reshape(rand_batch[right_idx], (-1, 28, 28))
    wrong = np.reshape(rand_batch[wrong_idx], (-1, 28, 28))

    right_pred_labels = true_classes[right_idx]
    wrong_pred_labels = pred_classes[wrong_idx]

    assert len(right) >= 10, f"Found less than 10 correct predictions!"
    assert len(wrong) >= 10, f"Found less than 10 correct predictions!"

    fig, axs = plt.subplots(2, 10)
    fig.suptitle("Classigications\n(PL = Predicted Label)")

    subsets = [right, wrong]
    pred_labs = [right_pred_labels, wrong_pred_labels]

    for r in range(2):
        for c in range(10):
            axs[r, c].imshow(subsets[r][c], cmap="Greys")
            axs[r, c].set(title=f"PL: {pred_labs[r][c]}")
            plt.setp(axs[r, c].get_xticklabels(), visible=False)
            plt.setp(axs[r, c].get_yticklabels(), visible=False)
            axs[r, c].tick_params(axis="both", which="both", length=0)

    plt.show()

12. Llame al código escrito en los 11 pasos anteriores para entrenar y probar el modelo

.asignación.py

from types import SimpleNamespace

import Beras
import numpy as np

class SequentialModel(Beras.Model):
    """
    Implemented in Beras/model.py

    def __init__(self, layers):
    def compile(self, optimizer, loss_fn, acc_fn):
    def fit(self, x, y, epochs, batch_size):
    def evaluate(self, x, y, batch_size):           ## <- TODO
    """

    def call(self, inputs):
        """
        Forward pass in sequential model. It's helpful to note that layers are initialized in Beras.Model, and
        you can refer to them with self.layers. You can call a layer by doing var = layer(input).
        """
        # TODO: The call function!
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

    def batch_step(self, x, y, training=True):
        """
        Computes loss and accuracy for a batch. This step consists of both a forward and backward pass.
        If training=false, don't apply gradients to update the model! 
        Most of this method (forward, loss, applying gradients)
        will take place within the scope of Beras.GradientTape()
        """
        # TODO: Compute loss and accuracy for a batch.
        # If training, then also update the gradients according to the optimizer
        y_pre = self.call(x)
        loss = self.compiled_loss.forward(y_pre, y)
        acc = self.compiled_acc(y_pre, y)

        eta = self.compiled_loss.input_gradients()
        # backwarding...
        for layer in self.layers[::-1]:
            #print(type(layer))
            eta = layer.input_gradients(eta)

        if training:
            self.optimizer.apply_gradients(self.trainable_variables[0], self.trainable_variables[1])
        return {"loss": loss, "acc": acc}

def get_simple_model_components():
    """
    Returns a simple single-layer model.
    """
    ## DO NOT CHANGE IN FINAL SUBMISSION

    from Beras.activations import Softmax
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy

    # TODO: create a model and compile it with layers and functions of your choice
    model = SequentialModel([Dense(784, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=10, batch_size=100)

def get_advanced_model_components():
    from Beras.activations import Softmax, LeakyReLU
    from Beras.layers import Dense
    from Beras.metrics import CategoricalAccuracy
    from Beras.losses import CrossEntropyLoss, MeanSquaredError, CategoricalCrossentropy
    from Beras.optimizers import BasicOptimizer, RMSProp
    from Beras.batchnorm import BatchNorm
    """
    Returns a multi-layered model with more involved components.
    """
    # TODO: create/compile a model with layers and functions of your choice.
    model = SequentialModel([Dense(784, 398), BatchNorm(398), LeakyReLU(0), Dense(398, 10), Softmax()])
    model.compile(
        optimizer=RMSProp(0.02),
        loss_fn=CrossEntropyLoss(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=12, batch_size=100)

if __name__ == "__main__":
    """
    Read in MNIST data and initialize/train/test your model.
    """
    from Beras.onehot import OneHotEncoder
    import preprocess

    ## Read in MNIST data,
    train_inputs, train_labels = preprocess.get_data_MNIST("train", "../data")
    test_inputs,  test_labels  = preprocess.get_data_MNIST("test",  "../data")

    ## TODO: Use the OneHotEncoder class to one hot encode the labels
    # ohe = lambda x: 0  ## placeholder function: returns zero for a given input
    ohe = OneHotEncoder()
    ohe.fit(train_labels)
    ## Get your model to train and test
    simple = False
    args = get_simple_model_components() if simple else get_advanced_model_components()
    model = args.model

    ## REMINDER: Threshold of accuracy: 
    ##  1470: >85% on testing accuracy from get_simple_model_components
    ##  2470: >95% on testing accuracy from get_advanced_model_components

    # TODO: Fit your model to the training input and the one hot encoded labels
    # Remember to pass all the arguments that SequentialModel.fit() requires
    # such as number of epochs and the batch size
    print('---------------------------[[[Train]]]]---------------------------')
    train_agg_metrics = model.fit(
        train_inputs, 
        ohe(train_labels), 
        epochs     = args.epochs, 
        batch_size = args.batch_size
    )
    print('-------------------------------------------------------------------')
    ## Feel free to use the visualize_metrics function to view your accuracy and loss.
    ## The final accuracy returned during evaluation must be > 80%.

    # from visualize import visualize_images, visualize_metrics
    # visualize_metrics(train_agg_metrics["loss"], train_agg_metrics["acc"])
    # visualize_images(model, train_inputs, ohe(tr  ain_labels))

    ## TODO: Evaluate your model using your testing inputs and one hot encoded labels.
    ## This is the number you will be using!
    print('---------------------------[[[Evaluate]]]---------------------------')
    test_agg_metrics = model.evaluate(test_inputs, ohe(test_labels), batch_size=100)
    print('Testing Performance:', test_agg_metrics)
    print('-----------------------------------------------------------------')

Me considero una tarea apenas calificada (no lo suficientemente buena), y las respuestas proporcionadas son solo para referencia. ¡Les deseo a todos un buen momento!

[Manuscrito en el pasado] Referencia de análisis: Numpy red neuronal multicapa manuscrita

prefacio

Proporcionar archivos de código terminados

Adquisición de archivos:

Estructura del archivo:

1. Datos preprocesados

preproceso.py

2. Codificación en caliente

onehot.py

3. Abstracción central

core.py

4. Capa de red

.capas

5. Función de activación

.activaciones.py

☆ 6. Función de relleno

modelo.py

☆def análisis por lotes_paso():

☆ clase SequentialModel en assgnment.py

7. Función de pérdida

pérdidas.py

8. Función de optimización

.optimizador.py

9. Índice de precisión

.métricas.py

10. Entrenamiento y pruebas

def get_simple_model() en asignación.py

get_advanced_model() en asignación.py

11. Visualiza los resultados

.visualizar.py

12. Llame al código escrito en los 11 pasos anteriores para entrenar y probar el modelo

.asignación.py

Supongo que te gusta