python machine learning - stochastic gradient descent

Previous we achieved ADALINE using a gradient descent method, this method uses all the training samples to update the weight vector can also be called batch gradient descent (batch gradient descent). Suppose now that we have a large number of sample data sets, such as one million samples, so if we now use to train a batch gradient descent model, each updated weight vector, we have to use one million samples, training time is very long, very efficient low, we can find a way to use both gradient descent, but they do not have to use weights every update to all the samples, so they were randomly presented to the gradient descent method (stochastic gradient descent).

Stochastic gradient descent method can only be used to update the weight vector using a training sample:
\ [\ ETA (Y ^ I- \ Phi (Z ^ I)) X ^ I \]
This method than the batch method Gradient Descent more fast, because it can be more frequent updates weight vector, and when the samples using updated weights, compared to the use of all of the samples to newer and more random, the algorithm will help to avoid falling into local minimum, using this method be sure to pay attention to randomly selected when selecting a sample to be updated, must disrupt the order of all samples before each iteration to ensure the randomness of training and learning rate in training is not fixed, as can be iterative increase in the number, learning rate decreases, this method can contribute to convergence.

Now we have to use all the samples of batch gradient descent, has also been using a single sample of stochastic gradient descent, so a compromise approach, called the minimum batch of study (mini-batch learning), each part of the training to use it sample to update the weight vector.

Next, we implemented using stochastic gradient descent method of Adaline

from numpy.random import seed
class AdalineSGD(object):
    """ADAptive LInear NEuron classifier.

    Parameters
    ----------
    eta:float
        Learning rate(between 0.0 and 1.0
    n_iter:int
        Passes over the training dataset.

    Attributes
    ----------
    w_: 1d-array
        weights after fitting.
    errors_: list
        Number of miscalssifications in every epoch.
    shuffle:bool(default: True)
        Shuffle training data every epoch
        if True to prevent cycles.
    random_state: int(default: None)
        Set random state for shuffling
        and initalizing the weights.

    """

    def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        if random_state:
            seed(random_state)

    def fit(self, X, y):
        """Fit training data.

        :param X:{array-like}, shape=[n_samples, n_features]
        :param y: array-like, shape=[n_samples]
        :return:
        self:object

        """

        self._initialize_weights(X.shape[1])
        self.cost_ = []

        for i in range(self.n_iter):
            if self.shuffle:
                X, y = self._shuffle(X, y)
            cost = []
            for xi, target in zip(X, y):
                cost.append(self._update_weights(xi, target))
            avg_cost = sum(cost)/len(y)
            self.cost_.append(avg_cost)
        return self
    
    def partial_fit(self, X, y):
        """Fit training data without reinitializing the weights."""
        if not self.w_initialized:
            self._initialize_weights(X.shape[1])
        if y.ravel().shape[0] > 1:
            for xi, target in zip(X, y):
                self._update_weights(xi, target)
        else:
            self._update_weights(X, y)
        return self
    
    def _shuffle(self, X, y):
        """Shuffle training data"""
        r = np.random.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        """Initialize weights to zeros"""
        self.w_ = np.zeros(1 + m)
        self.w_initialized = True
    
    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights"""
        output = self.net_input(xi)
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error ** 2
        return cost
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0]
    
    def activation(self, X):
        """Computer linear activation"""
        return self.net_input(X)
    
    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(X) >= 0.0, 1, -1)

Wherein _shuffle method, the permutation function call numpy.random obtain a random sequence of 0 to 100, and this sequence as a subscript category feature vectors and matrices, can disrupt the function as the sample order.

Now to start training

ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1)
ada.fit(X_std, y)

Training and draw a graph of FIG boundary

plot_decision_region(X_std, y, classifier=ada)
plt.title('Adaline - Stochastic Gradient Desent')
plt.xlabel('sepal length [standardized]')
plt.ylabel('petal length [standardized]')
plt.legend(loc = 'upper left')
plt.show()
plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Average Cost')
plt.show()

As can be seen from the figure, the average loss decreases rapidly, after about 15 iterations of the dividing line and use batch gradient descent Adaline boundary is similar.

Guess you like

Origin www.cnblogs.com/Dzha/p/11853542.html