News Category: AI multi-classification problem

We mentioned on a very classic three questions, they are:

  1. Binary classification (movie review good or bad tendency judgment)
  2. Multi-classification problem (the news by topic)
  3. Regression (according to real estate data to estimate real estate prices)

Last article, we introduced the binary classification of them, which we introduced a multi-classification issues. If you have not read the previous, please read on one, otherwise you will be a lot of content below unintelligible. Now we begin today's topic:

The actual background is this: the Reuters news divided into 46 mutually exclusive categories, an article which may be attributed to one or more classes, we need to do it is to automatically classify news. The problem is not the same as the previous black-and-white, either-or type of judgment, but to consider each article is different probabilities for each classification. A little thought, we will find the problem, although on different issues like the last, but the same part of it more, we only need to carry out some changes like depending on the particular circumstances. Specific contents are explained below, the same portions brief description, if in doubt please read the article:

  1. Data as previously described, the same can be initialized, i.e. by index, the data processing for the article series string word index, with one-hot vector processing method so that it can be treated as a network. There is a difference this time is the result, label also need to be addressed, because the result is not two values, it is a tensor of.

  2. Still using relu activated middle layer, projected spatial dimension can not be 16, and 64 into here, the reason is because too many results, with sixteen dimensions to contain information sixty-four results will be in the training process too much information is lost, resulting in greater accuracy will be decreased, and therefore layer 64 here.

  3. For the loss function, on one of binary_crossentropy is not good enough, need to modify the loss function, sparse_categorical_crossentropy for multi-classification of the loss function, but on different interfaces need to look between the former and the latter.

  4. We are still training 20 times, there have been previous problems, too fit, but this time appears after the ninth iteration, so we will iterate here instead Nine, re-training network. Figure I placed in front of the code, you can view:

    1. The last layer of the network, should not be activated with sigmoid, and should softmax, on the way to the probability distribution of output categories. This is not the same as the binary distribution.
  5. Finally, start training networks, training, roughly 80% accuracy.

  6. To end here, but there are two issues worth a look. We randomly articles were classified by random classifier, the accuracy rate was 19% in the previous two classifications can reach 50%. Our final results of each test data is a vector of a dimension of 46, is the probability value for each category, which add up to 1 (due to the calculation accuracy problems, in fact, is now finally possible to have a 1 small deviation), maximum probability category is what we predicted category. Finally, in the relevant code already, you can view the reference.

No more new things, and a brief introduction of these pictures the code given below:

image
image

#!/usr/bin/env python3

import copy

import numpy as np
from keras import layers
from keras import models
from keras.datasets import reuters


def classify():
    (train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
    # print('训练集长度:', len(train_data))
    # print('测试集长度:', len(test_data))
    # print(train_data[10])

    # 查看原文
    # word_index = reuters.get_word_index()
    # reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
    # decoded_newswire = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])
    # print(decoded_newswire)

    # 输出某一标签的分类索引
    # print(train_labels[3])

    x_train = vectorize_sequences(train_data)
    x_test = vectorize_sequences(test_data)

    # 将编码转换成张量
    # y_train = np.array(train_labels)
    # y_test = np.array(test_labels)

    one_hot_train_labels = to_one_hot(train_labels)
    one_hot_test_labels = to_one_hot(test_labels)
    # one_hot_train_labels = to_categorical(train_labels)
    # one_hot_test_labels = to_categorical(test_labels)

    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(46, activation='softmax'))

    model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
    # model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['acc'])
    # model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

    x_val = x_train[:1000]
    partial_x_train = x_train[1000:]
    y_val = one_hot_train_labels[:1000]
    partial_y_train = one_hot_train_labels[1000:]

    history = model.fit(partial_x_train, partial_y_train, epochs=9, batch_size=512, validation_data=(x_val, y_val))

    # 图片输出测试结果
    # loss = history.history['loss']
    # val_loss = history.history['val_loss']
    # epochs = range(1, len(loss) + 1)
    # plt.plot(epochs, loss, 'bo', label='训练损失')
    # plt.plot(epochs, val_loss, 'b', label='验证损失')
    # plt.title('训练损失和验证损失')
    # plt.xlabel('迭代')
    # plt.ylabel('精度')
    # plt.legend()
    # plt.show()

    # plt.clf()
    # acc = history.history['acc']
    # val_acc = history.history['val_acc']
    # plt.plot(epochs, acc, 'bo', label='训练精度')
    # plt.plot(epochs, val_acc, 'b', label='验证精度')
    # plt.title('训练精度和验证精度')
    # plt.xlabel('迭代')
    # plt.ylabel('精度')
    # plt.legend()
    # plt.show()

    results = model.evaluate(x_test, one_hot_test_labels)
    # 80%
    print(results)

    # 如果是随机分类器
    test_labels_copy = copy.copy(test_labels)
    np.random.shuffle(test_labels_copy)
    hits_array = np.array(test_labels) == np.array(test_labels_copy)
    # 19%
    print(float(np.sum(hits_array)) / len(test_labels))

    # 预测值
    predictions = model.predict(x_test)
    print(predictions[10].shape)
    print(np.sum(predictions[10]))
    print(np.argmax(predictions[10]))


def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results


def to_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1.
    return results


if __name__ == "__main__":
    classify()

Guess you like

Origin www.cnblogs.com/renyuzhuo/p/12222563.html