one_hot=True编码的过程及意义

见以下代码

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
BATCH_SIZE=10

mnist =input_data.read_data_sets('MNIST_data/')
xs,ys = mnist.train.next_batch(BATCH_SIZE)
print(ys.shape)
print(ys)
print('-------------编码后--------------')
mnist =input_data.read_data_sets('MNIST_data/',one_hot = True)
xs,ys = mnist.train.next_batch(BATCH_SIZE)
# print(xs.shape)
print(ys.shape)
print(ys)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(10,)
[7 0 8 2 4 5 7 8 6 2]
-------------编码后--------------
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(10, 10)
[[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]]

pandas中one-hot编码的神坑
机器学习中,经常会用到one-hot编码。pandas中已经提供了这一函数。
但是这里有一个神坑,得到的one-hot编码数据类型是uint8,进行数值计算时会溢出!!!

import pandas as pd
import numpy as np
a = [1, 2, 3, 1]
one_hot = pd.get_dummies(a)
print(one_hot.dtypes)
print(one_hot)
print(-one_hot)
1    uint8
2    uint8
3    uint8
dtype: object
   1  2  3
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0
     1    2    3
0  255    0    0
1    0  255    0
2    0    0  255
3  255    0    0

猜你喜欢

转载自blog.csdn.net/weiyumeizi/article/details/81502471