TensorFlow 卷积神经网络系列案例(1):猫狗识别 https://blog.csdn.net/duan_zhihua/article/details/81156693
TensorFlow 系列案例(2):自然语言处理-TensorFlow + Word2Vechttps://blog.csdn.net/duan_zhihua/article/details/81257323
TensorFlow 系列案例(3): 使用TensorFlow DNN分类器对数据进行分类
步骤:
- 分别读入iris训练数据集、测试数据集。
- 使用tf.contrib.learn.DNNClassifier算法对数据集进行分类。
- 使用classifier.fit训练模型。
- 评估分类准确度。
- 使用新的样本数据进行预测。
iris_training.csv训练数据如下:第一行为标题,第一个数字120为数据的行数,第二个数字4为特征维度。
120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
5.7,3.8,1.7,0.3,0
4.4,3.2,1.3,0.2,0
5.4,3.4,1.5,0.4,0
6.9,3.1,5.1,2.3,2
6.7,3.1,4.4,1.4,1
5.1,3.7,1.5,0.4,0
5.2,2.7,3.9,1.4,1
6.9,3.1,4.9,1.5,1
5.8,4.0,1.2,0.2,0
5.4,3.9,1.7,0.4,0
7.7,3.8,6.7,2.2,2
6.3,3.3,4.7,1.6,1
6.8,3.2,5.9,2.3,2
7.6,3.0,6.6,2.1,2
iris_test.csv数据文件:测试数据如下:第一行为标题,第一个数字30为数据的行数,第二个数字4为特征维度。
30,4,setosa,versicolor,virginica
5.9,3.0,4.2,1.5,1
6.9,3.1,5.4,2.1,2
5.1,3.3,1.7,0.5,0
6.0,3.4,4.5,1.6,1
5.5,2.5,4.0,1.3,1
6.2,2.9,4.3,1.3,1
5.5,4.2,1.4,0.2,0
6.3,2.8,5.1,1.5,2
iris.py代码:
# -*- coding: utf-8 -*-
# 引入必要的module
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import urllib
import numpy as np
import tensorflow as tf
# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"
IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
def main():
# If the training and test sets aren't stored locally, download them.
if not os.path.exists(IRIS_TRAINING):
raw = urllib.urlopen(IRIS_TRAINING_URL).read()
with open(IRIS_TRAINING, "w") as f:
f.write(raw)
if not os.path.exists(IRIS_TEST):
raw = urllib.urlopen(IRIS_TEST_URL).read()
with open(IRIS_TEST, "w") as f:
f.write(raw)
# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float32)
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,hidden_units=[10, 20, 10],n_classes=3,model_dir="/tmp/iris_model")
# Define the training inputs
def get_train_inputs():
x = tf.constant(training_set.data)
y = tf.constant(training_set.target)
return x, y
# Fit model.
classifier.fit(input_fn=get_train_inputs, steps=2000)
# Define the test inputs
def get_test_inputs():
x = tf.constant(test_set.data)
y = tf.constant(test_set.target)
return x, y
# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=get_test_inputs,steps=1)["accuracy"]
print("\nTest Accuracy: {0:f}\n".format(accuracy_score))
# Classify two new flower samples.
def new_samples():
return np.array(
[[6.4, 3.2, 4.5, 1.5],
[5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
predictions = list(classifier.predict(input_fn=new_samples))
print(
"New Samples, Class Predictions: {}\n"
.format(predictions))
if __name__ == "__main__":
main()
运行结果为:
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./tmp/iris_model\model.ckpt.
INFO:tensorflow:loss = 1.27656, step = 1
INFO:tensorflow:global_step/sec: 1038.9
INFO:tensorflow:loss = 0.189959, step = 101 (0.097 sec)
INFO:tensorflow:global_step/sec: 1172.9
INFO:tensorflow:loss = 0.0868506, step = 201 (0.086 sec)
INFO:tensorflow:global_step/sec: 1133.76
INFO:tensorflow:loss = 0.0672543, step = 301 (0.089 sec)
INFO:tensorflow:global_step/sec: 1028.17
INFO:tensorflow:loss = 0.0576761, step = 401 (0.095 sec)
INFO:tensorflow:global_step/sec: 1061.02
INFO:tensorflow:loss = 0.051202, step = 501 (0.094 sec)
INFO:tensorflow:global_step/sec: 997.134
INFO:tensorflow:loss = 0.0487256, step = 601 (0.101 sec)
INFO:tensorflow:global_step/sec: 959.174
INFO:tensorflow:loss = 0.0473462, step = 701 (0.105 sec)
INFO:tensorflow:global_step/sec: 905.811
INFO:tensorflow:loss = 0.0449771, step = 801 (0.108 sec)
INFO:tensorflow:global_step/sec: 811.094
INFO:tensorflow:loss = 0.0430157, step = 901 (0.123 sec)
INFO:tensorflow:global_step/sec: 959.593
INFO:tensorflow:loss = 0.0419328, step = 1001 (0.105 sec)
INFO:tensorflow:global_step/sec: 1095.99
INFO:tensorflow:loss = 0.0405608, step = 1101 (0.090 sec)
INFO:tensorflow:global_step/sec: 1120.6
INFO:tensorflow:loss = 0.0401995, step = 1201 (0.090 sec)
INFO:tensorflow:global_step/sec: 997.35
INFO:tensorflow:loss = 0.0391046, step = 1301 (0.100 sec)
INFO:tensorflow:global_step/sec: 968.277
INFO:tensorflow:loss = 0.038355, step = 1401 (0.103 sec)
INFO:tensorflow:global_step/sec: 1159.71
INFO:tensorflow:loss = 0.0374698, step = 1501 (0.086 sec)
INFO:tensorflow:global_step/sec: 1173.35
INFO:tensorflow:loss = 0.0361673, step = 1601 (0.085 sec)
INFO:tensorflow:global_step/sec: 1216.27
INFO:tensorflow:loss = 0.0356515, step = 1701 (0.081 sec)
INFO:tensorflow:global_step/sec: 1231.28
INFO:tensorflow:loss = 0.0347832, step = 1801 (0.082 sec)
INFO:tensorflow:global_step/sec: 1146.37
INFO:tensorflow:loss = 0.034383, step = 1901 (0.086 sec)
INFO:tensorflow:Saving checkpoints for 2000 into ./tmp/iris_model\model.ckpt.
INFO:tensorflow:Loss for final step: 0.0331372.
WARNING:tensorflow:From g:\ProgramData\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\estimators\head.py:625: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Starting evaluation at 2018-08-02-13:22:35
INFO:tensorflow:Restoring parameters from ./tmp/iris_model\model.ckpt-2000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2018-08-02-13:22:35
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.966667, global_step = 2000, loss = 0.0624721
Test Accuracy: 0.966667
WARNING:tensorflow:From g:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py:347: calling DNNClassifier.predict (from tensorflow.contrib.learn.python.learn.estimators.dnn) with outputs=None is deprecated and will be removed after 2017-03-01.
Instructions for updating:
Please switch to predict_classes, or set `outputs` argument.
INFO:tensorflow:Restoring parameters from ./tmp/iris_model\model.ckpt-2000
New Samples, Class Predictions: [1, 2]
iris.py代码中的load_csv_with_header读入数据文件,解析返回data、target;其中target标签列是数据文件的最后一列。
load_csv_with_header的源代码:
@deprecated(None, 'Use tf.data instead.')
def load_csv_with_header(filename,
target_dtype,
features_dtype,
target_column=-1):
"""Load dataset from CSV file with a header row."""
with gfile.Open(filename) as csv_file:
data_file = csv.reader(csv_file)
header = next(data_file)
n_samples = int(header[0])
n_features = int(header[1])
data = np.zeros((n_samples, n_features), dtype=features_dtype)
target = np.zeros((n_samples,), dtype=target_dtype)
for i, row in enumerate(data_file):
target[i] = np.asarray(row.pop(target_column), dtype=target_dtype)
data[i] = np.asarray(row, dtype=features_dtype)
return Dataset(data=data, target=target)
real_valued_column方法源代码:
def real_valued_column(column_name,
dimension=1,
default_value=None,
dtype=dtypes.float32,
normalizer=None):
"""Creates a `_RealValuedColumn` for dense numeric data.
Args:
column_name: A string defining real valued column name.
dimension: An integer specifying dimension of the real valued column.
The default is 1.
default_value: A single value compatible with dtype or a list of values
compatible with dtype which the column takes on during tf.Example parsing
if data is missing. When dimension is not None, a default value of None
will cause tf.parse_example to fail if an example does not contain this
column. If a single value is provided, the same value will be applied as
the default value for every dimension. If a list of values is provided,
the length of the list should be equal to the value of `dimension`.
Only scalar default value is supported in case dimension is not specified.
dtype: defines the type of values. Default value is tf.float32. Must be a
non-quantized, real integer or floating point type.
normalizer: If not None, a function that can be used to normalize the value
of the real valued column after default_value is applied for parsing.
Normalizer function takes the input tensor as its argument, and returns
the output tensor. (e.g. lambda x: (x - 3.0) / 4.2). Note that for
variable length columns, the normalizer should expect an input_tensor of
type `SparseTensor`.
Returns:
A _RealValuedColumn.
Raises:
TypeError: if dimension is not an int
ValueError: if dimension is not a positive integer
TypeError: if default_value is a list but its length is not equal to the
value of `dimension`.
TypeError: if default_value is not compatible with dtype.
ValueError: if dtype is not convertible to tf.float32.
"""
if dimension is None:
raise TypeError("dimension must be an integer. Use the "
"_real_valued_var_len_column for variable length features."
"dimension: {}, column_name: {}".format(dimension,
column_name))
if not isinstance(dimension, int):
raise TypeError("dimension must be an integer. "
"dimension: {}, column_name: {}".format(dimension,
column_name))
if dimension < 1:
raise ValueError("dimension must be greater than 0. "
"dimension: {}, column_name: {}".format(dimension,
column_name))
if not (dtype.is_integer or dtype.is_floating):
raise ValueError("dtype must be convertible to float. "
"dtype: {}, column_name: {}".format(dtype, column_name))
if default_value is None:
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
if isinstance(default_value, int):
if dtype.is_integer:
default_value = ([default_value for _ in range(dimension)] if dimension
else [default_value])
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
if dtype.is_floating:
default_value = float(default_value)
default_value = ([default_value for _ in range(dimension)] if dimension
else [default_value])
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
if isinstance(default_value, float):
if dtype.is_floating and (not dtype.is_integer):
default_value = ([default_value for _ in range(dimension)] if dimension
else [default_value])
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
if isinstance(default_value, list):
if len(default_value) != dimension:
raise ValueError(
"The length of default_value must be equal to dimension. "
"default_value: {}, dimension: {}, column_name: {}".format(
default_value, dimension, column_name))
# Check if the values in the list are all integers or are convertible to
# floats.
is_list_all_int = True
is_list_all_float = True
for v in default_value:
if not isinstance(v, int):
is_list_all_int = False
if not (isinstance(v, float) or isinstance(v, int)):
is_list_all_float = False
if is_list_all_int:
if dtype.is_integer:
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
elif dtype.is_floating:
default_value = [float(v) for v in default_value]
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
if is_list_all_float:
if dtype.is_floating and (not dtype.is_integer):
default_value = [float(v) for v in default_value]
return _RealValuedColumn(column_name, dimension, default_value, dtype,
normalizer)
raise TypeError("default_value must be compatible with dtype. "
"default_value: {}, dtype: {}, column_name: {}".format(
default_value, dtype, column_name))
dnn.py源代码:
class DNNClassifier(estimator.Estimator):
"""A classifier for TensorFlow DNN models.
Example:
```python
sparse_feature_a = sparse_column_with_hash_bucket(...)
sparse_feature_b = sparse_column_with_hash_bucket(...)
sparse_feature_a_emb = embedding_column(sparse_id_column=sparse_feature_a,
...)
sparse_feature_b_emb = embedding_column(sparse_id_column=sparse_feature_b,
...)
estimator = DNNClassifier(
feature_columns=[sparse_feature_a_emb, sparse_feature_b_emb],
hidden_units=[1024, 512, 256])
# Or estimator using the ProximalAdagradOptimizer optimizer with
# regularization.
estimator = DNNClassifier(
feature_columns=[sparse_feature_a_emb, sparse_feature_b_emb],
hidden_units=[1024, 512, 256],
optimizer=tf.train.ProximalAdagradOptimizer(
learning_rate=0.1,
l1_regularization_strength=0.001
))
# Input builders
def input_fn_train: # returns x, y (where y represents label's class index).
pass
estimator.fit(input_fn=input_fn_train)
def input_fn_eval: # returns x, y (where y represents label's class index).
pass
estimator.evaluate(input_fn=input_fn_eval)
def input_fn_predict: # returns x, None
# predict_classes returns class indices.
estimator.predict_classes(input_fn=input_fn_predict)
```
If the user specifies `label_keys` in constructor, labels must be strings from
the `label_keys` vocabulary. Example:
```python
label_keys = ['label0', 'label1', 'label2']
estimator = DNNClassifier(
feature_columns=[sparse_feature_a_emb, sparse_feature_b_emb],
hidden_units=[1024, 512, 256],
label_keys=label_keys)
def input_fn_train: # returns x, y (where y is one of label_keys).
pass
estimator.fit(input_fn=input_fn_train)
def input_fn_eval: # returns x, y (where y is one of label_keys).
pass
estimator.evaluate(input_fn=input_fn_eval)
def input_fn_predict: # returns x, None
# predict_classes returns one of label_keys.
estimator.predict_classes(input_fn=input_fn_predict)
```
Input of `fit` and `evaluate` should have following features,
otherwise there will be a `KeyError`:
* if `weight_column_name` is not `None`, a feature with
`key=weight_column_name` whose value is a `Tensor`.
* for each `column` in `feature_columns`:
- if `column` is a `SparseColumn`, a feature with `key=column.name`
whose `value` is a `SparseTensor`.
- if `column` is a `WeightedSparseColumn`, two features: the first with
`key` the id column name, the second with `key` the weight column name.
Both features' `value` must be a `SparseTensor`.
- if `column` is a `RealValuedColumn`, a feature with `key=column.name`
whose `value` is a `Tensor`.
"""
......
类似的,对于一些数据分类的案例,可以使用机器学习挖掘算法进行分类;也可使用深度学习进行分类,只需模仿使用鸢尾花数据集的数据源文件格式,稍微修改上述的代码,就可以体验深度学习的分类算法了。
学到很多东西的诀窍,就是一下子不要学很多。——洛克