Author: chen_h
WeChat & QQ: 862251340
WeChat public account: coderpai
It is planned to learn the Python API in tensorflow now, which will facilitate future learning.
Original link
This chapter introduces the API for sparse tensors
sparse tensor representation
For multidimensional sparse data, TensorFlow provides sparse tensor representation. The values in the sparse tensor are represented by IndexedSlices
indices, which can represent the data more efficiently.
class tf.SparseTensor
Explanation: The purpose of this function is to represent a sparse tensor.
Tensorflow uses three dense tensors: indices
, values
, dense_shape
, to represent a sparse tensor. In the Python interface, the three tensors are combined into one SparseTensor
class, and if you swap the positions of the three dense tensors, the SparseTensor
class will automatically swap the positions of the three tensors before proceeding with the operation.
Specifically, the sparse tensor is represented as SparseTensor(values, indices, dense_shape)
:
indices
: A two-dimensional tensor, the data type isint64
, and the data dimension is[N, ndims]
.values
: A one-dimensional tensor, the data type is arbitrary, and the data dimension is[N]
.dense_shape
: A one-dimensional tensor, the data type isint64
, and the data dimension is[ndims]
.
Among them, represents the existence of a value N
in the sparse tensor , and represents the dimension.N
ndims
SparseTensor
The corresponding dense tensor satisfies:
dense.shape = dense_shape
dense[tuple(indices[i])] = values[i]
By convention, indices
indexes in should be sorted from smallest to largest. SparseTensor
The order of the three dense tensors in is not mandatory, you can shuffle it and SparseTensor
it will be sorted automatically.
for example:
SparseTensor(values=[1, 2], indices=[[0, 0], [1, 2]], shape=[3, 4])
Then the dense tensor is:
[[1, 0, 0, 0]
[0, 0, 2, 0]
[0, 0, 0, 0]]
tf.SparseTensor.__init__(indices, values, shape)
Explanation: What this function does is build a SparseTensor
.
Input parameters:
indices
: A two-dimensional tensor, the data type isint64
, and the data dimension is[N, ndims]
.values
: A one-dimensional tensor, the data type is arbitrary, and the data dimension is[N]
.dense_shape
: A one-dimensional tensor, the data type isint64
, and the data dimension is[ndims]
.
Output parameters:
* A sparse tensor SparseTensor
.
tf.SparseTensor.indices
Explanation: The function of this function is to extract the non-zero value index in the dense matrix.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices=[[4, 1], [1, 2]], values=[1, 2], shape=[3, 4])
b = a.indices
sess = tf.Session()
print sess.run(a)
print sess.run(b)
sess.close()
Output parameters:
* A two-dimensional tensor, data type is int64
, data dimension is [N, ndims]
. Among them, N
represents the number of non-zero values in the sparse tensor, and ndims
represents the rank of the sparse tensor.
tf.SparseTensor.values
Explanation: The function of this function is to take out the non-zero values in the dense matrix.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices=[[4, 1], [1, 2]], values=[1, 2], shape=[3, 4])
b = a.values
sess = tf.Session()
print sess.run(a)
print sess.run(b)
sess.close()
Output parameters:
* A one-dimensional tensor, the data type is arbitrary.
tf.SparseTensor.dtype
Explanation: The purpose of this function is to return the type of the elements in the tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices=[[4, 1], [1, 2]], values=tf.constant([1, 2]), shape=[3, 4])
b = a.dtype
sess = tf.Session()
print b
sess.close()
Output parameters:
- Returns the type of the elements in the tensor.
tf.SparseTensor.shape
Explanation: The purpose of this function is to return the dimension of the sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices=[[4, 1], [1, 2]], values=tf.constant([1, 2]), shape=[3, 4])
b = a.shape
sess = tf.Session()
print sess.run(b)
sess.close()
Output parameters:
- Returns the dimension of a sparse tensor.
tf.SparseTensor.graph
Explanation: The purpose of this function is to return a graph containing this sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices=[[4, 1], [1, 2]], values=tf.constant([1, 2]), shape=[3, 4])
b = a.graph
sess = tf.Session()
print b
sess.close()
Output parameters:
- Return a graph containing this sparse tensor.
class tf.SparseTensorValue
Explanation: The purpose of this function is to view the value of the set sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensorValue(indices=[[4, 1], [1, 2]], values=tf.constant([1, 2]), shape=[3, 4])
sess = tf.Session()
print a
print a[0]
print a[1]
print a[2]
sess.close()
tf.SparseTensorValue.indices
Explanation: The purpose of this function is to return the existing position of the value in the sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensorValue(indices=[[4, 1], [1, 2]], values=tf.constant([1, 2]), shape=[3, 4])
sess = tf.Session()
print a.indices
sess.close()
Output parameters:
- Returns where values exist in a sparse tensor.
tf.SparseTensorValue.shape
Explanation: The purpose of this function is to return the dimension of the sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensorValue(values=tf.constant([1, 2]), indices=[[4, 1], [1, 2]], shape=[3, 4])
sess = tf.Session()
print a.shape
sess.close()
Output parameters:
- Returns the dimension of a sparse tensor.
tf.SparseTensorValue.shape
Explanation: The purpose of this function is to return the elements in the sparse tensor.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensorValue(values=tf.constant([1, 2]), indices=[[4, 1], [1, 2]], shape=[3, 4])
sess = tf.Session()
print sess.run(a.values) # 这是一个张量,所以用sess.run()
sess.close()
Output parameters:
- Returns the elements in a sparse tensor.
Converting sparse tensors to dense tensors
TensorFlow provides conversion operations between sparse and dense tensors.
tf.sparse_to_dense(sparse_indices, output_shape, sparse_values, default_value, name=None)
Explanation: The role of this function is to convert a sparse representation into a dense tensor. Specifically, the sparse tensor is sparse
converted into a dense tensor dense
as follows:
# If sparse_indices is scalar
dense[i] = (i == sparse_indices ? sparse_values : default_value)
# If sparse_indices is a vector, then for each i
dense[sparse_indices[i]] = sparse_values[i]
# If sparse_indices is an n by d matrix, then for each i in [0, n)
dense[sparse_indices[i][0], ..., sparse_indices[i][d-1]] = sparse_values[i]
By default, dense
the padding value default_value
in is 0
, unless the value is set to a scalar.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.sparse_to_dense(sparse_indices = [[1,2],[2,1]], output_shape = [3,3],
sparse_values = [2,3], default_value = 1)
sess = tf.Session()
print sess.run(a)
sess.close()
Input parameters:
sparse_indices
: ATensor
, the data type must beint32
orint64
. The data dimension is 0-dimensional, one-dimensional or two-dimensional, or higher latitudesparse_indices[i]
.output_shape
: ATensor
, the data type must besparse_indices
the same as . The data dimension is one-dimensional and represents the dimension of the output dense tensor.sparse_values
: OneTensor
, the data dimension is one-dimensional, and each element in it correspondssparse_indices
to the value of the middle coordinate.default_value
: oneTensor
, the data type must besparse_values
the same as, the data dimension is a scalar. Sets a sparse index to an unspecified value.name
: (optional) give a name to this operation.
Output parameters:
- A
Tensor
, the data type andsparse_values
the same. The data dimension of a dense tensor isoutput_shape
.
tf.sparse_tensor_to_dense(sp_input, default_value, name=None)
Explanation: The purpose of this function is to convert a sparse tensor SparseTensor
into a dense tensor.
This operation is a convenient way to convert sparse tensors into dense tensors.
For example, sp_input
the dimension of the data is [3, 5]
, the non-null value is:
[0, 1]: a
[0, 3]: b
[2, 0]: c
default_value
The value is x
, then the dimension of the output dense tensor is [3, 5]
, and the specific display form is as follows:
[[x a x b x]
[x x x x x]
[c x x x x]]
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[0, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[3, 5])
b = tf.sparse_tensor_to_dense(a, default_value = 11)
sess = tf.Session()
print sess.run(b)
sess.close()
Input parameters:
sp_input
: oneSparseTensor
.default_value
: The data dimension is a scalar, setting the sparse index to an unspecified value.name
: (optional) sets the prefix for the returned tensor names.
Output parameters:
- A dense tensor, the data dimension is
sp_input.shape
the value specified in the dense tensor, andsp_input
the value without index is thedefault_value
value.
abnormal:
类型错误
: Ifsp_input
not oneSparseTensor
, an error will be reported.
tf.sparse_to_indicator(sp_input, vocab_size, name=None)
Explanation: The role of this function is to convert SparseTensor
the coordinates of the sparse tensor to boolean coordinates in the dense tensor.
sp_input
The last dimension in is discarded and replaced with sp_input
the value in that bit, if sp_input.shape = [D0, D1, D2, ..., Dn, K]
where K
is the last dimension, then output.shape = [D0, D1, D2, ..., Dn, vocab_size]
where:
output[d_0, d_1, ..., d_n, sp_input[d_0, d_1, ..., d_n, k]] = True
output
The rest of the values are False
.
For example, sp_input.shape = [2, 3, 4]
, the non-null values are as follows:
[0, 0, 0]: 0
[0, 1, 0]: 10
[1, 0, 3]: 103
[1, 1, 2]: 112
[1, 1, 3]: 113
[1, 2, 1]: 121
and vocab_size = 200
, then output output.shape = [2, 3, 200]
, and output
the values in are all False
, except for the following locations:
(0, 0, 0), (0, 1, 10), (1, 0, 103), (1, 1, 112), (1, 1, 113), (1, 2, 121).
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[0, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[3, 5])
b = tf.sparse_to_indicator(a, 10)
sess = tf.Session()
print sess.run(b)
sess.close()
Input parameters:
sp_input
: ASparseTensor
, the data type isint32
orint64
.vocab_size
:sp_Input
The new dimension of the last dimension, and0 <= sp_input.shape > vocab_size
.name
: (optional) sets the prefix for the returned tensor names.
Output parameters:
- A modified dense boolean tensor.
abnormal:
类型错误
: Ifsp_input
not oneSparseTensor
, an error will be reported.
Operations on sparse tensors
TensorFlow provides some operations on sparse tensors.
tf.sparse_concat(concat_dim, sp_inputs, name=None)
Explanation: The function of this function is to combine a series SparseTensor
according to the specified dimension.
The specific merging idea is to first regard the sparse tensor as a dense tensor, then merge the tensors according to the specified dimension, and finally regard the merged dense tensor as a sparse tensor.
In the input data, SparseTensor
the dimension of the data must be the same indices
, values
and shapes
the length of the sum must be the same.
The dimensions of the output data will be determined by the dimensions of the input data, except for the dimension that needs to be merged, which is the sum of all data in that dimension.
The elements in the output tensor will be restored to the sparse tensor, sorted in the original order.
The time complexity of this operation is O(M log M)
, where M
is the sum of all non-null elements in the input data.
For example, concat_dim = 1
at the time :
sp_inputs[0]: shape = [2, 3]
[0, 2]: "a"
[1, 0]: "b"
[1, 1]: "c"
sp_inputs[1]: shape = [2, 4]
[0, 1]: "d"
[0, 2]: "e"
Then the output data is:
shape = [2, 7]
[0, 2]: "a"
[0, 4]: "d"
[0, 5]: "e"
[1, 0]: "b"
[1, 1]: "c"
Graphically represented as follows:
[ a] concat [ d e ] = [ a d e ]
[b c ] [ ] [b c ]
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[0, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[3, 5])
aa = tf.SparseTensor(indices = [[1, 1], [1, 3], [2, 1]], values=[11,12,13], shape=[3, 5])
b = tf.sparse_concat(0, [a, aa])
sess = tf.Session()
print sess.run(b)
print sess.run(tf.sparse_tensor_to_dense(b))
sess.close()
Input parameters:
concat_dim
: The dimension to be merged.sp_inputs
: ASparseTensor
list to merge.name
: (optional) sets the prefix for the returned tensor names.
Output parameters:
- a merged one
SparseTensor
.
abnormal:
类型错误
: ifsp_inputs
not aSparseTensor
list.
tf.sparse_reorder(sp_input, name=None)
Explanation: The function of this function is SparseTensor
to rearrange the elements in and sort them according to the index from small to large.
SparseTensor
Dimensions that are not affected by rearranging .
For example, if sp_input
the dimensions of [4, 5]
/ indices
are values
as follows:
[0, 3]: b
[0, 1]: a
[3, 1]: d
[2, 0]: c
Then SparseTensor
the dimension of the output is still [4, 5]
, indices
/ values
as follows:
[0, 1]: a
[0, 3]: b
[2, 0]: c
[3, 1]: d
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[2, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[3, 5])
b = tf.sparse_reorder(a)
sess = tf.Session()
print sess.run(b)
sess.close()
Input parameters:
sp_input
: oneSparseTensor
.name
: (optional) sets the prefix for the returned tensor names.
Output parameters:
- One
SparseTensor
, the data dimension and data type are unchanged, only the values in it are sorted in an orderly manner.
abnormal:
类型错误
: ifsp_input
not oneSparseTensor
.
tf.sparse_retain(sp_input, to_retain, name=None)
Explanation: The purpose of this function is to retain SparseTensor
the non-null elements specified in .
For example, if sp_input
the data dimension is [4, 5]
, and has 4 non-null values as follows:
[0, 1]: a
[0, 3]: b
[2, 0]: c
[3, 1]: d
Moreover , then the data dimension of to_retain = [True, False, False, True]
the final output data is , and keep two non-null values as follows:SparseTensor
[4, 5]
[0, 1]: a
[3, 1]: d
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[2, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[3, 5])
b = tf.sparse_retain(a, [False, False, True])
sess = tf.Session()
print sess.run(b)
sess.close()
Input parameters:
sp_input
: oneSparseTensor
, containingN
non-empty elements.to_retain
: a boolean vector of lengthN
and containing values.M
True
Output parameters:
- One
SparseTensor
, the data dimension is the same as the input data, which containsM
a non-null value, and the position of the value is determined according toTrue
the position.
abnormal:
类型错误
: ifsp_input
not oneSparseTensor
.
tf.sparse_fill_empty_rows(sp_input, default_value, name=None)
Explanation: The function of this function is SparseTensor
to fill the two-dimensional, empty rows with the value of the specified element.
If there is no element in a row, then [row, 0]
fill in the coordinates of the new row default_value
.
For example, we assume that sp_input
the data dimension is [5, 6]
, and the non-null values are as follows:
[0, 1]: a
[0, 3]: b
[2, 0]: c
[3, 1]: d
Because in a sparse tensor, there are no values in the first and fourth rows, then we need to fill in the [1, 0]
and [4, 0]
coordinates default_value
, as follows:
[0, 1]: a
[0, 3]: b
[1, 0]: default_value
[2, 0]: c
[3, 1]: d
[4, 0]: default_value
Note that the input may be listed last, but it has no effect on this operation.
The output SparseTensor
will be sorted in ascending order, and the output data and input data have the same data dimensions.
This operation also returns a boolean vector, in which the boolean value, if it is a True
value, means that one is added to the row default_value
, and the calculation formula is as follows:
empty_row_indicator[i] = True iff row i was an empty row.
Example of use:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
a = tf.SparseTensor(indices = [[2, 1], [0, 3], [2, 0]], values=[1,2,3], shape=[6, 5])
b, bb = tf.sparse_fill_empty_rows(a, 10)
sess = tf.Session()
print sess.run(b)
print '----'
print sess.run(bb)
sess.close()
Input parameters:
sp_input
: OneSparseTensor
, the data dimension is[N, M]
.default_value
: The value that needs to be filled into the empty row, the data type is thesp_input
same as.name
: (optional) sets the prefix for the returned tensor names.
Output parameters:
sp_ordered_output
: oneSparseTensor
, the data dimension is[N, M]
, and all empty rows in it are filleddefault_value
.empty_row_indicator
: A vector of boolean type, the data length isN
, if the row is paddeddefault_value
, then the boolean value of the positionTrue
.
abnormal:
类型错误
: ifsp_input
not oneSparseTensor
.