spark-mllib dense sparse vector and vector

MLlib local support vectors and matrices are stored in a single server, also supports distributed matrix stored in one or a plurality of rdd.
Local partial matrix and vector data is the simplest model used as a common interface. The basic linear algebra operations provided by Breeze.
Training examples used in supervised learning called "markers" in MLlib in.

Thus, vector and matrix, the marker is a spark-mllib underlying data model, is learning the basics of sparl-mllib.

Local vector

A plurality of local vector having a value of an index based on the integer and double types of integer type is stored on a single machine zero. MLlib supports two class
types of local vectors: dense (dense) vector and sparse (sparse) vector.
A dense vector based on a double array to represent its physical value, but a sparse vector based on two parallel arrays: indexed array and an array of values.

For example, a vector (1.0,0.0,3.0) can be represented by a dense format is [1.0,0.0,3.0];
sparse format as (3, [0,2], [1.0, 3.0]), where 3 is the vector size, the subscript index vector is 0,1,2, index vector element 0 is 1.0, the index vector element 0 is 3.0, and the index vector element value 1 is the default value 0.0.

It can be seen, the value sparse vector initialization defaults and does not include the value does not exist, part of the space can be saved, the data set can be made small; and dense is the vector value of each element of the vector are initialized, i.e., It is the subject of an index value does not exist, with default values ​​in place, so that the benefits are clear, but the dataset is relatively large.

Basic local implementation class vector are org.apache.spark.mllib.linalg.Vector, spark provided achieved in 2: DenseVector and SparseVector. spark officially recommended to use the factory method org.apache.spark.mllib.linalg.Vectors class to create a local vector.
Reference Vector.scala and Vectors.scala the docs API documentation for detailed description.

Now I use mllib api spark on the definition of a dense vector:

import org.apache.spark.mllib.linalg.{Vector, Vectors}

// Create a dense vector (1.0, 0.0, 3.0).
//创建密集向量
val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. //根据索引数组和值数据组创建稀疏向量 val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. //直接根据实体创建稀疏向量 val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0)))

Note: scala default reference scala.collection.immutable.Vector, so references to display org.apache.spark.mllib.linalg.Vector, be sure to use the Vector MLlib

MLlib local support vectors and matrices are stored in a single server, also supports distributed matrix stored in one or a plurality of rdd.
Local partial matrix and vector data is the simplest model used as a common interface. The basic linear algebra operations provided by Breeze.
Training examples used in supervised learning called "markers" in MLlib in.

Thus, vector and matrix, the marker is a spark-mllib underlying data model, is learning the basics of sparl-mllib.

Local vector

A plurality of local vector having a value of an index based on the integer and double types of integer type is stored on a single machine zero. MLlib supports two class
types of local vectors: dense (dense) vector and sparse (sparse) vector.
A dense vector based on a double array to represent its physical value, but a sparse vector based on two parallel arrays: indexed array and an array of values.

For example, a vector (1.0,0.0,3.0) can be represented by a dense format is [1.0,0.0,3.0];
sparse format as (3, [0,2], [1.0, 3.0]), where 3 is the vector size, the subscript index vector is 0,1,2, index vector element 0 is 1.0, the index vector element 0 is 3.0, and the index vector element value 1 is the default value 0.0.

It can be seen, the value sparse vector initialization defaults and does not include the value does not exist, part of the space can be saved, the data set can be made small; and dense is the vector value of each element of the vector are initialized, i.e., It is the subject of an index value does not exist, with default values ​​in place, so that the benefits are clear, but the dataset is relatively large.

Basic local implementation class vector are org.apache.spark.mllib.linalg.Vector, spark provided achieved in 2: DenseVector and SparseVector. spark officially recommended to use the factory method org.apache.spark.mllib.linalg.Vectors class to create a local vector.
Reference Vector.scala and Vectors.scala the docs API documentation for detailed description.

Now I use mllib api spark on the definition of a dense vector:

import org.apache.spark.mllib.linalg.{Vector, Vectors}

// Create a dense vector (1.0, 0.0, 3.0).
//创建密集向量
val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. //根据索引数组和值数据组创建稀疏向量 val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. //直接根据实体创建稀疏向量 val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0)))

Note: scala default reference scala.collection.immutable.Vector, so references to display org.apache.spark.mllib.linalg.Vector, be sure to use the Vector MLlib

Guess you like

Origin www.cnblogs.com/liuys635/p/12209935.html