After the multi-column feature values are combined through VectorAssembler, due to the spark storage format, a row with many zero values will be converted into a sparse vector sparseVector
for storage. However, in the subsequent calculation process, what we need is a dense vector, so we need to convert the sparse vector to a dense vector.
1. First use VectorAssembler to convert the required columns into vector columns
. 2. After converting to RDD, use map operation to convert the elements in the feature column to DenseVector and
insert the picture description here.