Solr DocValues

When sorting and faceting, it is very efficient to set the DocValues ​​to save the field value of the record.

Solr builds an index by inverting, first building a term list, and then each term corresponds to a document list. so

 

Because the traditional Solr builds the Index through the inverted index, first builds the term list, and then each term corresponds to a document list. This structure makes the query very fast, because the terms have already prepared term- to-documentList.

 

For other of our searches, such as sorting, faceting, and highlighting, the inverted index is not very effective. For example, for faceting, the term in each document must be assembled into a result set, and then the document IDs must be obtained. All these operations are in memory, when the amount of data is large, it will slow down the speed, and will Takes up valuable memory resources.

 

After Lucene 4.0, DocValue is referenced in column-oriented storage. This is created at index time using document-to-value mapping. Reduce the memory requirement of fieldCache, make faceting, sorting, grouping faster.

 

  1. Start DocValues
Just add docValues="true" to field type. For example: schema.xml

<field name="manu_exact" type="string" indexed="false" stored="false" docValues="true" />

If the index has been built before adding DocValues, and now DocValues ​​are needed, it needs to be re-created.

 

DocValues ​​only work in specific field types, and different filed types determine different Lucene docValue types. Available Solr field types are:

  • StrField and UUIDField.
  •  field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type.
  •  field is multi-valued, Lucene will use the SORTED_SET type.
  • Numeric fields, date fields and EnumField starting with Trie.
  • field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type.
  •  If the field is multi-valued, Lucene will use the SORTED_SET type.
 For multi-valued DocValues, there are two implementations stored as SORTED_SET
  1. The returned value is not the order of the original input, but is returned after sorting
  2. When there are multiple identical values, only one is returned

 

2. Retrieving fields with DocValues ​​set to stored="true" will be returned when searching. However, useDocValuesAsStored can control whether Field values ​​can be returned, in other words, if useDocValuesAsStored="true", then docValues ​​Fields that are not set to stored="true" will also be returned.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327073857&siteId=291194637