Solr schema written guidance

  1. uniquekey effect: When the Add doc when arranged uniqueKey, doc uniqueKey same before the latter will overwrite this doc,

If no, it will not be covered. update the time, according to information uniqueryKey words,

So there are update is recommended that accompanied uniqueKey, to a more complete configuration information, and secondly, the data is also easy to troubleshoot.

Corresponding to the id field, must be stored = true, indexed = true, the type is recommended to use or long int, not string

If you have special needs by id sort of scene, use TriedLongField, otherwise it will be sorted by time sorting order of text.

  1. defaultSearchField, named Incredibles is the default query time, check what domain. Phrase usage is reflected in:

queryStr = content: abc 123, equivalent queryStr = content: abc title: 123

queryStr = 123 equivalents queryStr = title: 123

That and other queries when there is no designated check that domain, the default on the corresponding defaultSearchField.

Since it is defaultSearchField, then the field is a must indexed = true

Distinction content: abc 123 and content: "abc 123" and content: (abc 123)

  1. 所有int sint tint long slong tlong float sfloat tfloat double sdouble tdouble

Do not support word, and without word. No sshort tshort, only short. Because these basic types on a value without any word.

On the int long fload Field, should not appear positionincrmentGap = 100

For tlong and tint tdoube have pricisionStep, positionIncrementGap, sortMissingLast = "true" attribute.

  1. Configuration word

All TextField chance participle

All TextField have a chance to perform facet

All omitTermFreqAndPositions TextField configuration = "true" only from the effect, not the sort of location information, the

  1. omitNorms = "true" parameters that affect the score of this domain, and then removed, so that the same word score representing the length of the field. According to Shannon theory,

The more a word appears in a longer text, or occurrences, less the value of information. If the corresponding omitNorms = false, then

When the following appears doc1 Taobao Taobao Taobao Hangzhou Hangzhou appear doc2 Network Limited, hit Taobao, doc1 score higher than doc2

Note: Only omitNorms a domain = "false", equivalent to all the fields are retained omitNorms this position, although omitNorms content is empty,

All, omitNorms in all domains omitNorms = "true" when the index fishes helpful.

  1. required=”true”

This property is said that once scham in the field to enable required = "true", then the time to build the index, when this field can not be empty, and the doc think finish

whole. Go dump the current center will null assignment to "", it will not be without value. But the schema or should lower projections, if logically

There is a need to ensure that certain fields must have

  1. multiValued=”true”

This configuration is not to say that a single or multiple domains term significance. Even mulitValued = false, a text field and it can be very long

A piece of text, which is the case of many term. multiValued = "true" really mean: incoming doc indexing time, when a domain

Is mulitValued = "true", you can continue to add content to the field. In one equivalent of doc, key same domain: value can

More. Under normal circumstances, the use of map, key it unique, and will not appear many same key, the situation of different value.

In addition, the configuration of mulitValued = "true", returns in the hit document, the returns list, rather than a single object.

In the end of the current search indexing focused, and do not deserve with this multiValued has no effect influence dump process, just in time to hit return

It returns a list or a single object.

In-depth tips, multiValued = "true" when building the index, in fact, opened up a new field that allows multiple occurrences of the same domain name.

When the query will query all of the same domain name, which leads to the retrieval performance will have some impact, especially after the expansion of the domain name, the impact is obvious.

8. A special word

Corresponding to propose #;: decile word, in fact, recommended unified converted to spaces divided word, which is the word of the native system, and is the word layer based compiler, and better performance.

There is no need for a #, a re-customization code to deploy.

  1. Sort, range, general inquiries

Sorting should be numeric type, it is recommended to use trie type, also supported the old sortable

Range should be numeric type, it is recommended to use tried Type

General queries multiple number combinations, it is recommended that the digital character of, and separate spaces, currently does not support an array of digital type

  1. and other types of date tdate

When other types of configuration data tdate, note time format.

Also, we do not recommend direct saving, but saving the difference of type int and so on.

Due to the different data precision control, such that the data fields used Term increases linearly, it is quite probably matter.

The long tail will consume very large memory space resources.

Linear term in the index growth is very terrible thing. Handling of the Long Tail is currently no special optimization.

General term growth of clustering in the index is quite terrible things, of the long-chain process and there are no special optimization.
Zhengzhou Women's Hospital: http://jbk.39.net/yiyuanzaixian/sysdfkyy/

11. Senior living

Self-check schema quality.

When the schema configuration is completed, you can take terminatorquickstart test, then luke tool to view the next index structure.

This may find some problems. May be a lot of places you can try to optimize the structure.

Guess you like

Origin blog.51cto.com/14333512/2404878