Solr的Schema.xml配置文件内容详解

<?xml version="1.0" encoding="UTF-8" ?>

这是solr的chema 文件，这个文件应该被重命名为"schema.xml",而且他应该放在solrhome/core/conf文件下面。
获取你也能在solr webapp 的classload下面找到他.

性能说明:可以如下来提高性能。

设置 stored=“false” 对那些只需要搜索，无需返回的字段.
设置 indexed=“false” 对于那些只用于返回无需进行搜索的字段.
删除所有不需要 copyfiled字段的声明
为了最好的索引大小与索引性能，设置所有一般的文本字段index=false,使用copyfile将他们copy到一个字段上，然后使用它进行搜索。

运行jvm服务器模式，并使用较高的日志级别，避免记录每一个请求。
–>

<!--字段名称应该包含字母数字或下划线字符，不以一个数字开始。这是目前没有严格执行，但其他字段名称将不会有来自所有组件的第一类支持和背部的兼容性没有保证。领导和的名字下划线（如_version_）保留。-->
<!--

在这data_driven_schema_configs configset，下面三个字段是必须的：
id、version，和_text_。所有其他字段都是可以删除修改的，并根据需要手动添加
在xml。
请注意，许多动态字段也被定义-您可以使用它们来指定一个
字段的类型通过字段命名约定-见下文。
警告：本_text_catch所字段将会显著地提高索引的大小。
如果你不需要，考虑删除它和相应的copyfield指令。–>

<field name="id" type="long" indexed="true" stored="true" required="true"/>
<!-- 常规字段->
<field name="informer_id" type="long" indexed="true" stored="false"/>
<field name="phone_number" type="string" indexed="true" stored="false"/>

<field name="title" type="string" indexed="true" stored="true" />
<field name="content" type="string" indexed="true" stored="true" />
<field name="latitude" type="string" indexed="true" stored="true" />
<field name="longitude" type="string" indexed="true" stored="true" />
<field name="attachment" type="string" indexed="true" stored="true" />

<field name="clue_status" type="int" indexed="true" stored="true" />
<field name="del_flag" type="int" indexed="true" stored="true" />
<field name="gmt_create" type="date" indexed="true" stored="true" />
<field name="create_uid" type="long" indexed="true" stored="true" />
<field name="gmt_modified" type="date" indexed="true" stored="true" />
<field name="modified_uid" type="long" indexed="true" stored="true" />
<!--预留字段 -->
<!--<field name="id" type="string" indexed="true" stored="true"  multiValued="false" />-->
<field name="_version_" type="long" indexed="true" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>

<!--复制字段-->
<!--建议建立一个拷贝字段，将所有的 全文本 字段复制到一个字段中，以便进行统一的检索
    要注意的是，如果你只是复制单个域，那么如果你被复制域本身就是多值域，那么目标域也是多值域，这毋庸置疑，那如果你复制的是多个域，只要其中有一个域是多值域，那么目标域就一定是多值域，这点一定要谨记
-->
<copyField source="*" dest="_text_"/>

<!--动态字段-->
<!-- 动态字段 属性配置上与常规字段没啥区别，最大的区别是name的属性上可以进行通配，比如说name="*_i"，那么只要是后面带i的字段都是符合的。这样就不怕一些字段无法匹配无法写入  -->

<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
<dynamicField name="*_is" type="ints"    indexed="true"  stored="true"/>
<dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />
<dynamicField name="*_ss" type="strings"  indexed="true"  stored="true"/>
<dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
<dynamicField name="*_ls" type="longs"   indexed="true"  stored="true"/>
<dynamicField name="*_t"   type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_b"  type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/>
<dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
<dynamicField name="*_fs" type="floats"  indexed="true"  stored="true"/>
<dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
<dynamicField name="*_ds" type="doubles" indexed="true"  stored="true"/>

<!-- 字段类型 -->
<!--

StrField: 这是一个不分词的字符串域，它支持 docValues 域，但当为其添加了docValues 域，则要求只能是单值域且该域必须存在或者该域有默认值
BoolField ： boolean 域，对应 true/false
TrieIntField, TrieFloatField, TrieLongField, TrieDoubleField 这几个都是默认的数字域， precisionStep 属性一般用于数字范围查询， precisionStep 值越小，则索引时该域的域值分出的 token 个数越多，会增大硬盘上索引的体积，但它会加快数字范围检索的响应速度， positionIncrementGap 属性表示如果当前域是多值域时，多个值之间的间距，单值域，设置此项无意义。
TrieDateField ：显然这是一个日期域类型，不过遗憾的是它支持 1995-12-31T23:59:59Z 这种格式的日期，比较坑爹，为此我自定义了一个 TrieCNDateField 域类型，用于支持国人比较喜欢的 yyyy-MM-dd HH:mm:ss 格式的日期。源码请参见我的上一篇博客。
BinaryField ：经过 base64 编码的字符串域类型，即你需要把 binary 数据进行base64 编码才能被 solr 进行索引。
RandomSortField ：随机排序域类型，当你需要实现伪随机排序时，请使用此域类型。
TextField ：是用的最多的一种域类型，它需要进行分词，所以它一般需要配置分词器。至于具体它如何配置 IK 分词器，这里就不展开了–>

<!--
  默认数值类型，用于范围类的查找, consider the tint/tfloat/tlong/tdouble types.
  这些字段支持文档的值，但应该是单值字段.
-->
<fieldType name="int" class="solr.TrieIntField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" docValues="true" precisionStep="0" positionIncrementGap="0"/>

<fieldType name="ints" class="solr.TrieIntField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/>
<fieldType name="floats" class="solr.TrieFloatField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/>
<fieldType name="longs" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/>
<fieldType name="doubles" class="solr.TrieDoubleField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/>

<!--
 各个精度值
-->
<fieldType name="tint" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0"/>

<fieldType name="tints" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
<fieldType name="tfloats" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
<fieldType name="tlongs" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
<fieldType name="tdoubles" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>

<!-- 日期格式
     Note: -->
<fieldType name="date" class="solr.TrieDateField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="dates" class="solr.TrieDateField" docValues="true" precisionStep="0" positionIncrementGap="0" multiValued="true"/>

<!-- 一种基于树结构的日期字段，日期范围查询与数据分类-->
<fieldType name="tdate" class="solr.TrieDateField" docValues="true" precisionStep="6" positionIncrementGap="0"/>

<fieldType name="tdates" class="solr.TrieDateField" docValues="true" precisionStep="6" positionIncrementGap="0" multiValued="true"/>


-->
<fieldType name="binary" class="solr.BinaryField"/>

<!-- The "RandomSortField" is not used to store or search any
     data.  You can declare fields of this type it in your schema
     to generate pseudo-random orderings of your docs for sorting 
     or function purposes.  The ordering is generated based on the field
     name and the version of the index. As long as the index version
     remains unchanged, and the same field name is reused,
     the ordering of the docs will be consistent.  
     If you want different psuedo-random orderings of documents,
     for the same version of the index, use a dynamicField and
     change the field name in the request.
 -->
<fieldType name="random" class="solr.RandomSortField" indexed="true" />

<!-- solr.TextField allows the specification of custom text analyzers
     specified as a tokenizer and a list of token filters. Different
     analyzers may be specified for indexing and querying.

     The optional positionIncrementGap puts space between multiple fields of
     this type on the same document, with the purpose of preventing false phrase
     matching across fields.

     For more info on customizing your analyzer chain, please see
     http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 -->

<!-- One can also specify an existing Analyzer class that has a
     default constructor via the class attribute on the analyzer element.
     Example:
<fieldType name="text_greek" class="solr.TextField">
  <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
</fieldType>
-->

<!-- A text field that only splits on whitespace for exact matching of words -->
<dynamicField name="*_ws" type="text_ws"  indexed="true"  stored="true"/>
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldType>

<!-- A general text field that has reasonable, generic
     cross-language defaults: it tokenizes with StandardTokenizer,
       removes stop words from case-insensitive "stopwords.txt"
       (empty by default), and down cases.  At query time only, it
       also applies synonyms.
  -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<!-- A text field with defaults appropriate for English: it
     tokenizes with StandardTokenizer, removes English stop words
     (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
     finally applies Porter's stemming.  The query time analyzer
     also applies synonyms from synonyms.txt. -->
<dynamicField name="*_txt_en" type="text_en"  indexed="true"  stored="true"/>
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
        />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
      -->
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
    />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
      -->
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<!-- A text field with defaults appropriate for English, plus
     aggressive word-splitting and autophrase features enabled.
     This field is just like text_en, except it adds
     WordDelimiterFilter to enable splitting and matching of
     words on case-change, alpha numeric boundaries, and
     non-alphanumeric chars.  This means certain compound word
     cases will work, for example query "wi fi" will match
     document "WiFi" or "wi-fi".
-->
<dynamicField name="*_txt_en_split" type="text_en_splitting"  indexed="true"  stored="true"/>
<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
    />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
    />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
     but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
<dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight"  indexed="true"  stored="true"/>
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
         possible with WordDelimiterFilter in conjuncton with stemming. -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<!-- Just like text_general except it reverses the characters of
       each token, to enable more efficient leading wildcard queries.
-->
<dynamicField name="*_txt_rev" type="text_general_rev"  indexed="true"  stored="true"/>
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
            maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<dynamicField name="*_phon_en" type="phonetic_en"  indexed="true"  stored="true"/>
<fieldType name="phonetic_en" stored="false" indexed="true" class="solr.TextField" >
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
  </analyzer>
</fieldType>

<!-- lowercases the entire field value, keeping it as a single token.  -->
<dynamicField name="*_s_lower" type="lowercase"  indexed="true"  stored="true"/>
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

<!-- 
  Example of using PathHierarchyTokenizerFactory at index time, so
  queries for paths match documents at that path, or in descendent paths
-->
<dynamicField name="*_descendent_path" type="descendent_path"  indexed="true"  stored="true"/>
<fieldType name="descendent_path" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory" />
  </analyzer>
</fieldType>

<!--
  Example of using PathHierarchyTokenizerFactory at query time, so
  queries for paths match documents at that path, or in ancestor paths
-->
<dynamicField name="*_ancestor_path" type="ancestor_path"  indexed="true"  stored="true"/>
<fieldType name="ancestor_path" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
  </analyzer>
</fieldType>

<!-- since fields of this type are by default not stored or indexed,
     any data added to them will be ignored outright.  --> 
<fieldType name="ignored" stored="false" indexed="false" docValues="false" multiValued="true" class="solr.StrField" />

<!-- This point type indexes the coordinates as separate fields (subFields)
  If subFieldType is defined, it references a type, and a dynamic field
  definition is created matching *___<typename>.  Alternately, if 
  subFieldSuffix is defined, that is used to create the subFields.
  Example: if subFieldType="double", then the coordinates would be
    indexed in fields myloc_0___double,myloc_1___double.
  Example: if subFieldSuffix="_d" then the coordinates would be indexed
    in fields myloc_0_d,myloc_1_d
  The subFields are an implementation detail of the fieldType, and end
  users normally should not need to know about them.
 -->
<dynamicField name="*_point" type="point"  indexed="true"  stored="true"/>
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

<!-- An alternative geospatial field type new to Solr 4.  It supports multiValued and polygon shapes.
  For more information about this and other Spatial fields new to Solr 4, see:
  http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
-->
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
           geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />

<!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType
    Parameters:
      defaultCurrency: Specifies the default currency if none specified. Defaults to "USD"
      precisionStep:   Specifies the precisionStep for the TrieLong field used for the amount
      providerClass:   Lets you plug in other exchange provider backend:
                       solr.FileExchangeRateProvider is the default and takes one parameter:
                         currencyConfig: name of an xml file holding exchange rates
                       solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org:
                         ratesFileLocation: URL or path to rates JSON file (default latest.json on the web)
                         refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60)
-->
<fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />
       
<!-- some examples for different languages (generally ordered by ISO code) -->
<!-- Armenian -->
<dynamicField name="*_txt_hy" type="text_hy"  indexed="true"  stored="true"/>
<fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
  <analyzer> 
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" />
    <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
  </analyzer>
</fieldType>

Solr的Schema.xml配置文件内容详解

猜你喜欢