elasticsearch 的mapping定义

elasticsearch 的mapping

例子一:
订单号如：ATTS000928732 类型不分词。 index: not_analyzed
订单号是全部数据如： 63745345637 这样的分词是可以的。

PUT /Order_v5
{
  "settings": {
     //设置10个分片，理解为类似数据库中的表分区中一个个分区的概念，不知道是否妥当
     "number_of_shards": 10
  }, 
  "mappings": {
    "trades": {
      "_id": {
        "path": "id"
      },
      "properties": {
        "id": {
         "type": "integer",
        //id：自增数字
        //要求：查询
         "store" : true
        },
        "name": {
        //名称：佳洁士,强生婴儿沐浴露，100w Led节能灯，户外多功能折叠椅等    
        //要求：抓住关键字，如:佳洁士+牙膏 or 牙刷；<span></span> 强生+沐浴露; led+节能+100W; 户外+折叠椅等
        //结论： 如果分词，就意味着产品品牌名词可能被拆分，如 "佳洁士", 如果不分词就意味着对用户输入要求匹配度高。先默认分词，试试看看。
         "type": "string"
        },
        "brand": { //品牌： PG，P&G,宝洁集团，宝洁股份，联想集团，联想电脑等 
          "type": "string"
        },
        "orderNo": { //订单号 ：如ATTS000928732
          "type": "string",
          "index":  "not_analyzed"
        },
        "description": {
            //描述： 2015款玫瑰香型强生婴儿沐浴露，550ml，包邮
            //搜索： 要求高亮所以设置store:true. 关键词权重：沐浴露 -> {强生+沐浴露 or 玫瑰花 + 沐浴露 or 550ml + 沐浴露 or 沐浴露 + 包邮-> 
{2015年 + 玫瑰香...}}
            //设置：必须分词，而且要控制好
              "type": "string"， <span></span>              
               "sort": true
},
        "date": {
          "type": "date"
        },
        "city": {
          "type": "string"
        },
        "qty": { <span></span>              // index无效
            "type": "float"
        },
        "price": {
              //价格： float index无效
             "type": "float"
        }
      }
    }
  }
}

例子二
定义mapping
在添加索引的mapping时就可以这样定义分词器
{
   "page":{
      "properties":{
         "title":{
            "type":"string",
            "indexAnalyzer":"ik",
            "searchAnalyzer":"ik"
         },
         "content":{
            "type":"string",
            "indexAnalyzer":"ik",
            "searchAnalyzer":"ik"
         }
      }
   }
}
indexAnalyzer为索引时使用的分词器，searchAnalyzer为搜索时使用的分词器。

java mapping代码如下：

XContentBuilder content = XContentFactory.jsonBuilder().startObject()
        .startObject("page")
          .startObject("properties")
            .startObject("title")
              .field("type", "string")
              .field("indexAnalyzer", "ik")
              .field("searchAnalyzer", "ik")
            .endObject()
            .startObject("code")
              .field("type", "string")
              .field("indexAnalyzer", "ik")
              .field("searchAnalyzer", "ik")
            .endObject()
          .endObject()
         .endObject()
       .endObject()

测试分词可用调用下面api，注意indexname为索引名，随便指定一个索引就行了
http://localhost:9200/indexname/_analyze?analyzer=ik&text=测试elasticsearch分词器

elasticsearch中的mapping映射配置与查询典型案例

elasticsearch中的mapping映射配置示例
比如要搭建个中文新闻信息的搜索引擎，新闻有"标题"、"内容"、"作者"、"类型"、"发布时间"这五个字段；
我们要提供"标题和内容的检索"、"排序"、"高亮"、"统计"、"过滤"等一些基本功能。
ES提供了smartcn的中文分词插件，测试的话建议使用IK分词插件。
内容中properties对应mapping里的内容，里面5个字段。
type指出字段类型、内容、标题字段要进行分词和高亮因此要设置分词器和开启term_vector。
{
"news": {
    "properties": {
      "content": {#内容
        "type": "string", #字段类型
        "store": "no", #是否存储
        "term_vector": "with_positions_offsets",#开启向量，用于高亮
        "index_analyzer": "ik",#索引时分词器
        "search_analyzer": "ik"#搜索时分词器
      },
      "title": {
        "type": "string",
        "store": "no",
        "term_vector": "with_positions_offsets",
        "index_analyzer": "ik",
        "search_analyzer": "ik",
        "boost": 5
      },
      "author": {
        "type": "string",
        "index": "not_analyzed"#该字段不分词
      },
      "publish_date": {
        "type": "date",
        "format": "yyyy/MM/dd",
        "index": "not_analyzed"#该字段不分词
      },
      "category": {
        "type": "string",
        "index": "not_analyzed"#该字段不分词
      }
    }
}
}

查询示例：内容包括几个部分：

分页:from/size、字段:fields、排序sort、查询:query、过滤:filter、高亮:highlight、统计:facet
{
"from": 0,
"size": 10,
"fields": [
    "title",
    "content",
    "publish_date",
    "category",
    "author"
],
"sort": [
    {
      "publish_date": {
        "order": "asc"
      }
    },
    "_score"
],
"query": {
    "bool": {
      "should": [
        {
          "term": {
            "title": "中国"
          }
        },
        {
          "term": {
            "content": "中国"
          }
        }
      ]
    }
},
"filter": {
    "range": {
      "publish_date": {
        "from": "2010/07/01",
        "to": "2010/07/21",
        "include_lower": true,
        "include_upper": false
      }
    }
},
"highlight": {
    "pre_tags": [
      "<tag1>",
      "<tag2>"
    ],
    "post_tags": [
      "</tag1>",
      "</tag2>"
    ],
    "fields": {
      "title": {},
      "content": {}
    }
},
"facets": {
    "cate": {
      "terms": {
        "field": "category"
      }
    }
}
}
结果包含需要的几个部分。
值得注意的是，facet的统计是命中的结果进行统计，filter是对结果进行过滤，filter不会影响facet，如果要统计filter掉的的就要使用filter facet。

elasticsearch 的mapping定义

猜你喜欢