elasticsearch删除字段

es因为文档存储，所以一旦字段确立，想要更改就不是那么件容易的事情，网上找到很多的例子，有很多只是说了一半。
比如这篇文章里面的elasticsearch mapping 添加编辑删除字段，他说要把导出的json数据中，删除掉字段对应的数据去掉，才能导入，然后没有下文了。确实着实让人苦恼。
elasticsearch更改mapping(不停服务重建索引),这篇文章，告诉我一个思路，但是提到如何遍历所有的老的index数据，哪个链接是坏掉的，不知道具体怎么做的，是不是很揪心呢。这真是很复杂的东西吗，只不过不会而已，藏着掖着也没什么好的吧。
换个思路，查询Elasticsearch索引管理-reindex重建索引，稍微有些眉目。
1、创建索引别名
edata这个是要改的索引，将它命一个别名edata_v1

POST _alias/d
{
  "actions":
    { "add": {  
            "alias": "edata_v1",  
            "index": "edata"  
   }} 
}

2、创建新的索引
这里注意text和keyword的区别，keyword存储数据的时候，不会分词建立索引，而text则是用来建立索引的，这是Elasticsearch之elasticsearch5.x 新特性，因为我用的时候就是从es5开始的，所以这里不探讨历史。

PUT edata_new
{
  "mappings": {
    "test" : {
      "properties": {
        "name" : {
          "type": "text"
        },
        "test1" : {
          "type": "keyword"
        },
        "test2" : {
          "type": "text"
        },
        "test3":{
          "type": "keyword"
        },
        "source" : {
          "type": "keyword"
        },
        "date" : {
          "type": "date",
          "format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}

3 遍历将数据写入新的索引中
下面的情况，是根据已知数量来做的.经过实践rows设置为1000太少了，按照下面的逻辑1w条数据写入es，也不到6s中，可以调到一些。当然当数据量达到200w左右，写入速度开始变慢，没有达到极限，还可以往上加，至于多少，那就是测试的事情了。

def update_customer(self, start=0, rows=1000):
        results = self.get_customer(start, rows)
        actions = []
        for result in results:
            id = str(result['_id'])
            # print("id is {}".format(id))
            # logger.debug('id is {} '.format(id))
            tm = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            if 'test1' in result['_source'].keys():
                action = {
                        '_index': 'edata_new',
                        '_type': 'title',
                        '_id': result['_id'],
                        '_source': {
                            'test1': str(result['_source']['test1']),
                            'name': str(result['_source']['name']),
                            'test2': str(result['_source']['test2']),
                            'test3': str(result['_source']['test3']),
                            'date':tm,
                            'source':'01'
                        }
                    }
                actions.append(action)
        if len(actions) > 0:
            success, msg = helpers.bulk(self.es, actions)
            return success, msg
        else:
            return "OK", "这一页没啥数据可更新"

def get_customer(self, start=0, rows=1000):
        body = {
            "query": {
                "match_all": {}
            },
            "from":start*rows,
            "size":rows
        }
        results = self.es.search(index='edata', doc_type='title', body=body)
        return results['hits']['hits']

def get_count(self):
        body = {
            "query": {
                "match_all": {}
            }
        }
        result = self.es.count(index='edata', doc_type='title', body=body)
        return result['count']

4 更改索引
删除edata_v1跟edata的别名，而改做edata_new,这样操作能支持不停es就能切换数据，但是有个前提，那就是数据通过es的别名进行引用。

POST _aliases/d
{  
    "actions": [  
        { "remove": {  
            "alias": "edata_v1",  
            "index": "edata"  
        }},  
        { "add": {  
            "alias": "edata_v1",  
            "index": "edata_new"  
        }}  
    ]  
}

如果引用索引，不是通过别名，还需要做下面的操作,也就是得先将edata的索引删掉，然后再重建别名edata，这样才能一劳永逸。这么看来es正确的用法，就是应该用索引别名的方式。

DELETE edata

POST _alias/d
{
  "actions":
    { "add": {  
            "alias": "edata",  
            "index": "edata_new"  
   }} 
}

elasticsearch删除字段

猜你喜欢