文章目录

1 创建document

1.1 创建时手动指定id
1.2 创建时自动生成id

2 查看document

2.1 根据id查询文档
2.2 通过_source字段控制查询结果

3 修改document

3.1 全量替换document
3.2 强制创建document

4 删除document
版权声明

1 创建document

1.1 创建时手动指定id

(1) 适用情景:

从其他系统中导入数据到ES时, 会采取这种方式: 使用原有系统中数据已有的唯一标识, 作为ES中document的id.

而如果数据一生产出来就存储到了ES中, 一般是不适合手动指定id的.

(2) 使用语法:

put index/type/id

(3) 使用示例:

PUT employee/developer/1
{
    "name": "healchow", 
    "e_id": 5220
}

(4) 响应信息:

{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 1,			// 当前版本号, 基于此字段进行并发控制
    "result": "created",
    "_shards": {
        "total": 2,			// 参与创建的分片数, 包括Primary和Replica
        "successful": 1,		// 成功创建索引的分片数量
        "failed": 0			// 创建索引失败的分片数量
    },
    "created": true
}

1.2 创建时自动生成id

(1) 使用情景:

ES作为数据存储服务器, 应用程序中的数据直接对接到ES中, 这种场景适合自动生成id.

在多节点并发生成大量数据的场景下, 自动生成id更具安全性.

(2) 使用语法:

POST index/type

(3) 使用示例:

POST employee/developer
{
    "name": "shoufeng",
    "sex": "male",
    "age": 20
}

(4) 查看添加结果:

{
    "_index": "employee",
    "_type": "developer",
    "_id": "AWcwx9fjbCKddg1Db2e2",	// 自动生成的id, 长度为20个字符
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}

自动生成的id, 长度为20个字符, URL安全, 是Base64编码的GUID字符串, 多节点(分布式系统)并行生成id时不会发生冲突.

2 查看document

2.1 根据id查询文档

查询时可以不指定type, 即下述的developer, 而用_all代替.

// 查询语法: 
GET employee/developer/1

// 结果如下: 
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {				// 文档的元数据
        "name": "healchow",
        "sex": "male",
        "age": 20
    }
}

2.2 通过_source字段控制查询结果

(1) 只获取文档_source内容:

GET employee/developer/1/_source

(2) 禁用_source字段:

GET employee/developer/1?_source=false

(3) 过滤_source中的内容:

// _source_include和_source_exclude可以匹配通配符*
GET employee/developer/1?_source_include=name,age&_source_exclude=sex

(4) 通过fields字段过滤:

GET employee/developer/1?fields=name,age

其他查询操作, 将在后续的文章中专门记录.

3 修改document

3.1 全量替换document

全量替换是基于指定文档id的修改:

// 语法与创建语法相同: 
PUT employee/developer/1
{
    "name": "healchow", 
    "e_id": 5220
}

操作过程说明:

① 如果指定的document id不存在, 就是创建操作;

② 如果指定的document id已经存在, 就是全量替换操作 —— 替换旧文档的JSON串内容;

③ Lucene中倒排索引一旦被创建就是不可变的, 要修改文档内容, 可以采取全量替换的方式 —— 对文档重新建立索引, 替换旧文档的所有内容;

④ ES会将旧文档标记为deleted, 然后根据我们提交的请求创建一个新文档, 当标记为deleted的文档数达到一定量时, ES会在自动删除这些旧文档.

3.2 强制创建document

存在这样的场景:

我们不知道索引中是否已经存在某个文档 —— 可能有其他用户在并发添加文档;

为了防止创建操作被执行为全量替换操作, 从而导致数据的丢失, 我们可以使用强制创建的方式, 来避免这种失误.

强制创建示例:

PUT employee/developer/1?op_type=create
{
    "name": "shou feng",
    "age": 20
}
// 或使用: PUT employee/developer/1/_create

// 响应结果中出现冲突: 
{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[developer][1]: version conflict, document already exists (current version [1])",
                "index_uuid": "qg6MLZLhQUGCIPbmPcfvfg",
                "shard": "3",
                "index": "employee"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[developer][1]: version conflict, document already exists (current version [1])",
        "index_uuid": "qg6MLZLhQUGCIPbmPcfvfg",
        "shard": "3",
        "index": "employee"
    },
    "status": 409
}

出现冲突的原因: ES通过乐观锁控制每个文档的_version信息, 强制创建语法会令初始化版本信息 —— 与已有id的文档的版本信息不符, 所以报错.
出现冲突后, 我们就能知道索引中已存在该文档了, 就可以更改id后重新添加了.

4 删除document

删除语法:
```
DELETE index/type/id
```
ES删除文档采取的是懒删除机制:

不会立即物理删除, 而是将其标记为deleted, 当被删除的文档数量达到一定级别后, ES会在后台自动删除这些文档.

版权声明

作者: ma_shoufeng(马瘦风)

出处: CSDN 马瘦风的博客

您的支持是对博主的极大鼓励, 感谢您的阅读.

本文版权归博主所有, 欢迎转载, 但未经博主同意必须保留此段声明, 且在文章页面明显位置给出原文链接, 否则博主保留追究相关人员法律责任的权利.

小白学ES 16 - Elasticsearch对文档进行CRUD操作(document)