Python接続ES動作

1. 準備

1.1 elasticsearch をインストールします。ここではバージョン 7.9.1 をインストールする例を示します。

pip3 install elasticsearch==7.9.1

2.ESを接続する

elasticsearch 拡張機能をインストールした後、Python を使用して es に接続できるようになります。

from elasticsearch import Elasticsearch

es = Elasticsearch("http://192.168.1.1:9200",http_auth=('username', 'password'), timeout=20)

複数のesがある場合、複数のipsを保存できます

es = Elasticsearch("['http://192.168.1.1:9200','http://192.168.1.2:9200']",http_auth=('username', 'password'), timeout=20)

3. インデックスを作成する

es にリンクした後、次のコマンドを使用してインデックスを作成できます。

result = es.indices.create(index='user_info',ignore=400)
print(result)

作成が成功すると返されます

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'user_info'}

4. データを挿入する

インデックス作成後、テーブルにデータを挿入します mysqlテーブルのデータがesに格納されている場合、オートインクリメントIDがあればこのように作成できます

data={
    
    'id':1,'name':'zhangsan','sex':'男','age':30,'hobby':'basketball,football','add_time':'2023-01-01 10:00:00'}
result = es.create(index='user_info',id=data['id'],body=data)
print(result)

挿入が成功すると、次のデータが返されます。

{
    
    '_index': 'user_info', '_type': '_doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {
    
    'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 12, '_primary_term': 1}

ここでは、生成された _id フィールドの値が id と同じであることがわかります。
次に、ループにデータを挿入します。

dataList = [
    {
    
    'id':2,'name':'lisi','sex':'男','age':20,'hobby':'football','add_time':'2023-01-11 10:00:00'},
    {
    
    'id':3,'name':'wanger','sex':'男','age':40,'hobby':'swim,football','add_time':'2023-02-01 10:00:00'},
    {
    
    'id':4,'name':'hanmei','sex':'女','age':18,'hobby':'run,football','add_time':'2023-02-03 10:00:00'},
    {
    
    'id':5,'name':'lily','sex':'女','age':25,'hobby':'badminton','add_time':'2023-02-10 10:00:00'}
]

for d in dataList:
    es.create(index='user_info', id=d['id'], body=d)

テーブルフィールドに自動インクリメントIDがない場合は、このメソッドを使用してデータを挿入できます。

data = {
    
    'name':'liudehua','sex':'男','age':30,'hobby':'sing,acting','add_time':'2023-01-01 10:00:00'}
result = es.index(index="user_info", body=data)

print(result)

返される結果は次のとおりです。この時点で生成される _id は、id 値を渡さなかったため、es によって自動的に生成されます。

{'_index': 'user_info', '_type': '_doc', '_id': 'U8EtgoYB3nwluujTOwxY', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 17, '_primary_term': 1}

5. データのクエリ

データを挿入した後、クエリを実行できます。よく使用される例をいくつか見てみましょう。

5.1 主キーIDに基づく検索

result = es.get(index='user_info', id=2)
print(result)

結果を返す

{'_index': 'user_info', '_type': '_doc', '_id': '2', '_version': 1, '_seq_no': 13, '_primary_term': 1, 'found': True, '_source': {'id': 2, 'name': 'lisi', 'sex': '男', 'age': 20, 'hobby': 'football', 'add_time': '2023-01-11 10:00:00'}}

5.2 フィールド値に基づく完全検索

body = {
    
    
    "query":{
    
    
        "term":{
    
    
            "name":"liudehua"
        }
    }
}
result = es.search(index='user_info',body=body)

結果の戻り値

{'took': 0, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 1.540445, 'hits': [{'_index': 'user_info', '_type': '_doc', '_id': 'U8EtgoYB3nwluujTOwxY', '_score': 1.540445, '_source': {'name': 'liudehua', 'sex': '男', 'age': 30, 'hobby': 'sing,acting', 'add_time': '2023-01-01 10:00:00'}}]}}

5.3 フィールド値に基づくあいまい一致

#查找兴趣爱好喜欢足球的

body = {
    
    
    "query":{
    
    
        "match":{
    
    
            "hobby":"football"
        }
    }
}
result = es.search(index='user_info',body=body)

4 つのレコードを返します。結果は次のようになります。

{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 0.52827823, 'hits': [{'_index': 'user_info', '_type': '_doc', '_id': '2', '_score': 0.52827823, '_source': {'id': 2, 'name': 'lisi', 'sex': '男', 'age': 20, 'hobby': 'football', 'add_time': '2023-01-11 10:00:00'}}, {'_index': 'user_info', '_type': '_doc', '_id': '1', '_score': 0.4084168, '_source': {'id': 1, 'name': 'zhangsan', 'sex': '男', 'age': 30, 'hobby': 'basketball,football', 'add_time': '2023-01-01 10:00:00'}}, {'_index': 'user_info', '_type': '_doc', '_id': '3', '_score': 0.4084168, '_source': {'id': 3, 'name': 'wanger', 'sex': '男', 'age': 40, 'hobby': 'swim,football', 'add_time': '2023-02-01 10:00:00'}}, {'_index': 'user_info', '_type': '_doc', '_id': '4', '_score': 0.4084168, '_source': {'id': 4, 'name': 'hanmei', 'sex': '女', 'age': 18, 'hobby': 'run,football', 'add_time': '2023-02-03 10:00:00'}}]}}

5.4 特定のレコードが存在するかどうかを調べる

一部のシナリオでは、レコードが存在するかどうかをクエリする必要があります。exist メソッドを直接使用して True と False を返すことができます。

exist = es.exists(index="user_info",id=2)

print(exist)

True

5.5 カスタム SQL クエリ

一部のシナリオでは、SQL を自分で記述することも可能です

body={
    
    
    'query':'select * from user_info where age>25 '
}
result = es.sql.query(body=body)
print(result)

結果は次のように返されます

{'columns': [{'name': 'add_time', 'type': 'text'}, {'name': 'age', 'type': 'long'}, {'name': 'hobby', 'type': 'text'}, {'name': 'id', 'type': 'long'}, {'name': 'name', 'type': 'text'}, {'name': 'sex', 'type': 'text'}], 'rows': [['2023-01-01 10:00:00', 30, 'basketball,football', 1, 'zhangsan', '男'], ['2023-02-01 10:00:00', 40, 'swim,football', 3, 'wanger', '男'], ['2023-01-01 10:00:00', 30, 'sing,acting', None, 'liudehua', '男']]}

es には他にもより複雑なクエリシナリオがあります。詳細について知りたい場合は、公式 Web サイトのドキュメント https://elasticsearch-py.readthedocs.io/en/7.9.1/ を参照してください。

6. データを変更する

es はデータを変更することもできます

#修改id为1的用户的兴趣爱好
body={
    
    
    "doc":{
    
    
        "hobby":"swim,game"
    }
}
es.update(index='user_info',id=1,body=body)

次のように user id=1 のユーザーデータを再度クエリすると、swim、game になっています。

{'_index': 'user_info', '_type': '_doc', '_id': '1', '_version': 2, '_seq_no': 18, '_primary_term': 1, 'found': True, '_source': {'id': 1, 'name': 'zhangsan', 'sex': '男', 'age': 30, 'hobby': 'swim,game', 'add_time': '2023-01-01 10:00:00'}}

7. データの削除

7.1 主キーIDに応じた削除

データの主キー ID がわかっている場合は、主キー ID に従ってデータを削除できます。


result = es.delete(index='user_info',id=1)
print(result)

削除は成功し、次の結果が返されます。

{'_index': 'user_info', '_type': '_doc', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 19, '_primary_term': 1}

id=1 のユーザーデータを再度クエリすると、データが見つからないことがわかります。

body = {
    
    
    "query":{
    
    
        "term":{
    
    
            "id":"1"
        }
    }
}
result = es.search(index='user_info',body=body)

{'took': 126, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}

7.2 クエリ条件に応じた削除

body={
    
    
    "query": {
    
    
        "match": {
    
    
            "name":"liudehua"
        }
    }
}

result = es.delete_by_query(index='user_info',body=body)
print(result)

以下のように返却成功情報を削除します。

{'took': 3218, 'timed_out': False, 'total': 1, 'deleted': 1, 'batches': 1, 'version_conflicts': 0, 'noops': 0, 'retries': {'bulk': 0, 'search': 0}, 'throttled_millis': 0, 'requests_per_second': -1.0, 'throttled_until_millis': 0, 'failures': []}

7.3 テーブル内のすべてのデータをクリアする

一部の小さなテーブルで、テーブル内のすべてのデータをクリアしたい場合は、次の方法を使用できます。


body={
    
    
    "query": {
    
    
        "match_all": {
    
    }
    }
}

es.delete_by_query(index='user_info',body=body)

大きなテーブルの場合は、インデックスを直接削除してから、新しいインデックスを追加することをお勧めします。

Python接続ES