elasticsearch (lucene) index data process

Storing an inverted index - fragmentation (lucene function)
in the lucene: lucene index includes several segment
in the elasticsearch: index contains several master from shard, shard lump several segment
segment unit is the smallest file stored elasticsearch that is segmented memory, segment is designed to be immutable
new : create a new index, the new data storage to create a new segment
to delete : Because segment is read-only, so the added .del file in the index file, specialized stores deleted data id, when the query is deleted data can still be queried, will be filtered out when the query results merger will actually deleted when the merge segment
update : add and delete a combination of
immutability of the advantages of segment

  • It does not require a lock (not directly modify the situation existing segment)
  • You can make use of memory, because the segment immutable, so the segment is loaded into memory without changing it, as long as enough memory, long-term residency segment can greatly improve query performance
  • Update, add incremental way is very light, good performance

Immutability disadvantage of segment

  • Deletion does not immediately delete a certain waste of space
  • Updated frequently involve a large number of deletion, there will be a lot of wasted space
  • The number of segment may be very large, a file server and a great handle, query performance increases with the number of segment increases

New data process

 

 The purpose of the process is to: improve write performance (asynchronous down plate)

1, save the index buffer while write Transaction log (memory prevent data loss, a bit like redo log)

2, when the space is full index buffer (default occupancy jvm10%) per second or data executed (by index.refresh_interval configuration) the Refresh operation, a write and clear the index buffer segment (one second finding here is just saved , so es also become a near real-time search engine)

3, the same time the segment brush into memory, open inquiry

4, flush operation segment is written to disk (default 30 minutes to perform a)

  flash operations include:

  •     Called once refresh
  •     fsync: The segment is written to disk
  •     Empty corresponding trans log

 

Guess you like

Origin www.cnblogs.com/zxporz/p/11672695.html