$ db_set 42 '{"name":"San Francisco","attractions":["Exploratorium"]}'
$ db_get 42
{"name":"San Francisco","attractions":["Exploratorium"]}
$ cat database
123456,{"name":"London","attractions":["Big Ben","London Eye"]}
42,{"name":"San Francisco","attractions":["Golden Gate Bridge"]}
42,{"name":"San Francisco","attractions":["Exploratorium"]}
- db_set function actually has good performance for simple operations, as appending to a file is efficient. Similarly, many DB internally use a log -- an append-only sequence of records, could be binary or human-readable.
- db_get function has terrible performance if you have a large number of records in DB. Complexity of key look up is O(n).
Index
- objective -- to efficiently find the value for a particular key in DB
- the general idea -- to keep some additional metadata on the side, which acts as a signpost and helps you to locate the data you want
- Index is an additional structure that is derived from the primary data -- maintaining additional structures incurs overhead, especially on writes, because the index also needs to be updated every time data is written
- for writes, simply appending to a file is the simplest write operation => best performance
Trade-off in storage system
Well-chosen indexes speed up read queries, but every index slows down writes. For this reason, DB don't index everything by default, but require you to choose indexes manually, using your knowledge of the app's typical query patterns.
3.1.1 Hash indexes
Key-value stores are similar to the dictionary type which is usually implemented as a hash map (hash table).
Use in-memory hash map to index data on disk
Suppose our data storage consists only of appending to a file, then the simplest indexing strategy is : _keep an in-memory hash map where every key is mapped to a byte offset in the data file -- the location at which the value can be found.
- When you append a new key-value to the file, you also update the hash map
- When you want to look up a value, use the hash map to find the offset in the data file, seek to that location, and read the value
This is essentially what Bitcask does.
- it offers high-performance reads and writes.
- all the keys have to fit in the available RAM, since the hash map is kept completely in memory
- the values can be loaded from disk with 1 disk seek or from filesystem cache
- this pattern is well suited to situations where the value for each key is updated frequently -- a lot of writes, but not too many distinct keys ==> a large number of writes per key, but it's feasible to keep all keys in memory
how to avoid running out of disk space?
Break logs into segments of a certain size by closing a segment file when it reaches a certain size, and making subsequent writes to a new segment file. Then perform compaction on these segments.
Compaction -- throwing away duplicate keys in the log, and keeping only the most recent update for each key
- Each segment has its own in-memory hash table
- To find the value for a key:
- check the most recent segment’s hash map;
- if the key is not present we check the second-most-recent segment;
- and so on...
- lookups don’t need to check many hash maps because of the merging process
Reference
Designing Data-Intensive Applications by Martin Kleppman
Reproduced in: https: //www.jianshu.com/p/938386352816