ToplingDB SidePlugin Configuration System

ToplingDB configuration system

For the design motivation of the ToplingDB configuration system, please refer to Motivation To Solution

1. Overview

ToplingDB configuration system uses json/yaml format to define configuration items, and includes all meta objects of ToplingDB/RocksDB into this configuration system. Overall, the ToplingDB configuration system achieves the following goals: 1. All configuration requirements of ToplingDB/RocksDB 2. Seamless plug-in: user code can be inserted into third-party modules (such as ToplingZipTable) without modification 3. Visualization: through Web Service Display the internal state of the engine (related documents) * Monitoring: export the engine indicators to Prometheus through Web Service , and then use grafana to visualize 4. Simplify multi-language Binding (only need to bind conf object)

2. Detailed introduction

The root configuration objects of ToplingDB/RocksDB are DBOptions and ColumnFamilyOptions, and the additional Options objects are the combination of DBOptions and ColumnFamilyOptions (CFOptions for short) (inherited from the latter two).

DBOptions and CFOptions contain second-level configuration objects, and some second-level objects further include third-level configuration objects. All of these objects are defined as sub-objects of the first-level json object named by its base class name in json. In addition, there are several other special first-level json objects in json (http, setenv, databases, open). You can refer to other json objects in the json object, and these references will be converted into reference relationships between C++ objects.

2.1. json configuration example

{
    "http": {
      "document_root": "/path/to/dbname",
      "listening_ports": "8081"
    },
    "setenv": {
      "DictZipBlobStore_zipThreads": 8,
      "StrSimpleEnvNameNotOverwrite": "StringValue",
      "IntSimpleEnvNameNotOverwrite": 16384,
      "OverwriteThisEnv": { "overwrite": true,
        "value": "overwrite is default to false, can be manually set to true"
      }
    },
    "permissions": { "web_compact": true },
    "Cache": {
      "lru_cache": {
        "class": "LRUCache",
        "params": {
          "capacity": "4G", "num_shard_bits": -1, "high_pri_pool_ratio": 0.5,
          "strict_capacity_limit": false, "use_adaptive_mutex": false,
          "metadata_charge_policy": "kFullChargeCacheMetadata"
        }
      }
    },
    "WriteBufferManager" : {
      "wbm": {
        "class": "Default",
        "params": {
          "//comment": "share mem budget with cache object ${lru_cache}",
          "buffer_size": "512M", "cache": "${lru_cache}"
        }
      }
    },
    "Statistics": { "stat": "default" },
    "TableFactory": {
      "bb": {
        "class": "BlockBasedTable",
        "params": { "block_cache": "${lru_cache}" }
      },
      "fast": {
        "class": "ToplingFastTable",
        "params": { "indexType": "MainPatricia" }
      },
      "zip": {
        "class": "ToplingZipTable",
        "params": {
          "localTempDir": "/dev/shm/tmp",
          "sampleRatio": 0.01, "entropyAlgo": "kNoEntropy"
        }
      },
      "dispatch" : {
        "class": "DispatcherTable",
        "params": {
          "default": "bb",
          "readers": { "ToplingFastTable": "fast", "ToplingZipTable": "zip" },
          "level_writers": ["fast", "fast", "bb", "zip", "zip", "zip", "zip"]
        }
      }
    },
    "CFOptions": {
      "default": {
         "max_write_buffer_number": 4, "write_buffer_size": "128M",
         "target_file_size_base": "16M", "target_file_size_multiplier": 2,
         "table_factory": "dispatch", "ttl": 0
      }
    },
    "databases": {
      "db1": {
        "method": "DB::Open",
        "params": {
          "options": {
            "write_buffer_manager": "${wbm}",
            "create_if_missing": true, "table_factory": "dispatch"
          }
        }
      },
      "db_mcf": {
        "method": "DB::Open",
        "params": {
          "db_options": {
            "create_if_missing": true,
            "create_missing_column_families": true,
            "write_buffer_manager": "${wbm}",
            "allow_mmap_reads": true
          },
          "column_families": {
            "default": "$default",
            "custom_cf" : {
              "max_write_buffer_number": 4,
              "target_file_size_base": "16M",
              "target_file_size_multiplier": 2,
              "table_factory": "dispatch", "ttl": 0
            }
          },
          "path": "'dbname' passed to Open. If not defined, use 'db_mcf' here"
        }
      }
    },
    "open": "db_mcf"
  }

2.2. Special objects

2.2.1.http

In this example, the first json subobject is:

"http": {
     "document_root": "/", "listening_ports": "8081"
  }

This http object defines the Http Web Server configuration used for Web presentation. For complete http parameters, please refer to: CivetWeb UserManual .

2.2.2setenv

"setenv": {
     "DictZipBlobStore_zipThreads" : 8
   }

Each subobject of setenv defines an environment variable.

2.2.3.permissions

Each child object of permissions defines a permission.

2.2.4.databases

Multiple database objects can be defined under databases, and database objects are divided into two categories: 1. DB containing only the default ColumnFamily 2. DB containing multiple ColumnFamily ( DB_MultiCF)

column_familiesThese two types of databases are distinguished by whether they contain child objects . Even if a database actually has only one ColumnFamily, but it defines that ColumnFamily in a child object , so is it . column_families DB_MultiCF

The database object is opened by the function specified by the method. The method is overloaded in the C++ code, and the method is also overloaded in the json. The same method is overloaded for DB and respectively. DB_MultiCF

2.2.5.open

Although we can define multiple databases in json, in many cases, we will only open one of them. When using OpenDB api without database name, this open object is used to specify which database to open. When the user uses the OpenDB api with the db name, the open object is ignored.

2.3. General Objects

Among the first-level objects of json, except for the above four special objects, the others are general objects, and the name of each level-one general object is the class name of the base class of such objects in ToplingDB/RocksDB. For example "Cache", "Statistics", "TableFactory" in the example, these first-level objects are themselves equivalent to a container, in which each sub-object defines a real C++ object. Each such "container" is equivalent to a namespace, and there can be objects with the same name under different namespaces.

The C++ object corresponding to the json object contains the class name and parameters, which are expressed by "class" and "params" respectively. Careful users can find that the json with the object name "stat" is the string "default". This is for simplicity. For classes without parameters, you can directly use the string of the class name to define (here "default" is The registered class name of stat, the corresponding C++ class is StatisticsImpl), of course, this kind of object can also be defined by a complete regular json object containing "class" and "params".

DBOptions and CFOptions are special general objects, because their "class" is deterministic, so "class" and "params" are omitted, and the members in "params" are directly promoted to the outer layer.

2.4. Object references

In C++ objects, one object refers to another object through pointers. In json, it is implemented through object names. The formal and complete writing of object references is "\${varname}", and the simplified writing can be "\$ varname" or "varname", where "varname" may lead to ambiguity because a json string may also express is . Our processing method is: first see whether the string is a defined object, if so, according to "varname" processing, otherwise according to processing. "class_name" "class_name"

2.4.1. Embedded objects

Instead of defining named objects and then referencing them by name, we can also define inline objects, as in the example:

"custom_cf" : {
      "max_write_buffer_number": 4,
      "target_file_size_base": "16M",
      "target_file_size_multiplier": 2,
      "table_factory": "dispatch", "ttl": 0
  }

"custom_cf" Could be defined as a reference to a CFOptions pair, but here it is more convenient and concise to define it as an inline object.

2.4.2.CFOptions::ttl

There is no ttl member in CFOptions, but we define ttl for it in json. This is because the "method" of database can be specified as "DB::Open" and many other functions:

"DB::OpenForReadOnly" // 等效于在 params 中定义 read_only: true
  "DBWithTTL::Open" // 需要 CFOptions::ttl
  "TransactionDB::Open"
  "OptimisticTransactionDB::Open"
  "BlobDB::Open"

Users can also extend and define their own Open, for example: MyCustomDB::Open.

2.5.DispatcherTable

"dispatch" : {
    "class": "DispatcherTable",
    "params": {
      "default": "bb",
      "readers": {"ToplingFastTable": "fast", "ToplingZipTable": "zip"},
      "level_writers": ["fast", "fast", "bb", "zip", "zip", "zip", "zip"]
    }
  }

As the name suggests, DispatcherTable is used for actual Table (SST) distribution and scheduling. For users, the most important thing is to use the corresponding Table at the corresponding level. level_writers

default is used to define when level < 0 (level is a member of TableBuilderOptions), or when creating a builder fails, as fallback. level_writer

readers are used to define the mapping to varname, because in the internal implementation, loading Table is implemented through implementation. As a dispather, it is natural to know what kind of Table the loaded Table is. This is distinguished by TableMagicNumber, which is statically determined at compile time. Yes, but TableFactory is created at runtime, and each specific TableFactory class can have multiple objects (with different params), so we need to specify which TableFactory object the corresponding TableFactory class uses to load. class_name DispatcherTable::NewTableReader

In this DispatcherTable definition, L0\~L2 use fast and L3\~L6 use zip.

3. Yaml file

Users who are familiar with Kubernetes may prefer Yaml. As a configuration file, Yaml is more readable, and the ToplingDB configuration system also supports Yaml.

4. Several actual configuration files

Json	Yaml	illustrate
etcd_dcompaction.json	yaml	with distributed Compact
lcompact_sample.json	yaml	Without Distributed Compact
todis-community.json	yaml	todis community edition (no performance components)
todis-enterprise.json	yaml	todis Enterprise Edition (with performance components)

See the link below:

etcd_dcompaction.json yaml with distributed Compact

lcompact_sample.json yaml without distributed Compact

todis-community.json yaml todis community edition (no performance components)

todis-enterprise.json yaml todis Enterprise Edition (with performance components)