Summary: Grafana Mimir Research

1. Background

Prometheus single instance, low availability, low reliability, cannot store more data.

  • solve business problems
    • For example: the current QKE is a cluster + a project and a prometheus instance, so when I deploy an application in multiple clusters, when querying data, I have to search from three prometheus, and then aggregate, which is troublesome. This problem does not exist with mimir

Remote Read of Prometheus Cluster Solution - Go China's Blog - CSDN Blog

  • Some businesses cannot be split, and the amount of data is still particularly large.

  • Data collection and querying do things completely independently, but the two often compete with each other for resources.

2. Core advantages of Grafana Mimir

1. Ease of maintenance

One of the core strengths of Grafana Mimir is its ease of "install and maintain". The project's extensive documentation, tutorials, and deployment tools make getting started quick and easy. Grafana Mimir's monolithic pattern allows the use of only one binary with no additional dependencies. Additionally, best practice dashboards, alerts, and runbooks packaged with Grafana Mimir make it easy to monitor the health of your system and keep it running smoothly.

2. Scalability

At the same time, Grafana Mimir's horizontally scalable architecture enables it to handle large amounts of time series data. Internal tests have shown that the system can handle up to 1 billion active time series, enabling massive scalability. This means that Grafana Mimir can run across multiple machines and thus be able to process orders of magnitude more time series than a single Prometheus instance.

3. Global view

In addition to this, another key advantage of Grafana Mimir is its ability to provide a global view of metrics. The system enables users to run queries that aggregate series from multiple Prometheus instances, providing a comprehensive view of all systems. The query engine also extensively parallelizes query execution, allowing even the highest cardinality queries to execute extremely fast.

4. Data persistence

Grafana Mimir uses object storage for long-term data storage, taking advantage of this ubiquitous, cost-effective, high-durability technology. The system is compatible with multiple object storage implementations, including AWS S3, Google Cloud Storage , Azure Blob Storage, OpenStack Swift, and any S3-compatible object storage. This provides users with an inexpensive, durable way to store metrics for long-term analysis.

5. High availability through replication

High availability is another key feature of Grafana Mimir. The system replicates incoming metrics, ensuring that no data is lost in the event of a machine failure. Its horizontally scalable architecture also means it can be restarted, upgraded or downgraded with zero downtime, ensuring no interruption to metric ingestion or querying.

6. Native multi-tenancy‍‍

Finally, Grafana Mimir's native multi-tenancy allows data and query isolation for independent teams or business units, enabling these groups to share the same cluster. Advanced throttling and quality of service controls ensure capacity is shared fairly among tenants, making it an excellent choice for larger organizations with multiple teams and departments.

2. Why is Prometheus a single instance?

Prometheus has always emphasized that it only does core functions.

In addition, Prometheus combines the features and advantages of the Go language, enabling Prometheus to capture and store more data at a lower cost, while Java-implemented big data projects such as Elasticsearch or Cassandra consume more resources to process the same amount of data. In other words, the single-instance, non-scalable Prometheus is powerful enough to meet the needs of most users .

Take a look:
1. Grafana checks the promerheus data source of mimir, and sees what the configured query address is. I want to know if it is the query component and what the port is. In addition, you can f12 to see if there is a call request  
* In the solution of data persistence to the hard disk, VictoriaMetrics is a better choice
* In the solution of data persistence to object storage, Thanos is more popular, and Grafana Mimir has more potential.

Object storage and local disk storage

Does Mimir need etcd?

3. Grafana Mimir distributed architecture‍

The distributed architecture of Grafana Mimir can refer to the following diagram:

From the figure above, we can see that Mimir has 7 different components, giving the first impression that it is a complex system. Thankfully, helm charts make things easier and also provide resource allocation recommendations based on payload.

For the analysis of the functions of different components in the architecture, you can refer to the following files and source code for exploration, as follows:

serviceMonitor:
  enabled: true
 
# disabled, external ruler is used
ruler:
  enabled: false
 
# disabled, external alertmanager is used
alertmanager:
  enabled: false
 
# disabled, blocks_storage is used
minio:
  enabled: false
 
compactor:
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      value: compactor
      operator: Equal
      effect: NoSchedule
  persistentVolume:
    storageClass: standard-rwo
    size: 50Gi
  resources:
    limits:
      cpu: 1200m
      memory: 2Gi
    requests:
      cpu: 1200m
      memory: 2Gi
 
distributor:
  extraArgs:
    distributor.ingestion-rate-limit: "10000000000000"
  replicas: 5
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      value: distributor
      operator: Equal
      effect: NoSchedule
  resources:
    limits:
      memory: 4Gi
      cpu: 2
    requests:
      memory: 4Gi
      cpu: 2
 
ingester:
  extraArgs:
    ingester.max-global-series-per-user: "0"
    ingester.max-global-series-per-metric: "0"
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      value: ingester
      operator: Equal
      effect: NoSchedule
  persistentVolume:
    size: 150Gi
    storageClass: standard-rwo
  replicas: 5
  resources:
    limits:
      memory: 25Gi
      cpu: 4
    requests:
      memory: 25Gi
      cpu: 4
 
chunks-cache:
  nodeSelector:
    app: mimir
  enabled: true
  replicas: 2
 
index-cache:
  nodeSelector:
    app: mimir
  enabled: true
  replicas: 3
 
metadata-cache:
  nodeSelector:
    app: mimir
  enabled: true
 
results-cache:
  nodeSelector:
    app: mimir
  enabled: true
 
overrides_exporter:
  nodeSelector:
    app: mimir
  replicas: 1
  resources:
    limits:
      memory: 256Mi
    requests:
      cpu: 100m
      memory: 128Mi
 
querier:
  extraArgs:
    querier.max-fetched-chunks-per-query: "8000000"
  replicas: 4
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      operator: Equal
      value: querier
      effect: NoSchedule
  resources:
    limits:
      memory: 24Gi
      cpu: 2
    requests:
      memory: 24Gi
      cpu: 2
 
query_frontend:
  replicas: 1
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      operator: Equal
      value: query-frontend
      effect: NoSchedule
  resources:
    limits:
      memory: 6Gi
      cpu: 2
    requests:
      memory: 6Gi
      cpu: 2
 
store_gateway:
  persistentVolume:
    size: 50Gi
  replicas: 1
  nodeSelector:
    app: mimir
  tolerations:
    - key: app
      operator: Equal
      value: store-gateway
      effect: NoSchedule
  resources:
    limits:
      cpu: 1
      memory: 6Gi
    requests:
      cpu: 1
      memory: 6Gi
 
mimir:
  structuredConfig:
    limits:
      out_of_order_time_window: 1h
    blocks_storage:
      backend: gcs
      gcs:
        bucket_name: <bucket_name>
        service_account: |
          {<secret>}
metaMonitoring:
  serviceMonitor:
    enabled: true

One more piece:

This one can more intuitively see the role of memberlist:


4. High reliability of data capture: HATracker

Reference: Mimir Speed ​​Experience (Part 4): High Reliability of Data Capture: Mimir Speed ​​Experience (Part 4): High Reliability of Data Capture_Grafana_Grafana Enthusiasts_InfoQ Writing Community

In the Prometheus ecosystem, it is necessary not only to achieve high reliability of data storage, but also to achieve high reliability of data capture (Agent/Collector). Usually, everyone adopts a similar strategy, that is, multiple Prometheus instances are set up for the same configuration, and then the The results fetched by each instance are pushed to the same storage backend through remote write.

Although doing so brings high reliability of crawling, if the data is not deduplicated (in a crawling cycle, the distributor only needs to forward the data captured by one Prometheus), the indicators captured by each Prometheus example will eventually be forwarded For the ingester, this will double the amount of data written by the ingester and the amount of compactor block compression, which will lead to a waste of resources.

So how to get rid of it? The HATracker function of the distributor is mainly used in Mimir/Cortex.

The basic logic of HATracker:

The general process is as follows:

  • Start two Prometheus Agents to capture indicators for the same APP.

  • The two Prometheus Agents inject the replication group label of {cluster: team1, replica : replica1/replica2} through global external_labels , and all indicators will carry this information.

  • After Mimir enables the HATracker function, the distributor will select the Prometheus Agent based on the cluster and replica grouping of the indicators. If replica1 is selected here, the result will be written to KV (currently only consul and etcd are supported).

  • After receiving the data forwarded by the Agent, all nodes of the distributor will judge whether it is from replica1, if it is forwarded (the forwarded data will delete the replica label), if not, it will be discarded.

When the replica1 agent fails and exceeds ha_tracker_failover_timeoutthe time configured by HATracker, it will trigger the agent to re-elect the master, that is, become replica2, and store it in KV. In the future, all the data of replica2 agent will be forwarded, and after replica1 recovers, the data reported by it will only be discarded.

 It can be seen that with HATracker, the data of the same cluster is deduplicated in the distributor, so that only one agent's data is written to the ingester.

how to configure

Modify prometheus.yaml, add external_labels, the content is as follows:

global:  external_labels:    cluster: team1    __replica__: replica1and
global:  external_labels:    cluster: team1    __replica__: replica2

Note: We can only use the Prometheus Agent mode to start the instance, and we can inject replica information through environment variables.

Modify the mimir/cortex distributor configuration, enable ha_tracker, and configure the corresponding kvstore.

limits:  accept_ha_samples: true
distributor:  ha_tracker:    enable_ha_tracker: true    kvstore:      store: consul      consul:        host: consul:8500

Summarize

When implementing the Prometheus high-availability solution, not only the reliability of data storage, but also the high reliability of data capture must be considered. The HATracker function of Mimir/Cortex can realize the deduplication of data captured by multi-copy Agents and avoid data writing. Double the input and data compression, greatly saving resources.

4. Planning Grafana Mimir capacity

1. Estimate the required CPU and memory:

Distributor

  • CPU: 1 core per second per 25000 samples.
  • Memory: 1GB for 25000 samples per second.

Ingest

  • CPU: 1 core per 300,000 series in memory
  • Memory: 2.5GB per 300,000 series in memory
  • Disk space: 5GB per 300,000 series in memory

Query-frontend

  • CPU: 1 core per 250 queries per second
  • Memory: 1GB per 250 queries per second

Querier

  • CPU: 1 core per 10 queries per second
  • Memory: 1 core per 10 queries per second

Store-gateway

  • CPU: 1 core per 10 queries per second
  • Memory: 1GB per 10 queries per second
  • Disk: 13GB per 1 million active series

8. Performance test

1. Performance and resource requirements

Who is stronger? Performance test between Grafana Mimir and VictoriaMetrics

Mimir can scale to 1 billion active time series and an  ingestion rate of 50 million samples/second according to Grafana labs tests , the benchmark requires running a  cluster with 7000 CPU cores and 30TiB memory , which is already what I heard The largest and most expensive public benchmark for time series databases. It's not that easy to reproduce benchmarks of this scale, luckily in most cases users have much less demanding workloads that are relatively easy to emulate.

Recommendations for large workloads call for approximately 140 CPUs and 800GB of memory for 10 million active time series.

But for a simple benchmark, this is obviously too demanding, so I started with the recommended configuration for a small workload of 1 million active time series, with resource requirements of about 30 CPUs and 200GB of memory.

2、

VictoriaMetrics consumes 1.7 times less CPU on average than Mimir.

VictoriaMetrics uses about 5x less memory than Mimir

Nine, Prometheus writes time series data to Mimir

1、

10. Test environment deployment: virtual machine

1. Test environment deployment: virtual machine

Documentation: Play with Grafana Mimir | Grafana Labs

1. Upgrade python to 3.x, because docker-compose has version requirements

2. Install docker-compose

  • Remove old version of docker-compose: sudo rm /usr/local/bin/docker-compose
  • 安装docker-compose:sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
  • Change the permissions of the docker-compose file to be executable: sudo chmod +x /usr/local/bin/docker-compose
  • restart terminal
  • Run the docker-compose version command again

3. Deploy mimir

Execute the following command:

git clone https://github.com/grafana/mimir.git
cd mimir
cd docs/sources/mimir/tutorials/play-with-grafana-mimir/
docker-compose up -d

docker-compose down && docker-compose up -d

执行命令后会部署几个服务:
  • Grafana Mimir
    • Three instances of Mimir in monolithic mode to provide high availability, and at the same time, enable multi-tenancy (tenant ID is Demo).
  • Minion
    • S3-compatible persistent storage for blocks, rules, and alerts.
  • Prometheus
    • Grafana Mimir metrics are fetched and then written back to Grafana Mimir to ensure availability of ingested metrics.
  • Grafana
    • Includes a pre-installed data source to query Grafana Mimir and a pre-installed dashboard for monitoring Grafana Mimir.
  • Load balancer
    • A simple NGINX based load balancer exposing Grafana Mimir endpoints on hosts.

Access services:

 
 

2. Deployment mode

The deployment mode is divided into single mode and micro service mode .

  • Monolithic mode runs all required components in a single process and is the default mode of operation, which you can specify by specifying-target=all
    • 目前我在虚机上基于单体模式部署的,即mimir进程
  • 微服务模式
  • 读写模式
    • Combining the monomer mode and the micro-service mode, the components are classified according to their functions, and the reading functions are placed in a process, and the writing functions are placed in the writing process.

11. Tenant management

We all know that Mimir's multi-tenancy is soft isolation, that is, the data of different tenants will share the hardware resources of the cluster at the same time, and the write/query throughput capacity of a Mimir cluster is often limited.

Therefore, we need to do fine-grained management for different databases (tenant data). There are two main purposes:

  • Current limiting: Configure different write/query QPS limits for different databases to avoid the impact of resource competition between different databases.

  • Functional configuration: mainly includes the size of database fragmentation, the maximum retention time of data storage, the time range of out-of-order data writing, etc. These configurations not only affect the function and performance of the database, but also effectively reduce the cost of data storage.

In Mimir/Cortex, by default, one piece of data will be written from the distributor to three different ingester nodes. The purpose of this is to achieve high reliability of data storage through multi-copy replication.

For HATracker, the data of the same cluster is deduplicated in the distributor, so that only one agent's data is written to the ingester.

Fine-grained tenant management through runtime configuration: https://xie.infoq.cn/article/bd811ca1741a790b5dd5dd766

12. Visual editing of alarm rules: mimir

13. Component introduction

1 Introduction

Mimir optional components: alertmanager, ruler, overrides-exporter, query-scheduler

Mimir required components: compactor, distributor, ingester, queryer, query-frontend, store-gateway

For a detailed introduction to the role of each component, refer to:
Using Grafana Mimir to realize cloud-native monitoring and alarm visualization
https://mp.weixin.qq.com/s?__biz=MzAxODcyNjEzNQ==&mid=2247570788&idx=3&sn=464f12dca1150cb5ce9190246a021b93&chksm=9bd278fca ca5f1eab4247d7875557932eb0b390aae3aa0a7bfc60627d4f6a6c1e5e88a718b17&scene=27#wechat_redirect

2. Introduction to caching components

When grafana mimir is deployed in the production environment, cache-related components such as chunks-cache, index-cache, metadata-cache, and results-cache

will be deployed. The functions of these components are as follows: 1. chunks-cache: used to cache data blocks, the data blocks are A segmented storage unit for time series data. By caching data blocks, the system can reduce access to the back-end storage system when processing high-frequency queries, improve query performance and reduce latency.

  • This cache component is used to store the actual time series data chunks (chunks). It interacts with the query engine and the storage layer. When a user requests data within a certain time range, the query engine will first query the chunks-cache. If the data block is in the cache, it will return directly; if it misses the cache, it will get the data from the storage layer, and add the obtained data block to the chunks-cache, so that it can be directly obtained from the cache next time.

2. index-cache: used to cache index information of time series data, such as metric labels, metric names, etc. Index caching helps to find related series quickly and narrow down queries without querying the entire dataset.

  • This cache component is used to store index information of time series data. These index information help to reduce the scope of data scanning when fetching data. index-cache usually interacts with query engine and storage layer. When querying data, the query engine needs to use these index information to find the actual data location. It will first query the index-cache; if the index is in the cache, it will get the corresponding data from the storage layer; if it misses the cache, it will get the index information from the storage layer, and add the obtained index information to the index-cache to download fetched from the cache on each access.

3. metadata-cache: metadata for caching metrics and tags. Metadata caching speeds up exploratory queries, such as listing all available metrics or filtering metrics by tags, improving dashboard loading speed and responsiveness.

  • This cache component is responsible for storing metadata (metrics and tags). Likewise, it interacts with the query engine and storage layer. When a user queries data that includes label filters, the query engine needs to know detailed information about the measure and its labels to complete the query. At this time, it will first query the metadata-cache; if the metadata is in the cache, it will return directly; if it misses the cache, it will obtain the metadata from the storage layer and add the obtained metadata to the metadata-cache for downloading fetched from the cache on each access.

4. results-cache: used to cache query results. By caching the results of previous queries, the same data can be returned quickly on subsequent accesses, reducing system load and reducing query times.

Deploying these cache components in a production environment helps improve system performance and scalability. Note that depending on your usage scenario and underlying storage system, these caching components may need to be tuned for optimal performance.

Fourteen, distributor (data distributor)

The Distributor is a stateless component that receives time-series data from Prometheus or Grafana brokers. The Distributor verifies the correctness of the data and ensures that the data is within the configured limits for a given tenant. The Distributor then divides the data into batches and sends them in parallel to multiple receivers, slicing the sequences among the receivers and replicating each sequence by the configured replication factor. By default, the configured replication factor is 3.

How it works
Validation
The Distributor validates the data it receives before writing it to the ingester. Because a single request can contain valid and invalid metrics, samples, metadata, and samples, the distributor only passes valid data to the ingester. The Distributor did not include invalid data in its request to the recipient program. If the request contains invalid data, the Distributor will return a 400 HTTP status code and details will be shown in the response body. Details about the first invalid data whether it's a Prometheus or Grafner agent are usually logged by the sender. Distributor validation includes the following checks:
* Metric metadata and labels conform to the Prometheus exposed format.
* Length of metric metadata (name, help and unit) does not exceed the length defined by validation.max-metadata-length
* Number of labels per metric is not higher than -validation.max-label-names-per-series
* Per each metric label name must be no longer than -validation.max-length-label-name
* each metric label value no longer than -validation.max-length-label-value
* each sample timestamp no later than -validation.create-grace -period
* Each sample has a timestamp and at least one non-empty label name and value pair.
* No more than 128 tags per sample.
Rate Limiting
Distributor includes two different types of rate limiting that apply to each tenant.
* Request rate
The maximum number of requests per second that each tenant can handle across the Grafana Mimir cluster.

* Accept rate
The maximum number of samples per second that each tenant can receive in a Grafana Mimir cluster. If any of these rates are exceeded, the Distributor will drop the request and return an HTTP 429 response code.
Internally, these limits are implemented using each distributor's local rate limiter. Each Distributor's local rate limiter is configured with limit/N, where N is the number of healthy Distributor replicas. If the number of Distributor replicas changes, the Distributor automatically adjusts the request and receive rate limits. Because these rate limits are implemented using each Distributor's local rate limiter, they require write requests to be evenly distributed across the Distributor pool. It can be limited by the following parameters:
-distributor.request-rate-limit
-distributor.request-burst-size
-distributor.ingestion-rate-limit
-distributor.ingestion-burst-size


Highly available trackers
Remote write senders (such as Prometheus) can be configured in pairs, which means that even if one of the remote write senders is down for maintenance or unavailable due to a failure, metrics will continue to be wiped and written to Grafana Mimir. We refer to this configuration as a High Availability (HA) pair. The distribution server includes an HA tracker. When the HA tracker is enabled, the Distributor deduplicates incoming sequences from Prometheus HA pairs. This enables you to have multiple HA copies of the same Prometheus server, write the same series to Mimir, and then deduplicate that series in the Mimir distributor.


Splitting and Replication
The Distributor splits and replicates incoming sequences between ingesters. You can configure the number of ingestor replicas written by each series via -ingester.ring.replication-factor. The replication factor defaults to 3. Distributors use consistent hashing and a configurable replication factor to determine which receivers receive a given sequence. Sharding and replication use ingester's hash ring. For each incoming sequence, the Distributor computes a hash using the metric name, label, and tenant ID. The calculated hash is called a token. The distribution server looks up the token in the hash ring to determine to which recipient program to write the sequence.

15. ingester (data receiver)

A receiver is a stateful component that writes incoming sequences to long-term storage on the write path and returns queried sequence samples on the read path.

working principle

Incoming sequences from the distributor are not immediately written to long-term storage, but are kept in memory on the receiving server or offloaded to disk on the receiving server. Eventually, all series are written to disk and uploaded to long-term storage periodically (every two hours by default). Therefore, the querier may need to fetch samples from sinks and long-term storage when executing queries on the read path. Any Mimir component that invokes a sink first looks up the sinks registered in the hash ring to determine which sinks are available. Each receiver may be in one of the following states:
pending
joining
active
leaving
unhealthy

Write Amplification
Ingers store the most recently received samples in memory in order to perform write amplification. If the receiver immediately wrote the received samples to long-term storage, the system would be difficult to scale due to the high voltage of long-term storage. For this reason, the receiver batches and compresses samples in memory and periodically uploads them to long-term storage. Write de-amplification is the main source of Mimir's low total cost of ownership (TCO).
Reception Failure and Data Loss
If the receiving program process crashes or exits suddenly, any in-memory sequences that have not been uploaded to long-term storage may be lost. There are ways to mitigate this failure mode:
* Replication
* Write-ahead log (WAL)
* Write-behind log (WBL), out-of-order when enabled

zone-aware replication

Region-aware replication ensures that the received replicas of a given time series are divided across different regions. Partitions can represent logical or physical failure domains, for example, different data centers. Splitting replicas across multiple regions prevents data loss and service interruption in the event of an outage across regions.

Sixteen, queryer (data query device)

1 Introduction

The queryer is a stateless component that evaluates PromQL expressions by fetching time series and tags on the read path, long-term storage with the storage gateway component, and recently written data with the sink component.
working principle

In order to find the correct chunk at query time, the querier needs an up-to-date view of the bucket in long-term storage. The querier only needs metadata information from the bucket, which includes the minimum and maximum timestamps of the samples within the bucket. The querier does one of the following to ensure the bucket view is updated:
Periodically download the bucket index (default)
Periodically scan the bucket

2. Case

Address: http://10.41.26.131:9009/prometheus/api/v1/query?query=up

3. Query Fragmentation

mimir supports query fragmentation, the schematic diagram is as follows


Seventeen, query-frontend (query front end)

The query frontend is a stateless component that provides the same API as the queryer and can be used to speed up the read path. Although the query frontend is not required, we recommend that you deploy it. When deploying a query frontend, query requests should be made to the query frontend instead of the querier. Queriers are needed in the cluster to execute queries, and the queries are held in internal queues. In this case, the querier acts as a worker that pulls jobs from the queue, executes them, and returns the results to the query front end for aggregation. To connect the querier to the query frontend, use the -querier.frontend-address configuration. It is recommended to deploy at least 2 query frontends in the case of high availability.


to split

The query frontend can split remote queries into multiple queries. By default, the split interval is 24 hours. The query front end executes these queries in parallel in downstream queryers and combines the results. Splitting prevents large multi-day or multi-month queries from causing querier out of memory errors and speeds up query execution.

store-gateway (data storage gateway)
The store gateway component is stateful and it queries blocks from long-term storage. On the read path, the querier and ruler use the storage gateway when processing queries, whether the query is from a user or from a rule being evaluated. In order to find the correct block to look for at query time, the storage gateway needs an up-to-date view of the buckets in long-term storage.

18. Alert manager

Mimir Alertmanager adds multi-tenancy support and horizontal scalability to Prometheus Alertmanagers. Mimir Alertmanager is an optional component that accepts alert notifications from the Mimir ruler component . Alertmanager deduplicates and groups alert notifications and routes them to notification channels such as email, PagerDuty, or OpsGenie.

Configure alert rules on grafana

The Alertmanager data storage location specifies:

Interface configuration alarm rules:

Nineteen, query-scheduler

The query scheduler is an optional stateless component that keeps a queue of queries to be executed and distributes the workload among available queriers .

Twenty, Override-exporter

Mimir supports configuring resource usage limits per tenant application to prevent a single tenant from using too many resources.

The override-exporter component exposes these limit information as prometheus metrics so that operators know how close tenant resource usage is to the configured limit.

For example, the utilization rate indicator can be configured in the future. In this case, the user can be notified in time when the utilization rate is approaching, so that the user can deal with it in time.

21. store-gateway

The store-gateway component also needs to store data.

Curious, why does store-gateway also need to store data? After looking at the structure, it is not the same as that of ingester, as shown below:

The function of index-header is to store the index information of each block locally for quick query.

See the documentation for details: Grafana Mimir binary index-header | Grafana Mimir documentation

解释:To query series inside blocks from object storage, the store-gateway must obtain information about each block index. To obtain the required information, the store-gateway builds an index-header for each block and stores it on local disk。

It can also be seen from the architecture diagram of mimir that the store-gateway component does interact with the index-cache, and it is estimated that both the local and index-cache store a copy.

By the way, let's take a look at the storage of other components:

Compactor components:

 Choose one to go in and see:

The compactor directory should record index information (it feels like it needs to be used when querying from object storage S3. Note that compactor will not query data from ingester, but only interact with OSS). I don’t know why there are chunks, and we will see later Source code verification

 Other folders seem to be for tenants, choose one to go in and have a look:

Looking at the storage structure, it records which blocks each tenant has, and the meta information of each block (the storage time and time range, it is expected that each compression will actually specify which time range to compress, to be verified by the source code)

 

Look at AlertManager again:

Record plug-in information, I don’t know what other directories are used for, it needs to be verified

Twenty-two, bucket index

Grafana Mimir bucket index | Grafana Mimir documentation

The bucket index is a per-tenant file that contains a list of blocks and block deletion markers in the storage.

The bucket index is stored in the backend object store, updated periodically by the compactor component , and used by the store-gateway.

21. Configuration modification

Specify the configuration file:
/usr/local/mimir/mimir-darwin-amd64 --config.file /usr/local/mimir/mimir.yaml


Mimir alertmanager:
$ mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id annoymous

The following is the front end of the alarm rule, and the alarm rule can be created in the interface mode:
Think about it, the query is also using grafana?


Multi-tenancy configuration:
change multitenancy_enabled: true in the configuration file
and upload the alertmanager configuration file (instance_id is generally the configured node name, which can be customized)

What does mimirtool do? Research
$ mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id instance_id


Find some WeChat or QQ communication groups

The support for opentelemetry is mainly for the OLTP protocol, that is, after the user program uses the opentelemetry SDK to collect data, it does not have to report to the opentelemetry collector, or can directly report to mimir, so that users do not need to build multiple collection services.
After Mimir natively supports the Oltp protocol, we can use the OpenTelemetry SDK to uniformly export it as OLTP. Whether it is pushed to Mimir or forwarded by OTLP through the collector, there is always only one data format on the application side

Try to configure the target, and then see the effect of the configuration

Through Mimir, it is very convenient for us to realize the high availability of Prometheus rule evaluation and alarm management. It not only provides out-of-the-box mimir-tool for daily operation and maintenance management, but also opens up Grafana Kanban, which is convenient for tenants to customize configuration through Grafana Kanban. In this way, the Grafana Kanban is completely a stateless service, and the persistence and consistency of its rule and alarm management configuration data are guaranteed by Mimir.
 

Thirty, prometheus remote-read principle

remote-read allows prometheus to read time series data on remote storage, extending local storage.

When prometheus responds to /query query requests, it is processed by fanoutStorage;

  • fanoutStorage includes localStorage (local TSDB) and remoteStorage (remote storage), both of which implement the query interface;
  • localStorage performs local queries;
  • remoteStorage executes remote queries via HTTP;
  • Merge the above two query results and return them to the client;

The following are remote_write and remote_read configured on prometheus.

remote_write: to mimir

remote_read: read from mimir

Thirty-one, hash ring

1 Introduction

In Grafana Mimir, the hash ring is mainly used for sharding and replication operations. Its function is to share work in a consistent manner among multiple copies of a component, so that other components can determine the address with which to communicate.

A hash ring works by first hashing the workload or data to be shared, and then using the hash result to determine which ring member owns that data.

Grafana Mimir uses the fnv32a hash function, which returns 32-bit unsigned integers with values ​​between 0 and (2^32)-1. This value is called a token and is used as an ID for the data and determines its position on the hash ring. This allows independent determination of which Grafana Mimir instance is the authoritative owner of specific data.

For example, sequences are sharded into multiple ingester components. Computes a token for a given sequence by hashing all tags and tenant ids of the sequence. The ingester instance that owns the series is the one that owns the range of tokens that includes the series tokens.

To divide the possible set of tokens (2^32) among the available instances in the cluster, all running Grafana Mimir components (such as ingesters) join a hash ring. A hash ring is a data structure that divides the token space into ranges and assigns each range to a given Grafana Mimir ring member.

At startup, the instance generates a random token value and registers it with the ring. The value registered with each instance determines which instance owns a given token. The token's owner is the instance with the smallest registered value greater than the looked-up token (wrapping around to zero when (2^32)-1 is reached).

To replicate data across multiple instances, Grafana Mimir traverses the ring clockwise to find replicas, starting with the authoritative owner of the data. The next instance found during traversal of the ring will receive the copied data.

2、Components that use a hash ring

There are several Grafana Mimir components that need a hash ring. Each of the following components builds an independent hash ring:

3. Shuffle Sharding

4. Functions built using hash rings

Grafana Mimir mainly uses hash rings for sharding and replication. Functions built using hash rings:

  • Service Discovery : Instances can discover each other, look up who is registered in the ring.
  • Heartbeat : Instances periodically send heartbeats to the ring to indicate that they are up and running. An instance is considered unhealthy if it does not detect a heartbeat for a period of time.
  • Region-aware replication : Region-aware replication is the replication of data across fault domains and can optionally be enabled in Grafana Mimir. For more information, see Configuring Zone-Aware Replication .
  • Random Sharding : Grafana Mimir optionally supports random sharding in multi-tenant clusters to reduce the blast radius of outages and better isolate tenants. For more information, see Configuring shuffle shards .

Thirty-two, some concepts

samples: refers to a data point, that is, the value of a certain metric+tag at a certain moment.

series: refers to metric+tag, which is the concept of counter in hubble

active series: Active indicators, which are always delivered.

Reference documents:

 Using Grafana Mimir to Realize Cloud Native Monitoring Alarm Visualization: Using Grafana Mimir to Realize Cloud Native Monitoring Alarm Visualization


Comparison of mainstream Prometheus long-term storage solutions: https://kubesphere.io/zh/blogs/prometheus-storage/


Mimir Speed ​​Experience (Part 3): Fine-grained tenant management through runtime configuration: https://xie.infoq.cn/article/bd811ca1741a790b5dd5dd766


Mimir Speed ​​Experience (Part 2): Using Grafana agent to achieve multi-tenant data capture: https://xie.infoq.cn/article/d7df312ab5c58d82b913f7445


This article takes you to understand the past and present of Grafana's latest open source project Mimir: https://xie.infoq.cn/article/2723176da5693f6085c6b1e78


Mimir Speed ​​Experience (Part 4): High reliability of data capture: https://xie.infoq.cn/article/7ae17cc1a6c04b391d872e85a


Mimir Quick Experience (Part 6): Rule Evaluation and Alarm Management: https://xie.infoq.cn/article/1a5a4de7f9fd2c3183e6ec475


Mimir Speed ​​Experience (Part 5): Writing native OTLP data: https://xie.infoq.cn/article/5ea9846b1f7d7b0446c3f4fa3


Mimir source code analysis (1): Challenges brought about by simultaneous placement of massive series chunks: https://xie.infoq.cn/article/14ad0bd19a8256fabf3a4807d


Introduction to Mimir: https://www.jianshu.com/p/c807611ecc9c


Mimir query main process source code: https://www.jianshu.com/p/f8d30a338854


Prometheus performance tuning - what is a high cardinality problem and how to solve it: https://www.jianshu.com/p/cf37c3e0fb92


Grafana Mimir Research: https://zhuanlan.zhihu.com/p/547989004


An article to understand Prometheus's long-term storage mainstream solution: https://zhuanlan.zhihu.com/p/564705066


Optimization of Grafana Mimir in massive time series indicators 22%252C%2522scm%
2522 %253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=168208054616800197065814&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~f irst_rank_ecpm_v1~rank_v31_ecpm-10-124985921-null-null.142^v86^insert_down1, 239^v2^insert_chatgpt&utm_term=mimir&spm=1018.2226.3001.4187


How to streamline the indicators of Prometheus and storage occupies
https://blog.csdn.net/jh035512/article/details/127956841?ops_request_misc=&request_id=102&UTM_TERM=mi Mir & UTM_Medium = Distribute.pc_search_result.none-Task-Blog-2 ~ ALL ~ sobaiduweb~default-8-127956841.142^v86^insert_down1,239^v2^insert_chatgpt&spm=1018.2226.3001.4187


https://www.oschina.net/news/189046/grafana-mimir
Grafana Labs released a high-performance open source time series database: Grafana Mimir


Grafana open source Prometheus long-term storage project Mimir
https://my.oschina.net/u/4197945/blog/5510473


Prometheus performance tuning - horizontal sharding
https://my.oschina.net/u/6187054/blog/5599509


Get started with Grafana Mimir in minutes:
https://grafana.com/blog/2022/04/15/video-get-started-with-grafana-mimir-in-minutes/


Video: How to migrate to Grafana Mimir in less than 4 minutes:
https://grafana.com/blog/2022/04/25/video-how-to-migrate-to-grafana-mimir-in-less-than-4-minutes/


https://grafana.com/docs/mimir/v2.7.x/references/learning-resources/
An article to understand Grafana Mimir: https://xie.infoq.cn/article/1a5a4de7f9fd2c3183e6ec475
Grafana Mimir research: https:/ /zhuanlan.zhihu.com/p/547989004
 

Guess you like

Origin blog.csdn.net/w2009211777/article/details/130317505