Apache Doris detailed tutorial (3)

7. Monitoring and alarming

Doris can use Prometheus and Grafana for monitoring and collection. Just download the latest version from the official website.

Prometheus official website download: https://prometheus.io/download/

Grafana official website download: https://grafana.com/grafana/download

Doris' monitoring data is exposed through the http interface of FE and BE. Monitoring data is displayed externally in the form of key-value text. Each key may also have a different Label to distinguish it. After the user has set up Doris, he can access the monitoring data through the following interface in the browser.

Frontend: fe_host:fe_http_port/metrics,如 http://zuomm01:8030/metrics

Backend: be_host:be_web_server_port/metrics,如 http://zuomm01:8040/metrics

The entire monitoring architecture is as shown below

Insert image description here

7.1、prometheus

1. Upload prometheus-2.26.0.linux-amd64.tar.gz and decompress it

 tar -zxvf prometheus-2.26.0.linux-amd64.tar.gz 

2. Configure promethues.yml

Configure two targets FE and BE respectively, and define labels and groups to specify the group. If there are multiple clusters, add the -job_name tag to perform the same configuration.

vi prometheus.yml 
scrape_configs: 
  - job_name: 'prometheus_doris'
    static_configs:
    - targets: ['zuomm01:8030','zuomm02:8030','zuomm03:8030']
      labels:
           group: fe
    - targets: ['zuomm01:8040','zuomm02:8040','zuomm03:8040']
      labels:
           group: be

3. Start prometheus

nohup /opt/app/prometheus-2.26.0.linux-amd64/prometheus --web.listen-address="0.0.0.0:8181" & 

This command will run Prometheus in the background and specify its web port as 8181. After startup, data collection begins and the data is stored in the data directory.

4. Visit

http://zuomm01:8181

Click Status -> Targets in the navigation bar to see the monitoring host nodes of all grouped jobs. Under normal circumstances, all nodes should be UP, indicating that data collection is normal. Click on an Endpoint to see the current monitoring values.

7.2、grafana

1. Upload grafana-7.5.2.linux-amd64.tar.gz and decompress it

tar -zxvf grafana-7.5.2.linux-amd64.tar.gz 

2. Configure conf/defaults.ini

vi defaults.ini
http_addr = zuomm01
http_port = 8182

3. Start

nohup /opt/app/grafana-7.5.2/bin/grafana-server &

Access http://zuomm01:8182 through the browser and configure the data source. The Prometheus account and password are both admin.

Insert image description here
Insert image description here
Adding a data source: On the gear side

Insert image description here
Add Prometheus:

Insert image description here
add dashboard

Template download address: https://grafana.com/grafana/dashboards/9734/revisions

Upload the prepared doris-overview_rev4.json

Find manager

Insert image description here
Import the downloaded doris template

Insert image description here
Insert image description here

8. Backup and Restore

Doris supports backing up current data in the form of files to remote storage systems through brokers. You can then use the restore command to restore data from the remote storage system to any Doris cluster. Through this function, Doris supports regular snapshot backup of data. You can also use this function to migrate data between different clusters.

8.1. Backup principle

The backup operation is to upload the data of the specified table or partition directly to the remote warehouse for storage in the form of files stored in Doris. When the user submits a Backup request, the system will perform the following operations internally:

Insert image description here
1. Snapshot and snapshot upload

The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are performed on snapshots. After the snapshot, changes to the table, imports, and other operations will no longer affect the backup results. Snapshot only generates a hard link to the current data file and takes very little time. After the snapshot is completed, these snapshot files will be uploaded one by one. Snapshot upload is completed concurrently by each Backend.

2. Metadata preparation and upload
After the data file snapshot is uploaded, Frontend will first write the corresponding metadata into a local file, and then upload the local metadata file to the remote location through the broker. terminal warehouse. Complete the final backup job.

8.2. Restore principle

The recovery operation needs to specify the backup data that already exists in a remote warehouse, and then restore the contents of this backup to the local cluster. When the user submits a Restore request, the system will perform the following operations:
Insert image description here
1. Create the corresponding metadata locally
This step will first be done in the local cluster. Create structures such as table partitions corresponding to recovery. After creation, the table is visible but inaccessible.

2. Local snapshot
This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the newly created table has no data). Its main purpose is to generate the corresponding snapshot directory on Backend, which is used to later receive the snapshot files downloaded from the remote warehouse.

3. Download snapshot
The snapshot file in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is completed concurrently by each Backend.

4. Effective snapshot
After the snapshot is downloaded, we need to map each snapshot to the metadata of the current local table. Then reload these snapshots to make them effective and complete the final recovery operation.

Key points

1. Backup and recovery related operations are currently only allowed to be performed by users with ADMIN permissions.
2. Only one backup or recovery job is allowed to be executed in a Database.
3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations separately by partition to reduce the cost of failed retries.

8.3. Backup example

1. Create a remote warehouse path

-- 1.启动hdfs
-- 2.启动broker

CREATE REPOSITORY `hdfs_test_backup`  -- 远端仓库的名字
WITH BROKER `broker_name` 
ON LOCATION "hdfs://linux01:8020/tmp/doris_backup"  -- 存储的路径
PROPERTIES ( 
 "username" = "root", 
 "password" = "" 
) ;

2. Perform backup

BACKUP SNAPSHOT [db_name].{snapshot_name} 
TO `repository_name` 
ON ( -- 表里面的哪些数据 
 `table_name` [PARTITION (`p1`, ...)], 
 ... 
) 
PROPERTIES ("key"="value", ...);

3. View backup tasks

SHOW BACKUP from test \G;

mysql> SHOW BACKUP from test \G;
*************************** 1. row ***************************
               JobId: 13300
        SnapshotName: event_info_log_snapshot
              DbName: test
               State: FINISHED
          BackupObjs: [default_cluster:test.event_info_log]
          CreateTime: 2022-11-27 21:29:56
SnapshotFinishedTime: 2022-11-27 21:30:00
  UploadFinishedTime: 2022-11-27 21:30:06
        FinishedTime: 2022-11-27 21:30:13
     UnfinishedTasks: 
            Progress: 
          TaskErrMsg: 
              Status: [OK]
             Timeout: 86400
1 row in set (0.02 sec)

4. View the remote warehouse image

SHOW SNAPSHOT ON `repo_name` 
 [WHERE SNAPSHOT = "snapshot" [AND TIMESTAMP =  
"backup_timestamp"]]; 


mysql> SHOW SNAPSHOT ON hdfs_test_backup;
+-------------------------+---------------------+--------+
| Snapshot                | Timestamp           | Status |
+-------------------------+---------------------+--------+
| event_info_log_snapshot | 2022-11-27-21-29-56 | OK     |
+-------------------------+---------------------+--------+

5. Cancel backup

CANCEL BACKUP FROM test; 

8.4. Recovery example

Restore the data previously backed up through the BACKUP command to the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW RESTORE command.

  • Only support for restoring OLAP type tables
  • Supports restoring multiple tables at one time. This needs to be consistent with the table in your corresponding backup.

illustrate:

1. There can only be one executing BACKUP or RESTORE task under the same database.

2. The ON clause identifies the tables and partitions that need to be restored. If no partition is specified, all partitions of the table will be restored by default. The specified table and partition must already exist in the warehouse backup

3. The backup table name in the warehouse can be restored to a new table through the AS statement. But the new table name cannot already exist in the database. The partition name cannot be modified.

4. You can restore the backup table in the warehouse and replace the existing table with the same name in the database, but you must ensure that the table structure of the two tables is completely consistent. The table structure includes: table name, columns, partitions, rollup, etc.

5. You can specify some partitions of the recovery table, and the system will check whether the partition Range or List can match.

6. PROPERTIES currently supports the following properties:

  • "backup_timestamp" = "2018-05-04-16-45-08": Specifies which time version of the corresponding backup to restore, required. This information can be obtained through the SHOW SNAPSHOT ON warehouse name; statement.
  • "replication_num" = "3": Specifies the number of replicas of the restored table or partition. Default is 3. If you restore an existing table or partition, the number of copies must be the same as the number of copies of the existing table or partition. At the same time, there must be enough hosts to accommodate multiple replicas.
  • "timeout" = "3600": task timeout, the default is one day. Unit seconds.

Restore table backup_tbl in backup snapshot_1 from example_repo to database example_db1, with the time version "2021-05-04-16-45-08". Revert to 1 copy:

-- 创建表:
create table event_info
(
user_id varchar(20),
event_id varchar(20),
event_action varchar(20),
event_time datetime
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 1;


-- 指定哪个数据库里面的哪一个快招表的名字
RESTORE SNAPSHOT test.event_info_log_snapshot
-- 从哪个仓库中的哪一个快照中
FROM `hdfs_test_backup` -- 仓库名字
ON ( `event_info_log` )  -- 对应需要的那一张表
PROPERTIES 
( 
-- 指定时间版本
 "backup_timestamp"='2022-11-27-21-29-56'
); 

View recovery tasks

mysql> SHOW RESTORE from test \G;
*************************** 1. row ***************************
                        JobId: 13316
                        Label: event_info_log_snapshot
                    Timestamp: 2022-11-27-21-29-56
                       DbName: default_cluster:test
                        State: FINISHED
                    AllowLoad: false
               ReplicationNum: 3
            ReplicaAllocation: tag.location.default: 3
               ReserveReplica: false
ReserveDynamicPartitionEnable: false
                  RestoreObjs: {
  "name": "event_info_log_snapshot",
  "database": "test",
  "backup_time": 1669555796781,
  "content": "ALL",
  "olap_table_list": [
    {
      "name": "event_info_log",
      "partition_names": [
        "event_info_log"
      ]
    }
  ],
  "view_list": [],
  "odbc_table_list": [],
  "odbc_resource_list": []
}
                   CreateTime: 2022-11-27 21:55:27
             MetaPreparedTime: 2022-11-27 21:55:28
         SnapshotFinishedTime: 2022-11-27 21:55:31
         DownloadFinishedTime: 2022-11-27 21:55:37
                 FinishedTime: 2022-11-27 21:55:43
              UnfinishedTasks: 
                     Progress: 
                   TaskErrMsg: 
                       Status: [OK]
                      Timeout: 86400
1 row in set (0.01 sec)

Cancel restore

CANCEL RESTORE FROM db_name;

8.5. Delete the remote warehouse

DROP REPOSITORY `repo_name`;  

Deleting a warehouse only deletes the mapping of the warehouse in Doris and does not delete the actual warehouse data. After deletion, you can map to the warehouse again by specifying the same broker and LOCATION.

Original text: https://mp.weixin.qq.com/s/C9P8Zoyw6MdTt9BNEcL0MA

Guess you like

Origin blog.csdn.net/qq_44787816/article/details/134770398