7. Monitoring and alarming
Doris can use Prometheus and Grafana for monitoring and collection. Just download the latest version from the official website.
Prometheus official website download: https://prometheus.io/download/
Grafana official website download: https://grafana.com/grafana/download
Doris' monitoring data is exposed through the http interface of FE and BE. Monitoring data is displayed externally in the form of key-value text. Each key may also have a different Label to distinguish it. After the user has set up Doris, he can access the monitoring data through the following interface in the browser.
Frontend: fe_host:fe_http_port/metrics,如 http://zuomm01:8030/metrics
Backend: be_host:be_web_server_port/metrics,如 http://zuomm01:8040/metrics
The entire monitoring architecture is as shown below
7.1、prometheus
1. Upload prometheus-2.26.0.linux-amd64.tar.gz and decompress it
tar -zxvf prometheus-2.26.0.linux-amd64.tar.gz
2. Configure promethues.yml
Configure two targets FE and BE respectively, and define labels and groups to specify the group. If there are multiple clusters, add the -job_name tag to perform the same configuration.
vi prometheus.yml
scrape_configs:
- job_name: 'prometheus_doris'
static_configs:
- targets: ['zuomm01:8030','zuomm02:8030','zuomm03:8030']
labels:
group: fe
- targets: ['zuomm01:8040','zuomm02:8040','zuomm03:8040']
labels:
group: be
3. Start prometheus
nohup /opt/app/prometheus-2.26.0.linux-amd64/prometheus --web.listen-address="0.0.0.0:8181" &
This command will run Prometheus in the background and specify its web port as 8181. After startup, data collection begins and the data is stored in the data directory.
4. Visit
http://zuomm01:8181
Click Status -> Targets in the navigation bar to see the monitoring host nodes of all grouped jobs. Under normal circumstances, all nodes should be UP, indicating that data collection is normal. Click on an Endpoint to see the current monitoring values.
7.2、grafana
1. Upload grafana-7.5.2.linux-amd64.tar.gz and decompress it
tar -zxvf grafana-7.5.2.linux-amd64.tar.gz
2. Configure conf/defaults.ini
vi defaults.ini
http_addr = zuomm01
http_port = 8182
3. Start
nohup /opt/app/grafana-7.5.2/bin/grafana-server &
Access http://zuomm01:8182 through the browser and configure the data source. The Prometheus account and password are both admin.
Adding a data source: On the gear side
Add Prometheus:
add dashboard
Template download address: https://grafana.com/grafana/dashboards/9734/revisions
Upload the prepared doris-overview_rev4.json
Find manager
Import the downloaded doris template
8. Backup and Restore
Doris supports backing up current data in the form of files to remote storage systems through brokers. You can then use the restore command to restore data from the remote storage system to any Doris cluster. Through this function, Doris supports regular snapshot backup of data. You can also use this function to migrate data between different clusters.
8.1. Backup principle
The backup operation is to upload the data of the specified table or partition directly to the remote warehouse for storage in the form of files stored in Doris. When the user submits a Backup request, the system will perform the following operations internally:
1. Snapshot and snapshot upload
The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are performed on snapshots. After the snapshot, changes to the table, imports, and other operations will no longer affect the backup results. Snapshot only generates a hard link to the current data file and takes very little time. After the snapshot is completed, these snapshot files will be uploaded one by one. Snapshot upload is completed concurrently by each Backend.
2. Metadata preparation and upload
After the data file snapshot is uploaded, Frontend will first write the corresponding metadata into a local file, and then upload the local metadata file to the remote location through the broker. terminal warehouse. Complete the final backup job.
8.2. Restore principle
The recovery operation needs to specify the backup data that already exists in a remote warehouse, and then restore the contents of this backup to the local cluster. When the user submits a Restore request, the system will perform the following operations:
1. Create the corresponding metadata locally
This step will first be done in the local cluster. Create structures such as table partitions corresponding to recovery. After creation, the table is visible but inaccessible.
2. Local snapshot
This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the newly created table has no data). Its main purpose is to generate the corresponding snapshot directory on Backend, which is used to later receive the snapshot files downloaded from the remote warehouse.
3. Download snapshot
The snapshot file in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is completed concurrently by each Backend.
4. Effective snapshot
After the snapshot is downloaded, we need to map each snapshot to the metadata of the current local table. Then reload these snapshots to make them effective and complete the final recovery operation.
Key points
1. Backup and recovery related operations are currently only allowed to be performed by users with ADMIN permissions.
2. Only one backup or recovery job is allowed to be executed in a Database.
3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations separately by partition to reduce the cost of failed retries.
8.3. Backup example
1. Create a remote warehouse path
-- 1.启动hdfs
-- 2.启动broker
CREATE REPOSITORY `hdfs_test_backup` -- 远端仓库的名字
WITH BROKER `broker_name`
ON LOCATION "hdfs://linux01:8020/tmp/doris_backup" -- 存储的路径
PROPERTIES (
"username" = "root",
"password" = ""
) ;
2. Perform backup
BACKUP SNAPSHOT [db_name].{snapshot_name}
TO `repository_name`
ON ( -- 表里面的哪些数据
`table_name` [PARTITION (`p1`, ...)],
...
)
PROPERTIES ("key"="value", ...);
3. View backup tasks
SHOW BACKUP from test \G;
mysql> SHOW BACKUP from test \G;
*************************** 1. row ***************************
JobId: 13300
SnapshotName: event_info_log_snapshot
DbName: test
State: FINISHED
BackupObjs: [default_cluster:test.event_info_log]
CreateTime: 2022-11-27 21:29:56
SnapshotFinishedTime: 2022-11-27 21:30:00
UploadFinishedTime: 2022-11-27 21:30:06
FinishedTime: 2022-11-27 21:30:13
UnfinishedTasks:
Progress:
TaskErrMsg:
Status: [OK]
Timeout: 86400
1 row in set (0.02 sec)
4. View the remote warehouse image
SHOW SNAPSHOT ON `repo_name`
[WHERE SNAPSHOT = "snapshot" [AND TIMESTAMP =
"backup_timestamp"]];
mysql> SHOW SNAPSHOT ON hdfs_test_backup;
+-------------------------+---------------------+--------+
| Snapshot | Timestamp | Status |
+-------------------------+---------------------+--------+
| event_info_log_snapshot | 2022-11-27-21-29-56 | OK |
+-------------------------+---------------------+--------+
5. Cancel backup
CANCEL BACKUP FROM test;
8.4. Recovery example
Restore the data previously backed up through the BACKUP command to the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW RESTORE command.
- Only support for restoring OLAP type tables
- Supports restoring multiple tables at one time. This needs to be consistent with the table in your corresponding backup.
illustrate:
1. There can only be one executing BACKUP or RESTORE task under the same database.
2. The ON clause identifies the tables and partitions that need to be restored. If no partition is specified, all partitions of the table will be restored by default. The specified table and partition must already exist in the warehouse backup
3. The backup table name in the warehouse can be restored to a new table through the AS statement. But the new table name cannot already exist in the database. The partition name cannot be modified.
4. You can restore the backup table in the warehouse and replace the existing table with the same name in the database, but you must ensure that the table structure of the two tables is completely consistent. The table structure includes: table name, columns, partitions, rollup, etc.
5. You can specify some partitions of the recovery table, and the system will check whether the partition Range or List can match.
6. PROPERTIES currently supports the following properties:
- "backup_timestamp" = "2018-05-04-16-45-08": Specifies which time version of the corresponding backup to restore, required. This information can be obtained through the SHOW SNAPSHOT ON warehouse name; statement.
- "replication_num" = "3": Specifies the number of replicas of the restored table or partition. Default is 3. If you restore an existing table or partition, the number of copies must be the same as the number of copies of the existing table or partition. At the same time, there must be enough hosts to accommodate multiple replicas.
- "timeout" = "3600": task timeout, the default is one day. Unit seconds.
Restore table backup_tbl in backup snapshot_1 from example_repo to database example_db1, with the time version "2021-05-04-16-45-08". Revert to 1 copy:
-- 创建表:
create table event_info
(
user_id varchar(20),
event_id varchar(20),
event_action varchar(20),
event_time datetime
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 1;
-- 指定哪个数据库里面的哪一个快招表的名字
RESTORE SNAPSHOT test.event_info_log_snapshot
-- 从哪个仓库中的哪一个快照中
FROM `hdfs_test_backup` -- 仓库名字
ON ( `event_info_log` ) -- 对应需要的那一张表
PROPERTIES
(
-- 指定时间版本
"backup_timestamp"='2022-11-27-21-29-56'
);
View recovery tasks
mysql> SHOW RESTORE from test \G;
*************************** 1. row ***************************
JobId: 13316
Label: event_info_log_snapshot
Timestamp: 2022-11-27-21-29-56
DbName: default_cluster:test
State: FINISHED
AllowLoad: false
ReplicationNum: 3
ReplicaAllocation: tag.location.default: 3
ReserveReplica: false
ReserveDynamicPartitionEnable: false
RestoreObjs: {
"name": "event_info_log_snapshot",
"database": "test",
"backup_time": 1669555796781,
"content": "ALL",
"olap_table_list": [
{
"name": "event_info_log",
"partition_names": [
"event_info_log"
]
}
],
"view_list": [],
"odbc_table_list": [],
"odbc_resource_list": []
}
CreateTime: 2022-11-27 21:55:27
MetaPreparedTime: 2022-11-27 21:55:28
SnapshotFinishedTime: 2022-11-27 21:55:31
DownloadFinishedTime: 2022-11-27 21:55:37
FinishedTime: 2022-11-27 21:55:43
UnfinishedTasks:
Progress:
TaskErrMsg:
Status: [OK]
Timeout: 86400
1 row in set (0.01 sec)
Cancel restore
CANCEL RESTORE FROM db_name;
8.5. Delete the remote warehouse
DROP REPOSITORY `repo_name`;
Deleting a warehouse only deletes the mapping of the warehouse in Doris and does not delete the actual warehouse data. After deletion, you can map to the warehouse again by specifying the same broker and LOCATION.
Original text: https://mp.weixin.qq.com/s/C9P8Zoyw6MdTt9BNEcL0MA