Data Governance Content

https://space.bilibili.com/405479587 The source of the content of the article is the up master of station b, Yuxing

Data governance content
1. Model:
Due to the rapid expansion of early business, the metadata is not well controlled, resulting in a large number of non-compliant models in the mature stage.
Solution: Data standard: Metadata supplementation
Construction control:
Regular scan of model review for large needs: Irregular model layered references, model cross-ods layer dependencies, empty tables, tables that have not been updated, etc.
Chimney models are offline in time: Chimney tables are switched/offline in time to improve the reuse rate of core data models

2. Resource
storage:
In business development, there are a large number of useless data tables waiting to be offline, and data tables with too long life cycle settings, which have not been rectified, and the long-term unused/reference model has been sorted out, and the life cycle does not conform to the current Standard model, non-partitioned, empty table, number of files, file format, etc. (fished out through data lineage model or platform)

	方案:
	   设置合理的表周期
	   长期未引用、使用表下线
	   压缩、存储格式优化:ods使用zlib压缩、dwd, 使用parquet+snappy dm准备由parquet+snappy 转为parquet+zstd
	   定期扫描:空表:表格式、未设置生命周期、未设置分区
	   切换数据格式
	   
 计算:
		梳理出数据倾斜,消耗大,运行时间过长,空跑等任务(通过meta模型或平台捞出)
		1.根据梳理的存储,下线相对应的计算任务
		2.运行时长过长、资源消耗大任务找原因
		3.针对任务调度时间规划不合理,导致凌晨时间段资源消耗较高任务,提前/延后任务调度时间,做到资源合理分配利用
		4、对于数据价值较低/烟囱开发/无效监控项任务,需要及时下线或将字段迁移至核心表
		5、规划核心任务 并分配任务执行优先级 把非核心的任务靠后运行
		6、
		
小文件治理 :spark3、定期扫描、合并

How to evaluate the quality of a data warehouse
From a technical perspective, a data warehouse should have cost, quality, efficiency requirements, and security capabilities; from a business perspective, a data warehouse should support business construction and cover as many business scenarios as possible, requiring data It can be obtained in time and can meet the needs of business data

1.数据质量
   评估方法:准确性、及时性、一致性、流程完整性
   流程:
		事前预防:监控。事后复盘:完善dqc规则和告警
2.模型建设
	评估方法:规范度、元数据完善度、复用度、稳定性、扩展性、合理性
3.数据安全
     评估方法:角色权限是否划分、权限管控、数据表是否分级、对外数据是否脱敏
4.成本、性能
      评估方法:无用表、任务是否及时下线、表生命周期是否合理、数据倾斜任务数、运行时长过长任务数、空跑任务、小文件过多数据表、成本管理
5.用户用数体验
6.数据资产覆盖

Guess you like

Origin blog.csdn.net/weixin_43015677/article/details/132211164