Integrated Metadata Management Platform - OpenMetadata Getting Started Book

Hello everyone, I am Dugufeng, a former port coal worker, currently working as the person in charge of big data in a state-owned enterprise, and the manager of the big data flow of the official account. In the last two years, because of the needs of the company and the development trend of big data, I began to learn about data governance. Today I will share with you the integrated metadata management platform - OpenMetadata.

This document is compiled based on the official website and personal practice data. For subsequent documents, please pay attention to the big data flow of the official account , and will continue to update~

This article is divided into four parts, respectively from the open source metadata management platform, OpenMetadata introduction, installation process and function demonstration four aspects.

1. Open Source Metadata Management Platform

Metadata management is the starting point for enterprises to comprehensively carry out data governance. Various metadata management tools and metadata management platforms emerge in endlessly.

There are many open source metadata management platforms. Open source metadata management platforms are tools for collecting, storing, and managing data. They provide a scalable way to organize and maintain metadata information for data. Here are some common open source metadata management platforms:

  1. Apache Atlas: Apache Atlas is an open source big data metadata management and data governance platform designed to help organizations collect, organize and manage metadata information of data. It provides rich metadata models and search functions, and can be integrated with various data storage and processing platforms.

  2. LinkedIn DataHub: LinkedIn DataHub is LinkedIn's open source metadata search and discovery platform. It provides a centralized metadata repository for managing and browsing metadata information of various types of datasets and data assets.

  3. Amundsen: Amundsen is Lyft's open source data discovery and metadata management platform. It provides a user-friendly interface that enables users to search, browse and contribute metadata information of datasets. Amundsen also supports integration with other data tools and platforms.

  4. Metacat: Metacat is Netflix's open source data discovery and metadata management platform. It provides a unified interface to find and browse metadata information of various datasets, and supports integration with other data tools and services.

These open source metadata management platforms provide various functions, such as metadata storage, search, browsing, data asset relationship management, data lineage tracking, etc., to help organizations better manage and utilize metadata information of data.

The OpenMetadata we are going to introduce today hopes to provide a metadata management standard to allow us to better manage metadata.

2. Introduction to OpenMetadata

OpenMetadata is an all-in-one platform for data discovery, data lineage, data quality, observability, governance, and team collaboration. It is one of the fastest growing open source projects with a vibrant community and adoption by numerous companies across various industry verticals. OpenMetadata is powered by a centralized metadata store based on open metadata standards/APIs, supports connectors for various data services, and enables end-to-end metadata management, allowing you to freely release the value of data assets.

At present, OpenMetadata has 2.5k stars on Github, and has just updated version 1.1.

Considering the network problems of some students, you can reply "OpenMetadata1.1" in the big data flow background to download the source code and installation package, which is valid for one month.

bc6564919789ecfa92e1f7a741bc6616.png

a01d7689c97f436ddced3b13c4449ca8.png

OpenMetadata includes the following:

  • Metadata Schema - Defines the core abstraction and vocabulary of metadata using a schema of types, entities, and relationships between entities. This is the basis for open metadata standards. Extensibility for entities and types with custom properties is also supported.

  • Metadata Store - Stores a metadata graph that connects data assets, user and tool generated metadata.

  • Metadata API - for generating and consuming metadata built on top of user interface patterns and tool, system, and service integrations.

  • Ingestion Framework - Pluggable framework for integrating tools and ingesting metadata into a metadata store, supports about 55 connectors. The ingestion framework supports well-known data warehouses such as Google BigQuery, Snowflake, Amazon Redshift, and Apache Hive; databases such as MySQL, Postgres, Oracle, and MSSQL; dashboard services such as Tableau, Superset, and Metabase; messaging services such as Kafka, Redpanda; and Airflow , Glue, Fivetran, Dagster, and other plumbing services.

  • OpenMetadata User Interface - A single place for users to discover and collaborate on all data.

967acc39353e55a641b3af11d217ceec.png

Core functions

  • Data Collaboration - Get event notifications through activity feeds. Use webhooks to send alerts and notifications. Add an announcement to notify the team of upcoming changes. Add a task to request a description or glossary term approval workflow. Add user mentions and collaborate using conversation threads.

  • Data Quality and Analyzers - Standardized tests and data quality metadata. Group related tests into test suites. Supports custom SQL data quality testing. There is an interactive dashboard to drill down to details.

  • Data Lineage - Supports rich column-level lineage. Efficiently filter queries to extract lineage. Manually edit lineages as needed and connect entities using the no-code editor.

  • Comprehensive roles and policies - handle complex access control use cases and hierarchical teams.

  • Connectors - Supports 55 connectors to various databases, dashboards, pipelines, and messaging services.

  • Glossary - Add a controlled vocabulary to describe important concepts and terms within your organization. Add glossaries, terms, tags, descriptions and reviewers.

  • Data Security - Supports Google, Okta, custom OIDC, Auth0, Azure, Amazon Cognito, and OneLogin as identity providers for SSO. Additionally, AWS SSO and Google SAML-based authentication are supported.

3. Installation process

Mainly use the Docker installation method, which can be done in a few minutes.

First check the python version.

python3 --version

Three versions of python 3.7, 3.8 and 3.9 are required.

Check the docker version.

docker --version

20.10.0 or later.

docker compose version

Requires docker compose 2.1.1 or higher.

create folder

mkdir openmetadata-docker && cd openmetadata-docker

Create a virtual environment.

python3 -m venv env

The virtual environment takes effect.

source env/bin/activate

update pip

pip3 install --upgrade pip setuptools

install openmetadata

pip3 install --upgrade "openmetadata-ingestion[docker]"

Make sure the installation is successful

metadata docker --help

Start the container

metadata docker --start

start postgre

metadata docker --start -db postgres

Subsequent visits

http://localhost:8585

success!

babdb1c0f51ae6196a61cc05acc815f0.png

4. Function demonstration

Home display

decf1af5978ef9d766c7a1bd5cf6ad42.png

multilingual support

db1502781e1fd1e77423e1293ea37c50.png

Overview page

773dde3f34acb6e8f6a908ae747ff94f.png

Data Quality Monitoring Page

c57c21ecf9c34d720caa460c05f65db6.png

data assets

d4b30422cf38e6c7d852e5de7b8c5912.png

Business Glossary Function

706a755c49a1c8c3c6836c6d74ac0b78.png

Configuration of some data sources.

7ae20ebc53bb48567ed35a5b5b087fda.png

To be continued~

For more knowledge sharing about big data, data governance, and artificial intelligence, please pay attention to big data flow.

Guess you like

Origin blog.csdn.net/xiangwang2206/article/details/131693173