exposure! Why does the functional design of Apache SeaTunnel Catalog greatly simplify the steps for users to enable it?

file

Catalog (catalog) provides metadata about databases, tables, and information required to access data, as well as a unified API to manage metadata, verify connections, and make metadata available to Sources (data sources), Sinks (data sinks) and Web accessible.

Catalog enables users to reference existing metadata in their data systems and automatically maps to the corresponding metadata of SeaTunnel. All in all, Catalog greatly simplifies the steps to start using SeaTunnel with users' existing systems and significantly enhances the user experience.

The Importance of Catalog Features

At present, many existing functions are implemented based on Catalog, such as CDC (change data capture) multi-table synchronization function, we use Catalog to obtain tables and field lists.

Apache SeaTunnel is currently designing a feature called SaveMode, which is implemented by the connector to support the processing of existing table structures and data in the target table. These functions are also implemented based on Catalog.

How is the Catalog designed? How to implement a new Catalog? The following is a detailed introduction.

Catalog API

initialization operation

Note: The directory name is not currently used and is expected to be provided to the web backend for saving and querying.

Java
public interface CatalogFactory extends Factory { String factoryIdentifier(); OptionRule optionRule(); Catalog createCatalog(String catalogName, ReadonlyConfig options); } public interface Catalog extends AutoCloseable { void open() throws CatalogException; void close() throws CatalogException; }

database operation

java
public interface Catalog extends AutoCloseable { // -------------------------------------------------------------------------------------------- // 数据库 // -------------------------------------------------------------------------------------------- String getDefaultDatabase() throws CatalogException; boolean databaseExists(String databaseName) throws CatalogException; List<String> listDatabases() throws CatalogException; void createDatabase(String databaseName, boolean ignoreIfExists) throws DatabaseAlreadyExistException, CatalogException; void dropDatabase(String databaseName, boolean ignoreIfNotExists) throws DatabaseNotExistException, CatalogException; }

table manipulation

java
public interface Catalog extends AutoCloseable { // -------------------------------------------------------------------------------------------- // 表格 // -------------------------------------------------------------------------------------------- List<String> listTables(String databaseName) throws CatalogException, DatabaseNotExistException; boolean tableExists(TablePath tablePath) throws CatalogException; CatalogTable getTable(TablePath tablePath) throws CatalogException, TableNotExistException; void createTable(TablePath tablePath, CatalogTable table, boolean ignoreIfExists) throws TableAlreadyExistException, DatabaseNotExistException, CatalogException; void dropTable(TablePath tablePath, boolean ignoreIfNotExists) throws TableNotExistException, CatalogException; }

Here is an implemented example.

MySQL Catalog

How to use MySQL Catalog:

file

  • username [String] The database name to use when connecting to the database server.
  • password [String] The password to use when connecting to the database server.
  • base-url [String] The URL must contain the database, eg "jdbc:mysql://localhost:5432/db" or "jdbc:mysql://localhost:5432/db?useSSL=true".
  • table-names [List] List of database table names to capture. The table name needs to include the database name, for example: database_name.table_name.
  • database-pattern [String] Regular expression of database names to capture.
  • table-pattern [String] Regular expression of database table names to capture. The table name needs to include the database name, for example: database_. \.table_ ..

profile configuration

conf
[source/sink] { [connector-factory-id] { catalog { factory = "MySQL" username = "test" password = "123456" base-url = "jdbc:mysql://localhost:5432/db" table-names = [ "db.table" ] } } }

How to use Catalog

For connectors that support Catalog, we will turn on a Catalog parameter to configure the Catalog used:

example

sql
env { "job.mode"=STREAMING "job.name"="cdc_mysql_to_mysql" "checkpoint.interval"="2000" "custom_parameters"="" } source { MySQL-CDC { parallelism = 1 catalog { factory = "MySQL" # 默认情况下,Catalog 将使用与连接器同名的选项 } username = "mysqluser" password = "mysqlpw" database-names = ["seatunnel-test"] table-pattern = "seatunnel-test\\.orders_\\d+" base-url = "jdbc:mysql://localhost:54508/seatunnel-test" } } sink { jdbc { url = "jdbc:mysql://localhost:4000/test" driver = "com.mysql.cj.jdbc.Driver" catalog { factory = "MySQL" username = "root" password = "" base-url = "jdbc:mysql://localhost:4000/test" table-pattern = "seatunnel-test2\\.orders_\\d+" } user = "root" password = "" query = "insert into sink(age, name) values(?,?)" } }

future plan

Currently, we have only implemented part of the Catalog. In the future, we plan to expand the scope of the Catalog implementation to include more connectors that support the Catalog, which will enable more connectors to support features such as SaveMode and automatic table creation.

Apache SeaTunnel is a distributed, high-performance, scalable data integration platform for massive data (offline & real-time) synchronization and transformation

Sincerely welcome more people to join!

We believe that under the guidance of The Apache Way such as "Community Over Code" (community is greater than code), "Open and Cooperation" (open collaboration), "Meritocracy" (elite management), and "diversity and consensus decision-making", we We will usher in a more diverse and inclusive community ecology, and jointly build technological progress brought about by the spirit of open source!

We sincerely invite all partners who are interested in making local open source global to join the family of SeaTunnel contributors and build open source together!

Guess you like

Origin blog.csdn.net/weixin_54625990/article/details/131251681