Databend Open Source Weekly Issue 107

Databend is a modern cloud data warehouse. Designed for flexibility and efficiency, it will escort your large-scale analysis needs. Free and open source. Experience cloud services immediately: https://app.databend.cn .

What's On In Databend

Explore the new progress of Databend this week and meet the Databend that is closer to your heart.

Understanding Connection Parameters

A connection parameter is a set of authentication and configuration information required to establish a connection to an external storage service supported by Databend, such as Amazon S3. These parameters are surrounded by parentheses and consist of a set of key-value pairs separated by commas or spaces. It will be used when creating Stage COPY INTOand querying external files.

The following SQL statement shows how to use connection parameters to create a Stage with S3 as the underlying storage.

CREATE STAGE my_s3_stage
URL = 's3://load/files/'
CONNECTION = (
    ACCESS_KEY_ID = '<your-access-key-id>',
    SECRET_ACCESS_KEY = '<your-secret-access-key>'
);

If you would like to learn more, please review the resources listed below.

Hive Catalog supports configuring storage parameters

In the past week, Databend has introduced storage parameter options for Hive Catalog, allowing it to configure specific storage services, no longer relying on Default Catalog's own storage backend.

The following example shows how to create a Hive Catalog with MinIO as the underlying storage service:

CREATE CATALOG hive_ctl 
TYPE = HIVE 
CONNECTION =(
    ADDRESS = '127.0.0.1:9083' 
    URL = 's3://warehouse/' 
    AWS_KEY_ID = 'admin' 
    AWS_SECRET_KEY = 'password' 
    ENDPOINT_URL = 'http://localhost:9000/'
)

If you would like to learn more, please review the resources listed below.

Code Corner

Let's explore code snippets or projects in Databend and the surrounding ecosystem.

gitoxideSpeed ​​up Git dependency downloads with

gitoxideis a high-performance, modern Git implementation written in Rust. Using the feature (Unstable) cargoof gitoxide, you can use gitoxidecrate instead to git2perform various git operations, so as to obtain several times performance improvement when downloading crates-index and git dependencies.

cargo {build | clippy | test} Databend recently enabled this feature in CI for , you can also try adding -Zgitoxidethe option when developing locally to speed up the build process:

cargo -Zgitoxide=fetch,shallow-index,shallow-deps build

If you would like to learn more, please review the resources listed below.

Highlights

Here are some notable events that you might find interesting.

  • SELECTThe clause can also be used alone without being used with VALUES.
  • Support for modifying default values ​​when changing columns.
  • Add virtual column support for tables in Parquet format
  • Support for automatic reclustering of tables after write operations ( COPY INTOand )REPLACE INTO

What's Up Next

We are always open to cutting-edge technologies and innovative ideas, and you are welcome to join the community and breathe life into Databend.

Enhanced infer_schemaability to support file paths

Currently, Databend supports querying both the file pointed to by the file path and the file located in the stage, for example:

select * from 'fs:///home/...';
select * from 's3://bucket/...';
select * from @stage;

However, currently infer_schemaonly supports processing files located in the Stage:

select * from infer_schema(location=>'@stage/...');

If files located in other paths are required for inference, an error will be reported:

select * from infer_schema(location =>'fs:///home/...'); -- this will panic.

We hope to unify infer_schemathe behavior of the function, allowing it to infer files in all locations, making it more usable.

Issue #12458 | Feature: infer_schema support normal file path

If you are interested in this topic, you can try to solve some of the problems or participate in discussions and PR reviews. Alternatively, you can click on https://link.databend.rs/im-feeling-lucky to pick a random question, good luck!

Changelog

Head over to the changelog for Databend's daily builds to stay up to date on developments.

Address: https://github.com/datafuselabs/databend/releases

Contributors

Many thanks to the contributors for their excellent work this week.

Connect With Us

Databend is an open source, flexible, low-cost, new data warehouse that can also perform real-time analysis based on object storage. Looking forward to your attention, let's explore cloud-native data warehouse solutions together to create a new generation of open source Data Cloud.

Redis 7.2.0 was released, the most far-reaching version Chinese programmers refused to write gambling programs, 14 teeth were pulled out, and 88% of the whole body was damaged. Flutter 3.13 was released. System Initiative announced that all its software would be open source. The first large-scale independent App appeared , Grace changed its name to "Doubao" Spring 6.1 is compatible with virtual threads and JDK 21 Linux tablet StarLite 5: default Ubuntu, 12.5-inch Chrome 116 officially released Red Hat redeployed desktop Linux development, the main developer was transferred away Kubernetes 1.28 officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5489811/blog/10100917