Databend Open Source Weekly Issue 113

Databend is a modern cloud data warehouse. Designed for flexibility and efficiency to support your large-scale analysis needs. Free and open source. Experience the cloud service now: https://app.databend.cn .

What's On In Databend

Explore Databend's new developments this week and encounter Databend that is closer to your heart.

Data import into table with extra columns

By default, COPY INTOdata is imported into the table by matching the order of the fields in the file with the corresponding columns in the table. The key is to ensure that the data between the file and table is properly aligned.

If the number of columns in the table is greater than the number of fields in the file, you can manually specify the columns to be imported to ensure alignment.

When importing a data file in CSV format, if the number of columns in the table is greater than the number of fields in the file, and the extra columns are at the end of the table, you can use the option to import the FILE_FORMATdata ERROR_ON_COLUMN_COUNT_MISMATCH.

If you would like to learn more, check out the resources listed below.

Code Corner

Let’s explore code snippets or projects in Databend and the surrounding ecosystem.

Design a read strategy for ParquetReader

There are some problems with the API used directly arrow-rs. When we try to prefetch data for prewhere and topk pushdown, we cannot reuse previously deserialized data blocks, and supporting reuse logic on the existing implementation will be very complicated.

In order to improve the read logic of the row group and reuse the prefetched data in the previous stage, we have extensively restructured the relevant logic and introduced a read strategy for decoupling.

NoPrefetchPoly

There is no prefetching phase. Directly read, deserialize and output the data chunks you need.

PredicateAndTopkPolicy

Prefetch the columns required by prewhere and topk during the prefetch phase. They are deserialized into DataBlock, and counted RowSelection. Then split by batch size DataBlockand store the result in memory VecDeque.

In the final stage RowSelection, the specified remaining columns are read and output in batches DataBlocks. Then merge the prefetched data and output_schemaproject it according to to get the result data block.

TopkOnlyPolicy

Similar to PredicateAndTopkPolicy, but only topk is considered during the prefetch phase.

If you would like to learn more, check out the resources listed below.

Highlights

Here are some noteworthy events, maybe you can find something of interest.

  • Add Spill related information to Query log.
  • Supports using COPY INTO to export data into compressed files.
  • Introduce GET /v1/background/:tenant/background_tasksHTTP API to query background tasks.
  • Read Example 4: Filtering Files with Pattern to learn how to use patterns to filter files.

What's Up Next

We are always open to cutting-edge technologies and innovative ideas, and welcome you to join the community and inject vitality into Databend.

Fix issues detected by SQLsmith

Since the introduction of SQLsmith testing last month, a total of about 40 problems have been detected. Databend Labs is working on fixing these issues to improve system stability in various scenarios.

We hope you can also participate in this work, which may have some simple tasks involving type conversion and special value handling, which can be handled by referring to other previous fixes.

Issues | Found by SQLsmith

If you are interested in this topic, you can try to solve some of the problems or participate in discussions and PR reviews. Alternatively, you can click on https://link.databend.rs/im-feeling-lucky to pick a random question, good luck!

New Contributors

Meet new people in the community. Databend is a better place because of you.

  • @zenus Fixed COPY INTOan issue where pattern mismatch was not detected during execution, #13010 .

Changelog

Check out the changelog for Databend's daily builds to stay up to date on the latest developments.

Address: https://github.com/datafuselabs/databend/releases

Contributors

A big thank you to the contributors for their great work this week.

Connect With Us

Databend is an open source, flexible, low-cost, new data warehouse based on object storage that can also perform real-time analysis. We look forward to your attention and exploring cloud native data warehouse solutions together to create a new generation of open source Data Cloud.

The author of the open source framework NanUI switched to selling steel, and the project was suspended. The first free list in the Apple App Store is the pornographic software TypeScript. It has just become popular, why do the big guys start to abandon it? TIOBE October list: Java has the biggest decline, C# is approaching Java Rust 1.73.0 Released A man was encouraged by his AI girlfriend to assassinate the Queen of England and was sentenced to nine years in prison Qt 6.6 officially released Reuters: RISC-V technology becomes the key to the Sino-US technology war New battlefield RISC-V: Not controlled by any single company or country, Lenovo plans to launch Android PC
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5489811/blog/10116013