Databend is a modern cloud data warehouse. Designed for flexibility and efficiency to support your large-scale analysis needs. Free and open source. Experience the cloud service now: https://app.databend.cn .
What's On In Databend
Explore Databend's new developments this week and encounter Databend that is closer to your heart.
Data import into table with extra columns
By default, COPY INTO
data is imported into the table by matching the order of the fields in the file with the corresponding columns in the table. The key is to ensure that the data between the file and table is properly aligned.
If the number of columns in the table is greater than the number of fields in the file, you can manually specify the columns to be imported to ensure alignment.
When importing a data file in CSV format, if the number of columns in the table is greater than the number of fields in the file, and the extra columns are at the end of the table, you can use the option to import the FILE_FORMAT
data ERROR_ON_COLUMN_COUNT_MISMATCH
.
If you would like to learn more, check out the resources listed below.
Code Corner
Let’s explore code snippets or projects in Databend and the surrounding ecosystem.
Design a read strategy for ParquetReader
There are some problems with the API used directly arrow-rs
. When we try to prefetch data for prewhere and topk pushdown, we cannot reuse previously deserialized data blocks, and supporting reuse logic on the existing implementation will be very complicated.
In order to improve the read logic of the row group and reuse the prefetched data in the previous stage, we have extensively restructured the relevant logic and introduced a read strategy for decoupling.
NoPrefetchPoly
There is no prefetching phase. Directly read, deserialize and output the data chunks you need.
PredicateAndTopkPolicy
Prefetch the columns required by prewhere and topk during the prefetch phase. They are deserialized into DataBlock
, and counted RowSelection
. Then split by batch size DataBlock
and store the result in memory VecDeque
.
In the final stage RowSelection
, the specified remaining columns are read and output in batches DataBlocks
. Then merge the prefetched data and output_schema
project it according to to get the result data block.
TopkOnlyPolicy
Similar to PredicateAndTopkPolicy
, but only topk is considered during the prefetch phase.
If you would like to learn more, check out the resources listed below.
Highlights
Here are some noteworthy events, maybe you can find something of interest.
- Add Spill related information to Query log.
- Supports using COPY INTO to export data into compressed files.
- Introduce
GET /v1/background/:tenant/background_tasks
HTTP API to query background tasks. - Read Example 4: Filtering Files with Pattern to learn how to use patterns to filter files.
What's Up Next
We are always open to cutting-edge technologies and innovative ideas, and welcome you to join the community and inject vitality into Databend.
Fix issues detected by SQLsmith
Since the introduction of SQLsmith testing last month, a total of about 40 problems have been detected. Databend Labs is working on fixing these issues to improve system stability in various scenarios.
We hope you can also participate in this work, which may have some simple tasks involving type conversion and special value handling, which can be handled by referring to other previous fixes.
If you are interested in this topic, you can try to solve some of the problems or participate in discussions and PR reviews. Alternatively, you can click on https://link.databend.rs/im-feeling-lucky to pick a random question, good luck!
New Contributors
Meet new people in the community. Databend is a better place because of you.
Changelog
Check out the changelog for Databend's daily builds to stay up to date on the latest developments.
Address: https://github.com/datafuselabs/databend/releases
Contributors
A big thank you to the contributors for their great work this week.
Connect With Us
Databend is an open source, flexible, low-cost, new data warehouse based on object storage that can also perform real-time analysis. We look forward to your attention and exploring cloud native data warehouse solutions together to create a new generation of open source Data Cloud.
The author of the open source framework NanUI switched to selling steel, and the project was suspended. The first free list in the Apple App Store is the pornographic software TypeScript. It has just become popular, why do the big guys start to abandon it? TIOBE October list: Java has the biggest decline, C# is approaching Java Rust 1.73.0 Released A man was encouraged by his AI girlfriend to assassinate the Queen of England and was sentenced to nine years in prison Qt 6.6 officially released Reuters: RISC-V technology becomes the key to the Sino-US technology war New battlefield RISC-V: Not controlled by any single company or country, Lenovo plans to launch Android PC