Apache Doris (incubating) 1.0 Release is officially released!

Dear community friends, after several months, we are happy to announce that Apache Doris (incubating) has ushered in the official release of the 1.0 Release version on April 18, 2022 ! This is the first 1-bit version of Apache Doris since it was incubated by the Apache Foundation, and it is also the version with the largest refactoring of the core code of Apache Doris so far ! There are 114 contributors who  have submitted over 660 optimizations and fixes for Apache Doris, thank you to everyone who makes Apache Doris even better!  

In version 1.0, we introduced important functions such as vectorized execution engine, Hive external table, Lateral View syntax and Table Function table function, Z-Order data index, Apache SeaTunnel plug-in, etc., and added support for synchronous update and deletion of data in Flink CDC. Support, optimize many problems in the process of data import and query, and comprehensively enhance the query performance, ease of use, stability and other special effects of Apache Doris. Welcome to download and use!

Special thanks

Every day that has not been published, there are countless contributors behind it , who dare not stop for half a minute. Here we would like to especially thank the small partners from the SIG (Special Interest Group) such as the vectorized execution engine, query optimizer, and visual operation and maintenance platform . Since the establishment of the Apache Doris Community SIG Group in August 2021, dozens of people from more than ten companies including Baidu, Meituan, Xiaomi, JD, Shuhai, ByteDance, Tencent, NetEase, Alibaba, PingCAP, Nebula Graph, etc. Contributors joined the SIG as the first members, and for the first time completed the development of such major functions as the vectorized execution engine, query optimizer, and Doris Manager visual monitoring operation and maintenance platform in the form of open source collaboration of special groups, which lasted more than half a year, Dozens of technical research and sharing, nearly 100 remote meetings, hundreds of commits submitted, and more than 100,000 lines of code involved . It is because of their contributions that version 1.0 came out. Let us once again express our most sincere thanks for their hard work!

At the same time, the number of Apache Doris contributors has exceeded 300 (click to review: Community News | Apache Doris welcomes its 300th contributor! ), the number of monthly active contributors has exceeded 60, and the average weekly average in recent weeks The number of commits submitted has also exceeded 80, and the scale and activity of developers gathered by the community have been greatly improved. We are very much looking forward to having more small partners participate in the community contribution, and work with us to build Apache Doris into the world's top analytical database. We also look forward to all our small partners to gain valuable growth with us.   If you want to participate in the community, please contact us via the developer email [email protected] .

Important update 

In the past, the SQL execution engine of Apache Doris was designed based on the row-based memory format and the traditional volcano model. There was unnecessary overhead in performing SQL operator and function operations, which led to the limited efficiency of the Apache Doris execution engine, which did not Adapt to the architecture of modern CPUs. The goal of the vectorized execution engine is to replace the current row-based SQL execution engine of Apache Doris, fully release the computing power of modern CPUs, break through the performance limitations on the SQL execution engine, and exert extreme performance.

Based on the characteristics of modern CPUs and the execution characteristics of the volcano model, the vectorized execution engine redesigned the SQL execution engine in the columnar storage system:

  • Reorganize the data structure of memory, replace Tuple with Column, improve Cache affinity, branch prediction and prefetch memory friendliness during calculation

  • The type judgment is performed in batches. In this batch, the type determined during the type judgment is used, and the virtual function cost of the type judgment of each row is allocated to the batch level.

  • Through batch-level type judgment, virtual function calls are eliminated, allowing the compiler to have the opportunity for function inlining and SIMD optimization

Thereby, the efficiency of the CPU during SQL execution is greatly improved, and the performance of the SQL query is improved.

In Apache Doris version 1.0, enabling the vectorized execution engine with   set batch_size = 4096   and  set enable_vectorized_engine = true  can significantly improve query performance in most cases. Under the SSB and OnTime standard test data sets, the overall performance of the two scenarios of multi-table association and wide-column query are improved by 3 times and 2.6 times respectively. 

   Lateral View Syntax  [Experimental ]

Through Lateral View syntax, we can use Table Function table functions such as explode_bitmap, explode_split, explode_jaon_array, etc., to expand bitmap, String or Json Array from one column into multiple rows, so that the expanded data can be further processed (such as Filter, Join, etc.) .

   Hive Appearance  [Experimental]

Hive External Table provides users with the ability to directly access Hive tables through Doris. External tables save the tedious data import work. Doris's own OLAP capabilities can be used to solve data analysis problems of Hive tables. The current version supports connecting Hive data sources to Doris, and supports federated queries through data in Doris and Hive data sources for more complex analysis operations.

   Support Z-Order data sorting format

Apache Doris data is sorted and stored according to the prefix column, so when the prefix query condition is included, fast data search can be performed on the sorted data, but if the query condition is not a prefix column, the data sorting feature cannot be used for fast data search. The above problems can be solved by Z-Order Indexing. In version 1.0, we added the Z-Order data sorting format, which can play a good filtering effect in the scenario of multi-column query of kanban type, and accelerate the filtering performance of non-prefix column conditions .

   Support for Apache SeaTunnel (incubating) plugin

Apache SeaTunnel is a high-performance distributed data integration framework built on Apache Spark and Apache Flink. In the 1.0 version of Apache Doris, we have added the SaeTunnel plug-in, users can use Apache SeaTunnel for synchronization and ETL between multiple data sources.

   new function

More bitmap functions are supported, see the function manual for details:

  • bitmap_max

  • bitmap_and_not

  • bitmap_and_not_count

  • bitmap_has_all

  • bitmap_and_count

  • bitmap_or_count

  • bitmap_xor_count

  • bitmap_subset_limit

  • sub_bitmap

Support national secret algorithm SM3/SM4;

Note : The functions marked  [Experimental] above  are experimental functions. We will continue to optimize and iterate on the above functions in subsequent versions, and further improve them in subsequent versions. If you have any questions or comments during use, please feel free to contact us.

 Important optimization 

   Function optimization

  • Reduce the number of segment files generated when importing in large batches to reduce the compaction pressure.

  • Data is transmitted through the attachment function of BRPC to reduce serialization and deserialization overhead in the query process.

  • Supports direct return of HLL/BITMAP type binary data for external analysis of services.

  • Optimize and reduce the probability of OVERCROWDED and NOT_CONNECTED errors in BRPC, and enhance system stability .

  • Improve the fault tolerance of imports.

  • Supports updating and deleting data synchronously through Flink CDC.

  • Support adaptive Runtime Filter.

  • Significantly reduces the memory footprint of insert into operations

 

   Usability improvements

  • Routine Load supports displaying status such as the number of current offset delays.

  • Added statistics on peak memory usage of queries in FE audit log.

  • Added missing version information to Compaction URL results to facilitate troubleshooting.

  • Supports marking BEs as non-queryable or non-importable to quickly screen problem nodes.

   Important bug fixes

  • Fixed several query errors.

  • Fixed some scheduling logic issues with Broker Load.

  • Fixed an issue where metadata could not be loaded due to the STREAM keyword.

  • Fixed Decommission not executing correctly.

  • Fix the problem that -102 error may occur when operating Schema Change operation in some cases.

  • Fix the problem that using String type may cause BE to crash in some cases.

   other

Add Minidump function;

   Supplementary Instructions

The Doris Manger visual monitoring operation and maintenance platform will also release version 1.0 in the near future. It has now entered the Release process. The release announcement will be released independently in the future. Please continue to pay attention.

Download and use

   Download and use

http://doris.apache.org/zh-CN/downloads/downloads.html

   Upgrade Instructions

You can directly upgrade from Apache Doris 0.15.0 or 0.15.x release version to 1.0 Release version, please refer to the documentation for the upgrade process:

http://doris.apache.org/zh-CN/installing/upgrade.html

   Changelog

For detailed Release Note, please check the link:

https://github.com/apache/incubator-doris/issues/8549

   Feedback

If you encounter any usage issues, please feel free to contact us via the GitHub Discussion forum or the Dev mailing group .

GitHub forum: https://github.com/apache/incubator-doris/discussions

Dev mailing group: [email protected]

 Thanks

The release of Apache Doris(incubating) 1.0 Release version is inseparable from the support of all community users. I would like to express my gratitude to all community contributors who participated in version design, development, testing, and discussion.

They are:

@924060929

@ adonis0147

@ Aiden-Dong

@aihai

@airborne12

@Alibaba-HZY

@amosbird

@arthuryangcs

@awakeljw

@bingzxy

@BiteTheDDDDt

@blackstar-baba

@caiconghui

@CalvinKirs

@cambyzju

@caoliang-web

@ccoffline

@chaplinthink

@ chovy-3012

@ChPi

@DarvenDuan

@dataalive

@dataroaring

@ dh-cloud

@dohongdayi

@dongweizhao

@drgnchan

@e0c9

@EmmyMiao87

@englefly

@eyesmoons

@freemandealer

@Gabriel39

@gaodayue

@GoGoWen

@Gongruixiao

@gwdgithubnom

@HappenLee

@Henry2SS

@ hf200012

@hyoung

@jacktengg

@jackwener

@JNSimba

@Keysluomo

@kezhenxu94

@killxdcj

@lihuigang

@littleeleventhwolf

@liutang123

@liuzhuang2017

@lonre

@lovingfeel

@luozenglin

@luzhijing

@MeiontheTop

@mh-boy

@morningman

@mrhhsg

@Myasuka

@nimuyuhan

@obobj

@pengxiangyu

@qidaye

@qzsee

@renzhimin7

@Royce33

@SleepyBear96

@smallhibiscus

@sodamnsure

@spaces-X

@sparklezzz

@stalary

@steadyBoy

@tarepanda1024

@THUMarkLau

@tianhui5

@tinkerrrr

@ucasfl

@Userwhite

@vinson0526

@wangbo

@wangshuo128

@wangyf0555

@weajun

@weizuo93

@whutpencil

@WindyGao

@ wunan1210

@xiaokang

@xiaokangguo

@xiedeyantu

@xinghuayu007

@xingtanzjr

@xinyiZzz

@xtr1993

@xu20160924

@xuliuzhe

@ xuzifu666

@xy720

@yangzhg

@yiguolei

@yinzhijian

@yjant

@zbtzbtzbt

@zenoyang

@zh0122

@TeamGirl333

@zhannngchen

@zhengshengjun

@zhengshiJ

@ZhikaiZuo

@ztgoto

@zuochunwei


Related Links:

Apache Doris official website:

http://doris.incubator.apache.org

Apache Doris Github:

https://github.com/apache/incubator-doris

Apache Doris developer mailing group:

[email protected] 

Guess you like

Origin www.oschina.net/news/191810/apache-doris-1-0-release