Amoro Trial & Contribution Activities | October Community Selection Announced

picture

Amoro is a lake warehouse management system built on open data lake tables such as Apache Iceberg. It provides a set of pluggable data self-optimization mechanisms and management services, aiming to bring out-of-the-box lake warehouse usage to users. experience.

The Amoro open source community launched a trial and contribution activity for the new version on August 10, 2023. The trial activities are designed to help users get started with Amoro faster. At the same time, they also collect users’ usage scenarios during the trial process and discover optimization improvements and new functional requirements in the project. The contribution activities hope that more developers will participate more deeply in Amoro contributions, making the Amoro community more diverse and achieving more long-term development.

Currently, a total of 27 users participate in trial activities and 20 developers participate in contribution activities. Thank you very much to every participant for your enthusiastic participation and selfless dedication. Your dedication and efforts are an important support for helping Amoro continue to move forward. At the same time, the community carefully prepared gifts for students who made outstanding contributions in the activities. The statistical range of contribution is from October 1, 2023 to October 30, 2023. According to statistics, a total of 3 trial users submitted trial feedback within the statistical time range, and the community also selected 2 MVC (Most Vauable Contributors) from all contributors.

 

0 OctoberMVC

zhongqishang , Amoro Committer

Mentor: Zhong Qishang (Github ID: zhongqishang) from Qichacha has been participating in Amoro community contributions since November 2022. He has contributed 29 PRs (Pull Requets) so far, which were resolved during the contribution activities in October. When the Optimizer automatically optimizes a table with too many Iceberg equality delete files, the optimization may be too slow or the memory may overflow. This greatly improves the stability of the Optimizer. At the same time, the display of the Optimizing page in the table details on the Dashboard has also been optimized to facilitate users to view the Optimizing task details.

Personal introduction : I come from the Big Data Architecture Department of Qichacha. I have made some small contributions to Apache Flink, Flink CDC, and Debezium before, but my participation is not high. This is my first time to deeply participate in an open source project.

Community experience : In 2022, when the company was planning to introduce the data lake Iceberg internally, it also encountered Amoro open source. Amoro solved our Iceberg compaction problem very well. At the same time as it was implemented, community contributions also ranged from simple typos modifications at the beginning to some optimization of merge performance, improvements to Planner, etc.

Message from the community : Thanks to open source, which prevents us from reinventing the wheel; thanks to the community for providing such an excellent project as Amoro, and thanks to community members for their guidance and suggestions not only on Amoro. Over the past year or so, the Amoro community has been booming, let’s work together.

huyuanfeng2018,Amoro Contributor

Mentor : Hu Yuanfeng (G ithub ID: huyuanfeng2018) from Huya has been participating in Amoro community contributions since July 2023. He has contributed 12 PRs (Pull Requets) so far, and expressed support for Iceberg in the contribution activities in October. Tag&Branch presentation. At the same time, he participated in the development of Amoro metric function and provided metric information related to table Optimizing.

Personal introduction: I come from Huya’s big data platform team and am mainly responsible for real-time computing and data lake construction.

Community experience: In July 2023, when we were looking for a solution that could manage iceberg tables well and merge them in a friendly way, we met amoro. We decided to try to use amoro to manage our iceberg table. With the help of the amoro community, we not only successfully used amoro to manage the iceberg table, but also deeply participated in the development of some community functions. We have made some optimizations to reduce the memory usage of ams. We have also participated in the repair of multiple bugs, as well as discussed and made suggestions on multiple planned features in the community.

Message from the community: I hope that the amoro community, like the data lake, will continue to develop excellently, embrace more changes and more challenges, and continue to innovate and make breakthroughs in the process. I also wish that more and more developers in the community can solve problems in more scenarios and work with us to promote the development of the amoro community and make it stronger and better!

02Trial   user feedback

During the trial activity, three users from Zhejiang Telecom, Jiuzhang Data, and Multipoint DMALL submitted trial feedback to the community.

Zhejiang Telecom:

Zhejiang Telecom used Amoro to solve its online demand for automatic optimization of iceberg lake warehouse tables. In order to improve the timeliness of data warehouse data, after the system was transformed and moved to the cloud, the iceberg format was introduced to prevent offline transmission and scanning of the production (teledb) source database from affecting database performance, and business data was written to the iceberg table through NetEase Youshu real-time transmission. In the process of using iceberg, I encountered the problem that iceberg's native spark compaction execution failed due to OOM and other reasons due to too many eq-delete files. After connecting to Amoro, the self-optimizing function provided by Amoro can handle the small file problem of the iceberg table in a timely manner, and improve the reading performance of the table while maintaining the availability of the table.

Nine chapters of data:

Jiuzhang Data builds a streaming-batch integrated data lake based on Amoro's Mixed-Iceberg format. The data is synchronized into the lake through Flink CDC, and the unique constraint of the primary key of the data into the lake is guaranteed through the Mixed Iceberg table format. During the process of building the test scenario, we discovered and fed back multiple problems with Mixed Iceberg Format in production scenarios, and worked with community developers to troubleshoot and locate them, providing valuable experience for the stability of Mixed Iceberg Format in production scenarios. At present, more than 1K ODS tables have been connected and tested, and real-time data synchronization and concurrent data replenishment have been verified. Iceberg Catalog is used to read Mixed Iceberg tables to build low-latency BI reports and other scenarios. In the future, we look forward to completing a real-time lake integrating streaming and batching. Warehouse system construction.

Multipoint DMALL:

In the context of cloud transformation, multi-point DMALL introduces Iceberg tables to deal with the pain points of Hive tables in terms of effectiveness and table structure changes. Amoro provides production-level Iceberg table operation and maintenance management capabilities, reducing the maintenance cost of manually scheduling batch tasks to merge files and data expiration of a large number of Iceberg tables. In addition, multi-point DMALL also implements Spark engine scheduling Amoro Optimizer to avoid the resident Flink Optimizer continuing to occupy resources in scenarios where updates are not frequent, and makes full use of Spark's dynamic resource allocation (DRA) feature, thereby further reducing resource consumption. .

03Welcome  to try and contribute

The trial and contribution activities will continue until December 2023. The community will count the trial and contribution status of the previous month every month. Partners with effective trial feedback will receive a community peripheral gift package. The monthly MVC (Most Valuable Contributor) will receive community rewards. Prepare a prize of AirPods.

If you are also willing to try it out or contribute, you can find the community contact information in Github and sign up . There will be a dedicated community mentor to assist you in completing version trials and project contributions.

Githubhttps://github.com/NetEase/amoro

Broadcom announced the termination of the existing VMware partner program deepin-IDE version update, a new look. WAVE SUMMIT is celebrating its 10th edition. Wen Xinyiyan will have the latest disclosure! Zhou Hongyi: Hongmeng native will definitely succeed. The complete source code of GTA 5 has been publicly leaked. Linus: I won’t read the code on Christmas Eve. I will release a new version of the Java tool set Hutool-5.8.24 next year. Let’s complain about Furion together. Commercial exploration: the boat has passed. Wan Zhongshan, v4.9.1.15 Apple releases open source multi-modal large language model Ferret Yakult Company confirms that 95 G data was leaked
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6895272/blog/10320887
Recommended