Looking at the new trend of data applications|The 8th Tencent Cloud Techo TVP Developer Summit concluded successfully

introduction

In the data-driven era, how to effectively utilize big data has become an important issue in various industries. With the vigorous development of emerging technologies such as cloud computing and artificial intelligence, data technology continues to grow and present new trends and characteristics. How can enterprises grasp the new context of data technology to gain insights into the value behind the data?

On August 19, 2023, the 8th Techo TVP Developer Summit - "Data Drives Intelligence, Intelligence Empowers the Future" hosted by Tencent Cloud TVP came to a successful conclusion. This summit gathered 6 leaders from the data technology industry and experts conducted wonderful sharing and in-depth exchanges around the latest progress, trends, and innovative applications of data technology, providing inspiration for developers in terms of ideas and practices.

W

▲Opening speech by Lu Dongming, founder of "Three People Walking Together" and Tencent Cloud TVP

This summit is hosted by Tencent Cloud TVP teacher Lu Dongming. Lu Dongming is also the founder and host of the interview column "Mingshuo Sanxing" focusing on the fields of big data and AI. He is also known as "Uncle Ming". At the beginning of the summit, Uncle Ming started by borrowing the classic quotation from the famous British novelist Dickens in "A Tale of Two Cities": This is the best era for China's data technology, and it is also the worst era. This is the most prosperous moment in the development cycle of database and big data technology in China's history, but the dazzling technology system and constantly changing products have also brought unprecedented challenges to developers and enterprises. Faced with numerous database technologies, how to choose and how to combine them to meet different goals are important issues that enterprises and developers need to think about and clarify.

Four major trends in data platforms under the democratization of AI

▲A speech by Shi Kai, author and founder of "Lean Data Methodology" and TVP of Tencent Cloud

Teacher Shi Kai, author and founder of "Lean Data Methodology" and TVP of Tencent Cloud, shared the theme of "Four Major Trends in Data Platforms under the Universalization of AI".

Teacher Shi pointed out that we are rapidly entering the era of "AI democratization" from "data democratization". In the era of data democratization, everyone can be empowered by data and obtain real-time feedback and analysis through the application and analysis of data. Insight. With the emergence of ChatGPT, the era of AI democratization is rapidly approaching. In the future, artificial intelligence will benefit everyone, and it will also bring huge challenges to enterprise data platforms, namely the contradiction between infinitely growing data application needs and limited and fragmented data productivity. However, the emergence of large models has given data personnel new imagination space. Everyone hopes that AI technology can help data production and data analysis, and accelerate the generation of value from data sources.

To this end, Teacher Shi proposed four major trends in the future development of data platforms:

  • The value of the data platform has become explicit. As enterprises invest more and more in data, more and more enterprises hope that data can directly generate value for the business. This also brings new challenges to the data platform, that is, how to integrate the data platform into The value is directly linked to the business value;
  • The data platform architecture is modernized. The data platform will develop in the direction of integrated analysis, ease of use, trustworthiness, and decentralization. New data architecture practices represented by Data Fabric/Data Mesh are gradually emerging;
  • AIGC-empowered data value chain, the data platform will integrate AIGC's new technologies to eliminate waste in the enterprise data production value chain;
  • AIGC capabilities are platform-oriented and service-oriented. AIGC will become a capability that enterprises can use and adjust at any time. Generally, enterprises do not need to build their own large models, but should focus on how to integrate the capabilities of large models to deepen the value mining of data.

At the end of the sharing, Mr. Shi Kai summed it up for everyone: "Digital transformation originates from problems, starts with business, becomes based on data, falls on scenarios, measures on value, and finally organizes." No matter how the data platform evolves, how can we learn from the business? Growth, irrigating with data, landing in scenarios, and ultimately presenting business value, promoting the overall digitalization of the enterprise is the core proposition of enterprise concern.

Both cost and ease of use—the evolution of Tencent Cloud ES cloud-native serverless

▲ Gao Pan, Director of Tencent Cloud ES R&D, gave a speech

From technical imagination, return to the reality of enterprise data governance. Today, as extensive growth changes to intensive growth, how to reduce costs, increase efficiency, and improve data efficiency is the focus of enterprises and developers. Mr. Gao Pan, R&D Director of Tencent Cloud ES, shared "Both Cost and Ease of Use - The Evolution of Tencent Cloud ES Cloud Native Serverless".

Teacher Gao Pan introduced that Tencent Cloud ES is a one-stop fully managed ELKB service native to Tencent Cloud. It is based on open source ES and carries out self-developed kernel transformation around cost, performance, stability, scalability and other aspects, achieving a cost reduction of 50%~ 80%, query performance is improved by 3 to 10 times, write performance is improved by 2 times, SLA reaches 99.99%, and scalability is improved by more than 10 times.

Tencent Cloud Big Data ES has many service scenarios, among which logs are the most common and largest scenario. Since log value density is relatively low, but the scale is usually large, enterprises focus on cost control in log scenarios. Therefore, Tencent Cloud Big Data ES has made a lot of optimization and improvements around costs, and has significantly reduced access costs, operation and maintenance costs, and resource costs through technologies such as link integration, index autonomy, and storage and calculation separation.

The cost problem has been easily solved. Teacher Gao Pan also hopes to continue to improve the ease of use and provide users with a one-stop big data analysis service. Although various vendors provide PaaS-based ES services based on the lowest-level ES kernel, users still need to spend energy on operation and maintenance work such as cluster creation, data link configuration, and index life cycle management. Therefore, he and his team improved on the Tencent Cloud PaaS version of ES service and launched a Serverless ES service that does not require care about clusters and nodes and is operation and maintenance-free. In terms of cost, further optimization has been made. Serverless is different from the original PaaS service's per-node billing method. It will charge based on the write volume and query volume, truly charging on demand. In terms of stability, the unified operation and maintenance tuning method of the cluster index background is adopted to avoid failures caused by improper use. It is also 100% compatible with the open source ES API and 100% compatible with the ELK ecosystem.

Build an enterprise-level real-time data warehouse: Create a stable and reliable data warehouse TCHouse-D based on Apache Doris

▲ Li De, technical director of Tencent Cloud Doris R&D, gave a speech

Apache Doris is ASF's well-known open source data warehouse project, and has won the favor of many developers for its simplicity, ease of use and flexibility. Mr. Li De, technical director of Tencent Cloud Doris R&D and PMC of the Apache Doris community, shared with you the topic "Building an Enterprise-level Real-time Data Warehouse: Building a Stable and Reliable Data Warehouse TCHouse-D Based on Apache Doris".

At the beginning of the sharing, Mr. Li De gave you a brief introduction to Tencent Cloud big data TCHouse-D. TCHouse-D is a real-time data warehouse service built by Tencent Cloud based on Apache Doris. It is 100% compatible with Apache Doris, compatible with the MySQL protocol, and supports concurrency and multi-dimensionality. Analysis, interactive analysis, real-time data warehouse, lake warehouse federated analysis and other business scenarios, it is simple and easy to use, elastically scalable, safe and reliable, ecologically compatible, and comprehensive in functions. Immediately, Teacher Li De shared his understanding of enterprise-level real-time updateable data warehouse:

  • Real-time writing and additions, deletions, modifications and queries, data can be written in real-time and batches, additions, deletions, modifications and queries are visible in real time, and can be connected to real-time systems such as Flink and Kafka;
  • Real-time synchronization of data changes, support for entire database synchronization and incremental synchronization, automatic speed adjustment of streaming write back pressure, real-time non-blocking automatic synchronization of table structure changes;
  • Enterprise-level stability and reliability, complete authentication, permissions and auditing functions, complete monitoring, alarming and inspection, fully managed services, high read and write availability.

TCHouse-D is strictly designed based on the above standards. To ensure real-time writing and addition, deletion, modification and query, it draws on the pre-aggregation model of Google Mesa. The storage engine provides fast data import support through an LSM-like data structure. In terms of real-time synchronization, MySQL Binlog can synchronize in real time, and the entire database increment and segment changes can be automatically synchronized. In addition, there are two stages of submission, which can achieve Exactly Once semantics. As a cloud product, TCHouse-D’s investment in stability is undoubted. It supports operation and maintenance and user two-level alarm systems, scheduled inspections, real-time write back pressure automatic current limiting, and tablet and compaction health checks. In addition, the role-based permission system, whitelist, metadata double backup and other mechanism designs also ensure the safety and reliability of the service.

With everyone's expectations, Mr. Li De shared the future plans and prospects of TCHouse-D: hot and cold stratification, computing nodes, cross-cluster synchronous replication, storage and calculation separation and other functions are under development, and are expected to be launched in Q4 this year or early next year. Meet everyone.

DataOps Exploration: Apache Top Ten DataOps Project Selection Analysis

▲ Speech by Guo Wei, Apache Software Foundation Member and Tencent Cloud TVP

In the field of big data, enterprises often focus on the results of data extraction and efficient mining, but they have only scratched the surface of exploring the closed-loop process of data generation, storage, integration, circulation, and regeneration. Apache Software Foundation Member and Tencent Cloud TVP teacher Guo Wei shared the theme of "DataOps Exploration: Apache Top Ten DataOps Top Project Selection Analysis".

In order to help everyone understand DataOps more intuitively, Mr. Guo summed it up as follows: store data in the database, build a dashboard, integrate it into the data lake to build a data model, then do mining, and finally predict the results and regenerate new data. This entire closed-loop process of data. Gartner once divided IT technology into three eras in 2019: IT craftsmanship, IT industrialization, and IT digitalization. Teacher Guo pointed out that with the rapid advancement of AI technology and the emergence of large models, we are facing the fourth era - the era of IT intelligence, and DataOps will also show a development trend from BI to AI. Subsequently, Mr. Guo gave a detailed introduction and selection analysis of ten popular ASF DataOps open source projects such as Apache SeaTunnel, Apache Airflow, Apache DolphinScheduler, Apache Nifi, etc., to further help enterprises and developers tailor their plans to find suitable projects so as to smoothly Create the company's own DataOps platform.

Speaking of the collision between large models and DataOps and the future trends that everyone is interested in, Mr. Guo said that it is a general trend for enterprises to retrain their own models through open source large models, and used a paragraph "Use the money of a cup of Starbucks to train your own privatized models". The case video of "ChatGPT" vividly demonstrates the feasibility of training large models. The ultimate goal of DataOps is to make data generation faster. The combination of large models and DataOps is something that every company and every individual should boldly try.

Finally, Mr. Guo led everyone to look forward to the future. The essence of Ops is to improve the efficiency of people, improve the efficiency of business and technology, improve the efficiency of design and R&D, and improve the efficiency between people of different levels. I believe that in the field of DataOps, it will also "ChatGPT-like" applications have emerged, allowing everyone to understand data through natural language.

The architecture and implementation practice of Tencent Cloud Intelligent Storage in AIGC scenarios

▲Wang Miao, head of Tencent Cloud Intelligent Storage R&D, gave a speech

Currently, AIGC is an important application scenario for large models and is sought after by many industries. Some institutions predict that the AIGC scenario will become a trillion-dollar market in 5-10 years. Mr. Wang Miao, head of intelligent storage R&D from Tencent Cloud, also shared with us the "Architecture and Implementation Practice of Tencent Cloud Intelligent Storage in AIGC Scenarios", which introduced in detail the technical architecture and main capabilities of Tencent Cloud Intelligent Storage, as well as its application in AIGC scenarios. Below are targeted problems that can help companies solve.

Teacher Wang Miao first introduced in detail the technical architecture of the intelligent storage system in the access layer, logical processing layer, data processing layer, storage layer, and underlying basic services. Then Teacher Wang Miao summarized the core elements of the AIGC scenario, namely content generation, content security, and content intelligence. Focusing on these three core elements, combined with all the processes involved in the AIGC scenario, from data collection, data preprocessing, feature engineering, From model training to inference applications, content review, and content intelligence, Tencent Cloud provides end-to-end intelligent storage solutions.

In Tencent Cloud's smart storage solution, COS serves as the unified storage base of the data lake. In the data training stage that requires strong bandwidth, it provides data accelerators GooseFS and GooseFSx. Through distributed acceleration services and rich protocol support, It can greatly improve data reading and writing efficiency and access convenience. In terms of content security, Tencent Cloud will provide an integrated storage content security solution from input to output through a customized model based on the rich content review capabilities of Data Wanxiang and the special scenarios of AIGC. In addition, in the face of copyright protection issues, Teacher Wang Miao also introduced in detail the technical principle of the digital watermark function of data: through the discrete Fourier transform algorithm, pictures and video frames are converted into frequency domain/time domain, and digital watermark information is embedded in the conversion process to achieve concealment. Watermark protects the copyright of digital products. In addition, AIGC's products must be distributed. Tencent Cloud Intelligent Storage also provides Jiezhi compression service, which can provide more than 50% volume compression for JPG and PNG images without changing the image format. Save distribution traffic.

Finally, Mr. Wang Miao shared a customer case focusing on the field of Vincentian graphs. The Tencent Cloud Intelligent Storage team assisted the customer in deploying GooseFS on the training node, building a TB/s throughput capacity, greatly improving the training efficiency, and improving the customer's Model iteration efficiency. As the business went online, faced with massive requests and AIGC products, customers used the data-rich AIGC automatic review function to review texts and images tens of millions of times a day, perfectly solving content security issues. When distributing pictures, through the combination of AVIF adaptive and extremely intelligent compression, the smallest pictures can be intelligently distributed for different platforms, reducing the picture download bandwidth by 50%, saving operating costs and improving access speed.

Round table dialogue session

▲Round table dialogue session

After the sharing session with experts full of useful information, the round table session specially planned for this summit followed. Different from the past, this roundtable discussion was conducted in the form of a debate under the chairmanship of Uncle Ming. Five guests, Shi Kai, Gao Pan, Li De, Guo Wei and Wang Miao, expressed their opinions on the topic and exported their different views and unique insights. , there was a collision of positive and negative views on almost every issue, and it was exciting for a while. While the audience was hooked, they also learned the thinking spirit of the big shots.

With the era of universal AI coming, will big data become more prosperous?

Three teachers, Shi Kai, Gao Pan and Wang Miao, take a positive stance. They all believe that AI will make all walks of life more prosperous in the future, the amount of data will increase dramatically, and the market will have higher computing power and efficiency for big data in the future. requirements, which will also further promote technological updates and promote the development of big data to a higher level.

Teacher Li De held the opposite opinion. He expressed his opinion after asking "Whether operating systems were more popular 20 years ago or now?" He believed that when AI truly iterates to extreme maturity, databases and big data will be hidden. Behind the application, everyone's demand for databases or big data may be reduced. Teacher Guo Wei also agrees with Teacher Li De's point of view. He believes that in the future, big data will become infrastructure, and all business logic will be done by AI large models.

The host Uncle Ming also shared his views. In his opinion, our understanding and exploration of data are not deep enough. With the development of AI, data needs are also changing, and new data types or new data types are likely to appear in the future. Data characteristics, data engineers at that time may have to solve completely new challenges. It is an evolution from Test to Text, to Image, and then to Video. There is a lot of room for imagination about what is behind the video.

Is the successful path for the future development of China's data technology "big and comprehensive" or "small and beautiful"?

Teacher Wang Miao prefers small but beautiful. He believes that companies in some vertical scenarios have enough in-depth professional domain knowledge, and after combining with big data technology, they can quickly respond to the needs of some vertical fields. At the same time, he also suggested that small but beautiful companies can stand on the shoulders of giants, consider using open source technology or cloud services for underlying technology, and focus their energy and resources to quickly launch their products. Teacher Gao Pan believes that this issue is a matter of division of labor. Small and beautiful companies should focus on in-depth exploration in their own fields, improve their own products, and then cooperate with large companies; large and comprehensive cloud vendors should integrate well and provide customers with a complete set of solutions.

Teachers Guo Wei, Shi Kai and Li De believe that it is better to be comprehensive. Teacher Guo Wei pointed out that the needs of Party A companies are diversified. 20% of companies choose to assemble themselves with small and beautiful single tools, while 80% of companies may rely more on one-stop solutions. Mr. Shi Kai said that in today's fierce market environment, companies that do not become large and comprehensive may face survival problems. There is an information gap between Party A and Party B in their understanding of technology and business goals. As a database product company, it needs to declare Only by being big and comprehensive and emphasizing the advantages of your own products can you enhance industry recognition. Teacher Li De holds a similar view. In his view, small and beautiful is an ideal vision, while big and comprehensive is a realistic path. From the perspective of business success, product positioning and marketing are very important. Many small and beautiful companies are not as good at positioning and publicity as large and comprehensive companies that can become household names.

Uncle Ming said that small and beautiful companies are the root of innovation. He hopes to see small and beautiful companies succeed, but large and comprehensive companies have more advantages in integrating resources and cost control. Considering the current business environment, large and comprehensive companies have more advantages. And whole companies are more likely to succeed.

In the era of multiple weapons, what are the "weapons" that help developers improve their combat effectiveness?

Teacher Gao Pan shared his suggestions from the perspective of weapons. Although technical products are complicated, developers only need to choose a well-recognized product in each field to study in depth according to their own scenario needs, such as Spark and TP scenarios for offline scenarios. MySQL, PG, ES and Doris in AP scenarios are all acceptable, and the remaining products can be used by analogy.

Teacher Shi Kai believes that in an era where technology is everywhere, the more important it is to maintain core competencies. Therefore, Teacher Shi proposed three important abilities that developers need to possess: learning ability, logic ability, and communication ability. Learning ability ensures faster growth, logical ability helps solve problems better, and communication ability can create a very good atmosphere and environment, allowing one to go further, more steadily and faster.

Teacher Li De also shared three abilities: first, the ability to use tools, such as ChatGPT, mature "wheels" and other tools or components to complete business needs; second, participating in open source, using open source code to learn and research can make faster progress; finally, It is the ability to summarize. Summary is the process of forcing yourself to think. Being good at summarizing can improve your thinking dimension.

Participating in open source is also one of Mr. Guo Wei's suggestions for developers. In addition, Mr. Guo Wei reminds developers to pay attention to large models, especially privatized models whose performance in auxiliary programming will exceed expectations. Secondly, a deep understanding of business processes and requirements is often the criterion that distinguishes excellent programmers from ordinary programmers. To become an excellent developer, you must not just be able to write code, but must understand the business and participate in the business process to better control business needs.

Teacher Wang Miao emphasized that developers need to have business awareness. When designing architecture and technology selection, use business awareness to weigh the input-output ratio and decide whether something should be done and how much resources should be invested in it. This is It is a very necessary quality for developers to further become comprehensive talents.

Finally, the host Uncle Ming summarized three-word suggestions for the participants: dissent, reason, and explanation. "Difference" is not only the difference of difference, but also the difference of variation. In the current era of serious homogeneity, developers must demand difference, observe market changes, and seize opportunities to seize the opportunity in the next cycle; "reason" is understanding. , understanding a system and understanding a business will become more and more important; and "speaking" represents persuasion. Truly successful developers often end up leading the team, and persuasion is essential on this path.

Conclusion

▲ Summit site

Watch the big names talk and learn about the digital future. At this point, this summit has officially come to an end. During the summit, six experts opened their minds and had in-depth exchanges on the latest progress and future trends of data technology. They not only brought the trend prospects of data technology, but also shared practical experience that can be implemented.

In the future, Tencent Cloud TVP will always keep pace with the times, adhere to the original intention of "impacting the world with technology", and continue to create the "most informative, interesting and useful" developer summit for developers. Let us look forward to the next Techo TVP development together. The arrival of the summit.

Highlights from the scene

Guess you like

Origin blog.csdn.net/QcloudCommunity/article/details/132877674