Artificial intelligence + big data platform in the big model era accelerates the emergence of innovation

The emergence of the concepts of large models and MaaS has defined a new model-centered AI development paradigm. The growing demand for computing power behind this has posed new challenges to the AI ​​engineering base. Today, the artificial intelligence + big data platform in the era of big models needs to have high-efficiency capabilities integrating computing efficiency, development efficiency, and processing efficiency to ensure business innovation in the era of service AI. On October 31, at the 2023 Yunqi Conference, Wang Junhua, vice president of Alibaba Cloud and head of Alibaba Cloud Computing Platform Division, announced the upgrade and release of Alibaba Cloud's artificial intelligence + big data platform to serve business innovation in all walks of life in the big model era .

High-performance AI infrastructure maximizes computing efficiency

According to OpenAI estimates, the global demand for computing power for head AI model training is growing at a rate of 10 times per year, and computing demand continues to explode. Wang Junhua introduced that the PAI Lingjun intelligent computing cluster has made deep optimizations in network, storage, and scheduling. It adopts HPN 7.0 new generation AI cluster network architecture, storage and computing separation architecture, and supports cluster scalability of up to 100,000 cards, allowing A very large cluster operates like a single computer .

PAI Lingjun, an intelligent computing service integrating software and hardware, provides stable and efficient support for large-scale deep learning training scenarios. The linear acceleration ratio of large model training tasks can reach 96%, and large model training resources can be saved by more than 50%. In terms of stability guarantee, PAI Lingjun Intelligent Computing Service is equipped with elastic fault-tolerant training framework AIMaster and EasyCkpt model automatic saving and recovery capabilities, which can allow kilocalorie-scale tasks to run stably for more than 3 weeks.

For inference service scenarios of large models, PAI systematically integrates model system joint optimization, runtime optimization, LLM domain optimization and other capabilities, which can increase the inference throughput of large language models by 3.5 times and significantly reduce inference latency. The maximum context length supported by single-card inference is 280K, and ultra-long context inference will further promote the emergence of LLM.

Multi-form and more flexible AI development model to support diverse needs

As demands continue to emerge, AI developers and AI development needs are becoming more and more segmented. The release of artificial intelligence platform PAI 4.0 comprehensively lowers the threshold for large-model AI development, provides complete demand support, and improves development efficiency .

Whether it is a group of deep learning developers who need to define model structures and development processes, a group with massive large-scale computing tasks, or a group of business algorithms that need to efficiently and quickly connect training and inference tasks, they can all achieve research and development through PAI. It includes various popular computing frameworks, open source models and development scenarios, and completes development and deployment in one stop.

PAI Lingji provides developers with cloud API services that can be used for application model development and developed model calls, allowing developers to quickly integrate large model capabilities into their own businesses and applications. In PAI-Lingji On the platform, developers can not only find Tongyi series large models (including Tongyi Qianwen, Tongyi Wanxiang, etc.), but also the best head models from the industry, including ChatGLM, Baichuan, Stable Diffusion, etc.

Wang Junhua announced that today these models are open to developers through the unified API and SDK on PAI Lingji . Developers only need a few lines of code to quickly integrate the capabilities of these different types of large models into their own applications. go.

Efficient data services improve the effect of large models, and deep integration of big data and AI

In the machine learning development process, 80% of R&D time is spent on data preparation. Data quality determines the effectiveness of large models, and the importance of data processing and analysis becomes even more prominent. Big data is part of the AI ​​infrastructure. Alibaba Cloud provides a full set of product solutions from data accumulation, cleaning, modeling, calculation to services to save time in data preparation during the AI ​​development process.

At the same time, big data and AI have been integrated more deeply. Alibaba Cloud's self-developed big data processing platform MaxCompute has fully upgraded its DataFrame capabilities and released the distributed computing framework MaxFrame, which is 100% compatible with data processing interfaces such as Pandas. With one line of code, native Pandas can be automatically converted into MaxFrame distributed computing, opening up data management and big data processing. The entire process from large-scale data analysis and processing to ML development breaks the boundaries of big data and AI development and use, greatly improving development efficiency.

In the AI ​​era driven by large models, AI scenarios have increasingly higher requirements for data timeliness. Flink+Paimon's new generation real-time lake warehouse solution provides users with one-stop data entry into the lake, real-time processing and exploration and analysis capabilities, and expands Flink Real-time computing capabilities in data lake scenarios while accelerating AI applications.

The fully managed vector retrieval service DashVector is officially released. Based on Proxima, Alibaba Cloud's self-developed high-performance vector retrieval kernel for 8 years, it provides a cloud-native, fully managed vector retrieval service with horizontal expansion capabilities. Hologres, OpenSearch, and Elasticsearch have respectively upgraded their vector capabilities to meet performance improvements in different scenarios. The newly released DataWorks Copilot fully combines the one-stop unified metadata, unified scheduling, unified data integration, unified data modeling and AI large model capabilities of the big data platform to fully integrate AI with business and create new value.

After upgrading the overall big data AI product capabilities for the era of large models, Wang Junhua announced that big data AI products have been fully serverless , and is committed to providing customers with cost-effective products that can be used out of the box and paid on demand. As the infrastructure of AI in the big model era, Alibaba Cloud Artificial Intelligence + Big Data Platform will firmly and continuously invest R&D resources to serve business innovation in all walks of life.

Microsoft launches new "Windows App" .NET 8 officially GA, the latest LTS version Xiaomi officially announced that Xiaomi Vela is fully open source, and the underlying kernel is NuttX Alibaba Cloud 11.12 The cause of the failure is exposed: Access Key Service (Access Key) exception Vite 5 officially released GitHub report : TypeScript replaces Java and becomes the third most popular language Offering a reward of hundreds of thousands of dollars to rewrite Prettier in Rust Asking the open source author "Is the project still alive?" Very rude and disrespectful Bytedance: Using AI to automatically tune Linux kernel parameter operators Magic operation: disconnect the network in the background, deactivate the broadband account, and force the user to change the optical modem
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5583868/blog/10140082