Tencent announced that the Hunyuan Wenshengtu large model is open source: Sora has the same architecture and can be used for free for commercial use

On May 14, Tencent announced that its Hunyuan Wensheng graph model has been fully upgraded and open sourced. It has been released on the Hugging Face platform and Github. It includes complete models such as model weights, inference code, and model algorithms, and can be used by enterprises and individuals. Free for commercial use by developers.

This is the industry's first Chinese-native DiT architecture Vincentian graph open source model, which supports Chinese and English bilingual input and understanding, with 1.5 billion parameters. The upgraded Hunyuan Vincentian large model adopts the DiT architecture consistent with sora, which can not only support Vincentian images, but also serve as the basis for multi-modal visual generation such as video.

Evaluation data shows that the latest Tencent Hunyuan Vincentian graph model is far more effective than the open source Stable Diffusion model and is currently the best open source Vincentian graph model; its overall capabilities are at the leading international level.

 

Self-developed new generation Vincent diagram model

The excellent performance of large models is inseparable from leading technical architecture. The upgraded Tencent Hunyuanwenshengtu large model adopts the new DiT architecture (DiT, Diffusion With Transformer), which is the same architecture and key technology of Sora and Stable Diffusion 3. It is a diffusion model based on the Transformer architecture.

In the past, the visual generation diffusion model was mainly based on the U-Net architecture, but with the increase in the number of parameters, the diffusion model based on the Transformer architecture has shown better scalability, which helps to further improve the generation quality and efficiency of the model. Tencent Hunyuan is one of the first in the industry to explore and apply a large language model combined with a DiT structure to create a Vincentian graph model. Starting from July 2023, Tencent Hunyuan Wenshengtu team has clarified the direction of models based on DiT architecture and launched the development of a new generation of models. At the beginning of this year, the Hunyuanwenshengtu large model was fully upgraded to the DiT architecture.

Based on the DiT architecture, Tencent's Hunyuan team has optimized the model's long text understanding capabilities at the algorithm level and can support content input of up to 256 characters, reaching the industry-leading level. At the same time, at the algorithm level, it has innovatively implemented multi-cycle image generation and dialogue capabilities, which can be adjusted through natural language description based on an initial generated image, thereby achieving more satisfactory results.

Native Chinese is also a highlight of Tencent's Hunyuanwenshengtu large model. Previously, the core data sets of mainstream open source models such as Stable Diffusion were mainly in English, and they did not have enough understanding of Chinese language, food, culture, and customs. Hunyuan Wenshengtu is the first Chinese-native DiT model with bilingual understanding and generation capabilities in Chinese and English. It performs well in generating Chinese elements such as ancient poetry, slang, traditional architecture, and Chinese food.

评测结果显示,新一代腾讯混元文生图大模型视觉生成整体效果,相比前代提升超过 20%,在语义理解、画面质感与真实性方面全面提升,在多轮对话、多主体、中国元素、真实人像生成等场景下效果提升显著。

 

Comprehensive open source to benefit the industry

Tencent's Hunyuan Wensheng graphics capability has been widely used in many businesses and scenarios such as material creation, product synthesis, and game graphics. At the beginning of this year, Tencent Advertising released Tencent Advertising Miaosi, a one-stop AI advertising creative platform based on Tencent’s Hunyuan model, which can provide advertisers with multi-scenario creative tools such as text-based pictures, picture-based pictures, and product background synthesis, effectively improving the Advertising production and delivery efficiency. More than 20 media outlets, including CCTV News, Xinhua Daily, Shenzhen Special Economic Zone Daily, Southern Metropolis Daily, and Yangcheng Evening News, have also used Tencent Hunyuan Wenshengtu for news content production.

Lu Qinglin, head of Tencent Wenshengtu, said: "Tencent's Hunyuan Wenshengtu research and development idea is practical, insisting on coming from practice and going to practice. This time, the latest generation model is fully open sourced in the hope of sharing Tencent's innovative ideas with the industry. Practical experience and research results in the field of Vincentian graphics will enrich the open source ecosystem of Chinese Vincentian graphics, jointly build the next generation of visual generation open source ecosystem, and promote the accelerated development of the large model industry.”

Based on Tencent's open source Vincentian diagram model, developers and enterprises can directly use it for reasoning without re-training, and can create exclusive AI painting applications and services based on Hunyuan Vincentian diagram, which can save a lot of manpower and computing power. The transparent and open algorithm also ensures the security and reliability of the model.

At the same time, based on the open and cutting-edge Hunyuan Wenshengtu basic model, it is also conducive to enriching the Chinese-based Wenshengtu open source ecosystem in addition to the English open source community dominated by Stable Diffusion, and forming more diverse native plug-ins. Promote the research, development and application of Chinese cultural images technology.

It is understood that Tencent has always been open to open source and has open sourced more than 170 high-quality projects, all of which are derived from Tencent’s real business scenarios and cover core business sectors such as WeChat, Tencent Cloud, Tencent Games, Tencent AI, and Tencent Security. Currently, in It has received more than 470,000 developers’ attention and likes on Github.

How much revenue can an unknown open source project bring? Microsoft's Chinese AI team collectively packed up and went to the United States, involving hundreds of people. Huawei officially announced that Yu Chengdong's job changes were nailed to the "FFmpeg Pillar of Shame" 15 years ago, but today he has to thank us—— Tencent QQ Video avenges its past humiliation? Huazhong University of Science and Technology’s open source mirror site is officially open for external access report: Django is still the first choice for 74% of developers. Zed editor has made progress in Linux support. A former employee of a well-known open source company broke the news: After being challenged by a subordinate, the technical leader became furious and rude, and was fired and pregnant. Female employee Alibaba Cloud officially releases Tongyi Qianwen 2.5 Microsoft donates US$1 million to the Rust Foundation
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6852546/blog/11114841