Exploration and practice of AIGC technology in Taotao show scenes




This article introduces the explosive growth of AIGC-related fields and discusses the design ideas and technical solutions of Taobao Xiuxiu (AI Buyer Show). The article covers key technologies such as image generation, simulation image generation and background changing solutions, as well as model process series connection. The article also introduces the use process of Taotaoxiu, the problems encountered and how to deal with them. Finally, the article looks forward to the future development trends of AIGC, including model usage experience optimization, automated material generation and product form exploration.


introduction


AIGC-related fields have experienced explosive growth. In the field of image-based AI, professional tools such as Midjourney and SD have appeared, and apps for generating virtual images such as Miaoya Camera and Meitu Xiuxiu have also been launched.


The current user groups and merchants on Taobao can consider designing an AI tool that combines users and merchants, with the theme of AI-generated user product endorsements, to help merchants improve the attractiveness of product displays, and at the same time enhance users through personalized and innovative virtual images. experience.


In this context, product classmates designed a generative product that allows users to have a sense of immersion, Taotao Xiu (also called AI Buyer Show), which allows users to upload photos to create product endorsements that look like me but are better than me. Combined with some interactive gameplay, it can arouse users’ creative interest and improve business indicators.


Keywords: innovative image AI applications, easy user creation, content sharing, and integration with merchant brands.


Technical research

The entire product will be more complicated. Here we mainly discuss some technical capabilities related to AIGC. Under the product design of the buyer show, there are the following demands on AIGC’s related capability dependence:

  1. Generate high-quality material templates, product materials that users want to endorse, in order to achieve better results; generate

  2. Combine material templates and user pictures to generate user-related pictures; user image generation

  3. On the basis of user pictures, consider replacing background styles to improve richness (under planning, not yet online, please write about it); background generation and replacement


Related product links, this is relatively early, some adjustments have been made to some of the plans in the survey map, but the general process is similar:

  1. Generate material -> Configure template -> Generate user endorsement map -> Carry out mapping;

  2. At the same time, we are considering the generation of some video categories.



▐Comparison of image generation solutions  


When the requirement is (real person + scene + product category) , some relatively good material picture cases are generated for users to use, and several models are used in the process.


Now from the perspective of post-mortem, I think we should make some comparisons between these models and consider several dimensions:

  1. Accuracy (ease of use):   The consistency between the image generated by the model and the description of the prompt word

  2. Scalability ; API access and automation: whether the model supports API access. If it supports API, running tasks will free up operations. Affects speed and efficiency.

  3. Success rate:  Approximately how many pictures can be used to have a usable photo, and the success rate is within an acceptable range.


Use prompt words as follows:
An ultra-realistic photograph captured with the aesthetics of an iPhone camera, portraying a modern Chinese woman in a distinctive location in Shanghai. The woman is sitting on a wooden bench, the backdrop is softly blurred showcasing the city's unique architecture. The park is filled with lush greenery and vibrant flowers, exuding tranquility. Soft sunlight bathes the woman's visage and hair, creating a subtle and natural glow. The image, shot in high resolution with a 750:1200 aspect ratio, exudes the character's authentic charm and elegance.

Effects of different models:


Model: Midjourney

Features: High ease of use; no scalability; success rate as high as 50%;

Advantages: high generation quality, good real-life effects, and can generate complex images.

Disadvantages: access restrictions; no API, cannot directly communicate with the system. ;Rate limit, a single user can generally only generate once per minute.

Rendering:

BadCase:

Basically not much, it’s just a matter of style, angle, etc.

Model: Tongyi Wanxiang
Features: high ease of use; high scalability; medium success rate of 10~50%;
Advantages: Internal product; supports API access; easy to use
Disadvantages: The effect is slightly worse in real-life scenes, but not unacceptable; it is more expensive, with a single photo priced at 0.16 yuan on the official website.
Rendering:
BadCase:

Sometimes the face becomes deformed


Model: Stable Diffusion
Features: low ease of use; high scalability; low success rate of about 1%;
Advantages: Open source; allows customized models and self-deployment; the effect can be very good after adjustment.
Disadvantages: It is difficult to use; the prompt words are difficult to adjust, and good results require more time; it can only generate a certain type, and once combined with categories or scenes, there will be bigger problems.
Rendering:
In fact, the effect is not very good.
BadCase:

The probability of failure is still relatively high.


Model: DALL·E
Features: The live-action effect of DALLE3 is currently not very good, so I will ignore it for now. The restoration of DALLE2 is a bit poor.
Advantages: Supports API access; can generate high-resolution images; has a relatively high degree of restoration of prompt words
Disadvantages: Access restrictions; it’s still almost useless in terms of real-life effects.
Rendering:

BadCase: In our scene, it feels like a bad case for real people.

Model: Duiyou
Features: style, size, generation speed, no obvious advantages over the above.
Advantages: Internal product; the effect is okay and has a certain degree of restoration
Disadvantages: There is no corresponding team to contact, and there is no API on the official website; it will also deform when combined with specific categories; the style is limited; the size is limited
Rendering:


Overall conclusion:

  1. Midjourney performs best in generating effects, but its process requires continuous manual participation, which means high time costs.

  2. Comparing the effects of Wanxiang and Stable Diffusion, Wanxiang has better effect. If you want to scale up, consider using Wanxiang;

  3. SD is less effective in general scenarios, but the SD model provides comprehensive customization capabilities.


Based on their respective characteristics, they are summarized as follows:


▐Comparison of simulation image generation solutions  


How to make the generated pictures contain corresponding character characteristics so that users have a stronger sense of involvement. Our algorithm classmates investigate different solutions, digital clones and face swapping. The approximate effect is as follows:


Taking into account resource issues and material quality issues behind it, the face-changing link also uses the mainstream Roop model.


▐Change the background scheme (under testing)  


Currently, the only solution available is SD's Inpaiting solution, which involves pulling out the characters, using SemanticGuidedHumanMatting, and then supplementing the background. Because the style behind it is generated, the prompt words may not cover all scenes, and there are certain restrictions on the input images. There is a certain degree of uncontrollability in the results.


The effect is still being explored to see what form is more suitable.


Some limitations:

  1. People should not occupy too little space; the background should not be too large.

  2. People should not hold things in their hands, and characters should not rely on some items, such as sofas, sitting, etc. This will also generate strange content.



▐Model process series  


You can take a look at Wanxiang’s effects from generation to final in some scenes.

Hope scene: Suitcase - Man - Airport

调整提示词:An Instagram-style portrait that serves as a luggage advertisement featuring a 20-year-old Chinese boy. He's sitting inside an airport with a suitcase next to him, holding a cup of coffee. The background is the airport, creating a high-end atmosphere. You can see the boy's complete face and facial features. He's posing dynamically and relaxed, creating a sophisticated composition, shot using a film camera, 8k


用通义万相随机生成四张照片。(提示词好的话,生成的成功率感觉还可以,效果大家可以评估下到底如何)



淘淘秀AIGC的使用


  1. 在淘宝客户端搜索【淘淘秀】

  2. 点击【淘淘秀】进入到对应的小程序。

  3. 开始我的代言,上传自己的照片

  4. 生成用户的代言照片;

  5. 可以选择自己喜欢的代言照片发布到广场,也可以选择私密。


搜索【淘淘秀】

进入【淘淘秀】 开始代言,传照片
生成代言
选择代言

问题与处理


在应用AIGC时遇到的一些问题与处理;


问题1:模型在特定场景下生成效果不佳

方案:引入外部的Midjourney,人工生产与导入。一些内部模型可以生成的,选择内部模型批量生成组合多个模型使用。


问题2:线上生成效果不稳定,资源消耗大。

方案:  离线生成,人工筛选。预先生成内容以减少资源消耗,并提高内容质量的一致性。


问题3:每部署一个模型,都要写一套TPP;

方案:  利用vipserver进行模型匹配和调用,写一套模型调用的网关,结合限流和队列技术,平衡系统负载,提高部署效率。最开始以为只有TPP才能访问到模型部署的机器,后面发现知道IP之后,应用也可以直接调用模型的服务,就省去TPP这一层了。


问题4:  生成的内容后如何使用。

方案:  开发一些内容的配套工具,内容的导出,内容的检索(图片检索),内容标注,以满足不同场景需求。


展望

在第一阶段,大约一个月的时间主要关注于开发和上线,未来还有一些可尝试的计划和想法:

  1. 优化模型使用体验:后台体验和用户体验,当前只是确保具备对应的功能,但如何让管理人员介入进来更好的指导模型生产素材,还有很多体验优化可以做。 另外再用户侧的模型生成上,保证效果更好和更稳定。
  2. 自动化素材生成:看能否设定内容目标后,能利用模型自动化地生成内容,提升内容的规模和丰富性。
  3. 产品形态探索:从图片到视频,从图片到故事,或者配上音乐等等,有些形态看看是否要尝试,探索更有趣、更吸引人的产品形态。

跳出产品之外,一些预感即将会发生的,随着模型的性能以及效果变好之后,以后对专业的内容创作者依赖越来少,内容的生产效率越来越高。互联网上将会有越来越多的AI内容,针对每个人的个性化素材,解放人们的想象力...
当然内容过度也会有一定的影响,但最后肯定还是往好的方向发展。

考虑到越来越多的AI创新产品出现,本文所涉及到的AIGC能力我们在这次的开发中都沉淀到一个AI的平台,提供一些模型能力的复用, 对类似能力有兴趣的业务,可以探讨交流下,一起探索下更多AI的可能性。

团队介绍


大淘宝技术用户运营平台技术团队,是一支以用户为中心,技术驱动,正在积极探索AI的年轻队伍。我们坚持通过技术创新,提升用户全生命周期体验,持续为用户创造价值。以创新为核心价值观之一,我们鼓励团队成员在工作中不断探索、实验和创新,以推动业界技术的进步和用户体验的提升。

我们不仅关注当前业界领先的技术,更注重未来技术的预研和应用,尤其是AI的探索和实践。团队成员积极参与学术研究和技术社区,不断探索新的技术方向和解决方案。我们立足体系化,打造业界领先的用户增长基础设施,以媒体外投平台、ABTest平台、用户运营平台为代表的基础设施赋能阿里集团用户增长,日均处理数据量千亿规模、调用QPS千万级。

在用户增长技术团队,我们提供“增长黑客”极客氛围和丰富的岗位选择。如果你对AI技术有强烈的兴趣,喜欢探索、实验和创新,欢迎加入我们的队伍,一同推动AI在业界的应用和发展。

简历投递邮箱:[email protected]


¤  拓展阅读  ¤

3DXR技术 |  终端技术 |  音视频技术
服务端技术  |  技术质量 |  数据算法


本文分享自微信公众号 - 大淘宝技术(AlibabaMTT)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

博通宣布终止现有 VMware 合作伙伴计划 deepin-IDE 版本更新,旧貌换新颜 WAVE SUMMIT 迎来第十届,文心一言将有最新披露! 周鸿祎:鸿蒙原生必将成功 GTA 5 完整源代码被公开泄露 Linus:圣诞夜我不看代码,明年再发布新版 Java 工具集 Hutool-5.8.24 发布,一起发发牢骚 Furion 商业化探索:轻舟已过万重山,v4.9.1.15 苹果发布开源多模态大语言模型 Ferret 养乐多公司确认 95 G 数据被泄露
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4662964/blog/10149405