AIGC-related fields have experienced explosive growth. In the field of image-based AI, professional tools such as Midjourney and SD have appeared, and apps for generating virtual images such as Miaoya Camera and Meitu Xiuxiu have also been launched.
The current user groups and merchants on Taobao can consider designing an AI tool that combines users and merchants, with the theme of AI-generated user product endorsements, to help merchants improve the attractiveness of product displays, and at the same time enhance users through personalized and innovative virtual images. experience.
In this context, product classmates designed a generative product that allows users to have a sense of immersion, Taotao Xiu (also called AI Buyer Show), which allows users to upload photos to create product endorsements that look like me but are better than me. Combined with some interactive gameplay, it can arouse users’ creative interest and improve business indicators.
Keywords: innovative image AI applications, easy user creation, content sharing, and integration with merchant brands.
The entire product will be more complicated. Here we mainly discuss some technical capabilities related to AIGC. Under the product design of the buyer show, there are the following demands on AIGC’s related capability dependence:
Generate high-quality material templates, product materials that users want to endorse, in order to achieve better results; generate
Combine material templates and user pictures to generate user-related pictures; user image generation
On the basis of user pictures, consider replacing background styles to improve richness (under planning, not yet online, please write about it); background generation and replacement
Related product links, this is relatively early, some adjustments have been made to some of the plans in the survey map, but the general process is similar:
Generate material -> Configure template -> Generate user endorsement map -> Carry out mapping;
At the same time, we are considering the generation of some video categories.
▐Comparison of image generation solutions
When the requirement is (real person + scene + product category) , some relatively good material picture cases are generated for users to use, and several models are used in the process.
Now from the perspective of post-mortem, I think we should make some comparisons between these models and consider several dimensions:
Accuracy (ease of use): The consistency between the image generated by the model and the description of the prompt word
Scalability ; API access and automation: whether the model supports API access. If it supports API, running tasks will free up operations. Affects speed and efficiency.
Success rate: Approximately how many pictures can be used to have a usable photo, and the success rate is within an acceptable range.
Effects of different models:
Model: Midjourney
Features: High ease of use; no scalability; success rate as high as 50%;
Advantages: high generation quality, good real-life effects, and can generate complex images.
Disadvantages: access restrictions; no API, cannot directly communicate with the system. ;Rate limit, a single user can generally only generate once per minute.
Rendering:
BadCase:
Sometimes the face becomes deformed
The probability of failure is still relatively high.
Overall conclusion:
Midjourney performs best in generating effects, but its process requires continuous manual participation, which means high time costs.
Comparing the effects of Wanxiang and Stable Diffusion, Wanxiang has better effect. If you want to scale up, consider using Wanxiang;
SD is less effective in general scenarios, but the SD model provides comprehensive customization capabilities.
Based on their respective characteristics, they are summarized as follows:
▐Comparison of simulation image generation solutions
How to make the generated pictures contain corresponding character characteristics so that users have a stronger sense of involvement. Our algorithm classmates investigate different solutions, digital clones and face swapping. The approximate effect is as follows:
Taking into account resource issues and material quality issues behind it, the face-changing link also uses the mainstream Roop model.
▐Change the background scheme (under testing)
Currently, the only solution available is SD's Inpaiting solution, which involves pulling out the characters, using SemanticGuidedHumanMatting, and then supplementing the background. Because the style behind it is generated, the prompt words may not cover all scenes, and there are certain restrictions on the input images. There is a certain degree of uncontrollability in the results.
The effect is still being explored to see what form is more suitable.
Some limitations:
People should not occupy too little space; the background should not be too large.
People should not hold things in their hands, and characters should not rely on some items, such as sofas, sitting, etc. This will also generate strange content.
▐Model process series
Hope scene: Suitcase - Man - Airport
调整提示词:An Instagram-style portrait that serves as a luggage advertisement featuring a 20-year-old Chinese boy. He's sitting inside an airport with a suitcase next to him, holding a cup of coffee. The background is the airport, creating a high-end atmosphere. You can see the boy's complete face and facial features. He's posing dynamically and relaxed, creating a sophisticated composition, shot using a film camera, 8k
用通义万相随机生成四张照片。(提示词好的话,生成的成功率感觉还可以,效果大家可以评估下到底如何)
在淘宝客户端搜索【淘淘秀】
点击【淘淘秀】进入到对应的小程序。
开始我的代言,上传自己的照片
生成用户的代言照片;
可以选择自己喜欢的代言照片发布到广场,也可以选择私密。
搜索【淘淘秀】 |
|
开始代言,传照片 |
生成代言 |
选择代言 |
问题与处理
在应用AIGC时遇到的一些问题与处理;
问题1:模型在特定场景下生成效果不佳
方案:引入外部的Midjourney,人工生产与导入。一些内部模型可以生成的,选择内部模型批量生成组合多个模型使用。
问题2:线上生成效果不稳定,资源消耗大。
方案: 离线生成,人工筛选。预先生成内容以减少资源消耗,并提高内容质量的一致性。
问题3:每部署一个模型,都要写一套TPP;
方案: 利用vipserver进行模型匹配和调用,写一套模型调用的网关,结合限流和队列技术,平衡系统负载,提高部署效率。最开始以为只有TPP才能访问到模型部署的机器,后面发现知道IP之后,应用也可以直接调用模型的服务,就省去TPP这一层了。
问题4: 生成的内容后如何使用。
方案: 开发一些内容的配套工具,内容的导出,内容的检索(图片检索),内容标注,以满足不同场景需求。
-
优化模型使用体验:后台体验和用户体验,当前只是确保具备对应的功能,但如何让管理人员介入进来更好的指导模型生产素材,还有很多体验优化可以做。 另外再用户侧的模型生成上,保证效果更好和更稳定。 -
自动化素材生成:看能否设定内容目标后,能利用模型自动化地生成内容,提升内容的规模和丰富性。 -
产品形态探索:从图片到视频,从图片到故事,或者配上音乐等等,有些形态看看是否要尝试,探索更有趣、更吸引人的产品形态。
大淘宝技术用户运营平台技术团队,是一支以用户为中心,技术驱动,正在积极探索AI的年轻队伍。我们坚持通过技术创新,提升用户全生命周期体验,持续为用户创造价值。以创新为核心价值观之一,我们鼓励团队成员在工作中不断探索、实验和创新,以推动业界技术的进步和用户体验的提升。
我们不仅关注当前业界领先的技术,更注重未来技术的预研和应用,尤其是AI的探索和实践。团队成员积极参与学术研究和技术社区,不断探索新的技术方向和解决方案。我们立足体系化,打造业界领先的用户增长基础设施,以媒体外投平台、ABTest平台、用户运营平台为代表的基础设施赋能阿里集团用户增长,日均处理数据量千亿规模、调用QPS千万级。
在用户增长技术团队,我们提供“增长黑客”极客氛围和丰富的岗位选择。如果你对AI技术有强烈的兴趣,喜欢探索、实验和创新,欢迎加入我们的队伍,一同推动AI在业界的应用和发展。
简历投递邮箱:[email protected]
本文分享自微信公众号 - 大淘宝技术(AlibabaMTT)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。