The practical application of AI image-generating aesthetics in Taobao





This article introduces how to formulate and apply aesthetic standards to evaluate and improve the quality of images generated by artificial intelligence, especially in the field of e-commerce. It is mainly divided into four categories: formulating aesthetic standards, training aesthetic models, applying aesthetic models, and upgrading Taobao style models. step.



Definition and analysis of aesthetics


  1. Image quality standards: Under the modern design framework, the defined image quality standards are basically unified. Focusing on the definition of skills and techniques also extends to the quality evaluation of pictures, paintings, photos, and images. On this basis, there will be requirements and emphasis on the characteristics of the means of making pictures.
  2. Image content standards: The requirements for expression quality under ideology are extensive, and image quality standards will be broken to serve the needs of content expression. It is usually defined and interpreted by authoritative figures such as critics or judges in the industry.


Aesthetics Project Goals


  1. The first step is to formulate aesthetic standards : formulate AI drawing standards and AI style standards, and jointly research with China Academy of Art and professors. Highlight professionalism, pertinence, objectivity and authority.

  2. Step 2 - Training aesthetic model: Cultivate an aesthetic judgment model based on AI aesthetic standards so that the machine can automatically judge and score.

  3. Step 3 - Apply the aesthetic model: Guide the optimization and upgrade of the Taobao AI image generation model based on the aesthetic model capabilities.

  4. Step 4 - Upgrade Taobao style model: Establish a Taobao style model library based on style standards, so that merchants have a rich and diverse style model to choose from. Create Taobao style model.


Step One: Develop Aesthetic Standards


The criterion framework is defined based on the components of "image", while focusing on " AI-generated characteristics " to build aesthetic standards:

Image composition: object shape/environment/composition/light and shadow/texture

AI generation characteristics: element authenticity & scene rationality

AI aesthetic standards: 5 guidelines, 19 standards


Step 2: Train the aesthetic model


  1. Aesthetic model goal: Improve the accuracy of automatic machine scoring and judgment of images.

  2. Accuracy rate: The same picture is subjected to aesthetic AI scoring and manual scoring, and the overlap rate between human and machine scores is taken.


▐Immersive experience  



Our AI aesthetic evaluation model adopts multi-modal aesthetic pre-training and multi-task fine-tuning learning methods. The advantages of doing this are as follows:

  1. Our model has fewer parameters, allows for fast training iterations, fast inference speed, can quickly screen high-aesthetic images, and can also evaluate the generation effects of different generation models, reducing manual annotation and review costs;

  2. Compared with models that only output aesthetic scores, our model can output abnormal attributes of generated images, which has higher interpretability;

  3. The abnormal attributes output by our model can be used as a pre-discriminator for image restoration, and can also be used to optimize the generation model for abnormally generated image marking;


▐Training process  


Develop scoring specifications based on aesthetic standards and establish a 5-point scoring rule, which is marked by designers to accumulate high-quality AI training data:
  1. Formulate scoring rules: scoring specifications for AI generated images (5 levels), and scoring rules for original image screening (3 levels).
  2. Ability to evaluate the aesthetics of the original mannequin image: Based on the preference for image quality such as the mannequin, environment, composition, light and shadow, texture, etc., a specialized aesthetic model of the original mannequin image is trained for aesthetic layering. Filterable low-aesthetic types include blurry images, white-bordered images or textures, incomplete or cropped human faces, heavily blocked human bodies, poor backgrounds or poor overall aesthetics, etc.
  3. AIGC Aesthetic Evaluation Capability of Raw Pictures: Our AIGC Aesthetic Evaluation of Raw Pictures is mainly aimed at raw pictures containing characters. Starting from two aspects, focusing on the rationality of the picture and focusing on the integration of the picture, the score is formulated based on 5 major criteria and 19 standard requirements. rules, and at the same time mark the abnormal attributes of the raw graph. The abnormal attributes currently supported by our model include abnormal integration between people and the background (characters hanging in the air, poor background texture, etc.), hand abnormalities, facial abnormalities, limb abnormalities, other abnormalities, etc. The output aesthetic score ranges from 1 to 5 points.

Figure: Pictures of different aesthetic scores predicted by AIGC raw image aesthetic evaluation


Reasonable training: multiple rounds of matching verification between humans and machines to ensure high quality data.

  1. 1 round of scoring test: Take the average score of 3 people to accumulate data to ensure objective scoring. The difference section reinterprets the specific problem points presented by the difference. Perform verification again. Ensure that different people’s interpretations of the Code are consistent and stable (5-point system).

  2. 2 rounds of AI scoring verification: take the average score of 3 people and proofread it with the machine. If there is a difference in score, reinterpret the specific problem points of the difference to clarify whether it is a human problem or a machine problem, ensuring that the two are gradually consistent and ensuring machine understanding. accuracy. (This will start after the first version of the AI ​​judgment model is available).


technical framework

  1. AIGC raw drawing aesthetic evaluation: based on the 5-point aesthetic criteria defined by the designer, mapped to five quality levels. At the same time, we conducted an inductive analysis of the generated data and summarized five major attributes: normal, abnormal fusion of person and background, hand abnormality, facial collapse, body abnormality, and other abnormalities. The quality level and attribute reasons are combined to form an aesthetic evaluation prompt word, which is used as the input of the multi-modal pre-training model. The loss function uses aesthetic score regression loss and attribute reason multi-label classification loss.

  2. Aesthetic evaluation of the original mannequin image: CLIP has a good zero-shot capability of good/bad classification in terms of aesthetic evaluation of image quality, color, lighting, composition, abstract concepts, etc. Therefore, in the pre-training stage, we improve the aesthetic representation ability of backbone by distilling CLIP's image encoder. The fine-tuning stage uses the improved backbone to predict the normalized aesthetic score. The loss function is weighted by L1 loss and binary cross-entropy loss to improve the performance and robustness of the model. After the model training is completed, by selecting different thresholds, human model pictures with different aesthetic levels can be layered.


▐Testing phase   


Based on the test situation, analyze current machine problems or human problems, and continuously tune the accuracy of the model. Continuously evolve and tune in this process.
  1. Tuning versatility: Test Taobao’s internal [Qianniu Intelligent Model] and Taobao’s external third-party models on the Qianniu platform . The same type of mannequins were evaluated and found to be compatible, but there were significant differences. When crawling specific image issues, we found that the quality of the uploaded original image will have an impact on the accuracy. To ensure fairness, standards for test atlases need to be developed.
  2. Authenticity test of machine scoring : The accuracy rate will fluctuate to a certain extent every week, and a standard test set will be constructed based on the model conditions. Use 1,200 standard test sets for AI and manual scoring (considering that the difficulty of the original pictures will affect AI judgment, the test set is divided into three levels: easy, medium, and difficult, with a ratio of 1:1:1).
  3. Rigorous test of machine scoring: The tuned scoring model will automatically score newly generated images and compare them with human scores.


Step Three: Apply the Aesthetic Model


Goal: Use aesthetic models to improve the rate of good drawings of Taobao AI large models.


▐Aesthetic model version 1.0 - application of AI image evaluation capabilities:  


  1. Goal: Use the aesthetic model to evaluate the Taobao generation model, determine picture scoring and picture problems, and repair the identified picture problems.
  2. Judgment ability: You can score pictures (1-5 points), screen out good pictures and bad pictures, and guide subsequent optimization suggestions for the model.
  3. Recognition ability: Currently, 5 key screen attributes can be fed back. (1. Abnormalities in the hands. 2. The person does not blend with the background. 3. Abnormalities in the face. 4. Abnormalities in the body. 5. Others).
  4. 修复能力:AIGC生成人物时画好的手一直是难点,人的手部自由度高且姿态复杂多变、图中占比小且细节多,导致画手的成功率不高。特别地,在实际业务中,由于用户上传的图片手部细节不明显或者手中拿着物品等复杂场景,在进行换模特换背景时,生成模型往往不能学到手部的准确细节特征导致画出不好的手。我们探索全新的手部修复技术方案。由 AI美学评价模型判断生成异常的手,对异常的手,利用3D手部状态重建模型保持正确的手指数量与手的形状,同时能够自适应生成图像中所需的手势。基于我们内部基底模型,融合Text Embedding,根据重建后的手部姿态重新绘制正常的手。经过反复调试参数和场景适配,我们的手部修复方案在业务数据上测试,修复成功率超过50%,可大幅度提高整体的生图良图率。手部修复的case如下:


  美学模型2.0版本-应用原图评测能力


  1. 目标:调优淘宝基地模型,目前有混杂的原图数据集,数据集质量参差不齐,需要进行有效的筛选优化。

  2. 背景:目前原图数据集来源核心是两部分:视觉中国和淘宝模特图。
    视觉中国的摄影图核心是供给给新闻稿做新闻配图,因此大量的图片为了营造故事性对人物和场景有独特的表达。淘宝模特图商家已经做了后期处理,有些诸如模特的处理已经比较夸张。

  3. 筛选优质原图:通过原图判定模型,筛选优质摄影图,调优自研模型等数据集效果。提升生图的良图率。(如多人混乱、背景混乱,场景融合感等效果可提升)。
    收集专业摄影原图:目前通过设计团队搜集优质的摄影模特图。

  4. 1.0版本的AI美学评价模型影响生成模型,使生成模型自适应对齐人类偏好:AI美学评价可用于指导基于扩散的生成模型,不仅指导生成模型要生成高美学图像,也需要减少生成低美学图像的概率。为了解决这个问题,我们利用AI美学评价模型在低美学异常生成图像加上异常属性标签,增强模型学习异常生成图像概念的能力,可以在推理阶段避免。


第四步:升级淘宝风格模型


目标:打造淘宝特色风格模型。
风格标准的归纳:风格框架已经设定完成,内容量较大,将联动校企合作研究生,根据我们的要求逐步填充风格内容。

  风格的背景情况


  1. 目前风格选择的丰富性不足,生图的场景和人物集中在特定的几个类型上。原先对于风格的设定采用穷举的方式。如背景生成的场景基本上是泳池、花园、商场、海滩、森林、雪山。

  2. 因为原图本身的来源关系,图片的地域场景特色基本是西式。诸如东南亚的海滩、欧式花园、美式商场、美式泳池、北欧雪山。

  3. 因为采用穷举的方式,导致工具的选择项过多,体验比较复杂,商家使用过程中会选择困难,采用不断尝试的方式。


  风格的框架设定


  1. 对应美学标准的五大原则。进行细分的穷举,作为组合因子。
  2. 风格类型分为平台品牌风格、趋势热点风格、经典艺术风格三类。
  3. 基于风格趋向进行因子组合。形成风格的多元组合。


  风格标准的运用


基于前台AI产品进行风格应用。通过用户使用数据反馈,进行风格的排序与汰换。逐步累积商家需要的风格。


  后续计划


  1. 美学标准:发布淘宝AI美学标准,联动中国美术学院完成。

  2. 风格标准:风格化标准完善,建立淘宝独有的风格体系。同时在产品侧进行测试。

  3. 产品能力:发布 AI paas产品能力,联动千牛产品团队部署上线,提供给集团相关自研AI与第三方AI进行服务,也同步提升兼容性。


¤  拓展阅读  ¤

3DXR技术 |  终端技术 |  音视频技术
服务端技术  |  技术质量 |  数据算法



本文分享自微信公众号 - 大淘宝技术(AlibabaMTT)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

Linus 亲自动手,阻止内核开发者用空格替换制表符 父亲是少数会写代码的领导人、次子是开源科技部主管、幼子是开源核心贡献者 华为:用 1 年时间将 5000 个常用手机应用全面迁移至鸿蒙 Java 是最容易出现第三方漏洞的语言 鸿蒙之父王成录:开源鸿蒙是我国基础软件领域唯一一次架构创新 马化腾周鸿祎握手“泯恩仇” 前微软开发人员:Windows 11 性能“糟糕得可笑” 虽然老乡鸡开源的不是代码,但背后的原因却让人很暖心 Meta Llama 3 正式发布 谷歌宣布进行大规模重组
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4662964/blog/11054257