FaceChain V2, Human AIGC open source application platform

1. Overview:

     Facechain is a deep learning model tool platform that can be used to create a personal digital image. Users only need to provide a minimum of one photo to obtain their own digital avatar of their personal image. Combining different style models and photo templates, you can generate personal photo works beyond imagination. What’s more interesting is that facechain also integrates the functions of speaker and virtual fitting, making your digital avatar more vivid and real, and expanding more business value and implementation scenarios.

        Since the first open source v1 version of facechain in August, it has mainly done the following things: 1.) Promote community development (including but not limited to the production of live/recorded teaching videos, training courses into universities, creation of developer communities, etc.), 2 .) Promote application development (including but not limited to AI photo charity trip for the elderly, development of Lingji dashscope API, Wanxiang photo studio application, etc.), 3.) Iteration of core functions: one-shot training, infinite style plan, SDXL to improve image delicacy degree, adding functions such as virtual fitting, speaker video, and animation stylization. Relevant materials are as follows:

a.)开源项目:GitHub - modelscope/facechain: FaceChain is a deep-learning toolchain for generating your Digital-Twin.

b.)论文地址:https://arxiv.org/abs/2308.14256

c.) Free online experience:Tongyi Wanxiang_AI Creative Painting_AI Painting_Artificial Intelligence-Alibaba Cloud, a>FaceChain character photo generation

        The facechain team has a strong foundation in character perception and understanding technology. This year there is TransFace (ICCV 2023): https://github.com/modelscope/facechain/tree/main/face_module/TransFace, DamoFD (ICLR 2023 ):https://github.com/modelscope/facechain/tree/main/face_module/DamoFDTwo mid-draft works, and there are many more In the shot. Investing in the direction of character perception and understanding, the team aims to use a more convenient and stronger representational framework to complete the upgrade of perception and understanding technology in the AIGC era to further promote the development of character AIGC applications. Relevant excerpts from representative articles are as follows:

        Firstly, I will introduce the basic function optimization brought by facechain v2, secondly, introduce the expanded functions of facechain v2, and finally reveal the future plan of facechain v3 version.

2. Function optimization:

1.) One-shot training:

        In order to achieve one-shot training capabilities as much as possible, facechain v2 focuses on a.) How to reduce the distribution space of training samples, b.) Provide a better training initial point through pretrain, c.) Find suitable lora training hyperparameters Start with three aspects. Finally, through a large number of experiments, a relatively stable one-shot training capability has been developed. In 80% of cases, users can complete the finetune training of the corresponding character lora by uploading a single picture, thereby obtaining an exclusive personal image model, which greatly reduces training costs. In terms of the training character portrait method, facechain has reduced the training cost to 1/10 of SOTA commercial applications for the first time, achieving nearly one-shot training capabilities. The corresponding results are as follows:

        In addition, facechain is also developing a train-free ID-protected character generation method. At present, internal experiments have significantly exceeded the effect of IP-Adapter. It is expected to be released in facechain v3 version, referred to as facechain-FaceAdapter technology.

2.) Unlimited Style Plan:

        Compared with the initial version of facechain v1, the v2 version adds hundreds of exquisite styles, and the key point is that they are all free. Currently, on many picture/video sharing websites, there are already many videos introducing how to use facechain to generate free beautiful photos. Facechain has become a powerful tool for free and beautiful photo production. In addition, there are many freelancers using facechain to provide photo services for users, and many developers and companies are integrating facechain API. For the unlimited style plan, facechain is expected to provide a one-click photo style training interface in the facechain v3 version, providing a convenient and highly available one-click training interface for photo style production, referred to as facechain-StyleMaker technology. Partial open source free exquisite style

3.) SDXL photo texture:

        Facechain v2 integrates the powerful Vincent graph model SDXL 1.0. SDXL 1.0 is a new generation of lexicon graph models released by Statbility AI. Through various rigorous experimental verifications, SDXL has surpassed various versions of Stable Diffusion models, and is as effective as the current commercial grade lexicon graph model Midjournal. With the support of SDXL, the texture of facechain's portrait generation has made a qualitative leap. The following picture is the generation result of facechain based on SDXL:

        It can be seen that: 1) in terms of generated details, the generated images are more delicate and textured in details, 2) in terms of background blur, the blur of the generated images is more dynamic and layered, 3) in terms of character expressions, the generated images are more delicate and textured. The model's expression is more natural and expressive, and her smile is more gentle and friendly. The combination of facechain v2 and SDXL will open up a high-quality open source AI photo experience for users. Of course, there is still room for further optimization before professional-level photography lighting effects and so on. In this part, it is recommended to do more unique lora special effects model training. In the future, more and more photography lighting effects lora should appear. In order to further improve the quality of photo output, facechain is expected to collect more camera special effects lora or effective special effects solutions in the v3 version, referred to as facechain-SpecialEffects technology.

3. Function expansion:

1.) Virtual fitting:

        The topic of virtual fitting has been around for a long time. After the rise of the e-commerce industry, there have been related research and development discussions. Due to its WYSIWYG attribute, it can further enhance the user’s clothing purchasing experience. It can not only provide product display services for merchants, but also provide first-body experience services for buyers, which allows it to have two user attributes of B and C at the same time. With the rise of AIGC, virtual fitting has also achieved certain breakthroughs. Facechain v2 has expanded the virtual fitting function. The rendering is as follows:

According to whether the clothes need to be deformed and generated, virtual fitting can be divided into deformation-preserving ID and non-deformation-preserving ID. Among them, the non-deformation-proof ID (partial ID-proof) part has been open sourced in facechain v2. In addition, the deformation-proof ID virtual fitting technology has been submitted to CVPR and is expected to be open sourced in facechain v3, referred to as facechain-TryOn technology.

2.) Speaker video:

        Speaker generation aims to animate a given portrait so that its lip movements and audio are highly consistent, which is crucial in digital human applications. Facechain v2 integrates the mainstream open source algorithm SadTalker. Compared with other algorithms Wav2Lip and video-retalking, SadTalker can control the head posture and facial expressions, and can also control the blink frequency, and can output more vivid talking videos. In addition to the original driver functionality, facechain v2's speaker video module also supports the use of GFPEN as post-processing to improve generation quality, while for audio input, three options are supported, including 1) TTS synthesis, 2) microphone recording, and 3) local File upload, users can choose to input according to their own needs. In addition, users can select one of the previously generated portrait photos to drive, thus connecting the two functions of portrait photo generation and speaker generation, thus meeting the diverse and rich generation needs of users. The entire processing link is shown in the figure:

In the facechain v3 version, character video generation will be the most important application update direction. The facechain team will release the facechain-video function in the v3 version, which will cover MagicTalker, MagicSinger, MagicLife, MagicDay, MagicMove and other video functions.

3.) Animation stylization:

        Animation stylization can convert the character image of the input picture into a two-dimensional virtual image and return the cartoonized result image. facechain v2 integrates the DCT-Net portrait cartoon model. The DCT-Net model also provides 2D animation, 3D, hand-painted, sketch, and art-style face conversion of faces. Currently, facechain only supports 2D &3D animation face generation. DCT-Net has lower scale requirements for training data. Given a small number of target style samples, it can learn the mapping relationship, migrate the original style to the target style, and retain the original image content information. At the same time, DCT-Net not only has better facial style transfer quality and generalization capabilities, but can also perform style transfer on full-body images of people. Its new idea of ​​"calibrate first and then generate" is to align the target style domain formed by a few targets with the original domain, and then use this to assist the network, so that the model can better learn the mapping relationship between the original style and the target style, and use the geometric expansion module Reduce the spatial constraints to make the style transfer effect more accurate without losing the original image information. The network Pipeline is as follows:

4. Future planning:

        The facechain v3 version will continue to make efforts in both function optimization and function expansion: a.) At the function optimization level, it will focus on the zero-shot and human aigc solutions under the RLHF framework, which will not only increase the upper limit of the effect, but also significantly reduce the computing requirements. to the CPU level; b.) At the function expansion level, it will focus on the direction of facechain-video generation of character videos, including but not limited to MagicTalker, MagicSinger, MagicLife, MagicDay, MagicMove and other character video functions. In addition, the team will continue to build AIGC-friendly character perception and understanding technology to create a more convenient perception and understanding infrastructure for character AIGC applications.

        ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​If you have like-minded students who want to work together, you can contact the facechain team.

Guess you like

Origin blog.csdn.net/sunbaigui/article/details/134870981