Amazing imagination! With just one sentence, AI can make up animation clips

Yue
Paihuai is produced by the qubit of Afei Temple | Public account QbitAI

640?wx_fmt=gif&wxfrom=5&wx_lazy=1

Have you seen The Flintstones?

This is an age-exposing question.

Anne Wei, The Flintstones is a cartoon comedy that first aired in 1960. The first season was rated 8.7 points by 22,000 users on Douban.

Now, this imaginative cartoon is used to train an AI with amazing imagination. How amazing is it? People who have seen it say goose girl~

640?wx_fmt=gif&wxfrom=5&wx_lazy=1

Just give a script, or a text description, and the AI ​​can generate a short animation in the brain. Notice! These little anime movies are brand new versions that you haven't seen before.

The method of generation is that AI finds the corresponding elements from the original animation according to the description and extracts them. Then adjust the size, proportion, position, angle, props, foreground, background, etc., and stitch them together again~

Come and watch the show directly.

To explain, Fred, Wilma, etc. are the names of the protagonists of this anime.

script:

Fred is wearing a red hat and is walking in the living room.

Here is the AI-generated video:

640?wx_fmt=gif

script:

Betty and Wilma chat in the living room. The two of them sat on the sofa, and you talked to me.

video:

640?wx_fmt=gif

script:

Fred was thinking and talking to himself while driving.

video:

640?wx_fmt=gif

script:

Betty is on the phone in the kitchen.

video:

640?wx_fmt=gif

how about it? Isn't it great?

The video below has more focus.


Datasets and Models

How does AI do it? To put it simply, first we have to build a dataset of "The Flintstones", which includes 25,000 cartoon clips (75 frames, about three seconds).

Each subsection is densely labeled.

The annotation information includes the name of the scene and the main characters: Fred, Wilma, etc. For infrequent supporting roles, there will be manual additions of simple annotations: policeman, old man in red, etc.

Then, the picture is segmented and reconstructed with the help of SLIC algorithm (Simple Linear Iterative Clustering), GrabCut automatic image segmentation algorithm, PatchMatch algorithm, etc.

640?wx_fmt=png

After this series of processing, the original material that can be used by AI is formed.

Of course, the top priority is the construction of AI models.

This model is called Craft (Composition, Retrieval and Fusion Network). Structurally, the model looks like this:

640?wx_fmt=png

主要包括三个部分:Layout Composer(布局编排器)、Entity Retriever(实体检索器)、Background Retriever(背景检索器)。

在“脑补”动漫小片时,Craft从空视频开始,根据脚本描述,依次添加场景中的实体。实体和背景检索器,会从数据集中搜索合适的素材,而布局编排器会对位置和比例进行调整。

最终上述种种融合,生成一段全新的小片。

640?wx_fmt=png

上面这张图,就是布局编排器的工作原理。

当然这中间还涉及很多数学公式啊,实验啊什么的。如果你对这些细节感兴趣,可以直接前往论文查看。

地址:https://arxiv.org/abs/1804.03608

这个研究,出自AI2、UIUC等机构的几名学者之手。

还不完美

当然,当然,现阶段,这个研究远非无懈可击。

比方,画面的重建还相对粗糙,能明显看出拼贴的痕迹。

还有,AI有时会在理解脚本和重建视频上产生问题。

例如,搞错姿势(站着->坐着)、打电话时听筒位置不对、背景和人物动作不同步等等。还有下面这个案例。

脚本:Wilma正跟Fred讲话,而他坐在饭厅的餐桌前读书。Fred专注读书,没听Wilma在讲什么。

640?wx_fmt=gif

如果你仔细看,能发现两个人物关系搞反了。

还有更糟的。

对于极端复杂的场景,例如包括三个或以上罕见的实体对象,Craft脑补出来的动漫小骗堪称“灾难”。

就像这样。

640?wx_fmt=gif

不过,这个研究的意义在于,AI对于文本的理解,以及基于其上的视频生成。一切还都有进步空间。

更远一点,也许未来的动画工作室,不会再有一堆堆天才的动画师,取而代之的是能快速生成动画片的AI。

诚挚招聘

量子位正在招募编辑/记者,工作地点在北京中关村。期待有才气、有热情的同学加入我们!相关细节,请在量子位公众号(QbitAI)对话界面,回复“招聘”两个字。

640?wx_fmt=jpeg

量子位 QbitAI · 头条号签约作者

վ'ᴗ' ի 追踪AI技术和产品新动态


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326075585&siteId=291194637