CVPR 2023 | AI that can imitate handwriting and create exclusive fonts for you!

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —> [Target Detection and Transformer] Exchange Group

Reprinted from: Heart of the Machine

Researchers from South China University of Technology, National University of Singapore, Hong Kong Polytechnic University and Pazhou Laboratory jointly proposed an interesting handwritten character generation method, which only needs to provide a small number of reference samples to imitate the user's writing style, and then generate handwritten characters that meet the Styled arbitrary text.

Research background of handwriting imitation AI

As the saying goes, words are like faces, and words are like people. Compared with rigid typefaces, handwriting can better reflect the personal characteristics of the writer. I believe that many people have imagined that they have a set of handwritten fonts of their own, which can be used in social software to better show their personal style.

However, unlike the English alphabet, the number of Chinese characters is extremely large, and it is very expensive to create a set of your own exclusive fonts. For example, the newly released national standard GB18030-2022 Chinese character set contains more than 80,000 Chinese characters. It is reported that a blogger on a certain video website spent 18 hours writing more than 7,000 Chinese characters, consuming a full 13 pens in the middle, and his hands were numb!

The above problems have triggered the author of the paper to think, can we design an automatic text generation model to help solve the problem of high cost of creating exclusive fonts? In order to solve this problem, the researchers envisaged to propose an AI that can imitate handwriting, and only need a small number of handwriting samples (about 10) from the user to extract the writing style contained in the handwriting (such as the size of the character, the degree of inclination, Aspect ratio, stroke length and curvature, etc.), and copy this style to synthesize more characters, so as to efficiently synthesize a complete set of handwritten fonts for users.

60574e1863a8658310f464c0888a22bb.png

Furthermore, the author of the paper considers the input and output modes of the model as follows from the perspectives of application value and user experience: 1. Considering that the online handwritings of the sequence mode are better than the offline handwritings of the image mode Text (offline handwritings) contains richer information (the detailed position and writing order of track points, as shown in the figure below), setting the output mode of the model to online text will have a wider application prospect, for example, it can be applied to robot writing and calligraphy education. 2. In daily life, it is more convenient for people to take pictures of mobile phones to obtain offline text than to obtain online text through capture devices such as tablets and touch pens. Therefore, setting the input mode of the generated model as offline text will make it more convenient for users to use!

494fe5987f51685fcacfa35b1244e78f.png

To sum up, the research goal of this paper is to propose a stylized online handwriting generation method. The model can not only imitate the writing style contained in the offline text provided by the user, but also generate controllable handwriting online according to the user's needs.

f057769cbb0f3cc702347bc728bf8253.jpeg

  • Paper address: https://arxiv.org/abs/2303.14736

  • Code open source: https://github.com/dailenson/SDT

main challenge

In order to achieve the above goals, the researchers analyzed two key questions: 1. Since the user can only provide a small number of character samples, can the user's unique writing style be learned from only these few reference samples? In other words, is it feasible to copy a user's writing style from a small number of reference samples? 2. The research goal of this paper not only needs to meet the controllable style of the generated text, but also needs to control the content. Therefore, after learning the user's writing style, how to efficiently combine the style with the text content to generate handwriting that meets the user's expectations? Next, let's take a look at how the SDT (style disentangled Transformer) method proposed by CVPR 2023 solves these two problems.

solution

Research motivation  The researchers found that there are usually two writing styles in personal handwriting: 1. There is an overall style commonality in the handwriting of the same writer. The style commonalities vary. Because this characteristic can be used to distinguish different writers, researchers call it writer's style. 2. In addition to the overall stylistic commonality, there are inconsistencies in detail between different characters from the same writer. For example, for the two characters "Hei" and "Jie", both have the same four-dot water radical in character structure, but there are slight writing differences in this radical in different characters, which is reflected in the length of stroke writing , position and curvature. Researchers refer to this subtle stylistic pattern on glyphs as glyph style. Inspired by the above observations, SDT aims to decouple the writer and glyph style from individual handwriting, hoping to improve the ability to imitate the style of user handwriting.

5f445f55cedd95eb6b022d7d76fb05ed.png

After learning the style information, unlike previous handwritten text generation methods that simply spliced ​​style and content features, SDT uses content features as query vectors to adaptively capture style information, thereby achieving efficient fusion of style and content , generating handwriting that matches the user's expectations.

3a0e9963a678392dae2ef5291fbd11f2.png

Method framework The  overall framework of SDT is shown in the figure below, which includes three parts: double-branch style encoder, content encoder and transformer decoder. First, this paper proposes two complementary contrastive learning objectives to guide the writer branch and grapheme branch of the style encoder to learn the corresponding style extraction respectively. Then, SDT uses the transformer's multi-head attention to dynamically fuse the style features and content features extracted by the content encoder, and gradually synthesize online handwritten text.

b85e4adcded17defb8fdb0b377ae6363.png

(a) Writer Style Contrastive Learning SDT proposes a supervised contrastive learning objective (WriterNCE) for writer style extraction, which gathers character samples belonging to the same writer together, pushes away handwritten samples belonging to different writers, and explicitly guides The writer branch focuses on stylistic commonalities in individual handwriting.

(b) Contrastive learning of glyph styles In order to learn more detailed glyph styles, SDT proposes an unsupervised comparative learning objective (GlyphNCE), which is used to maximize the mutual information between different perspectives of the same character, and encourages glyph branches to focus on learning detailed patterns in characters . Specifically, as shown in the figure below, first do two independent samples on the same handwritten character, obtain a pair of positive samples containing stroke detail information , da8330b2b66259a89c55818329bb8c79.pngand 3d8208ff72c2eb804f00ce424dc68fd3.pngthen sample negative samples from other characters 44e6fc3c44f808466ea398df91e90ec8.png. At each sampling time, a small number of sample blocks are randomly selected as new views containing details of the original samples. The sampling of the sample block obeys the uniform distribution, avoiding some areas of the characters from being over-sampled. In order to better guide the grapheme branch, the sampling process directly acts on the feature sequence output by the grapheme branch.

086c3e67d5685ebc8ec8864f69bdff22.png

(c)  Fusion strategy of style and content information After obtaining the two style features, how to efficiently fuse them with the content encoding learned by the content encoder? To solve this problem, at any decoding time t, SDT regards the content feature as the initial point, and then combines q and the output trajectory points before time t 96fa00470254f1d7be2d1ed697f205f5.pngto form a new content context 4b5bc02a9450904d78d028ed8c3d8644.png. Next, the content context is treated as a query vector, and the style information as a key & value vector. Under the fusion of cross-attention mechanism, content context and two kinds of style information are dynamically aggregated sequentially.

experiment

Quantitative evaluation  SDT has achieved the best performance on Chinese, Japanese, Indian and English datasets, especially in the style score index. Compared with the previous SOTA method, SDT has made a great breakthrough.

dbccb21b1e4356a355c9d0705438e002.png

b0ca05a3d9b8086038ebe40e8a2bc6a6.png

Qualitative evaluation  In terms of Chinese generation, compared with previous methods, the handwritten characters generated by SDT can avoid the collapse of characters and can well copy the user's writing style. Thanks to glyph style learning, SDT can also do a good job in generating stroke details of characters.

502993133879fb4c751abd7956fc0bcb.jpeg

SDT also performs well on other languages. Especially in the aspect of Indian text generation, the existing mainstream methods are easy to generate broken characters, but our SDT can still maintain the correctness of character content.

faf458eaf349c7e0b34cb7796a183999.jpeg

The impact of different modules on the performance of the algorithm As shown in the table below, each module proposed in this paper has a synergistic effect, effectively improving the performance of copying the user's handwriting. Specifically, the addition of the writer's style improves SDT's imitation of the overall style of the character, such as the inclination of the character and the aspect ratio, while the addition of the glyph style improves the stroke details of the generated character. Compared with the simple fusion strategy of existing methods, the adaptive dynamic fusion strategy of SDT fully enhances the performance of character generation in terms of various indicators.

c3ccdd28f9079e0edd023674727188e9.jpeg

06313428992e7761e10e9e45c938fd86.png

Visual Analysis of Two Styles Fourier transform the two style features to get the following spectrogram. It is observed from the figure that the writer's style contains more low-frequency components, while the glyph style mainly focuses on high-frequency components. In fact, the low-frequency components contain the overall outline of the target, while the high-frequency components pay more attention to the details of the object. This finding further validates and explains the effectiveness of decoupled writing styles.

7194b54ad028d07ec8f9410457d3f8e2.jpeg

Outlook

You can create your own exclusive fonts through handwriting AI to better express yourself on social platforms!

Click to enter —> [Target Detection and Transformer] Exchange Group

The latest CVPR 2023 papers and code download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

Background reply: Transformer review, you can download the latest 3 Transformer review PDFs

目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者ransformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watcha36a996957ef8ff0b9b4db2cf0a98683.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/131255806