The practice of AI text creation and publishing on Baidu App

Author | Content Ecosystem Team

Introduction

Large language models (LLMs) refer to language models containing tens of billions (or more) of parameters, which are often trained on large-scale data sets to improve their performance and generalization capabilities. After the content creation tool is connected to Wenxinyiyan's AI capabilities, it can provide users with more intelligent and personalized services, helping authors reduce the difficulty and cost of creation, improve creation efficiency, and better create their own works. This article briefly describes the basic process of AI text creation based on the Wenxin Yiyan model. It is also an initial attempt to combine content creation with AI. As the innovative application of generative AI continues to advance, updates based on pictures and videos will be released one after another. Many unique gameplays, so stay tuned!

The full text is 4732 words and the estimated reading time is 12 minutes.

01 background

With the rapid development of science and technology, large model technology in the field of artificial intelligence has become increasingly attractive. In the wide application of large models, we see its great potential and value in various scenarios. The development of large model technology not only promotes innovation and change in various industries, but is also changing our understanding and expectations of artificial intelligence.

Large language models (LLM) refer to language models containing tens of billions (or more) of parameters. These models are usually trained on large-scale data sets to improve their performance and generalization capabilities. The emergence of large-scale models has benefited from improvements in computing power and increased availability of data, allowing researchers to build more complex and powerful models to solve a variety of real-world problems.

Common large models such as OpenAi's ChatGPT and Baidu's Wenxinyiyan can better understand and generate natural language. By training on large amounts of text data, they are able to capture grammatical, semantic and contextual information in language. This makes them more accurate and smooth when answering questions, providing explanations, generating text, and conducting conversations. And once it was launched, it attracted widespread attention from the society. The technical development of large language models has had an important impact on the entire AI community and has completely changed the way we develop and use AI algorithms.

After dynamic posting on Baidu App is connected to Wenxinyiyan's AI capabilities, it can provide users with more intelligent and personalized services, helping authors reduce the difficulty and cost of creation, improve creation efficiency, and better create their own content.

02 Project introduction

The figure below shows an example of AI text creation in Baidu App - Publishing News.

Insert image description here

Content input

Insert image description here

AI writing poetry

Insert image description here

AI poetry writing completed

2.1 Overall architecture

picture

The AI ​​creation of Baidu App Dynamic Publisher is based on the auxiliary creation function of Baidu Wenxinyiyan, which can intelligently generate copywriting based on user input, rewriting, and some prompt words.

The overall business is divided into three layers:

1. The top layer is the business layer, including mobile AI-assisted creation, AI notes, etc. implemented on the client/H5/mini program;

2. The middle is the strategy layer, which provides prompt template configuration capabilities, input and output policy control capabilities, configuration information management capabilities, etc.;

3. The bottom layer provides basic services such as Wen Xin Yi Yan and risk control services.

2.2 The whole process

picture

The business layer calls the Baijiahao Creative Brain service to obtain account permissions, function permissions, prompt template configuration and other information, and displays function types according to permissions, such as daily updates, AI poetry writing, travel and travel, etc.; the user enters copy and accesses the Creative Brain copy. The interface is generated, and the creative brain verifies the account, permissions, and template information. After the permission verification passes, it accesses the risk control vocabulary and performs risk control management on the input information. Finally, the creative brain carries prompts and user input questions to access the Yiyan service, and obtains the AI Polished copywriting.

03 Key technologies

3.1 Prompt

Prompt (prompt word) is an instruction or question input to a large machine learning model to guide it to generate corresponding answers or output. It is typically a text string that describes the required information or task and provides context and guidance to the model to produce the desired response. For example, when conducting dialogue, text generation or other tasks based on the Wenxin Yiyan large model, Prompt plays a role in guiding the model to generate output. Well-designed prompts can help the model understand the user's intentions more accurately and generate relevant and useful answers, which has real industrial/social value. A good, high-quality prompt needs to give full play to our understanding and imagination, and mobilize the capabilities of large models, thereby improving our work efficiency.

3.1.1 Basic definition of Prompt

Prompt (prompt word), in short, is the text description that drives the large model to express.

Prompt formula = task + generation body + details (optional) + form (optional).

Task : The type of task you want the model to complete, such as writing a poem

Subject : generated object, such as writing a poem about summer

Details : Whether to include detailed output such as expressions

Format : typesetting, content style

Usually high-quality prompts meet the following three points:

Clear expression : easy to understand, concise and clear expression, not only can the model generate good content, but ordinary people can also understand the meaning.

Strong universality : on similar tasks, good results can still be achieved even after changing the subject word.

Stable generation : In the case of the same prompt word, the content generated multiple times is stable enough.

Quality prompts + large models = quality content.

3.1.2 Prompt configuration

Baidu App Dynamic Publisher provides a variety of AI creation functions. Each capability will have a built-in description to facilitate users to ask questions and get copywriting that meets their expectations. We provide the following categories:

Full-text continuation : Large models are required to use concise language to continue writing for users.

Full text rewriting : The big model is required to be an article rewriting assistant, correcting language errors and polishing the user's content

Daily dynamics : The large model is required to write a short daily dynamics based on user input.

AI Poetry : Asking a large model to create short poems for a topic entered by the user

Good product recommendation : The large model is required to use a lively language style and write a good product recommendation copy

Travel and travel : Ask the big model to write an essay about travel and travel to help users express their experiences.

3.2 Risk control

Add defensive instructions to prompts to prevent malicious prompt injection, manipulation of prompt words, and inducing the model to return unexpected results. Therefore, we will go through the following preventive measures throughout the entire production process:

  • Enter the content, return the content, access the content, risk control capabilities, and pass the risk control vocabulary;

  • One word returns to clear the screen and clears the user input content when the safe word list is hit;

  • Record user information and ban high-risk accounts based on the risk of input and output content;

  • Regularly inspect historical records for analysis.

3.3 SSE protocol

In order to improve the fluency and response speed of chat, Wen Xinyiyan adopted SSE as the server push technology. It allows the server to send events to the client. Compared with WebSockets or long and short polling technology, SSE provides a simpler way to implement push.

The following content needs to be added to the response header:

Content-Type: text/event-stream; charset=utf-8
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no

The client establishes a connection with the server through an HTTP GET request and specifies receiving text/event-stream type data. After receiving the request, the server does not return a response immediately, but keeps the connection open and generates a reply based on the user's input. After the server generates a reply, it sends the reply to the client as an event and keeps the connection open, waiting for the next input. After receiving the event, the client parses the data in the event and displays it on the chat interface. The client and Servers can continue to exchange data through the same connection until the client closes the connection or the server encounters an exception.

Compared with other solutions, SSE is simpler to use and does not require adding any new components. You only need to use existing back-end languages ​​and frameworks. SSE completely reuses the existing HTTP protocol and therefore can run directly on existing proxy servers and authentication technologies. SSE provides a native EventSource object on the browser side, which can easily monitor and process events sent by the server. SSE supports disconnection reconnection and message tracking functions, which can ensure data integrity and consistency.

3.4 Gradient flow display component

The client will perform streaming display based on the data returned by the server. It is divided into the following stages: initial waiting, displaying, and displaying end. During several state transitions, there are also different changes in the display of styles.

picture

Initial waiting : the cursor needs to be displayed and flashed.

Displaying : The text is required to be displayed one by one with the cursor flashing.

End of display : The complete copy needs to be displayed and the cursor hidden.

3.4.1 Custom TextView

Taking Android as an example, first initialize the two states of waiting and display, which looks very similar to Android's EditText component. However, the EditText component needs to handle combined events such as cursor display, focus acquisition, keyboard pop-up, and shielded editing. The function is not pure. There may also be adaptation issues. Considering this, I decided to implement it by inheriting TextView and customizing View.

We only need to consider three points, one is word-for-word display, the second is text gradient, and the third is cursor.

display verbatim

  • We can use a custom Handler timer to continuously intercept and display the text.

text gradient

  • Referring to the commonly used ForegroundColorSpan source code, you will find that ForegroundColorSpan inherits CharacterStyle and implements UpdateAppearance, and finally changes the text color by rewriting the corresponding updateDrawState method. Similarly, we can also achieve a gradient effect of text color by setting the brush to a gradient color and giving the starting and ending positions of the gradient color. The gradient of the brush Paint can be realized through the LinearGradient in the commonly used API.
override fun updateDrawState(tp: TextPaint?) {
    tp ?: return
    val leadingWidth = tp.measureText(containingText, 0, gradientStart)
    val gradientWidth = tp.measureText(containingText, gradientStart,
            gradientEnd)
    val lineGradient = LinearGradient(
            leadingWidth,
            0f,
            gradientWidth,
            0f,
            intArrayOf(startColorInt, endColorInt),
            floatArrayOf(0f, 1f),
            Shader.TileMode.CLAMP
    )
    tp.shader = lineGradient
}

cursor

  • Cursor addition: Consider how to display the cursor at the end of the copy every time? Here we refer to ReplacementSpan, also through a customized Span. There are two methods in ReplacementSpan: getSize() and draw() methods. getSize() The return value of this method will be used as the width of the text to be replaced. Draw() draws the cursor we need into the canvas in this method. We only need to draw a rounded rectangle of appropriate size in draw(). as cursor.
override fun getSize(paint: Paint, text: CharSequence?,
                  start: Int, end: Int, fm: Paint.FontMetricsInt?): Int {
    return paint.measureText(" ").toInt()
}

override fun draw(canvas: Canvas, text: CharSequence?, start: Int, end: Int, x: Float,
                  top: Int, y: Int, bottom: Int, paint: Paint) {
    canvas.drawRoundRect(x,
            top.toFloat(), x + width, bottom.toFloat(), rx, ry, cursorPaint)
}
  • Cursor flashing: Here you can use ValueAnimator animation to update the Alpha value and set the Alpha value to the transparency of the brush.
override fun draw(canvas: Canvas, text: CharSequence?, start: Int, end: Int, x: Float,
                  top: Int, y: Int, bottom: Int, paint: Paint) {
    cursorPaint.alpha = (alpha * 255).toInt().coerceAtMost(255)
    canvas.drawRoundRect(x,
            top.toFloat(), x + width, bottom.toFloat(), rx, ry, cursorPaint)
}

04 Summary

This article briefly describes the basic process of AI text creation based on large models such as Wenxin Yiyan. It is also an initial attempt to combine content creation with AI. As the innovative application of generative AI continues to advance, images and videos based on images and videos will be released one after another. Stay tuned for more special gameplay!

——END——

Recommended reading

DeeTune: Design and application of Baidu network framework based on eBPF

Baidu self-developed high-performance ANN search engine, open source

Storage solutions as products - Midgard exploration

Development history of Baidu vertical offline computing system

MMKV application optimization practice of Dujia Editing App

Guess you like

Origin blog.csdn.net/lihui49/article/details/132824542