Building an Immersive AI Text Editor: Design Principles and Ideas of an Open Source 3B Editor

With the help of AI immersive experience design on AutoDev and IDE, we began to build an AI-native text editor to explore immersive creative experience. It is suitable for requirement writing, architecture document, etc. Documentation scenarios to accelerate the daily work of multiple roles in software development.

GitHub: https://github.com/unit-mesh/3b (The project is still in the AI ​​experience design stage, and there is no access model yet. If you have a model, you are welcome to access it and sponsor it yourself)

Online preview:https://editor.unitmesh.cc/

Introduction

4e355fd0666d8ab5c4c9ea1adb89d335.jpeg

Over the past few months, we’ve been exploring what the best Copilot-style tools are. In the process, the AutoDev tool we built has also become the open source AI-assisted IDE plug-in that best fits Copilot's definition. Designed around developers' daily activities, the experience is enhanced with generative AI to create an immersive coding experience.

In daily work, writing code is only part of our many tasks. We also have a lot of recording work to do - such asRequirements Document, Architecture documents, development documents, etc. Similar AI-assisted immersive authoring tools are needed for these document jobs.

As a frequent writer and the author of several technical books, I have been building and re-building editors that suit me:

  • Based on Electron and micro-front-end architecture: Phodit (https://github.com/phodal/phodit)

  • Knowledge management tool based on Rust language: Quake (https://github.com/phodal/quake)

  • WeChat public account markdown rendering tool: MD (https://github.com/phodal/md)

That's why I'm probably a semi-expert in this area. For this field, the problem domain has changed from a writing tool to: How to combine knowledge management  + Intelligence Enhance + Search enhancements to build a better authoring experience?

Different AI scenarios for text content creation

e90063f71db1710761bb0f0f119dbf3a.jpeg

Writing is part of my wife’s daily work. However, their industry field is Yue Opera - a traditional theater type. Many times she also wants to take advantage of AI, but it often backfires: it is difficult for them to obtain sufficiently accurate professional knowledge from generative AI such as Bing, Baidu Wenxin or ChatGPT.

AI-Assisted Writing: Inspire Content with Generative AI

ChatGPT shows very good qualities when it comes to writing content from an audience perspective. However, when combined with generative AI writing, some inexplicable factual problems will arise:

Yue Opera "New Longmen Inn" is a modern Yue opera TV series jointly produced by Zhejiang Yue Opera Troupe and Zhejiang TV Station. It is adapted from the film "Crouching Tiger, Hidden Dragon" by the famous director Ang Lee. The TV series premiered on Zhejiang Satellite TV in December 2022. It aroused strong social response, with ratings reaching new highs and more than 1 billion online views, making it a phenomenal work.

Therefore, at this time, we can only refer to the basic logic of its writing. Based on its creative logic, we can then expand our thinking on cultural undertakings.

AI-Assisted Writing: Putting Facts into Context

In order to avoid so many professional problems, we can often use search engines to write. There are now generative AI tools like Baidu Yiyan and Bing that can get real-time web results to build better context. We can get better facts:

The Yue Opera "New Longmen Inn" is a new national-style and environmental Yue opera jointly produced by Zhejiang Xiaobaihua Yue Theater, Baiyue Cultural Creativity Co., Ltd., and Taihaoxi Ultimate Performing Arts Culture Communication (Shanghai) Co., Ltd. The play premiered on March 28, 2023 at the Butterfly Theater in Hangzhou, Zhejiang. The plot tells the story of the middle of the Ming Dynasty, when the eunuchs were in power. Cao Shaoqin, the governor of Dongchang, killed Yang Yuxuan, the Minister of War, and wanted to trap Yang Yuxuan's old subordinate Zhou Huai'an by hunting down his descendants...

So, as we know, given this context, generative AI can generate more accurate facts.

AI-assisted writing: contextual and historical content association

A few years ago, knowledge management tools like Notion were all the rage for good reason—knowledge is connected. When writing an article, in order to lay some groundwork, we need some historical materials as context, such as cultural self-confidence and self-improvement:

As an important part of Chinese traditional culture, Yue Opera has irreplaceable cultural value. In the process of innovation and development, Yue Opera must adhere to cultural self-confidence, carry forward the essence of traditional culture, and allow the audience to feel the charm of traditional Chinese culture while appreciating art.

Depending on the different backgrounds and occasions, these "historical materials" need to be compiled in combination with different scenes.

AI-assisted writing: content spelling and content optimization

Finally, after we have constructed the first draft, we need to look at whether there are any expression problems and some leadership ideas to improve the paragraphs so that our content is more in line with the speaker's ideas.

Therefore, we can use different tones and let generative AI help us optimize the content:

Finally, once the first draft is completed, we need to review whether there are any omissions in its expression and accept the leadership's suggestions to further optimize the content and ensure that it is more in line with the speaker's intention.

This type of work is very tedious and repetitive.

AI-assisted requirements writing

068f1659f421061e0abb57823e8c245f.png

Back in the field of software development, I believe that the pain points for most people are:The requirements are not clearly written. If the former is not clear, it will lead to problems with subsequent implementation functions, thus wasting a lot of time on reworking requirements.

After having AutoDev’s rich development experience and being limited by domestic and foreign models and open source model capabilities, the focus on demand tools should be:After generating the demand draft, Assist requirements refinement process. Review AutoDev's capabilities and processes for coding:

  • Read and analyze historical code (optional).

  • Generate preliminary skeleton code (optional).

  • Perform automated code completion.

  • Generate automated tests and documentation.

  • Refactor and optimize code.

Therefore, when we develop a similar requirements writing tool, we need to implement the following functions:

  • Historical requirements clarification (optional). Obtain historical requirements from code and release documents as part of functional features.

  • Draft generated. Generate a requirements outline that fits the content.

  • Refine the subkeys. According to the requirement feature specifications, the content required by different sub-items is generated, such as flow charts, etc.

  • Optimization requirements. Check whether there are any omissions in the requirements.

And this also means that, just like content writing, we need to integrate AI into the entire generation cycle of writing requirements.

Design principles for immersive AI

0af12ce922ec53df5c81b69b655ed2ca.png

After all the context we've covered, I'm sure you'll be on the same page as me: do the same as Microsoft Copilot-style tool enhancements is the most realistic. So, how should we consider this issue?

Therefore, when we move from the development side to the demand side, the first question to consider is:Is there an editor that is easier to extend?

3B Editor: Rethinking AI Editors

Based on the above thinking, we started to create a "new" content editor - of course based on the open source editor. When designing this editor, we looked at a range of existing editors:

  • Notion. This year my main tool for writing is the desktop version of Caton.

  • Jira AI. As a requirements assistant, Jira demonstrates a very good editing experience.

  • Microsoft Word. The world's most famous veteran editor.

  • Some other AI editors.

As well as a series of AI IDE-related experiences (JetBrains AI Assistant, GitHub Copilot, Bloop, Cursor) that we accumulated while developing AutoDev. Since we define 3B as an immersive AI editor, AI must be at your fingertips in this editor, and a lot of customization can be performed at the same time.

In the current version of the 3B editor, the main focus is on three design principles:

  • Smart embedding. Deeply integrate artificial intelligence with the user interface to ensure that AI models are cleverly introduced in various interface locations of the editor to achieve a more intuitive and intelligent user interaction experience.

  • Local optimization. By introducing a local reasoning model, we strive to provide an efficient and smooth writing experience in the user's local environment.

  • Contextual flexibility. Through the context API, users are given tools such as custom prompts and predefined contexts to more flexibly shape the editing environment.

There may be other points that I forgot to consider.

Principle 1. Intelligent Embedding

adb3ef7f7794f314bbf6ea1b7d2ab6a1.png

As we describe in the documentation, we are committed to intelligent embedding to provide users with a seamless AI-native UI interaction experience. Here are five ways to trigger AI-enhanced writing:

  1. Toolbar button triggering: By clicking the specially designed AI button on the toolbar, users can intuitively trigger AI commands, making the AI ​​function more directly controllable during the editing process. .

  2. Shortcut key trigger: Users can easily trigger AI commands by pressing the / key, providing a quick and convenient way to interact with AI interacts.

  3. Customized input box display: By pressing Control + / (Windows/Linux) or  (macOS), we have introduced a custom AI input box, allowing users to customize AI instructions more flexibly and enhancing users' sense of control over AI. Command + /

  4. Text selection bubble menu: While selecting text, the user can easily view the selection relative bubble menu, thereby triggering the corresponding AI function, providing users with a Intuitive and smart operation.

  5. Inline completion triggered:  by pressing Control + \ (Windows/Linux) or , we have introduced the triggering method of inline completion to provide more detailed and efficient AI function operation. Command + \

As local reasoning capabilities continue to improve, the editor will automatically trigger more advanced AI functions more intelligently, bringing users a more natural and intelligent writing experience.

Principle 2. Local optimization

fa40c9a1023e4c17ba9e83e820e631f6.png

In the previous article "Exploring the Transformation of Interactive Experience and Edge Intelligent Infrastructure" , introduces some of the trends we see in the size and speed of large models. Therefore, it is necessary to optimize and consider: Local(on-premise)Model priority versus on-premises< Optimization of a i=7>(on-premise). These optimization methods are diverse:

  • Semantic search enhancement. Since historical documents are a key part of writing, we need to vectorize them locally and conduct related vectorized searches, such as what we did in "Semantic search design of code base AI assistant a>》The architecture introduced.

  • Local syntax check. There are already many functions for this, I believe everyone knows it.

  • Text prediction. Similar to writing code, single-line and multi-line content completion will also greatly improve writing - the problem is to build a corresponding small model.

And these contents are just to supplement some materials when we write, to facilitate providing more context.

Principle 3. Contextual flexibility

d4bdd7cc809b6b05274f12444182f596.png

In AutoDev, we provide powerful customization capabilities, from custom documents and specifications for individuals to Team Prompts for teams. Similarly, it is similar for writing. Due to different scenarios, people require different AI capabilities.

  • Context is a variable. Different from AutoDev, all contexts of 3B are variables, namely: $beforeCursor$afterCursor$selection$similarChunk$relatedChunk , you can use it to combine your prompts.

  • Customize all prompts. In 3B, the AI ​​function provided by the system is just configuration information, and you can override it.

  • Extensible variables (implementation in progress). Writing requires a range of supplementary information, such as background information and Internet material, which should be sent to the large model as part of the context.

With this flexibility, Deepin users have significant control over the AI’s generative capabilities.

How is 3B implemented?

510e40d5a34c9c80f0775bbf75912524.png

Finally, let's get back to the more challenging technical part. The above series of complexities have resulted in us still designing interactions, rather than focusing on how to quickly access LLM online (PS: In fact, it is mainly because there are not enough people to develop it).

Technology selection

In fact, there are already a large number of AI editors on the market, and there is not much difference in selection:

  • Basic editors ProseMirror, Tiptap: ProseMirror provides very good flexibility, allowing you to customize various capabilities, and has rich AI extension components.

  • Local inference EdgeInfer: EdgeInfer is a local inference model we built using Rust + Onnx Runtime to run on cross-platforms such as browsers, mobile terminals, and desktops.

  • Desktop framework Tauri: It can make full use of Rust's infrastructure and can also run on the desktop.

See our Roadmap for details: https://github.com/unit-mesh/3b/issues/1.

data-driven context

Since we only need to interact with the AI ​​and the editor, we only need to abstract these two parts. Then, the corresponding implementation method is to use the data structure as an abstraction of the API. As follows:

{
  name: 'Polish',
  i18Name: true,
template: `You are an assistant helping to polish sentence. Output in markdown format. \n ###${DefinedVariable.SELECTION}###`,
  facetType: FacetType.BUBBLE_MENU,
  outputForm: OutputForm.STREAMING,
}

Providing the system's AI capabilities through configuration can provide a lot of flexibility. Of course, this also means: The complexity of the system increases.

In the above example configuration:

  • name and i18Name determine the AI ​​capability name displayed to the user and whether it is internationalized.

  • The template contains a series of variables to convert into AI prompts.

  • facetType defines the type of interaction with the user, such as toolbar, slash menu, bubble menu, etc.

  • outputForm, that is, how the returned content should be output.

Of course, there is other configuration information to help developers better customize the system's capabilities.

other

Tool indicators

For an immersive AI tool, what we need to optimize and consider are assessment indicators. Many organizations and teams use the acceptance rate as a test indicator for tools, but it is not that reasonable - the acceptance rate reflects more of the model and AI engineering the quality of. Among heavy users, the acceptance rate is bound to be high. As an immersive AI tool, frequency of use is a better instruction - that is, how to make AI more comfortable. How to separate model indicators from tool indicators? This is a very interesting topic.

Why is it called 3B?

Because: 2B young people have a lot of joy.

Summarize

The pit has just been dug, everyone is welcome to contribute.

GitHub: https://github.com/unit-mesh/3b (The project is still in the AI ​​experience design stage, and there is no access model yet. If you have a model, you are welcome to access it and sponsor it yourself)

Online preview: https://editor.unitmesh.cc/

Guess you like

Origin blog.csdn.net/gmszone/article/details/134635976
Recommended