Dify.AI User Face-to-face Meeting: Dify Product Planning and LLM Application Landing Frequently Asked Questions

On the evening of July 22, Dify's creative team and users temporarily organized a high-quality online communication event. The exchange meeting mainly focused on Dify's product planning, users' exploration and understanding of LLM, and the problems and confusions encountered by users in the process of using Dify. I believe it can provide good ideas and references for all friends who create applications based on LLM or Dify. Friends who missed it, look here. We have compiled and summarized relevant questions and discussion points for your reading reference (the Question section is for questions raised by different users, and the Answer section is for the understanding and answers of the Dify team).

6.1.jpg

About Dify Product Planning

Dify products have been concerned and loved by many developers and friends since they were launched. More than 30,000 applications (cloud version only) have been created on the platform. We hope to continue to meet the needs of users' application scenarios in terms of product capabilities, so that everyone can Really apply the application to the actual business and work. Zhang Luyu, founder of Dify, first synchronized Dify’s recent key product plans that everyone is most concerned about:

About Model Support

At present, the supported models are mainly OpenAI series and Claude series models. After testing, they have the best relative effect, so they are supported first. Other models are not yet ready for production or are not yet supported due to compliance and other issues. Considering users' interest in different models, we will add a batch of new models next week, including business models like Ali Tongyi, Wenxin Yiyan, Xunfei Xinghuo Cognition, etc., and open source models like LlaMa, etc. Of course, Dify also We will try our best to be compatible with the models built by users. You can host them on Hugging Face Hub, and Dify will support access in the future.

6.2.png

Plug-in ecological planning

There are many plug-in solutions currently on the market, and Dify generally supports native plug-ins of its own platform and external plug-ins of the OpenAI standard. Now popular plug-ins such as networking plug-ins, including those related to domestic security compliance review, will also be included as our native plug-ins. Existing dataset functionality also has the opportunity to be a native plugin. We will launch a general Agent Chat mode recently, which is used to adjust plug-ins and realize collaboration within the team, and the orchestration function may be adjusted later.

Q1: Have you considered building an application market, allowing users to publicly publish or vote for other people's applications?

A1: Dify's positioning is that the technology stack service provider does not focus on the application market. Its mission is to lower the threshold of application development and improve development efficiency, which is significantly different from the general application market. We have seen many application markets and application navigation in China, but the risk of compliance is high. At present, the application market has not reached the mature stage. Dify is willing to share some good applications and templates.

6.3.png

Q2: What does the upcoming Agent look like?

A2: It will be rolled out in several stages:

The first stage is the tool version, which is mainly for invoking tools, which can be searched online, call API, etc., and a demo version will be launched first among the general applications of Workspace.

The second phase plan is an autonomous agent, similar to AutoGPT. We have already studied this in February. It is very interesting in concept, but because it cannot be put into production, the calculation resource consumption is very large, and the actual application effect still needs to be verified, so it will not be given priority for the time being. roll out.

Q3: It is not easy for us to apply for the LLM model API as an individual. Can Dify provide quota purchase services?

A3: Dify is expected to provide the purchase channel of OpenAI and Claude model usage quota on the product interface. After the user purchases, the system will automatically charge it in. The second situation is that some developers have already bought agent keys through various domestic channels. We will also support this.

Dify.AI/LLM application problem communication

At the exchange meeting, the Dify team also conducted in-depth discussions and exchanges on the problems encountered by users in the process of using Dify's landing applications in recent months. For the convenience of reading, the following content is organized in the form of Q&A.

Q1: When can finer manual adjustments be made to the dataset segmentation function? Can custom split boundaries be implemented?

A1: Dify's segmentation planning requires relatively large adjustments. The original intention of Dify is very important. The fine-tuning of the technical stage of the existing model may not be something that most users can directly use, so the idea of ​​Dify is to put embedding and data sets as the top priority from the very beginning. Yes, it is also a function that everyone uses more. Because we are a developer product, or a toB product, we require that the content be controllable and precise. In the future, a large number of PDF and various data sources of web pages will be connected, which will inevitably require very precise adjustments to the segmentation. Recently, we have been optimizing (for example, algorithms such as databases). But the more pressing question for everyone is that if you have a 100,000-word document, the system has a strategy to help you pre-clean up and divide it into 1,000 segments for you, which may be 300-500 word fields, but this Obviously, it still can't achieve the effect that everyone wants AI to answer. Regarding this problem, we are currently working on two solutions:

  1. Dify will soon support the QA-type data structure, that is to say, if the user uploads the text, it will be divided into 1000 segments, and the system will help you generate 10 or more QA pairs in the future. After this QA pair is generated, basically It can cover the questions that users will ask, and when the large model is doing embedding and matching, it can use the questions to match the questions. This effect must be better. At the same time, the QA of a segment can support manual editing for us, which can alleviate some problems.

  2. We will redesign the segmentation interface that you see now. Users can see the full-text effect on this interface. For example, after uploading a PDF, the mouse crossing operation will achieve relatively fine manual segmentation. If you are not satisfied with the section, you can manually adjust it.

Q2: Why does the text generation sometimes break through the limit, and the answer does not match the prompt words? (For example, a prompt is set: it is required to only answer questions related to a certain aspect, but not to answer questions related to another aspect. But when talking to AI, he may go astray). Are there any proposals in the direction of long-term memory?

A2: Everyone who uses any large model, including ChatGPT, may have such a problem. You give an instruction, which we call the pre-prompt word. He answered the previous question very well, but then he lost control after beating around the bush. This is due to the technical principles of large language models. These models employ text generation, and their responses tend to favor the last text seen. Therefore, even if the prompt word limit is set, the model will be affected by the previous input when generating the answer, so that the answer may deviate from the prompt word limit. Currently, most of the mainstream models do not fully address this issue, and it may be necessary for vendors to adjust the weight of the system-level prompt words of the model or perform model fine-tuning to improve this limitation. In the future, Dify will adopt the method of checking and resetting, allowing the machine to summarize the previous conversation into a simplified prompt as a memory, then add the system prompt words, and then regenerate the answer, so as to improve this problem. But there will be a lot of detailed engineering issues. We make the product Dify and hope to reduce these engineering work as much as possible in the process of making applications, because the optimal solutions and best practices here are obtained through repeated experiments. As a result, it does not mean that we have a fixed routine now, because every model will be different.

There are several types of long-term memory according to the life cycle. The first is in the entire dialogue life cycle, the second is fixed to the user, and the third is that we may add some filtering conditions to the data set. If your data set It is very large, and you can search data according to the user's ID classification or subscribed commercial products, similar to the where statement in circle.

Q3: Why does Dify only have one pre-prompt input, unlike GPT, which has three interfaces: system message, user message, and assistant? Why do I ask the same question in other tools (such as ChatGPT) and Dify, but the output on both sides is different?

A3: The first stage of Dify is to lower the threshold and make it easy for everyone to use, so we simplified many engineering concepts. For example, for the question of system or user prompt words, we have tested the effect in advance on both sides, and wrote the best way after the test, so basically you only need to write a prompt. But with the subsequent subdivision of the model, the same prompt may have different situations. Therefore, the prompt layout page will also iterate the advanced version according to the complex situation in the future, so that users can see the link of our entire prompt.

There are two possible reasons for the different output results. The first model parameter setting is incorrect. The second reason is that in some scenarios, Dify will splice prompts and data sets into more complex prompts, so it is not the same prompt that you originally tested with other tools. The purpose of this is to use the data set to make the reply more in line with actual needs.

Q4: There are several aspects in a conversation, the first is the prompt, the second is the data set, the third is the context, and the fourth is the user’s more specific questions. Then, in the actual reply of AI, the weight of these aspects Can it be set?

A4: There are indeed a lot of things in the application orchestration you see now. We have a distribution ratio called token for this pile of things, and the weight you mentioned is called token distribution ratio in our case. Under a token with a limited model, which part accounts for more and which part accounts for less, if you want to adjust this weight, you can check it through the public source code. We may provide this ability to adjust input allocation in the future to improve flexibility. Currently, it is relatively simple packaging.

Q5: In the scenario application of AI customer service, how to realize the business process of actively obtaining user information in the dialogue (such as guiding users to provide information such as order number and mobile phone number) without going astray?

A5: On the one hand, this is still a problem with the model. It would be better to replace it with GPT4. On the other hand, the restrictions can be written more clearly, and the temperature of the parameters can be adjusted to the minimum. AI will basically answer according to the data set, otherwise it will be more open Some. After the release of Dify's Agent, there will be a "human-assisted dialogue" function. If AI judges that the context given by the user is incomplete, it will actively return the question when answering this question, and let the user continue to add. In this process, AI in turn asks users to supplement information. More complex business processes allow users to provide information in forms and other places, and then call the API to obtain the information and then generate a reply.

Q6: What Dify parses is PDF, TXT, CSV and other document-type data. If there are some situations that include both document and relational data, such as financial reports and quarterly reports released by listed companies, I want to use these data as a corpus, and then develop Can an application in the financial field support documents and relational data at the same time?

A6: Dify considered the form data processing method in the initial design, but found that the effect was not good, because there was redundant information in this method, resulting in a lower hit rate. In order to solve this problem, we consider preprocessing structured data such as Excel before hitting to improve the effect. In the future, we may cooperate with a company that is good at database-related work. We plan to cooperate with them to develop a Dify plug-in to solve data problems by processing structured data.

Regarding using the API to call the database, we have also designed a language before, which is specially used to make short and precise API calls and improvements for LLM. You can understand it as a compressed version of Swagger, only about 50% of the original character length. The essence of API calling is to add Post parameters and field comments to the URL. There is no essential difference from writing code, but the syntax is different. If your model ability is not very strong, theoretically there will be no essential difference in the effect, so if you can directly connect to the database through Dify, the effect will be better than calling the API. Because there are some problems in calling the API, for example, the plug-ins used by AI now take up a lot of tokens, which leads to limited and slow return results.

Q7: How to realize an AI virtual teacher product that integrates games and can interact? (The plan is to implement an AI virtual teacher that can interact with students in text and make learning suggestions. The follow-up goal is to enrich the form of interaction, such as supporting picture interaction, giving feedback based on works, and designing personalized learning plans. Ultimately, I want to It is embedded in the game scene and interacts with the game NPC to learn knowledge points to improve learning fun.)

A7: The complexity of AI virtual teacher products lies in the technology involved in multiple fields, including natural language processing, game development, and interaction design. At the same time, the application of large models also faces some challenges, such as the need to solve multi-modal situations and the processing of long dialogues. Therefore, when realizing such a product, it is necessary to realize the function step by step in combination with the actual situation and technical level, and plan and iterate the development of the product according to the progress of the existing technology.

(over)

The above is the key content that has been sorted out for your reference. Although it was on a Saturday night and it was organized temporarily, the quality of the exchange was very high. I believe the issues discussed above can inspire everyone. This small online communication activity has also been supported by many users, so we will conduct similar learning and communication activities from time to time. Let everyone make progress together on the road of LLM application landing, so stay tuned!

If you like Dify, welcome to:

  • Contribute code on GitHub and build a better Dify with us;

  • Share Dify and your experience with your friends through online and offline activities and social media;

  • Light us up ⭐️ on GitHub.

Guess you like

Origin blog.csdn.net/DifyAI/article/details/132002977