Taotian Group released the top ten challenges for large model application

New technologies represented by AI are becoming new driving forces for global business development. Since last year, Taotian Group has launched a series of in-depth cooperation with universities in key areas of AI.

Recently, Taotian Group has gathered specific issues on basic models and e-commerce application scenarios, and released ten challenging propositions for large model applications to university teachers and students and the whole society. Everyone is welcome to work together to overcome the problems. If you are interested in the following proposition,please click to participate in this challenge. We have prepared generous rewards. You are welcome to participate!

picture

1.Basic model proposition

1.1 Professional field pre-training model

✪ 1.1.1 Technical background

Unlike ordinary LLM-based chatbots, which generally focus on factors such as smooth dialogue, large models also put forward higher requirements for the professionalism of answers in e-commerce business. The model needs to be able to provide professional, accurate, and real-time answers in the e-commerce field, and this ability is a shortcoming of large models trained mainly on general corpus.

For obtaining real-time information, retrieval enhancement is an effective way, but in addition to obtaining basic information, the model also needs professional information processing capabilities and domain knowledge in vertical fields. These capabilities are difficult to significantly improve through downstream optimization such as sft, and they require capacity building in pre-training.

✪ 1.1.2 Technical Challenges

  1. How to obtain and construct training data for professional fields so that the model can learn professional knowledge that is difficult to cover with general data.

  2. The amount of professional data is usually limited. How to use professional data efficiently to avoid model forgetting.

  3. How to balance professional capabilities and general abilities? Different from purely professional models (such as codeLLama) oriented to specific fields, our goal is a general model with strong professional field capabilities.

✪ 1.1.3 Technical demands

  1. Pipline for data collection and processing in the construction professional field.

  2. Explore data strategies and training strategies for capacity building in professional fields. Strengthen the professionalism of the model in a specific field without reducing other capabilities.

1.2 Questions and Answers on Decomposed Reasoning for Complex Tasks

✪ 1.2.1 Technical background

In e-commerce business, there is often an expression gap between the user's expression of needs and the direct information of the product. For example, a user expresses a desire to send gifts on September 10, which implies two layers of logic: September 10 is Teacher's Day; Gifts are usually given to teachers. Therefore, the actual needs of users are most likely to be gifts for teachers on Teacher’s Day. If the user expresses that he wants a refrigerator for rental houses, the implicit message may be that he wants a refrigerator that is cheap and takes up little space.

If the model wants to accurately understand the user's intention, it needs to have certain common sense in the field of life and consumption, and use this common sense to reason about the user's expression and align the natural expression with the objective expression of the product.

At present, large models generally use special corpus learning methods to learn knowledge in specific fields. For example, by adding textbooks, question banks and other corpora, the subject capabilities of the model can be greatly improved. However, common sense of life is too broad to be explicitly exhaustive and summarized. Although the general corpus training model can make the model have certain common sense, the boundaries of its capabilities are uncontrollable, and it is difficult to iterate and improve.

✪ 1.2.2 Technical challenges

  1. The model needs to learn a lot of common sense about daily life and consumption. These common sense are difficult to express explicitly in a way similar to triples/description statements. When the capabilities of the normally trained model are not enough to meet business needs, how to improve its capabilities.

  2. If the model has basic common sense, can it use common sense to make correct inferences. In actual problems, more than one reasoning step may be involved.

  3. The model needs to judge by itself whether inference is needed and to what extent the inference ends. This is different from mathematical reasoning with clear goals.

✪ 1.2.3 Technical requirements

  1. A set of methods to improve the mastery of model knowledge during pre-training, sft and other model training processes.

  2. During the alignment process, the model can use the knowledge it has mastered to analyze, understand, and reason about user needs.

1.3 Long text technology for large models

✪ 1.3.1 Technical background

In business applications of large models, some scenarios will encounter long text problems, such as: multi-round conversations with retrieval enhancement. If it is necessary to retain the retrieved information used in each round of dialogue, the context length of the entire conversation will quickly increase. Growth; for another example, scenarios involving Tool calls, prompts, tool parameters, data formats, tool results, etc. will occupy a lot of context space. Usually, considering factors such as training cost, the Pretrian stage of the pre-training model will not use an excessively long context window. This proposes to use small-scale continued training, interpolation and other means on a model that has been pre-trained to make the model The proposition of the ability to have longer context processing windows.

✪ 1.3.2 Technical challenges

  1. The effect of the post-processed long sequence model has declined compared to the original long sequence model. How to narrow this gap.

  2. Through methods such as sparsification, the model can be equipped with long sequence processing capabilities in the pre-training stage, but this may affect the model effect. How to balance sequence length, training overhead, and model effectiveness.

✪ 1.3.3 Technical demands

  1. A set of methods to extend the sequence length of an existing pre-trained model.

  2. Research methods to support long sequences in pre-training, balance sequence length, training overhead and model effects.

1.4 Overcoming the problem of knowledge illusion

✪ 1.4.1 Technical background

Large models have strong memory capabilities and can remember the knowledge expected in training. However, in actual business, we found that large models cannot guarantee the accuracy of the output content. For example, if you ask the big model about the parameters of a non-existent mobile phone, the big model will still give a seemingly certain answer. This kind of knowledge illusion will bring great risks to the business in the e-commerce scenario.

From the perspective of information compression, it is impossible for a large model to remember all the information in the training corpus losslessly. Overcoming the knowledge illusion can be divided into two directions:

  • Let the model remember the knowledge it has seen more accurately.

  • Let the model know that it "doesn't know".

✪ 1.4.2 Technical Challenges

  1. How to make the model memorize knowledge more accurately, for example, there is a parameter library of 3C digital products in the training corpus, and how to reduce the probability of the model making up random questions when asked relevant questions.

  2. How to let the model understand its own knowledge boundaries, and give clear rejection responses to questions that exceed the knowledge boundaries.

✪ 1.4.3 Technical requirements

  1. In pre-training, explore and design training strategies and data strategies so that the model can more accurately memorize the knowledge existing in the corpus.

  2. Through sft, RLHF and other means, the model can give clear rejection responses to questions outside the knowledge boundary.

1.5 Path decision-making in tool use

✪ 1.5.1 Technical background

Large models accumulate rich world knowledge during the pre-training process, so that they can have powerful reasoning and decision-making capabilities in complex general interaction environments. However, for tasks in specific fields, such as Taobao e-commerce, which require domain-specific knowledge and decision-making logic, large models still have certain limitations.

These problems can only be solved by using more professional tools or domain knowledge. Therefore, large models need to have the ability to call various professional tools to provide more comprehensive and accurate support for real-life tasks. For example: by calling the "Vincent Diagram" tool, the large model expands its capability boundaries by generating the descriptive language of the Vincent diagram. By calling the "Taobao Product Search" tool, more professional, accurate, and timely product knowledge is introduced into the large model.

Tool learning agent development:

picture

Type of tool:

picture

✪ 1.5.2 Technical Challenges

How to make full use of the model's intent understanding and reasoning capabilities, correctly select tools, and provide interpretable tool calling paths

The returned results of the tool may be documents, tables, structured data, etc. How can the model make full use of the returned knowledge of the tool and summarize and integrate it into a response that meets the needs?

✪ 1.5.3 Technical requirements

  1. Optimize the ability of large models to understand tools, select tools, and call tools, plan tool calling paths, and solve problems with the highest calling efficiency.

  2. Study how to enable large models to accept and understand different forms of tool output, such as documents, tables, human feedback, and even pictures, and thereby enhance the final results of the large model.

2.Large model e-commerce application proposition

2.1 Information utilization in large model retrieval enhancement

✪ 2.1.1 Technical background

In e-commerce applications, we introduce retrieval enhancement to solve the problem of knowledge illusion under e-commerce. For example, for the user's question "Computers costing around 2,000 yuan", we can retrieve computer products with real-time prices of more than 2,000 yuan by searching Taobao's product library, and feed relevant product information to the model, thereby giving users a more accurate reply. But while introducing search enhancements, we also discovered the following problems:

  1. Due to the different organizational forms of the database, in some cases, while we obtain the e-commerce information needed to solve the problem, we may also introduce a large amount of irrelevant information, resulting in the model needing to extract and summarize the problem from a large amount of information. the correct information needed. For example, for the real-time question "Does Li Jiaqi have lipstick during tonight's live broadcast?", we can get the list of products for Li Jiaqi's live broadcast tonight through retrieval, but the model needs to determine whether there is lipstick in the list.

  2. Business search knowledge cannot be guaranteed to be completely correct. The model needs to be able to distinguish between correct and incorrect knowledge based on retrieved text information and user input questions, so as to answer the correct answer.

✪ 2.1.2 Technical challenges

  1. How to reason and summarize information related to the question from a large amount of retrieved information, and use this information to generate reasonable responses.

  2. How to enable the model to reject using retrieved incorrect information.

✪ 2.1.3 Technical requirements

  1. The retrieval-enhanced SFT training method enables the model to have both "retrieval information summary reasoning" and "retrieval information rejection" capabilities.

2.2 Tool calling with vague intentions

✪ 2.2.1 Technical background

We have planned different solutions to undertake different user intentions through different tool calling paths.

For example, if a user asks "How should I choose a dress?", we will first use the "Decision Factor" tool to get the dimensions that should be paid attention to when choosing a "dress", and then let the large model give the user reasonable suggestions based on these dimensions.

Another example is the user's question "Recommend a long dress worth more than 200 yuan to me." We will first search for Taobao products, and then let the model make recommendations based on the searched products.

However, in some cases, such as "recommend a dress", we cannot judge whether we need to first tell the user "how to choose a dress", or the question "dress", and we cannot judge the user's true intention. Under such vague intentions, large models are needed to learn the correct decision-making path to solve user problems in the model through massive user behavior data, instead of fixing a clear set of tool calls and paths like under fixed intentions.

✪ 2.2.2 Technical challenges

  1. Sometimes users often don’t know their true intentions when shopping. How can the model learn statistical decisions based on fuzzy intentions in the e-commerce field based on massive user behavior data.

✪ 2.2.3 Technical requirements

  1. Through technologies such as RLHF and CT, a tool learning method under fuzzy intentions is established to allow large models to understand the fuzzy intentions of real users.

2.3 Multi-target professional e-commerce RLHF

✪ 2.3.1 Technical background

After the pre-trained large model was fine-tuned with instructions, the large model initially showed the ability to solve problems. However, the large model at this time is not completely aligned with human needs and values, so reinforcement learning based on artificial feedback (RLHF) needs to be aligned with people. In RLHF, there are mainly two steps:

  • Training reward model;

  • Use the reward model to judge the quality of the model output results in the current state and update the model parameters accordingly.

When constructing the data for training the reward model, multiple responses to the same question need to be sorted manually. Currently, the most commonly used method is to make a [chosen]/[rejected] judgment on the two responses, so that it can be sorted based on the same manpower. Increase the coverage and diversity of questions. In order not to rely too much on manpower, some researchers use AI instead for H in RLHF, that is, RLAIF. Currently, the most commonly used AI is GPT4.

Whether it is H or AI, when a problem is in the general field, it is relatively easy to judge, but when faced with professional problems in the e-commerce vertical field, this sorting becomes not so easy.

For example:

  • "How about Sony's 70200 gm second generation?" Students who are not in the "3C digital" field may not have heard of this product, so they cannot sort the two different results;

  • "How to choose a lipstick shade?" Students who are not in the field of "beauty" cannot make correct judgments on different answers.

At the same time, in addition to "3C digital" and "beauty", the e-commerce vertical category also has many professional fields, such as "clothing", "sports and outdoor", "maternal and infant", "health", etc. Specialized domains pose challenges in building training data for reward models. E-commerce RLHF needs to be overall optimized in terms of professionalism, accuracy, comprehensiveness and depth.

✪ 2.3.2 Technical challenges

  1. Automatically build millions of reward model training data in different professional fields of e-commerce;

  2. RLHF results in the e-commerce domain need to be optimized for more goals, such as professionalism, accuracy, comprehensiveness, and depth.

✪ 2.3.3 Technical demands

  1. Data level: a methodology for constructing training data for reward models in e-commerce vertical professional fields;

  2. Model level: RLHF method for multi-objective optimization.

2.4 E-commerce Query Understanding Based on Generative Large Model

✪ 2.4.1 Technical background

In an e-commerce search, the user enters a query, and the system returns appropriate products by understanding the query. Generally speaking, there are natural differences between the query entered by the user and the product title entered by the merchant. The query is relatively short and has smooth sentences, while the title is long and filled with keywords. We need to rewrite the query to solve the problem between the query and the title. There is a semantic gap problem between the two queries. It is expected that rewriting the query can bring incremental products compared to the original query.

Conventional query rewriting methods include collaborative filtering based on behavioral data and similarity discrimination methods based on semantic data. There are problems such as limited ability to understand natural language queries and difficulty in matching long-tail queries to appropriate results. Generative rewriting based on large models can use the knowledge in the large model to enhance the understanding of the query and directly generate appropriate rewritten queries on long-tail queries.

✪ 2.4.2 Technical challenges

  1. In usual large model applications, the system returns the knowledge contained in the large model to the user as a reply to the question. However, in understanding the e-commerce query, it is not only necessary to use this knowledge to generate a rewritten query related to the query, but also to rewrite the query so that it can be used in the search. Bring incremental goods into the system;

  2. The queries entered by users in the e-commerce search system are complex and changeable. There are also challenges in how to establish a complete understanding of user needs through multiple queries entered by users.

✪ 2.4.3 Technical demands

  1. In the e-commerce scenario, we establish query rewriting technology that deeply understands query in the search system and can bring incremental products;

  2. By modeling user behavior, we deepen a complete understanding of user needs and establish personalized query rewriting technology based on query context.

2.5 Cognitive recommendation based on common sense knowledge of large models

✪ 2.5.1 Technical background

In the e-commerce recommendation scenario, we hope to use Alibaba's self-developed large models to develop the next generation of new paradigms for user interest recommendation based on large models to meet the diverse shopping needs of users.

Traditional recommendation algorithms rely heavily on user behavior and find similar products or people through behavior. This method is very efficient, but it is easy to get repetitive. Forced promotion of short-term efficiency can easily lead to poor overall discoverability. More and more users feel "Why do you keep recommending this product to me after I click on it?" "It's so boring to see the same thing every day." Therefore, certain means are needed to break the duplication and give users "unexpected and reasonable" results.

And this is where large models excel. We hope to introduce world knowledge and reasoning capabilities through large models, build recommendations that are consistent with human cognition, enhance the discoverability of results, and improve the existing data cycle.

✪ 2.5.2 Technical challenges

  1. Express users’ e-commerce shopping needs in a reasonable way and align the semantic space of natural language and e-commerce structures;

  2. It has SFT and RLHF capabilities for mainstream large models, taking into account both effect and performance;

  3. To make increments on top of the results of traditional recommendations, the results need to be discoverable and surprising, and have a real sense of experience for users.

✪ 2.5.3 Technical requirements

  1. Explore ultra-large-scale language models and be responsible for data construction, fine-tuning, alignment and other model optimization work related to large-scale language model applications;

  2. The application of large language models in Taobao recommendation scenarios includes logical reasoning, intelligent content understanding, product creative generation, etc.;

  3. Establish a new generation of cognitive recommendation algorithm system, combined with product innovation of interactive recommendations, to enhance user experience and the long-term value of homepage recommendations.

references

Path decisions in tool use

[01] Mialon , G. , Dessì , R. , Lomeli , M. , Nalmpantis , C. , Pasunuru , R. , Raileanu , R. , Rozière , B. , ., Schick , T. , Dwivedi-Yu , J. , Celikyilmaz , A. , Grave , E. , LeCun , Y. , & Scialom, T. (2023). Augmented Language Models: a Survey. 1–33.

[02] Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., & Wen, J.-R. (2023). A Survey on Large Language Model based Autonomous Agents.

[03] Qin, Y., Hu, S., Lin, Y., Chen, W., Ding, N., Cui, G., Zeng, Z., Huang, Y., Xiao, C., Han, C., Fung, Y. R., Su, Y., Wang, H., Qian, C., Tian, R., Zhu, K., Liang, S., Shen, X., Xu, B., … Sun, M. (2023). Tool Learning with Foundation Models. 1–75.

[04] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. 1–33.

[05] Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools.

[06] Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Tian, R., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., & Sun, M. (2023). ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. 4, 1–23.

For multi-target professional e-commerce RLHF

[01] RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback(2023)

[02] Llama 2: Open Foundation and Fine-Tuned Chat Models (2023)

[03] Training language models to follow instructions with human feedback (2022)

If you are interested in the proposition, please click to participate in this challenge. We have prepared generous rewards. You are welcome to participate! icon-default.png?t=N7T8http://Please click to participate in this challenge, we have prepared generous rewards. You are welcome to participate!

Guess you like

Origin blog.csdn.net/AlibabaTech1024/article/details/133805149