Use intelligent search and large models to build the next generation of enterprise knowledge base-manufacturing/finance/education/medical industry actual combat scenarios...

9b078e199a449c047a51a528c24674a7.gif

Thank you for reading the series of blogs "Building the Next Generation Knowledge Base of Enterprises Based on Intelligent Search and Large Models". The whole series is divided into 5 articles. It will systematically introduce new technologies such as how large language models can empower traditional knowledge base scenarios and help the industry Customer cost reduction and efficiency increase. The update directory is as follows:

▪The first article "Introduction to Typical Practical Scenarios and Core Components"

▪The second "hands-on rapid deployment guide"

▪Part III "Langchain Integration and Its Application in E-commerce"

▪ Part Four "Practical Scenarios in Manufacturing/Finance/Education/Medical and Other Industries" (this part)

▪ Part 5 "Integrating with Amazon Kendra" (coming soon)

background

In this series of blogs "Building the Next Generation Knowledge Base of Enterprises Based on Intelligent Search and Large Models", the first three articles have introduced the core components, quick deployment guide, LangChain integration and its application scenarios in e-commerce. Continue to dive into specific industries to introduce scenarios and implementation cases:

▪ Common scenario: Q&A based on enterprise internal knowledge base such as IT/HR information

▪ Manufacturing industry: Q&A on equipment maintenance knowledge base and after-sales customer service

▪ Financial industry: intelligent customer service and intelligent report generation

▪ Educational industry: Intelligent Q&A robots for students and schools

▪ Medical industry: information retrieval of medical papers

The program structure diagram is as follows:

881a5dedb88cff88327b6c08df634b41.png

General scenario: based on the internal knowledge base of the enterprise

e.g. Q&A for IT/HR information

In this scenario, enterprises can use IT manuals, employee manuals, sales manuals, etc. to build corporate knowledge bases. The users are all internal employees, helping employees improve the efficiency of information acquisition, thereby improving work efficiency.

Ask questions related to the employee handbook, the search engine will first obtain the relevant predictions, and then use LLM to extract, filter and summarize the predictions, and then directly give the answers to the questions.

Example 1: Query annual leave time

6fab0d7ef21d9d2a05488dd4071ef34e.png

Example 2: Query commute time

39b54c65e6289804d58ee3bc714cd08c.png

Manufacturing Industry

Industry scene

The manufacturing industry is relatively a traditional industry. Due to the accumulation of history, there are many original documents. However, because most enterprises are in the early stage of digital transformation, they cannot effectively use these documents. Therefore, its main appeal is to establish an enterprise-level knowledge base platform and use scattered documents to improve the efficiency of enterprise operations. For example, with the development of the manufacturing industry, enterprises have paid more attention to the maintenance and maintenance of equipment. The equipment maintenance knowledge base question answering system can provide real-time maintenance guidance to help operators and maintenance personnel solve various faults and technical problems. After-sales customer service is critical to providing quality customer support. Equipment failures and technical issues can have a significant impact on customers' production lines, so quick response and resolution of issues is essential.

The client chose this solution for three considerations:

1. In the manufacturing industry, many document descriptions are relatively professional knowledge, and all descriptions need to be rigorous. Therefore, the hallucination problem of the large language model will lead to unreliable content output, but more uncontrollable risks;

2. All answers must be accurate to the specific source to avoid deviations in the content generated by the large language model;

3. There is a large amount of sensitive data, including maintenance records, mechanical design drawings, etc., and large language models called by third-party APIs may have data leakage, causing violations and security risks.

Typical usage scenarios are questions and answers in the equipment maintenance knowledge base and after-sales customer service.

Industry Scenario Practice

Equipment maintenance knowledge base questions and answers and after-sales customer service

In this scenario, enterprises can use historical maintenance records, such as failure symptoms, failure causes, maintenance manuals, user manuals, etc. to build an enterprise knowledge base. The users are first-line maintenance engineers or after-sales customer service, combined with retrieval and large language models, can directly target users' failure phenomena and give specific cause analysis.

Example 1: Equipment maintenance scenario - ask why a part is rusted (Chinese scenario)

745705c1fcce6078951e1a72995f8bdd.png

Example 2: Product after-sales scenario - ask about the meaning of a certain indicator light status (English scenario)

10478d79798574804355a3671a4011b9.png

Data source: Midea Dishwasher Product Brochure

https://www.midea.com/content/dam/midea-aem/us/dishwashers/built-in-dishwashers/45-dba-dishwasher-with-extended-dry-in-stainless-steel-mdt24h3ast/Dish%20MDT24H3AST%20UM_FINAL.pdf

Financial sector

Industry scene

The financial industry is divided into banking, insurance, capital market, and multiple sub-vertical industries of payment. Based on intelligent search and large-scale knowledge base, banks can quickly and accurately answer various questions from customers, and provide personalized financial product recommendations and investment advice; Insurance institutions can empower users to quickly find insurance products that suit their needs, and understand insurance terms and claims procedures; capital market members can use it to help investors quickly obtain and understand information such as market dynamics, company financial data, and analysis reports; payment institutions An intelligent customer service system is established to help users quickly solve payment-related problems.

The client chose this solution for three considerations:

1. All descriptions in the financial industry need to be rigorous, and the data needs to be accurate. Therefore, the hallucination problem of the large language model will lead to unreliable content output, and seriously damage the corporate image and customer loss;

2. Financial institutions (such as banks and insurance) will provide relevant consulting services, and the responses involved must be accurate to the specific source, especially the relevant content of laws and regulations must be completely consistent with regulatory documents;

3. There is a large amount of sensitive data in financial data, including transactions, corporate revenue, internal assets, and personal information. Using a public large language model may inadvertently leak relevant data, causing violations and security risks.

Typical usage scenarios are intelligent customer service and intelligent report generation.

Industry Scenario Practice

Intelligent customer service

Intelligent customer service has a wide range of applications and scenarios in the financial industry, including:

▪Product and service consultation: help customers inquire and understand various products and services provided by financial institutions. Through natural language processing and machine learning technology, intelligent customer service can answer questions about financial product features, interest rates, fees, etc., and provide customers with personalized product consultation.

▪ Transaction guidance and operation support: Intelligent customer service can guide customers to perform various financial transaction operations, such as transfer, deposit, purchase of wealth management products, etc. Customers can interact with the intelligent customer service to obtain operation steps and guidance to improve the convenience and accuracy of transactions.

▪ Complaints and problem solving: Smart customer service can handle customer complaints and problems and provide corresponding solutions. Through the analysis and classification of customer problems, intelligent customer service can quickly answer common questions, and can also transfer to human customer service to deal with more complex problems, improving the efficiency of problem solving and customer satisfaction.

Example 1: Consultation on financial products

By asking questions (as in the example below) about the analysis of financial product revenue data, the search engine will search to obtain relevant corpus, and use it as the input of the large language model to summarize and summarize.

9329df59bda8a927b72d10368ff70cac.png

Example 2: Consultation on financial expertise

Some financial knowledge (such as GDR, depositary receipts, etc.) is highly professional and difficult to understand. Traditional customer service cannot quickly understand, organize and draw relevant conclusions to respond to this type of customer consultation, resulting in poor user experience. At the same time, the response to professional knowledge needs to be obtained from accurate and rigorous materials, so the source of reference materials is also an important indicator in this scenario. The use of intelligent search and large model solutions can effectively improve the effect of content summary, and at the same time list clear data sources, accurate to the sentences and paragraphs of the document.

6cb42ff97942c8f21ea231a10800a328.png

Smart Report Generation

In the financial industry, especially the capital market, whether it is a securities firm or an analyst of a secondary market institution, it is necessary to read and analyze a large amount of data and reports, and at the same time need to output various types of reports, such as line research, individual stock analysis, market analysis and outlook, investment proposal analysis, etc. They experience the following pain points:

▪ Time pressure: Analysts usually need to complete a large amount of report writing work in a short period of time to meet the needs of customers and markets for instant information. This puts time pressure on them, which can lead to compromised quality and depth of reporting.

▪ Data collation and processing: Writing reports requires analysts to collect, collate and process large amounts of market data, financial data and news information from various sources. Manually processing and organizing this data can be time-consuming, labor-intensive, and error-prone.

▪ Analyze and interpret complex data: Analysts need to understand and interpret complex financial data, financial indicators, and market trends in depth. This requires extensive research and analytical work in order to provide an accurate and comprehensive analysis and assessment.

▪ Information acquisition and update: analysts need to constantly track and acquire the latest market information, industry trends and company announcements. Acquiring and updating information can be difficult and time-consuming, especially when the sources of information are numerous and dispersed.

▪ Language and reporting style: Writing high-quality reports requires good language skills and a clear reporting style. However, analysts may face language challenges and how to convey complex financial concepts and data to readers in a concise and clear manner.

By using intelligent search and large model solutions, the cost of the above problems can be alleviated in terms of information collation and understanding and basic report generation.

The following example takes crude oil among commodities as an example. It is necessary to write a report on "risks brought about by rising crude oil":

a5904c159888bc11b881c6a2d524079e.png

6830d2217b192b1d5bd1f584db3dbc5a.png

By submitting relevant task guidelines, including (but not limited to): 1) task description; 2) the format, title and paragraphs specified in the article; 3) the subsection content and subject specified in the article. The intelligent search engine will first obtain relevant content from the loaded data, pass the content to the big language model, and ask the big language model to generate and output the content according to the guidelines. The output report can be used as the basic content and provided to the report writing and analysis team for secondary processing, thereby improving the generation efficiency.

Data Sources:

1) Finance-related financial reports;

https://data.eastmoney.com/notices/stock/600519.html

2) Public data of financial professional knowledge samples;

https://www.kwm.com/cn/zh/insights/latest-thinking/a-preliminary-study-on-the-operation-mechanism-and-issuance-of-global-depositary-receipts.html

Report generation reference data (commodities)

https://oil.chem99.com/

Education industry

Industry scene

The industry scenarios for the education field and smart education products can be explained from the following two perspectives, including the school/teacher perspective and the student/parent perspective.

Schools/teachers: This solution is based on providing innovative online education tools, such as AI Class Bot, to help schools and training institutions quickly establish online learning courses, help schools improve teaching quality and efficiency, and can also save teaching resources and costs. The burden of curriculum design and counseling, expanding teaching content and forms, and enhancing teaching innovation and competitiveness.

Students/Parents: Build an intelligent tutoring system based on this plan, generate learning content and methods suitable for each student according to their level and progress, adaptively generate questions and analysis of different difficulties and types, and achieve individualized teaching and adaptive education. At the same time, an intelligent question-and-answer system between schools and parents can be built to help parents understand their children's learning situation and needs, and provide more learning support and guidance.

In the education industry, customers choose this solution for three considerations:

1. Through this solution, the course content can be quickly and conveniently imported into the knowledge base, and the large language model can be used to form a course question-answering robot. Combined with digital human technology, it can also provide the function of multiple rounds of dialogue, making the educational process more interesting.

2. The user's positive feedback function realized by using AI/ML technology in this solution can help each student feed back the weight of search results in real time, so as to optimize their own knowledge base model in order to achieve the goal of adaptive learning.

3. Through this program, the known information of the school and the information scattered on the Internet can be unified into the knowledge base, including various unstructured and semi-structured data, so that parents can find the information they want more quickly.

Typical scenarios are Q&A robots for students and Q&A robots for schools.

Industry Scenario Practice

Example 1: Q&A robot (AI Class Bot) in student scene - AI customer service robot for learning English words

For the field of English word learning, import the relevant FAQ knowledge base in the existing English word learning process into the existing scheme. The knowledge base file contains many customer problems and solutions in the process of English word learning. Knowledge base upload function, import data into the knowledge base system.

In this example, we hope that the answer of the customer service robot must be based on the scope of the knowledge base. If it is not within the scope of the knowledge base, we must answer "this question cannot be answered based on known knowledge", that is to say, we must avoid big words Model hallucination problem. Based on this requirement, ordinary big oracle models can have certain innovations when answering user questions, that is, the model can set the temperature value to control the innovation of big language models. But even if you set a very low value, there is no guarantee that the large language model will not answer user questions innovatively.

This solution adds a judgment of confidence (evidence) to this requirement, and calculates the similarity between the answer given by the large language model and the user's question and the search results of the knowledge base. answer that question". As shown below:

3232c04bd10befb2b5eb585aacbb8136.png

Some questions are within the scope of the knowledge base, and the Q&A robot can answer them, as shown in the figure below:

702beeca04ddff9a67a5887b346b0f4f.png

Example 2: Question-answering robot (AI School Bot) in the school scene – a question-answering robot for volunteers

Parents of candidates who are facing the high school entrance examination and college entrance examination are relatively anxious. They need to know more school information in order to compare with their children's learning situation and choose a school that is more suitable for them and the major they will apply for in the future. The following is a question-and-answer scenario for asking middle school information. We only imported the data of several international schools into the knowledge base. We hope that the question-answering robot can answer questions within the scope of the knowledge base, and at the same time need to give the confidence of the answer. As shown below:

b7b4c58e6eef121e6fca35b9e34b46c9.png

When inquiring about the curriculum information of an international school, the Q&A robot will answer as follows:

408e4e07599b079a33371e3d158ffddf.png

Medical industry

Industry scene

There are a large number of documents in the medical industry, including sensitive information such as drug clinical research data, patient health data, and drug research experiment data, as well as a large number of public data sets such as genetic data and medical papers. However, as an industry with a long history, many hospitals and enterprises are still in the early stages of digital transformation, and there are problems such as large amount of data, inconsistent formats, and difficulty in reading and understanding. In the digital transformation of the medical and health field, lowering the threshold for using medical data has always been an important direction in this field. Specifically:

▪ Drug R&D: Provide pharmaceutical companies with a drug design knowledge base by integrating public papers and internal documents on drug design, quickly understand pharmacological activity, site of action, toxicology, applicable pathology and other information through keywords, and help companies increase the speed of R&D iterations , Improve R&D efficiency, reduce R&D costs and improve the overall success rate of the project.

▪ Medical knowledge base: Integrate data sources such as FAQ consultation data, drug instructions, patient medical records, medical guidelines, medical books, medical papers, professional websites, expert input data, etc., to build private knowledge of "disease-symptoms-drugs-diagnosis-crowd" database and a knowledge base-based virtual assistant for medicine and health intelligent experts.

A typical scenario is information retrieval of medical papers.

Industry Scenario Practice

Medical Papers Information Retrieval

In this scenario demonstration, we selected the most commonly used NCBI data set from the public data set of Amazon Cloud Technology, and selected a sub-data set in 2023 as sample data for testing.

For the convenience of testing, we clean some papers related to blood diseases as a test data set, and use some common blood disease questions to ask questions. The platform will recall from the corresponding data set and generate corresponding content according to the prompt. Considering that medical-related papers are all in English, in this test, we use open source large models that are good at English for testing.

1a7821f2aa83d2cc7fb4cac0944f1cc9.png

Due to the variety of paper data and different information such as history, in actual use, it may be necessary to use different keywords and sentences to recall the results that are most suitable for your use scenario.

For data that is not available in the knowledge base, the platform will recall "Not found answer" or "I don't know". This is to ensure that in medical and life science scenarios, for unconfirmed information, the reply to avoid invalid data.

e7f99af406907bdc824a05186b6bf61c.png

Due to various reasons such as the huge amount of paper data, conflicts between old and new data, etc., in the actual use process, we recommend that users do one time for papers, internal scientific research data, and any data you need to use according to your actual situation and the needs of the usage scenario. Clean up in advance, for example, keep the latest data, etc. This ensures that the data is more in line with your needs when recalling.

Data source: public dataset

https://registry.opendata.aws/ncbi-pmc/

Summarize

In the daily use of large language models, two prominent problems cannot be ignored. they are, respectively:

▪ Hallucination

▪ Data Leakage

The hallucination problem is one of the basic problems in the field of natural language processing. It means that the generated results of the text generation model contain content that conflicts with the input fact, that is, the results may appear to be fictional and fabricated facts. The problem of data leakage refers to the problem that users will actively or inadvertently pass in sensitive data that may involve business secrets, personal privacy, and enterprise management during the process of using large language models on the market, resulting in data leakage. The architecture design of this solution can effectively solve the above two problems.

To sum up, based on the combination of intelligent search and large language model, GAI applications are built for different industry scenarios to achieve:

▪ Manufacturing industry: By building an intelligent enterprise knowledge base, effectively integrating various data in the manufacturing industry, and applying AI technology, it can be quickly transformed into credible and accurate knowledge resources and provided to internal employees, thereby greatly improving their access to Efficiency of professional information.

▪ Financial industry: Accurately hit a variety of scenarios in the financial industry that could only be handled manually in the past, use AI/ML technology to reduce costs and increase efficiency for financial customers, and help customers actively explore more business scenarios, while ensuring data security and reliability Accelerate the efficiency of AI/ML for business innovation under the premise of control;

▪ Education industry: Through GAI technology, we can generate learning content and methods suitable for each student according to their level and progress, and adaptively generate different difficulty and types of questions and analysis, so as to realize individualized teaching and adaptive education.

▪ Medical industry: Through generative AI technology, we can make it easier for more life science workers to obtain and extract the required clinical and scientific information from the massive knowledge information, so as to better serve each of us Protect your health.

The author of this article

94d43f7a485d0b38149047e5fd2fed1f.jpeg

Xiong Junfeng

Amazon cloud technology industry solution architect, the main areas include AI/ML, manufacturing and healthcare. Worked in Tencent for three years, responsible for the product architecture design and image processing algorithm development of the National Medical Imaging Cloud Platform. Research directions include large language models and computer vision, etc., published 13 papers in SCI and international conferences as the first author, and applied for 11 invention patents as the first inventor.

7c9f1483f6908fcb67bcdcf8202c6f09.jpeg

Chen Weiming

Amazon cloud technology financial industry solution architect, with more than 10 years of experience in foreign bank team management, development and delivery. Focus on the construction and realization of solutions in the financial industry (capital market, insurance, banking). Served as the person in charge of intelligent financial crime risk management and transaction anti-fraud platform, including machine learning, artificial intelligence, DevOps, cloud computing and cloud native architecture design and implementation.

4e3b1b928d2233eec217c03eaa2f9932.jpeg

Shi Feng

Senior Solution Architect of Amazon Cloud Technology, responsible for the design and implementation of solutions in the education industry and transportation industry. He has worked in Alibaba Cloud for six years, responsible for the technical teams of large-scale projects such as transportation, government, Olympic Games, sports media, etc. He has worked in IBM and Oracle for 10 years, and has worked in various industries for cloud computing, big data, artificial intelligence, Internet of Things, He has rich practical experience in fields such as Metaverse.

503ad08bff150c843cfd4012dc61d688.jpeg

Qian Kai

Solution Architect Manager, responsible for consulting and architecture design of Amazon Cloud Technology's cloud computing solutions, has extensive experience in cloud operation and maintenance services, HPC, etc. Before joining Amazon Cloud Technology, he worked in HP and Citrix for many years, and is familiar with traditional virtualization, virtual desktop, Microsoft Windows and other products.

75b6c3917c60a281052d6988d24d59d8.png

Zhao Anbei

Amazon cloud technology solution architect, responsible for solution consulting and design based on Amazon cloud platform, member of machine learning TFC. He has extensive practical experience in the field of data processing and modeling, with a special focus on the engineering and application of machine learning in the medical field.

b6e5ddbbe0dd9bc10d9214dcccf178bf.gif

52e3855122e4b7db9a6578a5fc37cd78.gif

I heard, click the 4 buttons below

You will not encounter bugs!

feb9aee8ca4bf040e47c41e31a5618ba.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/132033669
Recommended