Big Language Model Six - LLM Enterprise Privatization Deployment Architecture

In the first half of 2023, the widespread use of APIs (such as OpenAI) to create large-scale language model (LLM)-based infrastructure will greatly shape the software field.
LangChain and LlamaIndex have played an important role in this trend. In the second half of 2023, the fine-tuning (or command adjustment) model threshold reduction in the operation and maintenance workflow of LLMOps has basically become the standard process in the industry. The development of this trend is mainly due to the following reasons. and other methods, Llama 2 can be fine-tuned on a single T4 card, which was unimaginable before; 2. The ability to process confidential data within the company, 3. After fine-tuning, it can be developed to exceed ChatGPT and GPT in certain specific tasks -4 and other model performance potential of the model. LLMOps mainly include:

  1. LLM fine-tuning, since the release of LLaMA, instruction fine-tuning has become more and more popular;
  2. Building an LLM framework: Libraries such as LangChain and LlamaIndex handle this, allowing you to query vector databases, improve the memory of the model or provide various tools.
  3. Optimization techniques for inference: As LLMs grow in size, it becomes increasingly important to apply optimization techniques to ensure that the model can be used efficiently for inference. Techniques include weight quantization (4-bit, 3-bit), pruning, knowledge distillation, etc.
  4. LLM Deployment: These models can be deployed locally like llama.cpp or in the cloud like Huggingface's Text Generative Inference or vLLM.

Data security is something every company has to take seriously. In order to improve productivity, reduce costs and increase efficiency, they have to accept the tools brought by new technologies. Privatization deployment is still very attractive to companies. The tool of big language model combined with the company's data can greatly improve the productivity of the company.

Questions about how to improve the performance of an LLM application when pretrained LLM does not perform as expected or desired? At present, there are roughly two approaches.
Retrieval-Augmented Generation (RAG) or model finetuning, RAG: This approach integrates retrieval (or search) capabilities into LLM text generation. It combines a retrieval system, which obtains relevant document fragments from a large corpus, and LLM, which uses the information in these fragments to generate answers. Essentially, RAG helps the model "find" external information to improve its response. LangChain and LlamaIndex belong to the RAG method.

Issues to be addressed in privatizing LLM

For end users, the privatized LLM deployment in the enterprise is the access method and the source of the access content.

  • Based on the open source large language model/self-developed large language model SFT, it will be used within the company in the form of API/APP/web plug-ins;
  • Construct a knowledge map/database based on company data within the company and public industry-related data, and the large language model refers to the constructed knowledge map/database to provide more accurate answers;

The system block diagram of enterprise privatization deployment is as follows:insert image description here

Correspondingly, the following five aspects need to be considered:

Model training and tuning: Train and tune large language models to improve their performance and accuracy.
Dataset cleaning and preprocessing: Cleaning and preprocessing the raw dataset to generate a dataset suitable for training large language models.
Model deployment and management: Deploy the trained large-scale language model to the production environment, and manage and maintain it.
Performance Optimization and Scaling: Performance optimization and scaling of large language models to improve their efficiency and scalability.
Security and privacy protection: Security and privacy protection for large language models to prevent security issues such as sensitive information disclosure and hacker attacks.

For the training of the model, there is Huggingface rtl before , and then there is Microsoft's deepspeed , and continuous improvements emerge in endlessly.
And the data is in the hands of the enterprise, and the data cleaning on the web page is not suitable for the enterprise. For enterprises, security and privacy permissions are a big deal. This article first takes a look at knowledge graphs and vector databases.
In traditional relational databases, data is usually organized in the form of tables. However, the emergence of the AI ​​era has brought with it a large amount of unstructured data, including images, audio, and text. It is inappropriate to store this data in a tabular format, and machine learning algorithms need to be used to convert this data into vector representations of "features". The emergence of vector database is to solve the storage and processing of these vectors.

The basis of vector databases is data indexing. With techniques such as inverted indexing, vector databases can efficiently perform similarity searches by grouping and indexing vector features. Vector quantization techniques help map high-dimensional vectors into low-dimensional spaces, thereby reducing storage and computation requirements. By utilizing indexing techniques, vector databases are able to efficiently search vectors using various operations such as vector addition, similarity calculation, and cluster analysis.

The current large model based on massive data poses some challenges to the database:

  • Accommodate large amounts of data: Generating AI models at scale requires large amounts of data for training to capture complex semantic and contextual information. As a result, data volumes are exploding. Vector databases, as skilled data managers, play a vital role in efficiently handling and managing such large amounts of data.
  • Enable accurate similarity search and matching: Text generated from large-scale generative AI models often requires similarity search and matching to provide precise replies, recommendations, or matching results. Traditional keyword-based search methods may fall short in terms of complex semantics and context. Vector databases shine in this field, providing high relevance and effectiveness for these tasks.
  • Support multi-modal data processing: Large-scale generation of artificial intelligence models goes beyond text data and can process multi-modal data such as images and voice. As a comprehensive system capable of storing and processing multiple data types, vector databases effectively support the storage, indexing and querying of multimodal data, enhancing their versatility.
    Some sensitive databases are already supporting this feature of vector databases.
    insert image description here
    SQLite: SQLite is a lightweight embedded database that supports storing large text, binary and multimedia data, and can be queried through SQL statements. SQLite is widely used in mobile applications, but its query performance may be affected by data volume and query complexity.

Realm: Realm is a mobile database that supports storage and management of structured and unstructured data, and provides high-performance query and data synchronization functions. Realm supports the use of large language models in mobile applications and can support large datasets through its sharding capabilities.

Realm Database: Realm Database is a cloud database launched by Realm, which supports seamless integration with Realm mobile database and provides cloud data storage and management functions. Realm Database also supports the use of large language models in mobile applications, and can support large datasets through its sharding capabilities.

Mobile databases such as SQLite, Realm, and Realm Database can all support large language models, but the specific support methods and performance may vary. When choosing a database, you need to consider factors such as data volume, query complexity, performance, and security to choose the database system that best suits your needs.

Neo4j is a Graph Database Management System (GDMS) that uses a graph model to store and manage data. Neo4j can be used to store and manage complex relational networks, such as social networks, supply chain networks, and knowledge graphs. Neo4j supports fast graph query and analysis, making it easy to discover relationships and patterns in data.

MongoDB is a Document-based Database Management System (DBMS) that uses a document model to store and manage data. MongoDB can be used to store and manage various types of data, including structured, semi-structured, and unstructured data. MongoDB is a widely used database system with strong data type support, automatic indexing, high availability, and scalability.

LangChain

LangChain is the best tool to combine vector database, vector search and LLM. The modules it supports are also developing rapidly, and are likely to be top-notch in large language applications (supporting research + production).
insert image description here

llama_index

Unleash the power of LLMs over your data。

Guess you like

Origin blog.csdn.net/shichaog/article/details/132513385