AI search of private data using LangChain and Elasticsearch

All the code for this blog post can be downloaded at: GitHub - liu-xiao-guo/python-vector-private

In this blog post I will delve into the deep waters of artificial intelligence and vector embeddings. ChatGPT is eye-opening, but there’s one major problem. This is a closed hosting system. After two decades of living in a world transformed by large dot-com companies, we as people worry about our private information and even our knowledge becoming the property of others simply because we use the Internet. As participants in an economy built on competition, we have a strong distrust of knowledge and data being concentrated in the hands of companies with a history of anti-competitive behavior.

So the question at hand is: Can I get a local large language model and run generative AI chat on my laptop without using a cloud service? This article will show how to deploy a large model locally and use Elasticsearch for vector search.

What am I going to build?

In short: I will build an AI chatbot that "knows" things that its pre-trained neural network doesn't by combining LLM with vector storage.

I'll start with a regular coaching project that can best be described as "talking to my books." I didn't invent this, a quick search revealed some other methods that generally use the paid OpenAI API. Example , video example

I'll start with the first piece of private data I use to search. A book that can be downloaded directly from the Internet . I converted it into a simple .txt file and saved it in the data/sample.txt file in the project.

how this will work

Key steps :

The text is extracted into small paragraph-sized chunks, enough to hold all the answers to the question questions about the book.
Each text chunk is passed to the sentence converter, which generates a dense vector representing its semantic meaning in the form of a vector embedding
These chunks and their vector embeddings are stored in Elasticsearch
When a question is asked, a vector is created for the question text and Elasticsearch is queried to find the chunk of text that is semantically closest to the question; presumably some text will have the answer.
Prompts are written for LLM to "fill" the retrieved text blocks as additional contextual knowledge.
LLM creates answers to questions

The reasons why you need to divide the text into chunks of different sizes are as follows:

In this example, I will use the sentence-transformers/all-mpnet-base-v2 model. According to the model description, the maximum token length of this model is 384:

In other words, we need to divide the book into chunks of no more than 384.

Langchain makes everything easy

There is a great Python library called Langchain that not only contains utility libraries that make using transformers and vector stores simple, but is somewhat interchangeable. Langchain has a more advanced mode for working with LLM than what I'm using here, but this is a good first test.

Many Langchain tutorials require a paid OpenAI account. OpenAI is great and probably leads the LLM quality race right now, but for all the reasons mentioned above I'll use the free HuggingFace model and Elasticsearch.

Get some offline models

You must create an account in Huggingface and obtain an API key . Afterwards, you can programmatically pull down the model using the Huggingface_hub Python or Langchain library. In the terminal running the code, you must first type the following command:

export HUGGINGFACEHUB_API_TOKENs="YOUR TOKEN"

We use the following model in our example:

Sentence-transformers/all-mpnet-base-v2 - as my vector embedding generator
google/flan-t5-large - as my large language model for conversational interaction

Install

If you have not installed your own Elasticsearch and Kibana, please refer to the following link:

During installation, we choose the installation guide of Elastic Stack 9.x for installation. By default, access to the Elasticsearch cluster has HTTPS secure access.

Generate certificate

In order to enable python applications to access Elasticsearch normally, we use the following command to generate a pem certificate:

$ pwd
/Users/liuxg/elastic/elasticsearch-8.10.0
$ ./bin/elasticsearch-keystore list
keystore.seed
xpack.security.http.ssl.keystore.secure_password
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
$ ./bin/elasticsearch-keystore show xpack.security.http.ssl.keystore.secure_password
GcOUL8b2RxKooxJU-VymFg
$ openssl pkcs12 -in ./config/certs/http.p12 -cacerts -out ./python_es_client.pem
Enter Import Password:
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
$ ls
LICENSE.txt          bin                  jdk.app              modules
NOTICE.txt           config               lib                  plugins
README.asciidoc      data                 logs                 python_es_client.pem

We copy the python_es_client.pem file generated above to the root directory of the application. The directory structure of the entire application is as follows:

$ tree -L 3
.
├── README.md
├── app-book.py
├── data
│   └── sample.txt
├── lib_book_parse.py
├── lib_embeddings.py
├── lib_llm.py
├── lib_vectordb.py
├── python_es_client.pem
├── requirements.txt
└── simple.cfg

Configuration items

As shown above, we have a configuration file called simple.cfg:

simple.cfg

ES_SERVER: "localhost" 
ES_PASSWORD: "vXDWYtL*my3vnKY9zCfL"
ES_FINGERPRINT: "e2c1512f617f432ddf242075d3af5177b28f6497fecaaa0eea11429369bb7b00"

We need to configure ES_SERVER according to the address of our Elasticsearch server. We also need to configure the password of the elastic super user. This password can be obtained when installing Elasticsearch. Of course you can also use other users' information to practice. If so, you need to make corresponding configuration and code changes.

You can also get the fingerprint configuration in Kibana’s configuration file confgi/kibana.yml:

Run the project

Before running the project, you need to do the installation:

python3 -m venv env
source env/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt

Create an embedded model

lib_embedding.py

## for embeddings
from langchain.embeddings import HuggingFaceEmbeddings

def setup_embeddings():
    # Huggingface embedding setup
    print(">> Prep. Huggingface embedding setup")
    model_name = "sentence-transformers/all-mpnet-base-v2"
    return HuggingFaceEmbeddings(model_name=model_name)

Create vector storage

lib_vectordb.py

import os
from config import Config

## for vector store
from langchain.vectorstores import ElasticVectorSearch

def setup_vectordb(hf,index_name):
    # Elasticsearch URL setup
    print(">> Prep. Elasticsearch config setup")

    
    with open('simple.cfg') as f:
        cfg = Config(f)
    
    endpoint = cfg['ES_SERVER']
    username = "elastic"
    password = cfg['ES_PASSWORD']
    
    ssl_verify = {
        "verify_certs": True,
        "basic_auth": (username, password),
        "ca_certs": "./python_es_client.pem",
    }

    url = f"https://{username}:{password}@{endpoint}:9200"

    return ElasticVectorSearch( embedding = hf, 
                                elasticsearch_url = url, 
                                index_name = index_name, 
                                ssl_verify = ssl_verify), url

Create offline LLM using prompt template with context and question variables

lib_llm.py

## for conversation LLM
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM


def make_the_llm():
    # Get Offline flan-t5-large ready to go, in CPU mode
    print(">> Prep. Get Offline flan-t5-large ready to go, in CPU mode")
    model_id = 'google/flan-t5-large'# go for a smaller model if you dont have the VRAM
    tokenizer = AutoTokenizer.from_pretrained(model_id) 
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id) #load_in_8bit=True, device_map='auto'
    pipe = pipeline(
        "text2text-generation",
        model=model, 
        tokenizer=tokenizer, 
        max_length=100
    )
    local_llm = HuggingFacePipeline(pipeline=pipe)
    # template_informed = """
    # I know the following: {context}
    # Question: {question}
    # Answer: """

    template_informed = """
    I know: {context}
    when asked: {question}
    my response is: """

    prompt_informed = PromptTemplate(template=template_informed, input_variables=["context", "question"])

    return LLMChain(prompt=prompt_informed, llm=local_llm)

Load books

Below is my chunking and vector storage code. It requires the composed Elasticsearch url, huggingface embedding model, vector database and target index name prepared in Elasticsearch

lib_book_parse.py


from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader

## for vector store
from langchain.vectorstores import ElasticVectorSearch
from elasticsearch import Elasticsearch
from config import Config

with open('simple.cfg') as f:
    cfg = Config(f)

fingerprint = cfg['ES_FINGERPRINT']
endpoint = cfg['ES_SERVER']
username = "elastic"
password = cfg['ES_PASSWORD']
ssl_verify = {
    "verify_certs": True,
    "basic_auth": (username, password),
    "ca_certs": "./python_es_client.pem",
}

url = f"https://{username}:{password}@{endpoint}:9200"

def parse_book(filepath):
    loader = TextLoader(filepath)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=384, chunk_overlap=0)
    docs = text_splitter.split_documents(documents)
    return docs

def loadBookBig(filepath, url, hf, db, index_name):    
    
    es = Elasticsearch( [ url ], 
                       basic_auth = ("elastic", cfg['ES_PASSWORD']), 
                       ssl_assert_fingerprint = fingerprint, 
                       http_compress = True  )
    
    ## Parse the book if necessary
    if not es.indices.exists(index=index_name):
        print(f'\tThe index: {index_name} does not exist')
        print(">> 1. Chunk up the Source document")
        
        docs = parse_book(filepath)
        
        # print(docs)

        print(">> 2. Index the chunks into Elasticsearch")
        
        elastic_vector_search= ElasticVectorSearch.from_documents( docs,
                                embedding = hf, 
                                elasticsearch_url = url, 
                                index_name = index_name, 
                                ssl_verify = ssl_verify)   
    else:
        print("\tLooks like the book is already loaded, let's move on")

Tie everything together with a question loop

After parsing this book, the main control loop looks like this

# ## how to ask a question
def ask_a_question(question):
    # print("The Question at hand: "+question)

    ## 3. get the relevant chunk from Elasticsearch for a question
    # print(">> 3. get the relevant chunk from Elasticsearch for a question")
    similar_docs = db.similarity_search(question)
    print(f'The most relevant passage: \n\t{similar_docs[0].page_content}')

    ## 4. Ask Local LLM context informed prompt
    # print(">> 4. Asking The Book ... and its response is: ")
    informed_context= similar_docs[0].page_content
    response = llm_chain_informed.run(context=informed_context,question=question)
    return response


# # The conversational loop

print(f'I am the book, "{bookName}", ask me any question: ')

while True:
    command = input("User Question>> ")
    response = ask_a_question(command)
    print(f"\n\n I think the answer is : {response}\n")

operation result

We can run the application with the following command:

python3 app-book.py

The question above is:

when was it?Although it was not yet late, the sky was dark when I turned into Laundress Passage.

Let's try another problem:

The question above is:

what will I send to meet you from the half past four arrival at Harrogate Station?

The question above is:

what do I make all the same and put a cup next to him on the desk?

The question above is:

How long did I sit on the stairs after reading the letter?

Hooray! We have completed the Q&A system. It perfectly answers what's in my book. Don’t you think it’s amazing :)