1. Description

Creating an LLM application with the help of LangChain can help us link everything easily. LangChain is an innovative framework that is revolutionizing the way we develop applications powered by language models. By combining advanced principles, LangChain is redefining the limits of what can be achieved through traditional APIs.

In the previous blog, we discussed in detail the modules present in LangChain, which were modified.

Actual implementation of LangChain to build custom data bots involves merging memory, prompting templates and chains, and creating web-based applications.

Chinmai Balelau

2. Let's start with the import

Import the LangChain and OpenAI for LLM sections. If you don't have any of these, install it.

#    IMPORTS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
from PyPDF2 import PdfReader
from langchain import OpenAI, VectorDBQA
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain

from langchain.document_loaders import TextLoader
# from langchain import ConversationalRetrievalChain
from langchain.chains.question_answering import load_qa_chain
from langchain import LLMChain
# from langchain import retrievers
import langchain
from langchain.chains.conversation.memory import ConversationBufferMemory

py2PDF is for reading and manipulating PDFs. In addition, there are different types of memory, which have specific functions to perform. I'm writing the next blog in this series, dedicated to memory, so I'll detail everything there.ConversationBufferMemory, ConversationBufferWindowMemory

3. Let's set up the environment.

I think you know how to get OpenAI API key. But just in case,

Go to the OpenAI API page,
Click on create new key
This will be your API key. paste below

import os
os.environ["OPENAI_API_KEY"] = "sk-YOUR API KEY"

Which model to use? Da Vinci, Babbage, Curie or Ada? Based on GPT 3? Based on GPT 3.5 or GPT 4? There are many questions about models, all suitable for different tasks. Few are cheaper, and few are more accurate. We'll also cover all the models in detail in the fourth blog in this series.

For simplicity, we will use the cheapest model " gpt-3.5-turbo ". Temperature is a parameter that gives us an idea of how random the answer is. The larger the temperature value, the more random answers we get.

llm = ChatOpenAI(temperature=0,model_name="gpt-3.5-turbo")

You can add your own data here. You can add any format like PDF, text, document, CSV. Depending on your data format, you can comment/uncomment the following code.

# Custom data
from langchain.document_loaders import DirectoryLoader
pdf_loader = PdfReader(r'Your PDF location')

# excel_loader = DirectoryLoader('./Reports/', glob="**/*.txt")
# word_loader = DirectoryLoader('./Reports/', glob="**/*.docx")

We cannot add all data at once. We split the data into chunks and send it to create an embedding of the data. If you don't know what embedding is, then

Embeddings capture the essence and contextual information of model operations and generated tokens in the form of numeric vectors or arrays. These embeddings are derived from the model's parameters, or weights, and are used to encode and decode input and output text.

This is how the embed is created. I took these screenshots from CODEBASIC , which is a good channel to learn LLM, [source: here ]

simply put,

Embedding LLM is a way to represent text as a vector of numbers. This allows language models to understand the meaning of words and phrases, and perform tasks such as text classification, summarization, and translation. In layman's terms, an embedding is a way of turning words into numbers. This is done by training machine learning models on large text corpora. The model learns to associate each word with a unique vector of numbers. This vector represents the meaning of a word, and its relationship to other words.

Source: Official Language Chain Blog

Let's do the exact same thing as represented in the image above.

#Preprocessing of file

raw_text = ''
for i, page in enumerate(pdf_loader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

# print(raw_text[:100])


text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In fact, when a user initiates a query, a search is made in the vector store and the most suitable index is retrieved and passed to the LLM. LLM then reformulates the content found in the index to provide a formatted response to the user.
I recommend digging further into the concepts of vector storage and embeddings to enhance your understanding.

embeddings = OpenAIEmbeddings()
# vectorstore = Chroma.from_documents(documents, embeddings)
vectorstore = FAISS.from_texts(texts, embeddings)

Embeddings are stored directly in the vector database. There are many vector databases that work for us such as Pinecone, FAISS, etc. Let's use FAISS here.

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say GTGTGTGTGTGTGTGTGTG, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:"""
QA_PROMPT = PromptTemplate(
    template=prompt_template, input_variables=['context',"question"]
)

You can use your own hints to refine queries and answers. With the hint written, let's link it to the final chain.

Let's call the last chain, which will include everything we chained before. We use ConversationalRetrievalChain here . This helps us to have conversations with robots like humans. It remembers previous chat conversations.

qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.8), vectorstore.as_retriever(),qa_prompt=QA_PROMPT)

We will use simple Gradio to create the web application. You can use streamer or any frontend technology. Also, there are many free deployment options available, such as deploying on hug face or localhost, which we can do later.

# Front end web app
import gradio as gr
with gr.Blocks() as demo:
    gr.Markdown("## Grounding DINO ChatBot")
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")
    chat_history = []

  def user(user_message, history)
        print("Type of use msg:",type(user_message))
        # Get response from QA chain
        response = qa({"question": user_message, "chat_history": history})
        # Append user message and response to chat history
        history.append((user_message, response["answer"]))
        print(history)
        return gr.update(value=""), history
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False)
    clear.click(lambda: None, None, chatbot, queue=False)
    ############################################

if __name__ == "__main__":
    demo.launch(debug=True)

This code will launch a local link to the web app where you can ask a question and see a response. Also in the IDE, you'll see the chat history being maintained.

A snapshot of LangChain [Image credit: Author]

That's enough for today. This is a simple introduction to linking the different modules and using them to start the final chain. You can do a lot by twisting different modules and code. I would say that play is the highest form of research !!

In the next blog, I will introduce memory and models in LangChain. How models are chosen, how memory contributes, and more...so stay tuned and get in touch with me with any suggestions or questions.

4. If you find this article insightful

It turns out that " generosity makes you a happier person "; so if you liked this article, please give it a round of applause. If you found this article insightful, please follow me on LinkedIn and in the media . You can also subscribe to be notified when I publish articles. Let's create a community! thank you for your support!

【LangChain Concept】Understanding Language Chains️: Part 2

1. Description

2. Let's start with the import

3. Let's set up the environment.

4. If you find this article insightful

Guess you like