How to build a chatbot out of your website content :: Päpper's Machine Learning Blog — This blog features state of the art applications in machine learning with a lot of PyTorch samples and deep learning code. You will learn about neural network optimization and potential insights for artificial intelligence for example in the medical domain.

In a previous blog entry, we used langchain to make a Q&A bot out of the content of your website.

The Github repository which contains the code of the previous as well as this blog entry can be found here.

It was trending on Hacker news on March 22nd and you can check out the disccussion here.

This blog posts builds on the previous entry and makes a chatbot which you can interactively ask questions similar to how ChatGPT works.

We already created the relevant document embeddings of our website content and saved it in a file called faiss_store.pkl, so we’ll assume that we already have that one.

Framing our chatbot

Given a question from the user, we use the previous conversation and that question to make up a standalone question.

This is necessary, so the previous context is taken into account.

To do so, we use the CONDENSE_QUESTION_PROMPT template below.

In addition, we need have a template which we can use to prime the chatbot for the topics it should answer to, so in my case I primed it to answer to machine learning and technical questions in its template:

from langchain.prompts.prompt import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import ChatVectorDBChain

_template = """Given the following conversation and a follow up question,
rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

template = """You are an AI assistant for answering questions about machine learning
and technical blog posts. You are given the following extracted parts of 
a long document and a question. Provide a conversational answer.
If you don't know the answer, just say "Hmm, I'm not sure.".
Don't try to make up an answer. If the question is not about
machine learning or technical topics, politely inform them that you are tuned
to only answer questions about machine learning and technical topics.
Question: {question}
=========
{context}
=========
Answer in Markdown:"""
QA = PromptTemplate(template=template, input_variables=["question", "context"])


def get_chain(vectorstore):
    llm = OpenAI(temperature=0)
    qa_chain = ChatVectorDBChain.from_llm(
        llm,
        vectorstore,
        qa_prompt=QA,
        condense_question_prompt=CONDENSE_QUESTION_PROMPT,
    )
    return qa_chain

Running the chatbot

Now we can write a small app wich uses our templates and our previously generated embeddings (faiss_store.pkl):

import pickle

if __name__ == "__main__":
    with open("faiss_store.pkl", "rb") as f:
        vectorstore = pickle.load(f)
    qa_chain = get_chain(vectorstore)
    chat_history = []
    print("Chat with the Paepper.com bot:")
    while True:
        print("Your question:")
        question = input()
        result = qa_chain({"question": question, "chat_history": chat_history})
        chat_history.append((question, result["answer"]))
        print(f"AI: {result['answer']}")

This works by taking in the question and the previous chat history and then makes a first attempt to rephrase it to a question taking the context into account. Then it searches your FAISS store to find documents which are semantically similar to the question as they are potential candidates to answer it.

Finally, the most relevant document excerpts together with the question are sent to the OpenAI API to retrieve the answer.

For more details about the approach and the full source code, check out my previous blog post as well as this Github repository.

How to build a chatbot out of your website content

This article covers:

Framing our chatbot

Running the chatbot

I help you listen through the noise in machine learning: