Dirty RAG

9 min readNov 20, 2024

First (mis)steps in building LLMs

I recently faced a challenge I didn’t feel equipped for. Build a production-ready LLM using Retrieval Augmented Generation (RAG). I am not an AI Engineer; my Python is passable but rudimentary, and LangChain was a framework I’d heard about once watching a YouTube video. There was no way I could figure this out (spoiler alert, I did).

Shoutout to Boran, who gave me an hour of his time and opened my eyes to what was possible.

The challenge

Using RAG and vector store techniques, build an AI Customer Service agent that responds to customer emails based on stock levels, processing orders and responding to information requests, matching customer tone and updating stock levels. Must be scalable. Simple, right? RIGHT??

Terms I’d never heard or was only vaguely familiar with before starting this project:

Retrieval Augmented Generation (or RAG)
Vector Store
LangChain

Why does this matter?

Time elapsed: 7 days

I used OpenAI’s gpt-4 and gpt-4o models to complete this project

Here’s how I did it:

Step 1: Prep the environment and establish communication with Open AI

%pip install openai

from openai import OpenAI
import os
import getpass


os.environ['OPENAI_API_KEY'] = getpass.getpass('Paste API key:')
client = OpenAI(
    api_key=os.environ['OPENAI_API_KEY']
)

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)
OpenAI()
print(completion.choices[0].message)

Step 2. Still prepping — import the necessary libraries (language model processing and vector storage)

%pip install langchain
%pip install -U langchain langchain-openai langchain-community
%pip install faiss-cpu

# Import LangChain components for creating and managing LLM-based chains and embeddings
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain_openai import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
import pandas as pd
from langchain.document_loaders import DataFrameLoader
from langchain.schema import Document
from langchain_core.output_parsers import StrOutputParser
import json
import ast

Step 3. Create a vector store to hold the product catalog.

As this was just a pet project/prototype, I used FAISS, an in-memory vector store. To scale this I would switch to a persisted backend like Chroma or Pinecone (wow, it actually sounds like I know what I’m talking about here)

Note: I had initially loaded the product catalog into a dataframe (products_df) with the below attributes:

product_id
name
description
stock
seasons
price

# Load the product DataFrame into LangChain's document format
# - Using "name" as the main content column to create embeddings based on product names
product_loader = DataFrameLoader(products_df, page_content_column="name")
product_documents = product_loader.load() # Load data into document format for embedding

# Generate embeddings for product documents and initialize FAISS vector store for similarity search
# Use OpenAI's embedding model to create product embeddings
embeddings = OpenAIEmbeddings()
product_vectors = FAISS.from_documents(product_documents, embeddings)

Step 4: The easy part — writing the prompts

I already knew how to write effective prompts (we’ve all been doing this for 2 years already). What was interesting was learning how you can give the prompt specific inputs…and chaining. OMG, chaining. I’ll get to that in Step 6.

I wrote 4 prompts. The first thing I wanted the AI to do was identify whether the email was placing an order or just looking for information. There’s then 3 more prompts to respond to an order email, process the order and finally to respond to inquiries.

# Create prompt templates for classification and response generation
# Each prompt template is tailored for a specific task: classification, order processing, order extraction, and product inquiry.
# Prompts include detailed instructions to guide the assistant's responses, ensuring accuracy, professionalism, and alignment with the tone of customer interactions.


classification_prompt = PromptTemplate(
    input_variables=["email"],
    template="""
    You are an expert customer service assistant.
    Classify the following email into one of these categories:
    - "product inquiry" if the email is asking for more information about a product or any time the email does not specify a product name or quantity. Use this exact phrase when classifying.
    - "order request" if the email is requesting to buy or order a product. Use this exact phrase when classifying.
    Read both the subject and the message of the email to make the classification. Not every email with a product id is an order request.
    Return only the category and no other text. Note that some of the emails are not in English. Identify the language of the email and translate it to English before classifying.

    Email: {email}

    """
)


order_prompt = PromptTemplate(
    input_variables=["product_info", "email"],
    template="""

    You are a helpful, professional and friendly assistant that responds to customer emails.
    Respond to the order request based on product info. Check 'stock'. Check if there is enough stock to fulfill the order. If there is enough stock, fill the order.
    If there is not enough stock, suggest similar alternatives or suggest the customer waits for a restock. If there is some but not enough stock,
    fill the order and suggest similar alternatives or suggest the customer waits for a restock for the remaining items. Give examples of similar items from the product list.
    Only use products from the product list. Do not make up any products. Always double-check the stock numbers to ensure you are not making a mistake.
    Think about each step before you take it. If you do not understand an order, say you don't understand and ask for clarification.
    If no products are specified, respond precisely with "I am unable to process this order. Please provide more details".
    Never start an email with "I'm sorry..."
    Assume we can place the order once product and quantity is verified. Shipping details can be collected in a later step.
    Please note that some of the emails are not in English. Identify the language of the email and respond in the same language.
    Make sure all responses are properly formatted and match the tone of the order email.


    Product Info: {product_info}
    Email: {email}

    """
)

order_extraction_prompt = PromptTemplate(
    input_variables=["product_info", "email"],
    template="""
    You are an expert customer service assistant.
    Extract each product name and its requested quantity from the following email.
    If a quantity is not specified, assume a quantity of 1.
    Only return a string in the following format:
    {{'product_id': 'CLF2109', 'product_name': 'Cable Knit Beanies', 'quantity': 2}}

    Do not return any other text.

    Email: {email}

    """
)


inquiry_prompt = PromptTemplate(
    input_variables=["product_info", "email"],
    template="""

    You are a helpful, professional and friendly assistant that responds to customer emails.
    Answer the email based on the product info below.  If it is a product inquiry, answer the question.
    If you can't answer the question, ask for more information. Do not make anything up.
    Please note that some of the emails are not in English. Identify the language of the email and respond in the same language.
    Make sure all responses are properly formatted and match the tone of the inquiry.


    Email: {email}

    """
)

Step 5: Choose your model(s)

This is where I got excited for the first time. At first, it seemed quite straightforward. You choose the most advanced model available and that’s a static thing that you use everywhere. What actually transpired is that the model I chose (gpt-4o) could not handle ambiguity very well. So an email where the inputs were missing, incomplete or vague like “I want to buy your best-selling item” sent it into meltdown. Gpt-4 handled this much better. On the other hand gpt-4o is very good at general formatting and creating structured outputs, and you guessed it, gpt-4 is rubbish at that. So after half a day of frustration trying and tweaking each of these models and dining on misery, I had a breakthrough with the realization that I could use them both! With CHAINING! And a little logic.

# Initialize the language models (LLMs) for processing responses
# - Using OpenAI's gpt-4o for initial response generation and email classification tasks.
# - Using OpenAI's gpt-4 for order processing tasks where handling missing or incomplete inputs is crucial.
#
# Rationale:
# Through experimentation, I observed that gpt-4o struggles with scenarios where expected inputs are partially missing or incomplete.
# In contrast, gpt-4 demonstrates better robustness and adaptability, making it more suitable for handling order processing.
# Temperature is set to 0 for both models to ensure deterministic, consistent responses.

mdl = ChatOpenAI(model_name='gpt-4o', temperature=0)  # Model for classification and general response generation
mdl4 = ChatOpenAI(model_name='gpt-4', temperature=0)  # Model for order processing

Step 6: CHAINING!

Did I mention that chaining got me really excited? So there’s this beautiful thing called the LCEL (LangChain Expression Language) and it allows you to create these really elegant chains. Each “link” in the chain is completely modular so you can swap them in and out as you like. You only then need to invoke the chain using chain.invoke().You can even chain multiple chains together and the power of this for process automation….

Let’s just say I’m going to have a lot of fun with chains.

In this case I created four chains (one for each prompt) and plugged in the gpt-4 model to handle and respond to order requests while using gpt-4o for the other chains.

# Set up processing chains with LangChain to manage classification, order handling, and response generation.
# Each chain is tailored to specific tasks, leveraging the strengths of different models:
# - gpt-4o is used for general formatting and extraction tasks, as it produces well-structured outputs for classifications and inquiries.
# - gpt-4 is used for order processing tasks due to its superior handling of missing or incomplete data.
#
# A parser (StrOutputParser) is applied to each chain to standardize the output for consistent downstream handling.

parser = StrOutputParser()  # Ensures that chain outputs are returned in a uniform string format

# Chains for each specific task:
classification_chain = classification_prompt | mdl | parser  # Classifies email intent as "order request" or "product inquiry"
order_chain = order_prompt | mdl4 | parser  # Processes order requests, handling complex data dependencies more effectively
order_extraction_chain = order_extraction_prompt | mdl | parser  # Extracts product details and quantities from the email content
inquiry_chain = inquiry_prompt | mdl | parser  # Handles product inquiries, generating informative responses based on relevant product data

Finally, having set everything up, it was time to actually write the code to process emails. I first invoked the classification chain to determine whether I was dealing with an order or an information request.

For orders, I used the order extraction chain to extract the product and quantities requested and then performed a similarity search on the product vector store, checked availability and then either create the order or mark it as out of stock. The order chain would then be invoked to respond to the customer email.

Information requests were much simpler as the AI simply checked stock (again using a similarity search) and generated a tailored response.

email_id = row['email_id']  # Extract the unique email ID
    email_message = row['message']  # Extract the email content for processing
    # Classify the email as either an "order request" or "product inquiry" using the classification chain
    classification = classification_chain.invoke({"email": email_message})
    print(f"Email ID: {email_id}, Classification: {classification}")
    # Append classification result to the classification_results list
    classification_results.append({"email ID": email_id, "category": classification})
    products_ordered = []
    # Process order requests
    if classification == "order request":
        product_entries = order_extraction_chain.invoke({"email": email_message}).split('\n')
        for entry in product_entries:
          products_ordered.append(ast.literal_eval(entry))
          products_ordered_df = pd.DataFrame(products_ordered)
          product_name = str(products_ordered_df['product_name'].iloc[-1])
          quantity_requested = products_ordered_df['quantity'].iloc[-1]
          #print(product_name, quantity_requested)

          # Use vector search to retrieve products that are relevant to the email content
          relevant_products = product_vectors.similarity_search(product_name,k=1)[0]
          product_id = relevant_products.metadata["product_id"]

          # Check the stock level of the identified product in the products DataFrame
          stock = products_df.loc[products_df['product_id'] == product_id, 'stock'].values[0]

          # Determine order status based on stock availability
          if stock >= quantity_requested:
              status = "created"  # Order can be fulfilled
              # Decrement stock to reflect the new level after order fulfillment
              products_df.loc[products_df['product_id'] == product_id, 'stock'] -= quantity_requested
          else:
              status = "out of stock"  # Insufficient stock to fulfill the order

          # Append order status to order_status_results list for each product in the order
          order_status_results.append({
              "email ID": email_id,
              "product ID": product_id,
              "quantity": quantity_requested,
              "status": status
          })

          # Generate a response to the order request based on order status
          try:
            response =  order_chain.invoke({
              "email": email_message,
              "product_info": relevant_products.page_content
            })
          except Exception as e:
            response = str("I'm unable to process and respond to this email at this time. Please try again later.")
          # Append the response to the order_response_results list
          order_response_results.append({"email ID": email_id, "response": response})
          print(f"Response: {response}")

    # Process product inquiries
    elif classification == "product inquiry":
        # Retrieve relevant products using vector search to provide accurate response content
        relevant_products = product_vectors.similarity_search(email_message,k=2)
        # Generate a response to the inquiry based on the most relevant product found
        try:
          response =  inquiry_chain.invoke({
            "email": email_message,
            "product_info": relevant_products[0].page_content  # Provide info on the most relevant product
          })
        except Exception as e:
          response = str("I'm unable to process and respond to this email at this time. Please try again later.")

        # Append the inquiry response to inquiry_response_results list
        inquiry_response_results.append({"email ID": email_id, "response": response})
        print(f"Response: {response}")

This was a challenging (at times extremely frustrating) but fun and ultimately fulfilling challenge. I can’t wait to get into the bones of building more LLM applications that solve real problems.

Github link