A modular, latency-optimized, retrieval-augmented customer support agent built using LangGraph, LLMs, and a Chroma vector database.
The system intelligently adapts responses based on the user's persona, decides when human escalation is required, and retrieves relevant knowledge when necessary.
This project demonstrates how to build a production-style AI support agent with:
- Persona-aware responses
- Retrieval-Augmented Generation (RAG)
- Structured LLM outputs
- Pydantic-validated state
- Multi-query retrieval
- Persistent graph state using SQLite
- Latency-optimized architecture
The system detects the user's persona and adapts its response style accordingly.
Supported personas include:
- technical_expert
- frustrated_user
- business_executive
Persona classification ensures responses match the user's technical depth and tone.
The agent automatically decides whether a query should be escalated to human support.
Examples of escalation triggers:
- User frustration
- Billing or account issues
- Requests for refunds or complaints
- Complex issues beyond documentation
If escalation is required, the system routes the query to a HumanSupport node.
For knowledge-based queries, the system retrieves relevant documents and generates grounded responses.
Pipeline:
Query → Retriever → Context Documents → LLM → Final Answer
The retriever uses ChromaDB with embeddings to search documentation efficiently.
The system uses a MultiQueryRetriever to improve retrieval quality.
Instead of searching with a single query, the retriever:
- Generates multiple reformulated versions of the question
- Searches the vector database for each variation
- Combines the retrieved documents
This improves recall and ensures relevant documents are not missed.
The initial version of the chatbot required multiple LLM calls:
- Persona detection
- Escalation decision
- Retrieval decision
This increased latency.
The optimized architecture introduces a Triage Node that performs all decisions in one structured LLM call.
The triage output includes:
- Persona
- Confidence score
- Escalation decision
- Retrieval requirement
This significantly reduces latency and token usage.
The system uses structured outputs instead of free text responses.
Outputs are validated using Pydantic schemas such as:
PersonaClassificationEscalationDecisionRetrievalDecisionTriageResult
This ensures reliable parsing and robust routing logic.
All LangGraph state is validated using Pydantic models.
Example state variables include:
- query
- persona
- chat_history
- context
- answer
- escalation decision
- retrieval decision
Using Pydantic ensures type safety and prevents runtime errors.
The system includes an SQLite-based checkpointer that persists graph state.
Benefits:
- Resumable executions
- Debugging capability
- Conversation persistence
- Reproducible runs
The checkpointer uses LangGraph's SqliteSaver.
Checkpoint files are stored in:
checkpoints/langgraph.db
The project is organized into modular components to improve maintainability.
Key modules include:
- configuration
- prompts
- schemas
- retriever
- data ingestion
- graph logic
- agent implementation
Each module has a clearly defined responsibility.
Customer_Support_Chatbot
│
├── my_project
│ │
│ ├── checkpoints
│ │ └── langgraph.db
│ │
│ ├── files
│ │ └── documents for ingestion
│ │
│ ├── rag
│ │
│ │ ├── core
│ │ │ ├── config.py
│ │ │ ├── prompts.py
│ │ │ ├── schemas.py
│ │ │ └── sqlite_checkpointer.py
│ │ │
│ │ ├── graph
│ │ │ ├── agent.py
│ │ │ └── chatbot1.py
│ │ │
│ │ └── load_retrieve
│ │ ├── data_ingestion.py
│ │ └── retriever.py
│ │
│ └── vectorstore
│ └── chroma.sqlite3
│
└── myenv
The triage node determines:
- persona
- escalation
- retrieval requirement
This reduces API calls and improves performance.
git clone <repo-url>
cd Customer_Support_Chatbotpython -m venv myenvActivate environment:
Windows
myenv\Scripts\activateLinux / Mac
source myenv/bin/activatepip install -r requirements.txtCreate a .env file with your API keys.
Example:
GOOGLE_API_KEY=your_key
HUGGINGFACE_API_KEY=your_key
Run the ingestion script to create the vector database.
python rag/load_retrieve/data_ingestion.pyThis will:
- Load documents from
/files - Split them into chunks
- Generate embeddings
- Store them in ChromaDB
python rag/graph/agent.pyThis starts the LangGraph chatbot pipeline.
Central configuration for:
- model selection
- vector database paths
- checkpoint paths
- file locations
Contains all prompt templates including:
- persona detection
- escalation detection
- retrieval decision
- triage prompt
- answer generation
Defines Pydantic models for:
- structured LLM outputs
- LangGraph state
- triage results
- persona classification
Creates an SQLite checkpointer for LangGraph using:
SqliteSaver
Ensures graph state persistence.
Main optimized LangGraph agent.
Implements:
- Triage node
- Retrieve node
- Answer node
- GeneralAnswer node
- HumanSupport node
This version minimizes LLM calls.
Earlier reference implementation.
Uses multiple nodes:
- PersonaDetection
- EscalationDecision
- RetrievalDecision
Kept for comparison with the optimized design.
Creates the MultiQueryRetriever using:
- Chroma vector store
- embedding models
- LLM query expansion
Responsible for:
- loading documents
- splitting into chunks
- embedding generation
- storing in ChromaDB
This project demonstrates several production-grade AI system patterns:
- LangGraph workflow orchestration
- Retrieval-Augmented Generation
- Persona-aware conversational agents
- Structured LLM outputs
- Pydantic validation
- Persistent state with checkpointers
- Latency optimization strategies
- Modular AI system design
These patterns are commonly used in enterprise AI support systems.
Possible extensions include:
- HumanSupport based on requirements
- Websearch Fallback
- Streaming responses
- Tool calling
- Conversation memory summarization
- Analytics and monitoring
- Vector database alternatives
- Hybrid search (BM25 + embeddings)
- UI interface (Streamlit or FastAPI)
This project is intended for educational and research purposes.

