Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Agentic RAG with Vertex AI Vector Search 2.0

This project is a sample implementation of an Agentic RAG using the Agent Development Kit (ADK), with Vertex AI Vector Search 2.0 as the unified vector store.

Key Features of Vector Search 2.0

Vector Search 2.0 is Google Cloud's fully managed, self-tuning vector database built on Google's ScaNN (Scalable Nearest Neighbors) algorithm.

  • Unified Data Storage: Store both vector embeddings and user data together (no separate database needed)
  • Auto-Embeddings: Automatically generate semantic embeddings using Vertex AI embedding models
  • Built-in Full Text Search: Provides built-in full-text search without needing to generate sparse embeddings
  • Hybrid Search: Combine semantic and keyword search with intelligent RRF ranking
  • Zero Indexing to Billion-Scale: Start immediately with kNN, then scale to billions with ANN indexes

Project Structure

rag-with-vectorsearch-2.0/
├── rag_with_vectorsearch_2_0/       # ADK Agent directory
│   ├── .env.example
│   ├── agent.py
│   ├── tools.py                     # Vector Search 2.0 Hybrid Search
│   └── requirements.txt             # Agent dependencies
├── data_ingestion/                  # Collection creation & Ingestion
│   ├── .env.example
│   ├── create_vector_search_collection.py
│   ├── ingest.py
│   └── requirements.txt             # Data ingestion dependencies
├── source_documents/                # Source documents for RAG
└── README.md

Architecture Pattern: Unified Vector Store with Auto-Embeddings

Unlike the traditional pattern that requires a separate document store (e.g., Firestore), Vector Search 2.0 stores both vectors and data together, simplifying the architecture.

How It Works

  1. Create Collection: Define data schema and vector schema with auto-embedding configuration
  2. Ingest Data: Store documents as Data Objects with data and empty vectors (auto-generated)
  3. Hybrid Search: Query using both Semantic Search and Text Search combined with RRF
  4. Retrieve Data: Get results directly from search response (no secondary lookup needed)

Architecture Diagram

+--------------+    (1) Query      +----------------------------+
|              | ----------------> |        Agentic RAG         |
|  User/Client | <---------------- |(Cloud Run, Agent Engine...)| 
|              | (4) Final Result  +----------------------------+
+--------------+                          |            ^
                            (2) Hybrid    |            | (3) Return results
                               Search     v            |     with data
                              +----------------------------------+
                              |   Vertex AI Vector Search 2.0    |
                              |   (Collection with Auto-Embed)   |
                              |   - Semantic Search (Dense)      |
                              |   - Text Search (Keyword)        |
                              |   - RRF Ranking                  |
                              +----------------------------------+

Prerequisites

Before you begin, you need to have an active Google Cloud project.

1. Configure your Google Cloud project

First, authenticate with Google Cloud:

gcloud auth application-default login

Set up your project and enable the necessary APIs:

# Set your project ID and location
export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value project)
export GOOGLE_CLOUD_LOCATION="us-central1"

# Enable the required APIs
gcloud services enable \
  vectorsearch.googleapis.com \
  aiplatform.googleapis.com \
  cloudresourcemanager.googleapis.com

2. Grant Agent Engine permissions (for deployment)

To allow the deployed Agent Engine to access your Vector Search collection:

# Get your project number
export PROJECT_NUMBER=$(gcloud projects describe $GOOGLE_CLOUD_PROJECT --format="value(projectNumber)")

# Grant the Vertex AI User role
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT \
    --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform-re.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

# Grant the Vertex AI Vector Search Viewer role
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT \
    --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform-re.iam.gserviceaccount.com" \
    --role="roles/vectorsearch.viewer"

Setup

1. Install Dependencies

This project uses uv to manage the Python virtual environment.

Create and activate the virtual environment:

# Navigate to the project directory
cd rag-with-vectorsearch-2.0

# Create the virtual environment
uv venv

# Activate the virtual environment (macOS/Linux)
source .venv/bin/activate
# Activate the virtual environment (Windows)
.venv\Scripts\activate

Install dependencies:

# Install agent dependencies
uv pip install -r rag_with_vectorsearch_2_0/requirements.txt

# Install data ingestion script dependencies
uv pip install -r data_ingestion/requirements.txt

2. Create Vector Search Collection (with ANN Index)

Run the collection creation script. This creates a Collection with auto-embedding configuration and an ANN index for production-ready search.

cd data_ingestion

python create_vector_search_collection.py \
  --project_id=$GOOGLE_CLOUD_PROJECT \
  --location=$GOOGLE_CLOUD_LOCATION \
  --collection_name="rag-collection"

Available options:

  • --embedding_model: Embedding model (default: gemini-embedding-001)
  • --embedding_dim: Embedding dimensions (default: 768)
  • --no-index: Skip ANN index creation
  • --wait-for-index: Wait for ANN index creation to complete

Note: Collection creation is immediate, but ANN index creation takes 5-30 minutes.

3. Ingest Documents

Run the data ingestion script to load documents from source_documents/:

cd data_ingestion

python ingest.py \
  --project_id=$GOOGLE_CLOUD_PROJECT \
  --location=$GOOGLE_CLOUD_LOCATION \
  --collection_name="rag-collection" \
  --source_dir="../source_documents"

This script:

  • Loads .md and .txt files from the source directory
  • Splits documents into chunks (1000 chars, 100 overlap)
  • Creates Data Objects with auto-generated embeddings

4. Run the Agent Locally

Before running the agent, create a .env file:

cp rag_with_vectorsearch_2_0/.env.example rag_with_vectorsearch_2_0/.env

Edit the .env file with your configuration:

GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
VECTOR_SEARCH_COLLECTION_NAME=rag-collection

Using the Command-Line Interface (CLI):

adk run rag_with_vectorsearch_2_0

Using the Web Interface:

adk web

Screenshot:

ADK Web Interface for RAG with Vector Search 2.0

Deployment

The RAG agent can be deployed to Vertex AI Agent Engine using the adk deploy command.

1. Set Environment Variables

export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value project)
export GOOGLE_CLOUD_LOCATION="us-central1"

2. Deploy the Agent

adk deploy cloud_run \
  --project=$GOOGLE_CLOUD_PROJECT \
  --region=$GOOGLE_CLOUD_LOCATION \
  rag_with_vectorsearch_2_0

Or deploy to Agent Engine:

adk deploy agent_engine \
  --project=$GOOGLE_CLOUD_PROJECT \
  --region=$GOOGLE_CLOUD_LOCATION \
  --staging_bucket="gs://your-staging-bucket" \
  rag_with_vectorsearch_2_0

When the deployment finishes, it will print a line like this:

Successfully created remote agent: projects/<PROJECT_NUMBER>/locations/<PROJECT_LOCATION>/reasoningEngines/<AGENT_ENGINE_ID>

Make a note of the AGENT_ENGINE_ID. You will need it to interact with your deployed agent.

Testing the Deployed Agent

Once your agent is deployed, you can use the provided Jupyter notebook to verify its functionality.

  1. Navigate to the notebooks/ directory.
  2. Open test_vector_search_2_0_agent_on_agent_engine.ipynb.
  3. Follow the instructions in the notebook to:
    • Configure your Project ID and Location.
    • Connect to the deployed agent using your unique AGENT_ENGINE_ID.
    • Test stateful sessions and real-time streaming queries.

This notebook provides a convenient way to interact with your agent in a stateful manner and serves as a starting point for building your own client applications.

References