Athena

Athena is an end-to-end framework for benchmarking Retrieval-Augmented Generation (RAG) pipelines. It provides modular interfaces for embedding generation, vector database search, and LLM evaluation, wired together by a single configuration file and an orchestration script that collects timing and accuracy metrics.

Published at IISWC 2025: Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB.

Repository Layout

.
├── config.yaml              # Single config file for the entire pipeline
├── run_pipeline.py          # End-to-end orchestration script
├── metrics.py               # MetricsCollector (timing + accuracy)
│
├── emb/
│   ├── emb_server_vllm.py   # Launch the vLLM embedding server
│   └── emb_query.py         # Client: fetch_embeddings()
│
├── llm/
│   ├── llm_server_vllm.py   # Launch a vLLM completion server
│   ├── llm_server_ollama.py # Launch an Ollama server
│   └── llm_eval.py          # LLM class + bulk_eval()
│
├── milvus/
│   └── milvus_tasks_backup.py  # MilvusInterface
│
├── postgres/
│   └── postgres_tasks_backup_v2.py  # PostgresInterface
│
└── profile/
    ├── gpu_power.py         # nvidia-smi GPU power logger
    └── uprof_script.sh      # AMD uProf CPU profiler

Quick Start

1. Install dependencies

pip install pymilvus vllm requests numpy pyyaml pandas evaluate rouge_score

Milvus itself should be run via Docker:

# follow https://milvus.io/docs/install_standalone-docker.md
docker compose up -d

2. Start the servers

Embedding server (vLLM, port 8000):

python emb/emb_server_vllm.py

LLM server — pick one:

python llm/llm_server_vllm.py   # vLLM, port 8001
python llm/llm_server_ollama.py # Ollama, port 11434

Both server scripts read embedding_model / llm_model from config.yaml.

3. Configure

Edit config.yaml to point at your collection, models, and input files. See Configuration Reference below or the full docs.

4. Run the pipeline

python run_pipeline.py                    # uses ./config.yaml
python run_pipeline.py --config my.yaml  # custom config

The pipeline will:

Embed all questions via the embedding server
Search Milvus for relevant documents
Generate answers with the LLM and compute ROUGE scores
Write results to output_file (CSV) and optionally metrics_output (JSON)

Configuration Reference

# Embedding
embedding_model: "infly/INF-retriever-v1-1.5b"
emb_api_url: "http://localhost:8000/v1/embeddings"

# Milvus
milvus_uri: "http://localhost:19530"
milvus_token: "root:Milvus"
collection_name: "my_collection"
vector_field: "embedding"
search_limit: 5
search_params: {"metric_type": "COSINE", "params": {"ef": 64}}
output_fields: ["id", "text"]

# LLM
llm_model: "llama3"
llm_provider: "ollama"   # "ollama" or "vllm"
batch_size: 4

# Input/output
input_file: "prompts.json"       # [{"question": "..."}, ...]
groundtruth_file: "answers.json" # ["answer1", "answer2", ...]
output_file: "results.csv"

# Metrics
collect_metrics: true
metrics_output: "metrics.json"

Full field descriptions are in the docs.

Input Format

prompts.json — array of objects, each with a "question" key:

[
  {"question": "What is the capital of France?"},
  {"question": "Who wrote Hamlet?"}
]

answers.json — array of ground-truth answer strings, one per question:

["Paris", "Shakespeare"]

Output Format

results.csv — one row per question:

question	prediction	reference	rouge-1	rouge-L	latency_first_token

metrics.json — pipeline timing and accuracy summary:

{
  "embed_latency": 1.23,
  "search_latency": 0.45,
  "llm_batch_latency": 8.91,
  "end_to_end_latency": 10.59,
  "avg_rouge1": 0.61,
  "avg_rougeL": 0.58
}

Modules

`emb/emb_query.py` — `fetch_embeddings()`

from emb.emb_query import fetch_embeddings

embeddings = fetch_embeddings(
    ["Text 1", "Text 2"],
    model="infly/INF-retriever-v1-1.5b",
    api_url="http://localhost:8000/v1/embeddings",
)  # returns np.ndarray of shape (2, dim)

`milvus/milvus_tasks_backup.py` — `MilvusInterface`

from milvus.milvus_tasks_backup import MilvusInterface

db = MilvusInterface(uri="http://localhost:19530", token="root:Milvus")
results = db.search(
    collection_name="my_collection",
    anns_field_name="embedding",
    emb_in=embeddings.tolist(),
    limit=5,
    search_params={"metric_type": "COSINE", "params": {"ef": 64}},
    output_fields=["id", "text"],
)

`llm/llm_eval.py` — `LLM` + `bulk_eval()`

from llm.llm_eval import LLM, bulk_eval

llm = LLM(model="llama3", provider="ollama")
df = bulk_eval(llm, questions, retrieved_docs, references=answers, batch_size=4)

`metrics.py` — `MetricsCollector`

from metrics import MetricsCollector

metrics = MetricsCollector(enabled=True)
with metrics.timed("my_stage"):
    do_work()
metrics.save("metrics.json")

Set enabled=False to disable all collection with zero overhead.

Profiling

GPU power draw (NVIDIA):

MODEL=infly/INF-retriever-v1-1.5b DURATION_SEC=60 python profile/gpu_power.py

CPU profiling (AMD):

./profile/uprof_script.sh <output_dir>

Open-Source Datasets

Citation

@INPROCEEDINGS{11241995,
  author={Liang, Ning and Wenz, Fabian and Giceva, Jana and Wills, Lisa Wu},
  booktitle={2025 IEEE International Symposium on Workload Characterization (IISWC)},
  title={Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB},
  year={2025},
  pages={28-41},
  doi={10.1109/IISWC66894.2025.00013}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Athena

Repository Layout

Quick Start

1. Install dependencies

2. Start the servers

3. Configure

4. Run the pipeline

Configuration Reference

Input Format

Output Format

Modules

`emb/emb_query.py` — `fetch_embeddings()`

`milvus/milvus_tasks_backup.py` — `MilvusInterface`

`llm/llm_eval.py` — `LLM` + `bulk_eval()`

`metrics.py` — `MetricsCollector`

Profiling

Open-Source Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
emb		emb
llm		llm
milvus		milvus
postgres		postgres
profile		profile
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
config.yaml		config.yaml
metrics.py		metrics.py
run_pipeline.py		run_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

Athena

Repository Layout

Quick Start

1. Install dependencies

2. Start the servers

3. Configure

4. Run the pipeline

Configuration Reference

Input Format

Output Format

Modules

emb/emb_query.py — fetch_embeddings()

milvus/milvus_tasks_backup.py — MilvusInterface

llm/llm_eval.py — LLM + bulk_eval()

metrics.py — MetricsCollector

Profiling

Open-Source Datasets

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`emb/emb_query.py` — `fetch_embeddings()`

`milvus/milvus_tasks_backup.py` — `MilvusInterface`

`llm/llm_eval.py` — `LLM` + `bulk_eval()`

`metrics.py` — `MetricsCollector`

Packages