A production-style Retrieval-Augmented Generation (RAG) implementation in .NET using PostgreSQL pgvector, .NET Aspire, and pluggable LLM providers (Ollama / OpenAI).
This project demonstrates how to build a semantic search + RAG pipeline using modern .NET practices:
- Vector similarity search with pgvector
- LLM-based answer generation (Ollama / OpenAI)
- Clean Architecture (Domain, Application, Infrastructure)
- .NET Aspire orchestration (Postgres + Ollama + API)
- Provider-agnostic design (LLM + vector store)
User Query
↓
Embedding Service (Ollama / OpenAI)
↓
Vector Store (pgvector)
↓
Top-K Retrieved Documents
↓
LLM (RAG Prompt)
↓
Generated Answer
DotNetRagPgvector/
│
├── AppHost/ # Aspire orchestration
├── Api/ # Minimal API (entry point)
├── Application/ # Use cases + abstractions
│ ├── Abstractions/ # IEmbeddingService, IVectorStore, ILLMService
│ ├── Services/ # RagService, IngestionService
│ └── DTOs/ # UserInput
│
├── Domain/ # Core models
│ └── Models/
│ ├── Document
│ └── RetrievedDocument
│
├── Infrastructure/ # Implementations
│ ├── Entities/ # EF Core Entities - DesignPatterns
│ ├── Persistence/ # EF Core + pgvector
│ ├── VectorStores/ # PgVectorStore
│ ├── Embeddings/ # Ollama / OpenAI
│ └── LLM/ # Ollama / OpenAI
- ✅ Semantic search using embeddings
- ✅ Retrieval-Augmented Generation (RAG)
- ✅ pgvector integration with PostgreSQL
- ✅ Clean architecture with strict separation of concerns
- ✅ Pluggable LLM providers (Ollama / OpenAI)
- ✅ Aspire-based container orchestration
- ✅ Scalar UI for API testing
- .NET 10
- ASP.NET Core Minimal APIs
- Entity Framework Core
- PostgreSQL + pgvector
- .NET Aspire
- Ollama (local LLM)
- OpenAI (optional)
- .NET 10 SDK
- Docker Desktop (Or Podman - set Aspire container runtime to podman)
- .NET Aspire workload
dotnet workload install aspiredotnet run --project AppHostThis starts:
- PostgreSQL (pgvector)
- Ollama
- API
Navigate to:
http://localhost:<api-port>/scalar
{
"query": "How to handle distributed transactions?"
}- How to prevent cascading failures?
- How to scale read-heavy systems?
- How do microservices communicate asynchronously?
{
"AI": {
"Provider": "Ollama",
"Ollama": {
"BaseUrl": "http://localhost:11434",
"EmbeddingModel": "nomic-embed-text",
"ChatModel": "phi4-mini"
},
"OpenAI": {
"ApiKey": "",
"EmbeddingModel": "text-embedding-3-small",
"ChatModel": "gpt-4o-mini"
}
}
}Switch between Ollama and OpenAI:
"Provider": "Ollama"
// or
"Provider": "OpenAI"No code changes required.
- Uses PostgreSQL with
pgvector - Embeddings stored as
vectorcolumn - Cosine similarity used for retrieval:
var results = await dbContext.DesignPatterns
.OrderBy(x => x.Embedding.CosineDistance(questionVector))
.Take(2)
.ToListAsync();ORDER BY "Embedding" <=> @queryVector
LIMIT 2- Automatic on startup
- Uses real-world architecture patterns dataset
- Embeddings generated via selected provider
// 1. Embed query
var queryEmbedding = await _embedding.GenerateAsync(query);
// 2. Retrieve relevant documents
var docs = await _vectorStore.SearchAsync(queryEmbedding);
// 3. Build prompt
// 4. Generate response using LLMYou can easily extend:
- pgvector ✅
- Pinecone (future)
- FAISS (future)
- Ollama ✅
- OpenAI ✅
- Azure OpenAI (easy to add)
- Domain is persistence-agnostic
- Infrastructure handles external dependencies
- Application orchestrates the RAG pipeline
- API acts as composition root
- Hybrid search (BM25 + vector)
- HNSW indexing for pgvector
- Streaming LLM responses
- Evaluation & benchmarking
- Metadata filtering
Contributions are welcome! Feel free to open issues or PRs.
MIT License
This repository demonstrates a production-ready, provider-agnostic RAG architecture in .NET.
- pgvector
- .NET Aspire
- Ollama