Build a complete data platform in 30 minutes with DuckDB + DuckLake + Marimo.
What you get: Enterprise lakehouse capabilities that normally cost $50K+/year, running on your laptop for $0.
# Start the lakehouse
docker-compose up -d
# Wait 30 seconds for services to start, then visit:
# 📊 Marimo Notebooks: http://localhost:2718
# 🗄️ MinIO Console: http://localhost:9001 (minioadmin/minioadmin)- DuckDB with DuckLake - SQLite for analytics, now with snapshots
- PostgreSQL - The metadata brain
- MinIO - S3-compatible storage that runs locally
- Marimo - Python notebooks that actually work
- Full ACID transactions - Multiple users, no conflicts
- Time travel queries - Rollback experiments instantly
- Vector search ready - VSS + FTS extensions loaded
- Prototyping AI features without burning warehouse credits
- Data team sandboxes with proper isolation
- Local development that scales to production
- Learning modern data stack without complexity
- Cost optimization - offload expensive ad-hoc queries
ducklake-sandbox/
├── docker-compose.yml # Main orchestration
├── Dockerfile.marimo # Marimo environment
├── requirements.txt # Python dependencies
├── init-scripts/
│ └── 01-init-ducklake.sql # Postgres setup
├── notebooks/
│ └── ducklake_demo.py # Demo notebook
└── .env.example # Configuration template
| Service | Port | Purpose |
|---|---|---|
| Marimo | 2718 | Interactive notebooks |
| PostgreSQL | 5432 | Metadata catalog |
| MinIO | 9000/9001 | Object storage + UI |
- MinIO:
minioadmin/minioadmin - PostgreSQL:
postgres/ducklake123
# In Marimo notebook
import duckdb
conn = duckdb.connect()
# Connect to your lakehouse
conn.execute("USE lakehouse;")
# Load from anywhere
conn.execute("CREATE TABLE my_data AS SELECT * FROM 'path/to/data.csv';")# Save current state
conn.execute("SELECT ducklake_snapshot('lakehouse');")
# Query historical data
conn.execute("SELECT * FROM my_table FOR SNAPSHOT 1;")# Vector search ready
conn.execute("CREATE INDEX my_vectors ON my_table USING HNSW (embedding);")
# Full-text search
conn.execute("CREATE INDEX my_text_idx ON my_table USING FTS (text_column);")# Production-like setup
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up
# Add more compute nodes
docker-compose scale marimo=3# Use real S3 in production
environment:
- AWS_ACCESS_KEY_ID=your_real_key
- AWS_SECRET_ACCESS_KEY=your_real_secret
- AWS_ENDPOINT_URL=https://s3.amazonaws.com# Each user gets isolated sandbox
ATTACH 'ducklake:postgres:...' AS user_sandbox_alice (DATA_PATH 's3://team-bucket/alice/');
ATTACH 'ducklake:postgres:...' AS user_sandbox_bob (DATA_PATH 's3://team-bucket/bob/');This same setup runs on any cloud provider:
- AWS: RDS Postgres + S3 + ECS/EKS
- GCP: Cloud SQL + GCS + GKE
- Azure: PostgreSQL + Blob Storage + AKS
- Any VPS: Same docker-compose on $5/month server
No vendor lock-in, no surprise bills.
Found a bug? Want to add features? PRs welcome!
Common additions:
- dbt integration
- Streamlit dashboard
- Airflow scheduling
- Prometheus monitoring
If this saved you from expensive warehouse bills, consider giving it a star!
Questions? Open an issue or discussion.
Built with ❤️ by the data community. From laptop to lakehouse in minutes.