Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 31 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

oss-data-analyst is an intelligent AI agent that converts natural language questions into SQL queries and provides data analysis. Built with the Vercel AI SDK, it features multi-phase reasoning (planning, building, execution, reporting) and streams results in real-time.

> **Note**: This is a fork of the original [vercel-labs/oss-data-analyst](https://github.com/vercel-labs/oss-data-analyst) template with modifications to make it more suitable for demonstration purposes. Key changes include:
>
> - **Emphasis on local databases**: Optimized for SQLite-based demos without requiring cloud database setup
> - **Prompt engineering improvements**: Enhanced system prompts for better query understanding and result interpretation
> - **Streamlined setup**: Simplified configuration and dependencies for quick local deployment

> **Note**: This is a reference architecture. The semantic catalog and schemas included are simplified examples for demonstration purposes. Production implementations should use your own data models and schemas.

## Features
Expand All @@ -25,29 +31,36 @@ oss-data-analyst is an intelligent AI agent that converts natural language quest
### Installation

1. **Clone the repository**

```bash
git clone https://github.com/vercel/oss-data-analyst.git
cd oss-data-analyst
```

2. **Install dependencies**

```bash
pnpm install
```

3. **Set up environment variables**

```bash
cp env.local.example .env.local
```

Edit `.env.local` and add your Vercel AI Gateway key

4. **Initialize the database**

```bash
pnpm initDatabase
```

This creates a SQLite database with sample data (Companies, People, Accounts)

5. **Run the development server**

```bash
pnpm dev
```
Expand All @@ -67,43 +80,51 @@ pnpm start
This repository includes a sample database schema with three main entities to demonstrate oss-data-analyst's capabilities:

### **Companies**

Represents organizations in your database. Each company has:

- Basic information (name, industry, employee count)
- Business metrics (founded date, status)
- Example: Technology companies, Healthcare organizations, etc.

### **Accounts**

Represents customer accounts or subscriptions tied to companies. Each account includes:

- Account identification (account number, status)
- Financial metrics (monthly recurring value, contract details)
- Relationship to parent company
- Example: Active subscriptions with monthly values ranging from $10k-$50k

### **People**

Represents individual employees or contacts within companies. Each person has:

- Personal information (name, email)
- Employment details (department, title, salary)
- Relationship to their company
- Example: Engineers, Sales representatives, Managers across different departments


## How It Works

oss-data-analyst uses a multi-phase agentic workflow:

1. **Planning Phase**

- Analyzes natural language query
- Searches semantic catalog for relevant entities
- Identifies required data and relationships
- Generates execution plan

2. **Building Phase**

- Constructs SQL query from plan
- Validates syntax and security policies
- Optimizes query structure
- Finds join paths between tables

3. **Execution Phase**

- Estimates query cost
- Executes SQL against database
- Handles errors with automatic repair
Expand All @@ -120,6 +141,7 @@ oss-data-analyst uses a multi-phase agentic workflow:
### Customizing Prompts

Modify system prompts in `src/lib/prompts/`:

- `planning.ts` - Planning phase behavior
- `building.ts` - SQL generation logic
- `execution.ts` - Query execution handling
Expand All @@ -136,25 +158,28 @@ Try asking oss-data-analyst (using the sample database):
- "What is the total revenue for Active accounts?"
- "How many people work in Engineering?"

## Using with Production Databases
## Using with Your Own Data

The default setup uses SQLite for demonstration. To use with Snowflake or other databases:
The application uses SQLite as its database. To use your own data:

1. Update `src/lib/oss-data-analyst-agent-advanced.ts` to import from `./tools/execute` instead of `./tools/execute-sqlite`
2. Configure your database credentials in `.env.local`
3. Update the semantic catalog in `src/lib/semantic/` with your schema definitions
1. Modify the database seed script at `scripts/seed-database.ts` with your data
2. Run `pnpm initDatabase` to reinitialize the database
3. Update the semantic catalog in `src/semantic/` with your schema definitions matching your data structure

## Troubleshooting

**Database Not Found**

- Run `pnpm initDatabase` to create and seed the database
- Check that `data/oss-data-analyst.db` exists

**AI Gateway API Errors**

- Verify your API key is valid in `.env.local`
- Check API rate limits and credits

**Build Errors**

- Run `pnpm install` to update dependencies
- Check TypeScript errors with `pnpm run type-check`
- Clear `.next` folder and rebuild
22 changes: 8 additions & 14 deletions env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,14 @@ NEXT_PUBLIC_API_URL=http://localhost:3000
# Development
NODE_ENV=development

# AI Provider (OpenAI)
OPENAI_API_KEY=your_openai_api_key_here
# AI Provider
# Create an API key: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai%2Fapi-keys
AI_GATEWAY_API_KEY=your_ai_gateway_api_key_here

# Snowflake Database Configuration
SNOWFLAKE_ACCOUNT=your_account.region
SNOWFLAKE_USERNAME=your_username
SNOWFLAKE_PASSWORD=your_password
# OR use private key authentication (optional)
# SNOWFLAKE_PRIVATE_KEY=your_private_key_here

SNOWFLAKE_ROLE=your_role
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_SCHEMA=your_schema
# SQLite Database Configuration
# The application uses a local SQLite database located at data/oss-data-analyst.db
# Run 'pnpm run initDatabase' to initialize and seed the database

# Security - Allowed schemas for queries (comma-separated)
ALLOWED_SCHEMAS=analytics,crm
# SQLite uses "main" as the default schema
ALLOWED_SCHEMAS=main
6 changes: 1 addition & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@
"lucide-react": "^0.468.0",
"motion": "^12.23.24",
"nanoid": "^5.1.6",
"next": "^15.0.3",
"next": "15.0.5",
"next-themes": "^0.4.6",
"node-sql-parser": "^5.3.10",
"pg": "^8.16.3",
Expand All @@ -106,8 +106,6 @@
"react-resizable-panels": "^3.0.6",
"react-syntax-highlighter": "^15.6.6",
"react-typed": "^2.0.12",
"recharts": "2.15.4",
"snowflake-sdk": "^2.1.3",
"sonner": "^2.0.7",
"streamdown": "^1.4.0",
"tailwind-merge": "^3.3.1",
Expand All @@ -116,8 +114,6 @@
"tw-animate-css": "^1.4.0",
"use-stick-to-bottom": "^1.1.1",
"vaul": "^1.1.2",
"vega": "^6.2.0",
"vega-lite": "^6.4.1",
"zod": "^4.1.12"
},
"packageManager": "pnpm@8.15.0",
Expand Down
Loading