open-metadata · prateekbaibhav · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
@@ -0,0 +1,121 @@
+# 🤖 OpenMetadata AI SDK Recipes
+### By Baibhav Prateek | OpenMetadata Hackathon 2026
+
+## 🎯 Problem Statement
+Most data teams struggle with poor metadata quality — tables without 
+descriptions, no owners assigned, and no easy way to explore their 
+data catalog using natural language. This project solves all three 
+problems using AI.
+
+## 💡 Solution
+Three ready-to-use Jupyter notebooks that demonstrate how to combine 
+OpenMetadata's REST API with AI to build powerful metadata workflows.
+
+---
+
+## 📓 Notebooks
+
+### 1. 🏥 Metadata Health Report (`metadata_health_report.ipynb`)
+**Problem it solves:** Data teams have no visibility into how well 
+their metadata is documented.
+
+**What it does:**
+- Connects to OpenMetadata and fetches all tables
+- Checks which tables are missing descriptions and owners
+- Calculates an overall health score (0-100)
+- Generates visual charts showing coverage
+- Exports results to CSV for further analysis
+
+**Sample Output:**
+==================================================
+📊 MY OPENMETADATA HEALTH REPORT
+Total Tables Analyzed  : 50
+✅ Have Description    : 24 (48%)
+❌ Missing Description : 26 (52%)
+✅ Have Owner          : 0 (0%)
+❌ Missing Owner       : 50 (100%)
+Overall Health Score   : 24/100
+Status                 : 🔴 CRITICAL
+
+---
+
+### 2. 🔗 AI Template (`langchain_openmetadata_template.ipynb`)
+**Problem it solves:** Developers need a reusable starting point 
+for building AI-powered data catalog applications.
+
+**What it does:**
+- Provides a clean, reusable template connecting Groq AI to OpenMetadata
+- Fetches real metadata context from OpenMetadata
+- Uses LLaMA 3.3 70b to answer questions about your data
+- Anyone can customize this template for their own use case
+
+**Sample Questions it answers:**
+- "Which tables look incomplete or poorly documented?"
+- "What kind of organization does this data belong to?"
+- "Which tables should a new data analyst explore first?"
+
+---
+
+### 3. 🤖 AI Agent (`openmetadata_ai_agent.ipynb`)
+**Problem it solves:** Users have to know exactly what to search 
+for in their data catalog. This agent makes it conversational.
+
+**What it does:**
+- Intelligent agent that automatically decides how to search
+- Has 3 tools: get_tables, search_tables, get_databases
+- AI decides which tool to use based on your question
+- Returns human-friendly answers with full reasoning shown
+
+**Sample Interaction:**
+❓ User: Find tables related to orders
+🧠 Agent thinking...
+🔧 Agent decided to use: search_tables: orders
+📦 Data fetched: ['raw_orders', 'fact_orders', 'orders'...]
+🤖 Answer: Found several order-related tables...
+
+---
+
+## 🚀 Quick Start
+
+### Prerequisites
+```bash
+pip install openmetadata-ingestion groq requests pandas matplotlib jupyter
+```
+
+### Setup
+1. Get your OpenMetadata token from your profile page
+2. Get a free Groq API key from console.groq.com
+3. Open any notebook and replace the placeholders in Cell 1:
+```python
+GROQ_API_KEY = "your_groq_api_key_here"
+TOKEN = "your_openmetadata_token_here"
+```
+4. Run all cells in order!
+
+---
+
+## 🛠️ Technologies Used
+- **OpenMetadata REST API** — metadata fetching and search
+- **Groq AI (LLaMA 3.3 70b)** — natural language processing
+- **Python** — core language
+- **Pandas** — data analysis
+- **Matplotlib** — visualization
+- **Jupyter Notebooks** — interactive environment
+
+## 🎯 Impact
+These notebooks help data teams:
+- **Identify** poorly documented tables instantly
+- **Explore** their data catalog using natural language
+- **Build** AI-powered metadata applications faster
+
+## 📁 File Structure
+ingestion/examples/
+├── metadata_health_report.ipynb    # Health scoring notebook
+├── langchain_openmetadata_template.ipynb  # AI template notebook
+├── openmetadata_ai_agent.ipynb     # AI agent notebook
+├── requirements.txt                # Dependencies
+└── README.md                       # This file
+
+## 🔗 Related Issue
+This submission is for issue #26646 — Metadata AI SDK Starter 
+Templates / Recipes
@@ -0,0 +1,210 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2d542bc1-1752-4bbe-9cac-c88548ce6393",
+   "metadata": {},
+   "source": [
+    "# LangChain + OpenMetadata Template\n",
+    "### Built by Baibhav Prateek | OpenMetadata Hackathon 2026\n",
+    "\n",
+    "## What is this?\n",
+    "A reusable template that connects AI to OpenMetadata.\n",
+    "Anyone can use this as a starting point for their own\n",
+    "AI-powered data catalog applications.\n",
+    "\n",
+    "## How to use this template:\n",
+    "1) Add your API keys\n",
+    "2) Run all cells in order\n",
+    "3) Ask your own questions\n",
+    "4) Customize the questions for your use case\n",
+    "\n",
+    "## Technologies used:\n",
+    "1) OpenMetadata API for metadata\n",
+    "2) Groq AI (LLaMA 3) for natural language processing\n",
+    "3) Python requests for API calls"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac2f9ec7-80b3-4b2d-89ae-3bf237059733",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "from groq import Groq\n",
+    "\n",
+    "# Your credentials\n",
+    "GROQ_API_KEY = \"your_groq_api_key_here\"\n",
+    "BASE_URL = \"https://sandbox.open-metadata.org\"\n",
+    "TOKEN = \"paste_your_tokens_here\"\n",
+    "\n",
+    "HEADERS = {\n",
+    "    \"Authorization\": f\"Bearer {TOKEN}\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "# Initialize Groq client\n",
+    "client = Groq(api_key=GROQ_API_KEY)\n",
+    "\n",
+    "print(\"✅ Setup complete!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cfa44929-6991-432b-8455-071cf8a12fe0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Functions to fetch data from OpenMetadata with error handling\n",
+    "def get_tables(limit=10):\n",
+    "    try:\n",
+    "        response = requests.get(\n",
+    "            f\"{BASE_URL}/api/v1/tables\",\n",
+    "            headers=HEADERS,\n",
+    "            params={\"limit\": limit}\n",
+    "        )\n",
+    "        if response.status_code != 200:\n",
+    "            print(f\"❌ Error: {response.status_code}\")\n",
+    "            return []\n",
+    "        return response.json().get(\"data\", [])\n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error fetching tables: {e}\")\n",
+    "        return []\n",
+    "\n",
+    "def get_databases():\n",
+    "    try:\n",
+    "        response = requests.get(\n",
+    "            f\"{BASE_URL}/api/v1/databases\",\n",
+    "            headers=HEADERS,\n",
+    "            params={\"limit\": 20}\n",
+    "        )\n",
+    "        if response.status_code != 200:\n",
+    "            print(f\"❌ Error: {response.status_code}\")\n",
+    "            return []\n",
+    "        return response.json().get(\"data\", [])\n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error fetching databases: {e}\")\n",
+    "        return []\n",
+    "\n",
+    "def search_assets(query):\n",
+    "    try:\n",
+    "        response = requests.get(\n",
+    "            f\"{BASE_URL}/api/v1/search/query\",\n",
+    "            headers=HEADERS,\n",
+    "            params={\"q\": query, \"index\": \"table_search_index\", \"limit\": 5}\n",
+    "        )\n",
+    "        if response.status_code != 200:\n",
+    "            print(f\"❌ Error: {response.status_code}\")\n",
+    "            return []\n",
+    "        return response.json().get(\"hits\", {}).get(\"hits\", [])\n",
+    "    except Exception as e:\n",
+    "        print(f\"❌ Error searching: {e}\")\n",
+    "        return []\n",
+    "\n",
+    "print(\"✅ Helper functions ready!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddbb5ecf-d621-43a7-a5b7-03ac2cdec978",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This function connects AI with OpenMetadata\n",
+    "# Step 1; First I fetch real tables from OpenMetadata\n",
+    "# Step 2 ; I give that information to the AI as context\n",
+    "# Step 3; The AI uses that context to answer the question\n",
+    "# This way the AI always has uptodate information\n",
+    "\n",
+    "def ask_ai(question):\n",
+    "    # Fetch context from OpenMetadata\n",
+    "    tables = get_tables(limit=10)\n",
+    "    table_names = [t.get(\"name\", \"\") for t in tables]\n",
+    "    \n",
+    "    # Build prompt\n",
+    "    prompt = f\"\"\"You are a helpful data catalog assistant.\n",
+    "You have access to OpenMetadata with these tables: {table_names}\n",
+    "\n",
+    "User question: {question}\n",
+    "\n",
+    "Answer helpfully and concisely.\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=\"llama-3.3-70b-versatile\",\n",
+    "        messages=[{\"role\": \"user\", \"content\": prompt}]\n",
+    "    )\n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "# Test it!\n",
+    "answer = ask_ai(\"How many tables do we have and what are some of their names?\")\n",
-    "answer = ask_ai(\"How many tables do we have and what are some of their names?\")\n",
+    "answer = ask_ai(\"From the fetched sample of up to 10 tables, how many tables are listed and what are some of their names?\")\n",
-    "answer = ask_ai(\"How many tables do we have and what are some of their names?\")\n",
+    "answer = ask_ai(\"From the fetched sample of up to 10 tables, how many tables are listed and what are some of their names?\")\n",
+    "print(\"🤖 AI says:\")\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72b75316-6857-4018-99b9-c75f29071e4a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Interactive Q&A session\n",
+    "questions = [\n",
+    "    \"Which tables look incomplete or poorly documented?\",\n",
+    "    \"What kind of organization does this data belong to?\",\n",
+    "    \"If you were a new data analyst, which tables would you explore first?\",\n",
+    "]\n",
+    "\n",
+    "print(\"=\" * 60)\n",
+    "print(\"   🤖 OpenMetadata AI Assistant Demo\")\n",
+    "print(\"=\" * 60)\n",
+    "\n",
+    "for question in questions:\n",
+    "    print(f\"\\n❓ Question: {question}\")\n",
+    "    print(\"-\" * 40)\n",
+    "    answer = ask_ai(question)\n",
+    "    print(f\"🤖 Answer: {answer}\")\n",
+    "    print()\n",
+    "\n",
+    "print(\"=\" * 60)\n",
+    "print(\"   🤖 OpenMetadata AI Template Demo\")\n",
+    "print(\"   Built for OpenMetadata Hackathon 2026\")\n",
+    "print(\"=\" * 60)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dbcaac82-2132-47e9-8721-f384270685ad",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.9"
-   "version": "3.13.9"
+   "version": "3.11"
-   "version": "3.13.9"
+   "version": "3.11.0"
-   "version": "3.13.9"
+   "version": "3.11"
-   "version": "3.13.9"
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}