Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .codex-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "brightdata-mcp",
"version": "0.1.0",
"description": "Public web access and data collection from Codex via MCP",
"author": {
"name": "brightdata",
"url": "https://github.com/brightdata/brightdata-mcp"
},
"homepage": "https://github.com/brightdata/brightdata-mcp",
"repository": "https://github.com/brightdata/brightdata-mcp",
"keywords": [
"mcp",
"codex"
],
"mcpServers": "./.mcp.json",
"skills": "./skills/",
"interface": {
"displayName": "Bright Data MCP",
"shortDescription": "Public web access and data collection from Codex via MCP",
"longDescription": "A powerful MCP server for public web access. npm and PyPI package data, right from your AI agent.",
"category": "Web Scraping",
"websiteURL": "https://github.com/brightdata/brightdata-mcp"
}
}
20 changes: 20 additions & 0 deletions .github/workflows/plugin-quality-gate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Plugin Quality Gate

on:
pull_request:
paths:
- ".codex-plugin/**"
- "skills/**"
- ".mcp.json"

jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Codex plugin quality gate
uses: hashgraph-online/hol-codex-plugin-scanner-action@v1
with:
plugin_dir: "."
min_score: 80
fail_on_severity: high
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:
cache: "npm"
registry-url: 'https://registry.npmjs.org'
scope: '@brightdata'
- run: npm install -g npm@latest
- run: npm ci
- run: npm audit signatures
- run: npm publish
11 changes: 11 additions & 0 deletions .mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"mcpServers": {
"brightdata-mcp": {
"command": "npx",
"args": [
"-y",
"@brightdata/mcp"
]
}
}
}
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN_HERE
<hr/>
<p>✅ Web Search<br/>
✅ Scraping with Web unlocker<br/>
✅ AI-ranked Discover search<br/>
❌ Browser Automation<br/>
❌ Web data tools</p>
<br/>
Expand Down Expand Up @@ -213,7 +212,7 @@ https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN_HERE
- Mode priority: `PRO_MODE=true` (all tools) → `GROUPS` / `TOOLS`
(whitelist) → default rapid mode (base toolkit).
- Base tools always enabled: `search_engine`, `search_engine_batch`,
`scrape_as_markdown`, `scrape_batch`, `discover`.
`scrape_as_markdown`, `scrape_batch`.
- Group ID `custom` is reserved; use `TOOLS` for bespoke picks.


Expand Down Expand Up @@ -395,7 +394,6 @@ https://github.com/user-attachments/assets/61ab0bee-fdfa-4d50-b0de-5fab96b4b91d
|------|-------------|----------|
| 🔍 `search_engine` | Web search with AI-optimized results | Research, fact-checking, current events |
| 📄 `scrape_as_markdown` | Convert any webpage to clean markdown | Content extraction, documentation |
| 🎯 `discover` | AI-ranked web search with intent-based relevance scoring | Deep research, RAG pipelines, competitive analysis |

### 💎 Pro Mode Tools (60+ Tools)

Expand Down
1 change: 0 additions & 1 deletion assets/Tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
|scrape_batch|Scrape up to 10 webpages in one request and return an array of URL/content pairs in Markdown format.|
|scrape_as_html|Scrape a single webpage with advanced extraction and return the HTML response body. Handles sites protected by bot detection or CAPTCHA.|
|extract|Scrape a webpage as Markdown and convert it to structured JSON using AI sampling, with an optional custom extraction prompt.|
|discover|Search the web and rank results by AI-driven relevance. Returns scored results with title, description, URL, and relevance score. Supports intent-based ranking, geo-targeting, date filtering, and keyword filtering.|
|session_stats|Report how many times each tool has been called during the current MCP session.|
|web_data_amazon_product|Quickly read structured Amazon product data. Requires a valid product URL containing /dp/. Often faster and more reliable than scraping.|
|web_data_amazon_product_reviews|Quickly read structured Amazon product review data. Requires a valid product URL containing /dp/. Often faster and more reliable than scraping.|
Expand Down
1 change: 0 additions & 1 deletion manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
{"name": "scrape_batch", "description": "Scrape multiple webpage URLs with advanced options for content extraction and get back the results in Markdown. This tool can unlock any webpage even if it uses bot detection or CAPTCHA. Processes up to 10 URLs."},
{"name": "scrape_as_html", "description": "Scrape a single webpage URL with advanced options for content extraction and get back the results in HTML. This tool can unlock any webpage even if it uses bot detection or CAPTCHA."},
{"name": "extract", "description": "Scrape a webpage and extract structured data as JSON. First scrapes the page as markdown, then uses AI sampling to convert it to structured JSON format. This tool can unlock any webpage even if it uses bot detection or CAPTCHA."},
{"name": "discover", "description": "Search the web and rank results by AI-driven relevance. Returns scored results with title, description, and URL. Supports intent-based ranking, geo-targeting, date filtering, and keyword filtering."},
{"name": "session_stats", "description": "Tell the user about the tool usage during this session"},
{"name": "web_data_amazon_product", "description": "Quickly read structured amazon product data. Requires a valid product URL with /dp/ in it. This can be a cache lookup, so it can be more reliable than scraping."},
{"name": "web_data_amazon_product_reviews", "description": "Quickly read structured amazon product review data. Requires a valid product URL with /dp/ in it. This can be a cache lookup, so it can be more reliable than scraping."},
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@brightdata/mcp",
"version": "2.9.4",
"version": "2.9.3",
"description": "An MCP interface into the Bright Data toolset",
"type": "module",
"main": "./server.js",
Expand Down
121 changes: 1 addition & 120 deletions server.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ const base_timeout = process.env.BASE_TIMEOUT
const base_max_retries = Math.min(
parseInt(process.env.BASE_MAX_RETRIES || '0', 10), 3);
const pro_mode_tools = ['search_engine', 'scrape_as_markdown',
'search_engine_batch', 'scrape_batch', 'discover'];
'search_engine_batch', 'scrape_batch'];
const tool_groups = process.env.GROUPS ?
process.env.GROUPS.split(',').map(g=>g.trim().toLowerCase())
.filter(Boolean) : [];
Expand Down Expand Up @@ -487,125 +487,6 @@ addTool({
}),
});

addTool({
name: 'discover',
description: 'Search the web and rank results by AI-driven relevance. '
+'Returns scored results with title, description, and URL. Supports '
+'intent-based ranking, geo-targeting, date filtering, and keyword '
+'filtering.',
annotations: {
title: 'Discover',
readOnlyHint: true,
openWorldHint: true,
},
parameters: z.object({
query: z.string().describe('The search query'),
intent: z.string().optional().describe('Describes the specific goal '
+'of the search to help the AI evaluate and rank result relevance.'
+'If not provided, the query string is used as the intent'),
country: z.string().length(2).optional()
.describe('2-letter ISO country code for localized results '
+'(e.g., "US", "GB", "DE")'),
city: z.string().optional()
.describe('City for localized results (e.g., "New York", '
+'"Berlin")'),
language: z.string().optional()
.describe('Language code (e.g., "en", "es", "fr")'),
num_results: z.number().int().optional()
.describe('Exact number of search results to return'),
filter_keywords: z.array(z.string()).optional()
.describe('Keywords that must appear in search results'),
remove_duplicates: z.boolean().optional()
.describe('Remove duplicate results (default: true)'),
start_date: z.string().optional()
.describe('Only content updated from this date (YYYY-MM-DD)'),
end_date: z.string().optional()
.describe('Only content updated until this date (YYYY-MM-DD)'),
}),
execute: tool_fn('discover', async(data, ctx)=>{
let body = {query: data.query, format: 'json'};
if (data.intent)
body.intent = data.intent;
if (data.country)
body.country = data.country;
if (data.city)
body.city = data.city;
if (data.language)
body.language = data.language;
if (data.num_results)
body.num_results = data.num_results;
if (data.filter_keywords)
body.filter_keywords = data.filter_keywords;
if (data.remove_duplicates===false)
body.remove_duplicates = false;
if (data.start_date)
body.start_date = data.start_date;
if (data.end_date)
body.end_date = data.end_date;
let trigger_response = await axios({
url: 'https://api.brightdata.com/discover',
method: 'POST',
data: body,
headers: {
...api_headers(ctx.clientName, 'discover'),
'Content-Type': 'application/json',
},
});
let task_id = trigger_response.data?.task_id;
if (!task_id)
throw new Error('No task_id returned from discover request');
console.error(`[discover] triggered with task ID: ${task_id}`);
let max_attempts = polling_timeout;
let attempts = 0;
while (attempts<max_attempts)
{
try {
if (ctx && ctx.reportProgress)
{
await ctx.reportProgress({
progress: attempts,
total: max_attempts,
message: `Polling for discover results (attempt `
+`${attempts+1}/${max_attempts})`,
});
}
let poll_response = await axios({
url: 'https://api.brightdata.com/discover',
params: {task_id},
method: 'GET',
headers: api_headers(ctx.clientName, 'discover'),
});
if (poll_response.data?.status==='processing')
{
console.error(`[discover] still processing, polling `
+`again (attempt ${attempts+1}/${max_attempts})`);
attempts++;
await new Promise(resolve=>setTimeout(resolve, 1000));
continue;
}
console.error(`[discover] results received after `
+`${attempts+1} attempts`);
let results = poll_response.data?.results || [];
results = results.map(r=>({
link: r.link,
title: r.title,
description: r.description,
relevance_score: r.relevance_score,
}));
return JSON.stringify(results);
} catch(e){
console.error(`[discover] polling error: ${e.message}`);
if (e.response?.status===400)
throw e;
attempts++;
await new Promise(resolve=>setTimeout(resolve, 1000));
}
}
throw new Error(`Timeout after ${max_attempts} seconds waiting `
+`for discover results`);
}),
});

addTool({
name: 'session_stats',
description: 'Tell the user about the tool usage during this session',
Expand Down
4 changes: 2 additions & 2 deletions server.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
"url": "https://github.com/brightdata/brightdata-mcp",
"source": "github"
},
"version": "2.9.4",
"version": "2.9.2",
"packages": [
{
"registryType": "npm",
"registryBaseUrl": "https://registry.npmjs.org",
"identifier": "@brightdata/mcp",
"version": "2.9.4",
"version": "2.9.2",
"transport": {
"type": "stdio"
},
Expand Down
12 changes: 12 additions & 0 deletions skills/brightdata-mcp/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
name: brightdata-mcp
description: Public web access and data collection from Codex via MCP
---

# Bright Data MCP for Codex

Use Bright Data MCP from Codex via MCP.

## When to use
- When you need brightdata-mcp capabilities in your Codex workflow
- See https://github.com/brightdata/brightdata-mcp for full setup instructions
2 changes: 1 addition & 1 deletion tool_groups.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
'use strict'; /*jslint node:true es9:true*/

const base_tools = ['search_engine', 'scrape_as_markdown', 'discover'];
const base_tools = ['search_engine', 'scrape_as_markdown'];

export const GROUPS = {
ECOMMERCE: {
Expand Down