Network Discovery MCP - User Guide

A modular, containerized network discovery service for automated network mapping and analysis. Network Discovery MCP automatically discovers network devices, collects configurations, and generates interactive topology visualizations starting from a single seed device.

Overview

Network Discovery MCP provides automated network infrastructure discovery and mapping. The system supports two discovery approaches:

Discovery Method 1: Seed Device Discovery

Connects to a single "seed" network device
Automatically discovers the network topology through:
- Interface information (subnets, VRFs)
- Routing tables (known networks)
- ARP tables (active hosts)
- CDP/LLDP neighbors (directly connected devices)
Scans all discovered IP addresses for reachability
Identifies device vendors and models
Collects device configurations
Generates interactive network topology visualizations

Use this method when: You have network access to devices and want automated topology discovery.

Discovery Method 2: Direct IP/Subnet Scanning

Accepts a list of IP addresses or subnets directly
Scans the provided addresses for reachability
Identifies device vendors and models
Collects device configurations
Generates network topology visualizations

Use this method when: You have a source of truth (like NetBox, spreadsheet, or IPAM) with device IPs and want to skip topology discovery.

Operating Modes

The service can operate in two modes:

REST API Mode: Traditional HTTP API for integration with scripts, tools, and CI/CD pipelines
MCP Mode: Model Context Protocol interface for direct AI agent integration

Both modes provide identical functionality with different integration patterns.

Features

Core Discovery Features

Automated Network Mapping: Discover entire network topology from a single seed device
Intelligent Multi-Vendor Support: Automatically detects device OS after login (Cisco, Juniper, Arista, Palo Alto, Fortinet, Huawei)
Enhanced Device Fingerprinting: Improved vendor identification with support for Arista EOS, Juniper JUNOS, and Palo Alto PAN-OS
Parallel Operations: High-performance scanning and configuration collection
Configuration Management: Securely collect and store device configurations with vendor-specific commands
Interactive Visualizations: Generate interactive HTML network topology maps with color-coded device status
Batfish Integration: Advanced network analysis and validation

Reliability Features

Credential Validation: Test credentials before starting expensive operations (saves 30+ minutes on failures)
Job Resume: Resume failed jobs without re-doing completed work (critical for large networks)
Retry Logic: Automatic retry with exponential backoff for transient failures
Timeout Handling: Configurable timeouts prevent hanging operations
Graceful Shutdown: Clean shutdown handling for containerized environments

Monitoring and Observability

System Health Checks: Monitor system resources before starting scans
Job Statistics: Detailed progress tracking and success metrics
Failure Analysis: Intelligent recommendations for troubleshooting
Historical Tracking: Track job history and success rates over time

Intelligent Features

Automatic OS Detection: Detects device operating system after SSH login (eliminates need for platform parameter)
Vendor-Specific Commands: Automatically selects correct commands for each vendor
Fingerprint Correction: Corrects misidentifications by validating OS post-authentication
Fallback Mechanisms: Uses fingerprint data if OS detection fails

Security Features

Secure Credential Handling: Passwords masked in logs and representations
Atomic File Operations: Prevent data corruption during writes
Container Isolation: Runs in isolated container environments
Optional HTTPS: Support for encrypted MCP communications

Getting Started

Prerequisites

Docker and Docker Compose installed
Network access to devices you want to discover
Valid credentials for at least one "seed" device

Quick Start

Clone the repository:

git clone https://github.com/username/network-discovery-mcp.git
cd network-discovery-mcp

Choose your deployment mode:

For REST API Mode:

docker compose up -d

For MCP Mode (AI Agents):

docker compose -f docker-compose.mcp.yml up -d

Verify the service is running:

REST API:

curl http://localhost:8000/health

MCP:

curl http://localhost:8080/mcp

Start a discovery (REST API example):

curl -X POST http://localhost:8000/v1/seed \
  -H "Content-Type: application/json" \
  -d '{
    "seed_host": "192.168.1.1",
    "credentials": {
      "username": "admin",
      "password": "your_password"
    },
    "methods": ["interfaces", "routing", "arp", "cdp"]
  }'

Note: The platform parameter (like cisco_ios) is now optional! The system automatically detects the device OS after login and selects the appropriate commands. You can still provide platform as a fallback if needed.

The response includes a job_id that you can use to track progress and retrieve results.

Deployment Modes

REST API Mode (Default)

REST API mode provides traditional HTTP endpoints for programmatic access.

Use this mode when:

Integrating with existing tools and scripts
Running from CI/CD pipelines
Building custom applications
You need language-agnostic HTTP access

Starting REST API mode:

docker compose up -d

Accessing the API:

API Base URL: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

Configuration file: docker-compose.yml

MCP Mode (AI Agent Integration)

MCP mode provides a Model Context Protocol interface for AI agent integration.

Use this mode when:

Integrating with AI agents (Claude, GPT, etc.)
Building autonomous network discovery workflows
You want AI-driven network operations

Starting MCP mode:

docker compose -f docker-compose.mcp.yml up -d

Accessing the MCP server:

MCP Endpoint: http://localhost:8080/mcp
Health Check: http://localhost:8080/health

Configuration file: docker-compose.mcp.yml

MCP with HTTPS

Some AI agent frameworks (like certain Claude implementations or enterprise AI platforms) require HTTPS connections and will not accept HTTP. If your AI agent requires HTTPS, you need to configure SSL certificates.

When to use HTTPS:

Your AI agent framework refuses HTTP connections
Your AI agent is running on a different network and requires encryption
Corporate security policies mandate encrypted communications
You're exposing the MCP server to the internet

Prerequisites:

SSL certificate file (fullchain.pem or certificate.crt)
SSL private key file (privkey.pem or private.key)

Step 1: Prepare Your Certificates

Option A - Using Let's Encrypt certificates:

# If using Let's Encrypt/Certbot, certificates are typically at:
# Certificate: /etc/letsencrypt/live/yourdomain.com/fullchain.pem
# Private key: /etc/letsencrypt/live/yourdomain.com/privkey.pem

Option B - Using self-signed certificates (for testing):

# Generate self-signed certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout privkey.pem \
  -out fullchain.pem \
  -subj "/CN=localhost"

Option C - Using corporate certificates:

# Use certificates provided by your organization
# Ensure you have both the certificate and private key files

Step 2: Modify docker-compose.mcp.yml

Edit the docker-compose.mcp.yml file to mount your certificates:

services:
  network-discovery-mcp:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: network-discovery-mcp
    environment:
      - ENABLE_MCP=true
      - TRANSPORT=https                    # Set transport to https
      - PORT=8080
      - HOST=0.0.0.0
      - ARTIFACT_DIR=/artifacts
      - BATFISH_HOST=batfish
      - LOG_LEVEL=info
    ports:
      - "8080:8080"                         # HTTP port (optional, for health checks)
      - "443:443"                           # HTTPS port (required)
volumes:
  - ./artifacts:/artifacts
      # Mount your SSL certificates (REQUIRED for HTTPS):
      - /path/to/your/fullchain.pem:/certs/fullchain.pem:ro
      - /path/to/your/privkey.pem:/certs/privkey.pem:ro
    depends_on:
      - batfish
    networks:
      - discovery-network

  batfish:
    image: batfish/batfish:latest
    container_name: batfish
ports:
      - "9996:9996"
      - "9997:9997"
    networks:
      - discovery-network

networks:
  discovery-network:
    driver: bridge

Important changes:

Change TRANSPORT from http to https
Add port mapping for 443:443
Uncomment and update the certificate volume mounts with your actual certificate paths
Replace /path/to/your/ with the actual path to your certificates

Step 3: Start the Service

# Start with HTTPS enabled
docker compose -f docker-compose.mcp.yml up -d

# Check logs to verify HTTPS is working
docker compose -f docker-compose.mcp.yml logs network-discovery-mcp

# You should see logs indicating nginx started with HTTPS on port 443

Step 4: Verify HTTPS is Working

# Test HTTPS endpoint
curl https://localhost/mcp

# If using self-signed certificates, use -k to skip verification
curl -k https://localhost/mcp

# You should get a response from the MCP server

Step 5: Configure Your AI Agent

Update your AI agent configuration to use the HTTPS endpoint:

# Example: Connecting AI agent to HTTPS MCP server
import mcp

# With valid certificates
client = mcp.Client("https://your-server.com/mcp")

# With self-signed certificates (development only)
import ssl
context = ssl._create_unverified_context()
client = mcp.Client("https://localhost/mcp", ssl_context=context)

Troubleshooting HTTPS

Problem: "Certificate verification failed"

If using self-signed certificates, your AI agent may reject them. Solutions:

For testing, disable certificate verification in your agent (not recommended for production)
Add the self-signed certificate to your system's trusted certificates
Use properly signed certificates from Let's Encrypt or a CA

Problem: "Connection refused on port 443"

Check that:

# 1. Container is running
docker compose -f docker-compose.mcp.yml ps

# 2. Port 443 is mapped
docker compose -f docker-compose.mcp.yml port network-discovery-mcp 443

# 3. Nginx is running with HTTPS
docker compose -f docker-compose.mcp.yml logs network-discovery-mcp | grep nginx

# 4. Certificates are mounted correctly
docker exec network-discovery-mcp ls -la /certs/

Problem: "Cannot find certificates"

The container looks for certificates at:

/certs/fullchain.pem
/certs/privkey.pem

Verify they're mounted:

docker exec network-discovery-mcp ls -la /certs/
# Should show both files

Example: Complete HTTPS Setup with Let's Encrypt

# 1. Obtain Let's Encrypt certificates (on host machine)
sudo certbot certonly --standalone -d your-domain.com

# 2. Update docker-compose.mcp.yml with certificate paths
# Edit the volumes section:
volumes:
  - ./artifacts:/artifacts
  - /etc/letsencrypt/live/your-domain.com/fullchain.pem:/certs/fullchain.pem:ro
  - /etc/letsencrypt/live/your-domain.com/privkey.pem:/certs/privkey.pem:ro

# 3. Ensure TRANSPORT is set to https
# environment:
#   - TRANSPORT=https

# 4. Start the service
docker compose -f docker-compose.mcp.yml up -d

# 5. Test HTTPS connection
curl https://your-domain.com/mcp

# 6. Configure AI agent to use HTTPS endpoint
# In your AI agent config:
# mcp_server_url: https://your-domain.com/mcp

HTTP vs HTTPS Decision Matrix

Scenario	Use HTTP	Use HTTPS
Testing locally with AI agent on same machine	Yes	No
AI agent framework requires HTTPS	No	Yes
Exposing MCP server over network	No	Yes
Corporate security policy	No	Yes
AI agent on different network	No	Yes
Development/testing environment	Yes	Optional
Production environment	No	Yes

Summary:

HTTP (port 8080) is simpler for local testing
HTTPS (port 443) is required for AI agent frameworks that don't accept HTTP
The service automatically detects mounted certificates and enables HTTPS
Both HTTP and HTTPS can run simultaneously (useful for health checks on port 8080)

Testing Your MCP Server

You can test the MCP server using standard HTTP tools:

# Test HTTP MCP server
curl http://localhost:8080/mcp

# Test HTTPS MCP server
curl https://localhost/mcp --insecure

# Check available tools (pretty print)
curl http://localhost:8080/mcp | jq '.tools'

# Verify specific tool availability
curl http://localhost:8080/mcp | jq '.tools[] | select(.name=="run_network_discovery")'

You can also use the MCP Inspector tool from the official MCP SDK if you have it installed locally, or integrate directly with AI agent frameworks like Claude Desktop, Cline, or other MCP-compatible clients.

Single Container Deployment

For advanced users who want more control or don't need Batfish:

REST API Mode:

docker run -d \
  -p 8000:8000 \
  -e ARTIFACT_DIR=/data \
  -v /path/to/artifacts:/data \
  ghcr.io/username/network-discovery-mcp:latest

MCP Mode (HTTP):

docker run -d \
  -p 8080:8080 \
  -e ENABLE_MCP=true \
  -e TRANSPORT=http \
  -e ARTIFACT_DIR=/data \
  -v /path/to/artifacts:/data \
  ghcr.io/username/network-discovery-mcp:latest

MCP Mode (HTTPS):

docker run -d \
  -p 443:443 -p 8080:8080 \
  -e ENABLE_MCP=true \
  -e TRANSPORT=https \
  -e ARTIFACT_DIR=/data \
  -v /path/to/artifacts:/data \
  -v /etc/ssl/certs/fullchain.pem:/certs/fullchain.pem:ro \
  -v /etc/ssl/private/privkey.pem:/certs/privkey.pem:ro \
  ghcr.io/username/network-discovery-mcp:latest

Note: Running without the Batfish container disables topology visualization and analysis features.

Using the REST API

The REST API provides programmatic access to all network discovery functionality.

Complete Discovery Workflow

Here's a complete example of discovering a network using the REST API:

Step 1: Validate Credentials (Recommended)

Test credentials before starting the discovery to avoid wasting time:

curl -X POST http://localhost:8000/v1/credentials/validate \
  -H "Content-Type: application/json" \
  -d '{
    "seed_host": "192.168.1.1",
    "username": "admin",
    "password": "cisco123"
  }'

Note: The platform parameter is optional. The system will automatically detect the OS after connecting.

Response:

{
  "valid": true,
  "latency_ms": 2340,
  "detected_vendor": "Cisco",
  "detected_model": "IOS-XE",
  "can_read_config": true
}

If credentials are invalid, you'll get an error with suggestions before wasting time on discovery.

Step 2: Seed from a Device

Start discovery from a seed device:

curl -X POST http://localhost:8000/v1/seed \
  -H "Content-Type: application/json" \
  -d '{
    "seed_host": "192.168.1.1",
    "credentials": {
      "username": "admin",
      "password": "cisco123"
    },
    "methods": ["interfaces", "routing", "arp", "cdp"]
  }'

Note: Platform auto-detection happens automatically. You can optionally provide "platform": "cisco_ios" as a fallback if detection should fail.

Response:

{
  "job_id": "net-disc-20251102-123456",
  "status": "completed",
  "targets_count": 256,
  "targets_path": "/artifacts/net-disc-20251102-123456/targets.json"
}

Save the job_id for subsequent operations.

Step 3: Scan Discovered Targets

Scan the discovered targets for reachability:

curl -X POST http://localhost:8000/v1/scan \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456",
    "ports": [22, 443],
    "concurrency": 200
  }'

Response:

{
  "job_id": "net-disc-20251102-123456",
  "status": "completed",
  "hosts_scanned": 256,
  "hosts_reachable": 42
}

Step 4: Fingerprint Devices

Identify vendors and models:

curl -X POST http://localhost:8000/v1/fingerprint \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456"
  }'

Response:

{
  "job_id": "net-disc-20251102-123456",
  "status": "completed",
  "hosts_fingerprinted": 42,
  "identified_count": 38
}

Step 5: Collect Configurations

Collect device configurations (platform auto-detected):

curl -X POST http://localhost:8000/v1/state/collect \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456",
    "credentials": {
      "username": "admin",
      "password": "cisco123"
    },
    "concurrency": 50
  }'

Note: The system automatically detects each device's OS after login and runs the appropriate commands:

Cisco: show running-config
Arista: show running-config
Juniper: show configuration | display set
Palo Alto: show config running
Fortinet: show full-configuration
Huawei: display current-configuration

Response:

{
  "job_id": "net-disc-20251102-123456",
  "status": "completed",
  "device_count": 42,
  "success_count": 38,
  "failed_count": 4
}

Step 6: Generate Topology Visualization

Build Batfish snapshot and generate visualization:

# Build Batfish snapshot
curl -X POST http://localhost:8000/v1/batfish/build \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456"
  }'

# Load into Batfish
curl -X POST http://localhost:8000/v1/batfish/load \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456"
  }'

# Generate visualization
curl -X GET "http://localhost:8000/v1/batfish/topology/html?job_id=net-disc-20251102-123456" \
  -o network_topology.html

# Open in browser
open network_topology.html

Step 7: Check Job Status

Monitor progress at any time:

curl http://localhost:8000/v1/status/net-disc-20251102-123456

Response:

{
  "job_id": "net-disc-20251102-123456",
  "seeder": {
    "status": "completed",
    "targets_count": 256,
    "completed_at": "2025-11-02T12:34:56Z"
  },
  "scanner": {
    "status": "completed",
    "hosts_scanned": 256,
    "hosts_reachable": 42,
    "completed_at": "2025-11-02T12:38:45Z"
  },
  "fingerprinter": {
    "status": "completed",
    "hosts_fingerprinted": 42,
    "completed_at": "2025-11-02T12:40:22Z"
  },
  "state_collector": {
    "status": "completed",
    "device_count": 42,
    "success_count": 38,
    "failed_count": 4,
    "completed_at": "2025-11-02T12:55:33Z"
  }
}

Handling Failures: Job Resume

If a job fails partway through (e.g., during config collection), you can resume it without re-doing completed work:

# Check for resumable jobs
curl http://localhost:8000/v1/jobs/resumable

# Resume a specific job
curl -X POST http://localhost:8000/v1/jobs/resume \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-20251102-123456",
    "credentials": {
      "username": "admin",
      "password": "cisco123",
      "platform": "cisco_ios"
    }
  }'

The resume operation:

Detects which phases completed successfully
Skips completed work
Retries only failed or incomplete phases
Preserves all successful results

This is critical for large networks where a transient failure shouldn't require restarting the entire discovery process.

Alternative: Scan from Subnets Directly

If you don't need seed-based discovery and already have a list of IP addresses or subnets (from NetBox, IPAM, spreadsheet, etc.), you can scan them directly:

curl -X POST http://localhost:8000/v1/scan/from-subnets \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "manual-scan-001",
    "subnets": ["192.168.1.0/24", "10.0.0.0/24"],
    "ports": [22, 443],
    "concurrency": 200
  }'

Then continue with fingerprinting and config collection as normal.

Working with Source of Truth Systems

If you maintain an inventory in a source of truth system (NetBox, phpIPAM, etc.), you can extract the IP addresses and scan them directly without seed device discovery.

Example: Using NetBox as Source of Truth

# 1. Query NetBox for device IPs (this is pseudocode)
DEVICE_IPS=$(curl -H "Authorization: Token YOUR_TOKEN" \
  https://netbox.example.com/api/dcim/devices/ | \
  jq -r '.results[].primary_ip.address' | \
  cut -d'/' -f1)

# 2. Convert to JSON array
SUBNETS=$(echo "$DEVICE_IPS" | jq -R -s -c 'split("\n")[:-1]')

# 3. Start scan with those IPs
curl -X POST http://localhost:8000/v1/scan/from-subnets \
  -H "Content-Type: application/json" \
  -d "{
    \"job_id\": \"netbox-scan\",
    \"subnets\": $SUBNETS,
    \"ports\": [22, 443],
    \"concurrency\": 200
  }"

# 4. Continue with fingerprinting
curl -X POST http://localhost:8000/v1/fingerprint \
  -H "Content-Type: application/json" \
  -d '{"job_id": "netbox-scan"}'

# 5. Collect configs
curl -X POST http://localhost:8000/v1/state/collect \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "netbox-scan",
    "credentials": {
      "username": "admin",
      "password": "your_password",
      "platform": "cisco_ios"
    }
  }'

# 6. Generate topology
curl -X POST http://localhost:8000/v1/batfish/build \
  -H "Content-Type: application/json" \
  -d '{"job_id": "netbox-scan"}'

curl -X POST http://localhost:8000/v1/batfish/load \
  -H "Content-Type: application/json" \
  -d '{"job_id": "netbox-scan"}'

curl -X GET "http://localhost:8000/v1/batfish/topology/html?job_id=netbox-scan" \
  -o topology.html

Example: Using a CSV File with IP Addresses

# 1. Convert CSV to subnet list
# Assuming devices.csv has a column "ip_address"
SUBNETS=$(tail -n +2 devices.csv | cut -d',' -f1 | jq -R -s -c 'split("\n")[:-1]')

# 2. Start scan
curl -X POST http://localhost:8000/v1/scan/from-subnets \
  -H "Content-Type: application/json" \
  -d "{
    \"job_id\": \"csv-scan\",
    \"subnets\": $SUBNETS,
    \"ports\": [22, 443]
  }"

AI Agent Example with Source of Truth

import mcp
import requests

# Connect to MCP server
client = mcp.Client("http://localhost:8080/mcp")

# 1. Retrieve device IPs from NetBox
netbox_response = requests.get(
    "https://netbox.example.com/api/dcim/devices/",
    headers={"Authorization": "Token YOUR_TOKEN"}
)
device_ips = [d["primary_ip"]["address"].split("/")[0] 
              for d in netbox_response.json()["results"] 
              if d.get("primary_ip")]

# 2. Scan the devices from NetBox
result = client.call_tool("scan_from_subnets", {
    "job_id": "netbox-discovery",
    "subnets": device_ips,
    "ports": [22, 443]
})

# 3. Continue with fingerprinting and config collection
client.call_tool("fingerprint_devices", {"job_id": "netbox-discovery"})
client.call_tool("collect_device_configs", {
    "job_id": "netbox-discovery",
    "credentials": {
        "username": "admin",
        "password": "password",
        "platform": "cisco_ios"
    }
})

# 4. Generate topology
client.call_tool("build_batfish_snapshot", {"job_id": "netbox-discovery"})
client.call_tool("load_batfish_snapshot", {"job_id": "netbox-discovery"})
topology = client.call_tool("generate_topology_visualization", {"job_id": "netbox-discovery"})

print(f"Topology visualization saved to: {topology['path']}")

This approach is ideal when:

You already maintain device inventory in another system
You want to validate your source of truth data
You need to discover only specific devices, not the entire network
You want to avoid CDP/LLDP-based discovery

Using with AI Agents (MCP)

The MCP interface provides tools that AI agents can use to discover and analyze networks autonomously.

MCP Workflow Example

Here's how an AI agent would interact with the MCP server:

import mcp

# Connect to MCP server
client = mcp.Client("http://localhost:8080/mcp")

# Step 1: Validate credentials first
validation = client.call_tool("validate_device_credentials", {
    "seed_host": "192.168.1.1",
    "username": "admin",
    "password": "cisco123",
    "platform": "cisco_ios"
})

if not validation["valid"]:
    print(f"Invalid credentials: {validation['error']}")
    print(f"Suggestion: {validation['suggestion']}")
    exit(1)

print("Credentials validated successfully!")

# Step 2: Check system health
health = client.call_tool("check_system_health", {})
if not health["ready_for_scan"]:
    print(f"System not ready: {health['issues']}")
    exit(1)

# Step 3: Seed from device
result = client.call_tool("seed_device", {
    "seed_host": "192.168.1.1",
    "credentials": {
        "username": "admin",
        "password": "cisco123",
        "platform": "cisco_ios"
    },
    "methods": ["interfaces", "routing", "arp", "cdp"]
})
job_id = result["job_id"]

# Step 4: Scan targets
client.call_tool("scan_targets", {
    "job_id": job_id,
    "ports": [22, 443]
})

# Step 5: Get job statistics
stats = client.call_tool("get_job_stats", {
    "job_id": job_id
})
print(f"Found {stats['results']['scanning']['reachable_hosts']} devices")

# Step 6: Fingerprint devices
client.call_tool("fingerprint_devices", {
    "job_id": job_id
})

# Step 7: Collect configs
result = client.call_tool("collect_device_configs", {
    "job_id": job_id,
    "credentials": {
        "username": "admin",
        "password": "cisco123",
        "platform": "cisco_ios"
    }
})

# Step 8: Generate topology
client.call_tool("build_batfish_snapshot", {"job_id": job_id})
client.call_tool("load_batfish_snapshot", {"job_id": job_id})
viz = client.call_tool("generate_topology_visualization", {"job_id": job_id})

print(f"Topology saved to: {viz['path']}")

# Step 9: Get final statistics
final_stats = client.call_tool("get_job_stats", {"job_id": job_id})
print(f"Discovery complete!")
print(f"- Devices discovered: {final_stats['results']['scanning']['reachable_hosts']}")
print(f"- Vendors identified: {final_stats['results']['vendors']}")
print(f"- Configs collected: {final_stats['results']['config_collection']['configs_collected']}")

AI Agent Best Practices

1. Always Validate Credentials First

Before starting expensive discovery operations, validate credentials:

# GOOD: Validate first (10 seconds)
validation = client.call_tool("validate_device_credentials", {...})
if validation["valid"]:
    # Proceed with discovery
    
# BAD: Skip validation, waste 30 minutes on wrong credentials
client.call_tool("seed_device", {...})  # Fails after 30 min

2. Check System Health Before Large Scans

health = client.call_tool("check_system_health", {})
if health["cpu"]["usage_percent"] > 80:
    print("System busy, waiting...")
    time.sleep(60)

3. Resume Failed Jobs

# Check for failed jobs
resumable = client.call_tool("list_resumable_jobs", {})

if resumable["count"] > 0:
    # Offer to resume
    client.call_tool("resume_failed_job", {
        "job_id": resumable["resumable_jobs"][0]["job_id"],
        "username": "admin",
        "password": "cisco123"
    })

4. Use Job Statistics for Reporting

stats = client.call_tool("get_job_stats", {"job_id": job_id})

# Generate user-friendly report
print(f"""
Network Discovery Complete:
- Devices discovered: {stats['results']['scanning']['reachable_hosts']}
- Identification rate: {stats['results']['fingerprinting']['identification_rate'] * 100}%
- Vendor breakdown:
  - Cisco: {stats['results']['vendors'].get('cisco', 0)}
  - Juniper: {stats['results']['vendors'].get('juniper', 0)}
  - Arista: {stats['results']['vendors'].get('arista', 0)}
- Configs collected: {stats['results']['config_collection']['configs_collected']}
""")

GitHub Actions Integration

The repository includes workflows for running network discovery from GitHub Actions.

Setup

Add your device credentials as GitHub Secrets:
- Go to Settings > Secrets and variables > Actions
- Add DEVICE_USERNAME
- Add DEVICE_PASSWORD
Use the workflow:

Create .github/workflows/discover-network.yml:

name: Discover Network

on:
  workflow_dispatch:
    inputs:
      seed_host:
        description: "Seed device IP or hostname"
        required: true
        default: "192.168.1.1"
      platform:
        description: "Device platform"
        required: true
        default: "cisco_ios"
        type: choice
        options:
          - cisco_ios
          - cisco_nxos
          - juniper_junos
          - arista_eos

jobs:
  discover:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Start containers
        run: docker compose up -d
      
      - name: Wait for service
        run: |
          timeout 60 bash -c 'until curl -s http://localhost:8000/health; do sleep 2; done'
      
      - name: Validate credentials
        run: |
          curl -X POST http://localhost:8000/v1/credentials/validate \
            -H "Content-Type: application/json" \
            -d '{
              "seed_host": "${{ github.event.inputs.seed_host }}",
              "username": "${{ secrets.DEVICE_USERNAME }}",
              "password": "${{ secrets.DEVICE_PASSWORD }}",
              "platform": "${{ github.event.inputs.platform }}"
            }'
      
      - name: Run discovery
        run: |
          JOB_ID=$(curl -X POST http://localhost:8000/v1/seed \
            -H "Content-Type: application/json" \
            -d '{
              "seed_host": "${{ github.event.inputs.seed_host }}",
              "credentials": {
                "username": "${{ secrets.DEVICE_USERNAME }}",
                "password": "${{ secrets.DEVICE_PASSWORD }}",
                "platform": "${{ github.event.inputs.platform }}"
              },
              "methods": ["interfaces", "routing", "arp", "cdp"]
            }' | jq -r '.job_id')
          
          echo "JOB_ID=$JOB_ID" >> $GITHUB_ENV
          
          # Scan
          curl -X POST http://localhost:8000/v1/scan \
            -H "Content-Type: application/json" \
            -d "{\"job_id\": \"$JOB_ID\", \"ports\": [22, 443]}"
          
          # Fingerprint
          curl -X POST http://localhost:8000/v1/fingerprint \
            -H "Content-Type: application/json" \
            -d "{\"job_id\": \"$JOB_ID\"}"
          
          # Collect configs
          curl -X POST http://localhost:8000/v1/state/collect \
            -H "Content-Type: application/json" \
            -d "{
              \"job_id\": \"$JOB_ID\",
              \"credentials\": {
                \"username\": \"${{ secrets.DEVICE_USERNAME }}\",
                \"password\": \"${{ secrets.DEVICE_PASSWORD }}\",
                \"platform\": \"${{ github.event.inputs.platform }}\"
              }
            }"
          
          # Generate topology
          curl -X POST http://localhost:8000/v1/batfish/build \
            -H "Content-Type: application/json" \
            -d "{\"job_id\": \"$JOB_ID\"}"
          
          curl -X POST http://localhost:8000/v1/batfish/load \
            -H "Content-Type: application/json" \
            -d "{\"job_id\": \"$JOB_ID\"}"
          
          curl -X GET "http://localhost:8000/v1/batfish/topology/html?job_id=$JOB_ID" \
            -o topology.html
      
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: network-discovery-results
          path: |
            topology.html
            /tmp/network_discovery_artifacts/${{ env.JOB_ID }}/

Running the Workflow

Go to Actions tab in GitHub
Select "Discover Network"
Click "Run workflow"
Enter seed device IP and platform
View results in the workflow artifacts

API Reference

Core Discovery Endpoints

Method	Endpoint	Description
POST	`/v1/seed`	Start seeder from a device, creates targets.json
POST	`/v1/scan`	Scan targets from existing job
POST	`/v1/scan/from-subnets`	Scan specific subnets directly
POST	`/v1/scan/add-subnets`	Add subnets to existing job
POST	`/v1/fingerprint`	Fingerprint discovered devices
POST	`/v1/state/collect`	Collect device configurations
POST	`/v1/state/update/{hostname}`	Re-collect single device config

Credential Management Endpoints

Method	Endpoint	Description
POST	`/v1/credentials/validate`	Validate credentials before discovery
POST	`/v1/credentials/validate/batch`	Validate against multiple devices

Job Management Endpoints

Method	Endpoint	Description
GET	`/v1/status/{job_id}`	Get job status and progress
GET	`/v1/jobs/resumable`	List jobs that can be resumed
POST	`/v1/jobs/resume`	Resume a failed job

Batfish Endpoints

Method	Endpoint	Description
POST	`/v1/batfish/build`	Build Batfish snapshot from configs
POST	`/v1/batfish/load`	Load snapshot into Batfish
GET	`/v1/batfish/topology`	Get topology in JSON format
GET	`/v1/batfish/topology/html`	Generate interactive HTML visualization
GET	`/v1/batfish/networks`	List all networks
GET	`/v1/batfish/networks/{name}/snapshots`	List snapshots for network
POST	`/v1/batfish/networks/{name}/snapshot/{snapshot}`	Set current snapshot

Data Retrieval Endpoints

Method	Endpoint	Description
GET	`/v1/scan/{job_id}`	Get scan results
GET	`/v1/scan/{job_id}/reachable`	Get only reachable hosts
GET	`/v1/fingerprint/{job_id}`	Get fingerprinting results
GET	`/v1/state/{hostname}`	Get device configuration
GET	`/v1/artifacts/{job_id}/{filename}`	Get any artifact file

System Endpoints

Method	Endpoint	Description
GET	`/health`	Basic health check
GET	`/ready`	Readiness check with validation
GET	`/docs`	Interactive API documentation

MCP Tools Reference

Seeder Tools

seed_device

Start network discovery from a seed device
Parameters: seed_host, credentials, methods, job_id
Returns: job_id, status, targets_count

get_targets

Retrieve targets collected from seed device
Parameters: job_id
Returns: targets array with IPs and subnets

Scanner Tools

scan_targets

Scan targets for open management ports
Parameters: job_id, ports, concurrency
Returns: hosts_scanned, hosts_reachable

scan_from_subnets

Scan specific subnets directly (no seeding required)
Parameters: job_id, subnets, ports, concurrency
Returns: hosts_scanned, hosts_reachable

get_reachable_hosts

Get only reachable hosts from scan results
Parameters: job_id
Returns: reachable hosts array

Fingerprinter Tools

fingerprint_devices

Identify device vendors and models
Parameters: job_id, snmp_community (optional)
Returns: hosts_fingerprinted, identified_count

get_fingerprint_results

Get fingerprinting results
Parameters: job_id
Returns: detailed fingerprint data with vendors

Config Collector Tools

collect_device_configs

Collect device configurations in parallel
Parameters: job_id, credentials, concurrency
Returns: device_count, success_count, failed_count

get_device_config

Get configuration for specific device
Parameters: job_id, hostname
Returns: device configuration JSON

Batfish Tools

build_batfish_snapshot

Build Batfish snapshot from collected configs
Parameters: job_id
Returns: snapshot_dir, device_count

load_batfish_snapshot

Load snapshot into Batfish for analysis
Parameters: job_id
Returns: network_name, snapshot_name

get_topology

Get network topology in JSON format
Parameters: job_id OR network_name + snapshot_name
Returns: topology graph with nodes and edges

generate_topology_visualization

Generate interactive HTML visualization
Parameters: job_id OR network_name + snapshot_name
Returns: path to HTML file

Credential Validation Tools

validate_device_credentials

Quick credential test (5-15 seconds)
Parameters: seed_host, username, password, platform
Returns: valid, latency_ms, vendor, platform_correct, error, suggestion
Use this BEFORE starting expensive discovery operations

validate_credentials_multiple

Test credentials against multiple devices in parallel
Parameters: devices array, username, password, concurrency
Returns: total_devices, valid_count, invalid_count, success_rate, per-device results
Useful for testing credentials across different device types

Job Resume Tools

resume_failed_job

Resume failed job without re-doing completed work
Parameters: job_id, phase (optional), credentials
Returns: resumed_from, phases_executed, summary
Critical for large networks - saves hours on partial failures

list_resumable_jobs

Get list of jobs that can be resumed
Parameters: none
Returns: resumable_jobs array with failed_phases and completed_phases
Useful for proactively offering to resume failed jobs

Monitoring Tools

check_system_health

Check system resources before starting scans
Parameters: none
Returns: CPU, memory, disk usage, ready_for_scan status
Use before large scans to ensure system capacity

get_job_stats

Get detailed statistics for a job
Parameters: job_id
Returns: module statuses, scan results, vendor breakdown, timing
Use for progress monitoring and reporting

get_recent_job_history

Analyze recent job executions
Parameters: hours (default: 24), limit (default: 50)
Returns: success_rate, failed_jobs, health assessment
Use for identifying trends and recurring issues

get_system_recommendations

Get intelligent recommendations and warnings
Parameters: none
Returns: warnings, recommendations, quick_actions
Use for proactive issue detection

Artifact Tools

get_artifact_content

Retrieve any artifact file from job directory
Parameters: job_id, filename
Returns: file content (text or base64-encoded)
Supports HTML, JSON, text, and binary files

Configuration

Environment Variables

Core Settings

ARTIFACT_DIR: Directory for job artifacts (default: /tmp/network_discovery_artifacts)
DEFAULT_PORTS: Comma-separated ports to scan (default: 22,443)
DEFAULT_CONCURRENCY: Parallel operation limit (default: 200)
CONNECT_TIMEOUT: Connection timeout in seconds (default: 1.5)
LOG_LEVEL: Logging verbosity (default: info)

Batfish Settings

BATFISH_HOST: Batfish server hostname (default: batfish)
BATFISH_PORT: Batfish server port (default: 9996)

Server Settings

HOST: Server bind address (default: 0.0.0.0)
PORT: Server port (default: REST: 8000, MCP: 8080)

MCP Settings

ENABLE_MCP: Enable MCP mode (default: false)
TRANSPORT: Transport type http or https (default: http)
BASE_PATH: URL base path for proxied deployments (default: "")

Docker Compose Configuration

The service comes with two Docker Compose files:

docker-compose.yml (REST API Mode)

services:
  network-discovery:
    environment:
      - ARTIFACT_DIR=/artifacts
      - DEFAULT_PORTS=22,443
      - DEFAULT_CONCURRENCY=200
      - BATFISH_HOST=batfish
    ports:
      - "8000:8000"
volumes:
      - ./artifacts:/artifacts

docker-compose.mcp.yml (MCP Mode)

services:
  network-discovery-mcp:
    environment:
      - ENABLE_MCP=true
      - TRANSPORT=http
      - PORT=8080
      - ARTIFACT_DIR=/artifacts
      - BATFISH_HOST=batfish
    ports:
      - "8080:8080"
    volumes:
      - ./artifacts:/artifacts

You can customize these files or override values:

# Override concurrency
DEFAULT_CONCURRENCY=400 docker compose up -d

# Use custom artifact directory
ARTIFACT_DIR=/var/network-discovery docker compose up -d

Architecture

System Components

Network Discovery MCP consists of six main modules:

Seeder: Connects to seed device and discovers network topology
- Collects interface information
- Retrieves routing tables
- Gathers ARP entries
- Discovers neighbors via CDP/LLDP
- Outputs: targets.json, device_states/{host}.json
Scanner: Probes discovered targets for reachability
- Parallel port scanning with configurable concurrency
- Tests management ports (SSH, HTTPS, etc.)
- Outputs: ip_scan.json, reachable.json
Fingerprinter: Identifies device vendors and models
- Banner analysis
- SNMP probing (optional)
- Pattern matching
- Outputs: fingerprints.json
Config Collector: Retrieves device configurations
- Multi-vendor support (Cisco, Juniper, Arista, etc.)
- Parallel collection with retry logic
- Credential validation
- Outputs: state/{hostname}.json
Batfish Loader: Prepares network for analysis
- Converts configs to Batfish format
- Builds network snapshots
- Loads into Batfish for analysis
- Outputs: batfish_snapshot/configs/{hostname}.cfg
Topology Visualizer: Generates interactive visualizations
- Queries Batfish for topology data
- Generates D3.js force-directed graphs
- Outputs: topology.html, topology.json

Data Flow

[Seed Device] 
    ↓
[Seeder] → targets.json
    ↓
[Scanner] → ip_scan.json, reachable.json
    ↓
[Fingerprinter] → fingerprints.json
    ↓
[Config Collector] → state/*.json
    ↓
[Batfish Loader] → batfish_snapshot/configs/*.cfg
    ↓
[Topology Visualizer] → topology.html

Directory Structure

network-discovery-mcp/
  ├── network_discovery/
│   ├── __main__.py              # Entry point (REST API or MCP mode)
│   ├── api.py                   # FastAPI REST endpoints
│   ├── mcp_server.py            # MCP tools implementation
│   ├── seeder.py                # Network discovery seeding
│   ├── scanner.py               # Port scanning
│   ├── fingerprinter.py         # Device identification
│   ├── config_collector.py      # Configuration collection
│   ├── batfish_loader.py        # Batfish integration
│   ├── topology_visualizer.py   # Visualization generation
│   ├── credential_validator.py  # Credential testing
│   ├── job_resume.py            # Job resumption logic
│   ├── metrics.py               # System monitoring
│   ├── artifacts.py             # File I/O operations
│   ├── config.py                # Configuration management
│   └── workers.py               # Async task coordination
├── tests/                       # Unit tests
├── Dockerfile                   # Container image definition
├── docker-compose.yml           # REST API deployment
├── docker-compose.mcp.yml       # MCP deployment
└── requirements.txt             # Python dependencies

Artifact Storage Structure

Each discovery job creates a directory structure:

{ARTIFACT_DIR}/{job_id}/
├── targets.json                 # Discovered network targets
├── device_states/
│   └── {hostname}.json          # Per-device state from seeder
├── ip_scan.json                 # Full scan results
├── reachable.json               # Filtered reachable hosts
├── fingerprints.json            # Device identification results
├── state/
│   ├── {hostname1}.json         # Device configurations
│   ├── {hostname2}.json
  │   └── ...
├── batfish_snapshot/
│   └── configs/
│       ├── {hostname1}.cfg      # Batfish-format configs
│       ├── {hostname2}.cfg
  │       └── ...
├── topology.json                # Topology graph data
├── topology.html                # Interactive visualization
├── status.json                  # Job status tracking
└── error.json                   # Error details (if any)

All file writes are atomic (write to .tmp, then rename) to prevent corruption.

Troubleshooting

Service Won't Start

Problem: Container fails to start

docker compose up -d
# Error: port already in use

Solution: Check for port conflicts

# Check what's using port 8000
lsof -i :8000

# Use different port
PORT=8001 docker compose up -d

Authentication Failures

Problem: All devices fail authentication

Error: Authentication failed for all devices

Solution: Validate credentials first

# Test credentials before discovery
curl -X POST http://localhost:8000/v1/credentials/validate \
  -H "Content-Type: application/json" \
  -d '{
    "seed_host": "192.168.1.1",
      "username": "admin",
    "password": "your_password",
    "platform": "cisco_ios"
  }'

# Check response for specific error
{
  "valid": false,
  "error": "Authentication failed - invalid username or password",
  "suggestion": "Verify credentials for user 'admin' on this device"
}

Job Failures During Config Collection

Problem: Job fails after collecting some configs

Status: 200/450 configs collected, then failure

Solution: Resume the job instead of restarting

# List resumable jobs
curl http://localhost:8000/v1/jobs/resumable

# Resume the failed job
curl -X POST http://localhost:8000/v1/jobs/resume \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "net-disc-123",
    "credentials": {"username": "admin", "password": "pass"}
  }'

# This will retry only the 250 failed devices

Slow Discovery

Problem: Discovery taking too long

Solution: Check system resources

# Check system health
curl http://localhost:8000/v1/health

# Check specific job statistics
curl http://localhost:8000/v1/status/{job_id}

# Reduce concurrency if CPU high
curl -X POST http://localhost:8000/v1/scan \
  -H "Content-Type: application/json" \
  -d '{"job_id": "...", "concurrency": 50}'

Batfish Connection Issues

Problem: Topology generation fails

Error: Cannot connect to Batfish

Solution: Verify Batfish container

# Check Batfish is running
docker compose ps

# Check Batfish logs
docker compose logs batfish

# Restart if needed
docker compose restart batfish

# Wait for Batfish to initialize
sleep 30

Large Network Timeouts

Problem: Timeouts on large subnets

Solution: Increase timeouts or reduce scope

# Increase timeout
CONNECT_TIMEOUT=5.0 docker compose up -d

# Or scan in smaller batches
curl -X POST http://localhost:8000/v1/scan/from-subnets \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "batch1",
    "subnets": ["192.168.1.0/24"],
    "concurrency": 100
  }'

View Logs

For debugging, check container logs:

# REST API mode
docker compose logs -f network-discovery

# MCP mode
docker compose -f docker-compose.mcp.yml logs -f network-discovery-mcp

# Just errors
docker compose logs network-discovery | grep ERROR

# Specific time range
docker compose logs --since 30m network-discovery

Get Help

If you encounter issues not covered here:

Check logs: docker compose logs network-discovery
Review job status: curl http://localhost:8000/v1/status/{job_id}
Check system recommendations: Use get_system_recommendations MCP tool
Review artifact files in {ARTIFACT_DIR}/{job_id}/

Additional Resources

API Documentation: http://localhost:8000/docs (interactive Swagger UI)
GitHub Releases: https://github.com/username/network-discovery-mcp/releases (version history and changelogs)

License

This project is licensed under the terms specified in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
network_discovery		network_discovery
prompt		prompt
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.mcp.yml		docker-compose.mcp.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
nginx.conf		nginx.conf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Network Discovery MCP - User Guide

Table of Contents

Overview

Discovery Method 1: Seed Device Discovery

Discovery Method 2: Direct IP/Subnet Scanning

Operating Modes

Features

Core Discovery Features

Reliability Features

Monitoring and Observability

Intelligent Features

Security Features

Getting Started

Prerequisites

Quick Start

Deployment Modes

REST API Mode (Default)

MCP Mode (AI Agent Integration)

MCP with HTTPS

Step 1: Prepare Your Certificates

Step 2: Modify docker-compose.mcp.yml

Step 3: Start the Service

Step 4: Verify HTTPS is Working

Step 5: Configure Your AI Agent

Troubleshooting HTTPS

Example: Complete HTTPS Setup with Let's Encrypt

HTTP vs HTTPS Decision Matrix

Testing Your MCP Server

Single Container Deployment

Using the REST API

Complete Discovery Workflow

Step 1: Validate Credentials (Recommended)

Step 2: Seed from a Device

Step 3: Scan Discovered Targets

Step 4: Fingerprint Devices

Step 5: Collect Configurations

Step 6: Generate Topology Visualization

Step 7: Check Job Status

Handling Failures: Job Resume

Alternative: Scan from Subnets Directly

Working with Source of Truth Systems

Using with AI Agents (MCP)

MCP Workflow Example

AI Agent Best Practices

1. Always Validate Credentials First

2. Check System Health Before Large Scans

3. Resume Failed Jobs

4. Use Job Statistics for Reporting

GitHub Actions Integration

Setup

Running the Workflow

API Reference

Core Discovery Endpoints

Credential Management Endpoints

Job Management Endpoints

Batfish Endpoints

Data Retrieval Endpoints

System Endpoints

MCP Tools Reference

Seeder Tools

Scanner Tools

Fingerprinter Tools

Config Collector Tools

Batfish Tools

Credential Validation Tools

Job Resume Tools

Monitoring Tools

Artifact Tools

Configuration

Environment Variables

Core Settings

Batfish Settings

Server Settings

MCP Settings

Docker Compose Configuration

Architecture

Packages