AgentUnit supports multiple AI platforms through dedicated adapters. Each adapter provides seamless integration with platform-specific features while maintaining consistency through the AgentUnit interface.
AutoGen AG2 (AutoGen Second Generation) is Microsoft's advanced multi-agent conversation framework. The AG2Adapter enables AgentUnit to orchestrate and test complex agent conversations.
# Install AutoGen AG2
pip install autogen-ag2
# Configure API keys
export OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_API_KEY="your-azure-key" # Optional for Azure OpenAIfrom agentunit import Scenario
from agentunit.adapters import AG2Adapter
# Basic configuration
config = {
"model": "gpt-4",
"temperature": 0.7,
"max_turns": 10,
"timeout": 60
}
adapter = AG2Adapter(config)# Advanced AG2 configuration
advanced_config = {
"model": "gpt-4",
"model_config": {
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9
},
"agents": {
"user_proxy": {
"name": "UserProxy",
"system_message": "You are a helpful assistant.",
"human_input_mode": "NEVER",
"max_consecutive_auto_reply": 3
},
"assistant": {
"name": "Assistant",
"system_message": "You are an AI assistant specialized in problem solving.",
"llm_config": {
"model": "gpt-4",
"temperature": 0.5
}
}
},
"conversation_config": {
"max_turns": 10,
"silent": False,
"cache_seed": None
}
}
adapter = AG2Adapter(advanced_config)# Create a scenario with AG2 adapter
scenario = Scenario(
name="multi_agent_conversation",
adapter=adapter,
dataset_source="conversation_prompts.json"
)
# Run the scenario
result = await scenario.run()
print(f"Success rate: {result.success_rate:.2%}")- Group Chat: Support for multi-agent group conversations
- Code Execution: Safe code execution with Docker
- Function Calling: Tool use and function execution
- Memory: Conversation history and context management
- Workflow: Complex multi-step agent workflows
- Agent Design: Create specialized agents with clear roles
- Turn Limits: Set appropriate max_turns to prevent infinite loops
- Error Handling: Implement robust error handling for network issues
- Resource Management: Use timeouts and cleanup for long-running conversations
- Testing: Start with simple two-agent conversations before complex groups
OpenAI Swarm is an experimental framework for multi-agent orchestration. The SwarmAdapter provides integration with Swarm's lightweight agent coordination.
# Install OpenAI Swarm
pip install openai-swarm
# Configure API key
export OPENAI_API_KEY="your-api-key"from agentunit.adapters import SwarmAdapter
# Basic Swarm configuration
config = {
"model": "gpt-4",
"temperature": 0.7,
"max_turns": 5
}
adapter = SwarmAdapter(config)# Advanced Swarm configuration with custom agents
advanced_config = {
"model": "gpt-4-turbo",
"agents": {
"coordinator": {
"name": "Coordinator",
"instructions": "You coordinate between other agents to solve complex tasks.",
"functions": ["transfer_to_analyst", "transfer_to_executor"]
},
"analyst": {
"name": "Analyst",
"instructions": "You analyze problems and provide detailed insights.",
"functions": ["analyze_data", "generate_insights"]
},
"executor": {
"name": "Executor",
"instructions": "You execute tasks based on analysis and coordination.",
"functions": ["execute_plan", "report_results"]
}
},
"handoff_config": {
"max_handoffs": 10,
"context_variables": {}
}
}
adapter = SwarmAdapter(advanced_config)# Define agent functions
def transfer_to_analyst():
"""Transfer conversation to the analyst agent."""
return agents["analyst"]
def analyze_data(data: str):
"""Analyze the provided data."""
return f"Analysis of {data}: [detailed analysis]"
# Create scenario with custom functions
scenario = Scenario(
name="swarm_coordination",
adapter=adapter,
dataset_source="coordination_tasks.json"
)
result = await scenario.run()- Agent Handoffs: Seamless transfer between specialized agents
- Function Calling: Rich function execution with context passing
- Context Variables: Shared state across agent interactions
- Lightweight Design: Minimal overhead for simple multi-agent tasks
- Agent Specialization: Create focused agents with specific expertise
- Handoff Strategy: Design clear handoff conditions and logic
- Function Design: Keep functions simple and well-documented
- Context Management: Use context variables effectively for state sharing
- Testing: Test individual agents before complex orchestrations
LangSmith provides observability and evaluation for LLM applications. The LangSmithAdapter enables comprehensive monitoring and debugging of agent interactions.
# Install LangSmith
pip install langsmith
# Configure API key and project
export LANGCHAIN_API_KEY="your-api-key"
export LANGCHAIN_PROJECT="your-project-name"
export LANGCHAIN_TRACING_V2=truefrom agentunit.adapters import LangSmithAdapter
# Basic LangSmith configuration
config = {
"project_name": "agentunit-testing",
"trace_level": "INFO",
"auto_eval": True
}
adapter = LangSmithAdapter(config)# Advanced LangSmith configuration
advanced_config = {
"project_name": "production-monitoring",
"session_name": "multi-agent-testing",
"trace_config": {
"trace_level": "DEBUG",
"include_metadata": True,
"trace_sampling_rate": 1.0
},
"evaluation_config": {
"auto_eval": True,
"eval_chains": ["correctness", "helpfulness", "conciseness"],
"custom_evaluators": ["domain_accuracy", "response_quality"]
},
"feedback_config": {
"collect_user_feedback": True,
"feedback_scale": "thumbs",
"automatic_scoring": True
}
}
adapter = LangSmithAdapter(advanced_config)# Create scenario with LangSmith monitoring
scenario = Scenario(
name="monitored_conversation",
adapter=adapter,
dataset_source="evaluation_dataset.json"
)
# Run with automatic tracing
result = await scenario.run()
# Access detailed traces
for run in result.runs:
trace_url = run.metadata.get("langsmith_trace_url")
print(f"View trace: {trace_url}")- Automatic Tracing: Full execution traces with timing and metadata
- Evaluation Chains: Built-in and custom evaluation metrics
- Dataset Management: Manage test datasets and examples
- Feedback Collection: User feedback and human evaluation
- Performance Analytics: Latency, cost, and usage analytics
- Project Organization: Use clear project and session names
- Trace Sampling: Adjust sampling rates for production vs development
- Custom Evaluators: Create domain-specific evaluation metrics
- Dataset Curation: Maintain high-quality evaluation datasets
- Continuous Monitoring: Set up alerts for performance degradation
AgentOps provides production monitoring and observability for AI agents. The AgentOpsAdapter enables real-time monitoring and performance analytics.
# Install AgentOps
pip install agentops
# Configure API key
export AGENTOPS_API_KEY="your-api-key"from agentunit.adapters import AgentOpsAdapter
# Basic AgentOps configuration
config = {
"environment": "production",
"auto_start_session": True,
"capture_video": False
}
adapter = AgentOpsAdapter(config)# Advanced AgentOps configuration
advanced_config = {
"environment": "production",
"session_config": {
"auto_start_session": True,
"session_tags": ["agentunit", "multi-agent", "testing"],
"capture_video": True,
"capture_screenshots": True
},
"monitoring_config": {
"track_llm_calls": True,
"track_agent_actions": True,
"track_tools": True,
"track_errors": True
},
"analytics_config": {
"cost_tracking": True,
"performance_metrics": True,
"usage_analytics": True
},
"alerting_config": {
"error_threshold": 0.05,
"latency_threshold": 5000, # milliseconds
"cost_threshold": 100.0 # dollars
}
}
adapter = AgentOpsAdapter(advanced_config)# Create scenario with AgentOps monitoring
scenario = Scenario(
name="production_monitoring",
adapter=adapter,
dataset_source="production_cases.json"
)
# Run with full monitoring
result = await scenario.run()
# Access monitoring data
session_url = result.metadata.get("agentops_session_url")
print(f"View session: {session_url}")- Real-time Monitoring: Live dashboard with agent activity
- Video Capture: Visual recordings of agent interactions
- Cost Tracking: Detailed cost analysis and budgeting
- Error Detection: Automatic error detection and alerting
- Performance Analytics: Latency, throughput, and efficiency metrics
- Environment Tags: Use clear environment and session tags
- Video Capture: Enable for critical test scenarios
- Cost Monitoring: Set appropriate cost thresholds and alerts
- Error Tracking: Monitor error rates and implement auto-recovery
- Performance Optimization: Use analytics to identify bottlenecks
Wandb provides experiment tracking and model management. The WandbAdapter enables comprehensive experiment tracking for agent testing and evaluation.
# Install Wandb
pip install wandb
# Login to Wandb
wandb loginfrom agentunit.adapters import WandbAdapter
# Basic Wandb configuration
config = {
"project": "agentunit-experiments",
"entity": "your-team",
"job_type": "evaluation"
}
adapter = WandbAdapter(config)# Advanced Wandb configuration
advanced_config = {
"project": "multi-agent-research",
"entity": "ai-research-team",
"run_config": {
"job_type": "hyperparameter-sweep",
"group": "model-comparison",
"tags": ["multi-agent", "gpt-4", "evaluation"]
},
"logging_config": {
"log_frequency": 1,
"log_gradients": False,
"log_parameters": True,
"log_artifacts": True
},
"experiment_config": {
"save_code": True,
"monitor_system": True,
"track_env": True
},
"hyperparameters": {
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 1000,
"num_agents": 3
}
}
adapter = WandbAdapter(advanced_config)# Create scenario with Wandb tracking
scenario = Scenario(
name="experiment_tracking",
adapter=adapter,
dataset_source="research_dataset.json"
)
# Run experiment with full tracking
result = await scenario.run()
# Log custom metrics
wandb.log({
"success_rate": result.success_rate,
"avg_response_time": result.avg_response_time,
"total_cost": result.total_cost
})
# Save artifacts
wandb.save("results.json")
wandb.save("conversation_logs.txt")- Experiment Tracking: Complete experiment history and comparison
- Hyperparameter Sweeps: Automated hyperparameter optimization
- Artifact Management: Version control for datasets and models
- Collaborative Features: Team sharing and collaboration tools
- Rich Visualizations: Charts, tables, and custom visualizations
- Project Organization: Use clear project and experiment naming
- Hyperparameter Tracking: Log all relevant configuration parameters
- Artifact Versioning: Version datasets and model checkpoints
- Collaborative Workflows: Share experiments with team members
- Custom Metrics: Define domain-specific evaluation metrics
| Feature | AG2 | Swarm | LangSmith | AgentOps | Wandb |
|---|---|---|---|---|---|
| Multi-Agent | ✅ Advanced | ✅ Lightweight | ❌ Monitoring | ❌ Monitoring | ❌ Tracking |
| Observability | ✅ Advanced | ✅ Advanced | ✅ Experiments | ||
| Production Ready | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | |
| Cost Tracking | ❌ No | ❌ No | ✅ Advanced | ||
| Real-time Monitor | ❌ No | ❌ No | ✅ Yes | ✅ Yes | |
| Collaboration | ✅ Yes | ✅ Yes | ✅ Advanced |
- Development: Use AG2 or Swarm for agent development
- Testing: Use LangSmith for evaluation and debugging
- Production: Use AgentOps for monitoring and alerting
- Research: Use Wandb for experiment tracking
# Environment-specific configurations
configs = {
"development": {
"ag2": {"model": "gpt-3.5-turbo", "max_turns": 5},
"langsmith": {"trace_level": "DEBUG"}
},
"production": {
"ag2": {"model": "gpt-4", "max_turns": 10},
"agentops": {"capture_video": True},
"langsmith": {"trace_level": "INFO"}
}
}try:
result = await scenario.run()
except AdapterError as e:
# Handle adapter-specific errors
logger.error(f"Adapter error: {e}")
except TimeoutError as e:
# Handle timeout errors
logger.error(f"Scenario timeout: {e}")- Use connection pooling for multiple scenarios
- Implement caching for repeated operations
- Monitor resource usage across adapters
- Set appropriate timeouts and limits
This comprehensive guide ensures successful integration with all supported platforms while maximizing the benefits of each adapter's unique capabilities.