Skip to content

Add AIREV-Agent-0.8B v2: Sub-billion parameter model for BFCL V4#1319

Open
mk42-ai wants to merge 1 commit intoShishirPatil:mainfrom
mk42-ai:airev-agent-0.8b-v2
Open

Add AIREV-Agent-0.8B v2: Sub-billion parameter model for BFCL V4#1319
mk42-ai wants to merge 1 commit intoShishirPatil:mainfrom
mk42-ai:airev-agent-0.8b-v2

Conversation

@mk42-ai
Copy link
Copy Markdown

@mk42-ai mk42-ai commented Apr 3, 2026

Model

AIREV-Agent-0.8B — a 752M parameter model fine-tuned for agentic tool calling.

Training Pipeline

  1. SFT on 50K Claude Opus 4.6-generated BFCL-format samples with chain-of-thought reasoning
  2. AutoResearch — Karpathy-style automated hyperparameter discovery (112 experiments, 4 GPUs) found optimal GRPO config: lr=2e-6, 24 generations, temp=0.6, format_bonus=0.1
  3. GRPO with AutoResearch-optimized config on 43K clean training samples (14 hours, single H100)
  4. Targeted SFT on multi-turn, memory, and web_search categories using real BFCL function schemas

Evaluation

All 20 BFCL V4 categories evaluated. Results generated using transformers inference with temperature=0.6, chain-of-thought reasoning via tokens.

Prompt Mode

This model uses prompt-based function calling (not native FC mode). The BFCL system prompt with bracket format [func_name(params)] is used.

Hardware

Trained on a single NVIDIA H100 80GB GPU. Total training time: ~24 hours (SFT + GRPO + targeted SFT).

Model: airev-ai/AIREV-Agent-0.8B (0.8B params, Qwen3.5-0.8B base)
Training: SFT on 50K Claude Opus data + GRPO with AutoResearch-optimized hyperparameters
Architecture: Gated Delta Network (GDN), 262K context
License: Apache 2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant