Time: 45 minutes | Difficulty: Intermediate
Reflection enables agents to evaluate their own outputs, identify issues, and iteratively improve results. This meta-cognitive capability is key to building high-quality, self-correcting AI systems.
By the end of this tutorial, you'll be able to:
- Implement reflection loops for self-evaluation
- Build agents that critique their own work
- Use iterative refinement to improve outputs
- Define quality criteria for different tasks
- Apply reflection to code, writing, and decisions
- Combine reflection with other patterns
- Understand when reflection adds value vs overhead
We'll implement reflection agents that:
- Generate - Create initial output
- Reflect - Evaluate quality and identify issues
- Refine - Improve based on reflection
- Iterate - Repeat until quality threshold met
- Compare - Show before/after improvements
Make sure you have:
- Completed Tutorial 9: Plan-and-Execute
- Understanding of quality assessment
- PHP 8.1+ installed
- Claude PHP SDK configured
Reflection is the ability to examine and evaluate one's own outputs, thoughts, and processes. In AI agents, reflection enables:
- Self-evaluation - Assess quality of outputs
- Error detection - Find mistakes and issues
- Iterative improvement - Refine through multiple passes
- Learning - Understand what works and what doesn't
Without Reflection:
Task: Write a function to reverse a string
Output: function reverse($s) { return strrev($s); }
Done!
With Reflection:
Task: Write a function to reverse a string
Generate:
function reverse($s) { return strrev($s); }
Reflect:
- Uses built-in function (good)
- No input validation (issue)
- No documentation (issue)
- No edge case handling (issue)
Refine:
/**
* Reverses a string safely
* @param string|null $s Input string
* @return string Reversed string
*/
function reverse(?string $s): string {
if ($s === null || $s === '') {
return '';
}
return strrev($s);
}
Better!
The core pattern:
$output = generate($task);
for ($iteration = 1; $iteration <= $maxIterations; $iteration++) {
$reflection = reflect($output, $criteria);
$score = extractScore($reflection);
if ($score >= $qualityThreshold) {
echo "Quality threshold reached!\n";
break;
}
$issues = extractIssues($reflection);
$output = refine($output, $issues);
}
return $output;Define what "good" means for your task:
$criteria = [
'correctness' => [
'weight' => 0.4,
'description' => 'Is the solution correct and accurate?'
],
'completeness' => [
'weight' => 0.3,
'description' => 'Are all requirements addressed?'
],
'clarity' => [
'weight' => 0.2,
'description' => 'Is it easy to understand?'
],
'efficiency' => [
'weight' => 0.1,
'description' => 'Is it reasonably optimal?'
]
];Different types of reflection questions:
Quality Assessment:
"Evaluate this output on a scale of 1-10 for:
- Correctness (1-10)
- Completeness (1-10)
- Clarity (1-10)
Overall score and reasoning?"
Issue Identification:
"Review this carefully and identify:
1. Errors or mistakes
2. Missing information
3. Unclear explanations
4. Potential improvements"
Comparative Analysis:
"Compare this output to best practices:
- What aligns with standards?
- What deviates from best practices?
- What could be better?"
Fix specific issues:
$refinementPrompt = "Improve this output by:\n";
foreach ($issues as $issue) {
$refinementPrompt .= "- {$issue['type']}: {$issue['description']}\n";
}
$refinementPrompt .= "\nOriginal output:\n{$output}";function reflectAndRefine($client, $task, $initialOutput, $maxIterations = 3) {
$output = $initialOutput;
$history = [];
for ($i = 0; $i < $maxIterations; $i++) {
echo "Iteration " . ($i + 1) . "\n";
echo str_repeat("-", 60) . "\n";
// Reflect
$reflectionPrompt = "Task: {$task}\n\n" .
"Current output:\n{$output}\n\n" .
"Evaluate this output:\n" .
"1. What's working well?\n" .
"2. What issues exist?\n" .
"3. How can it be improved?\n" .
"4. Overall quality score (1-10)";
$reflection = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024,
'messages' => [[
'role' => 'user',
'content' => $reflectionPrompt
]]
]);
$reflectionText = extractTextContent($reflection);
echo "Reflection:\n{$reflectionText}\n\n";
// Extract score
preg_match('/(?:score|quality)[:\s]+(\d+)/i', $reflectionText, $matches);
$score = isset($matches[1]) ? (int)$matches[1] : 5;
$history[] = [
'iteration' => $i + 1,
'output' => $output,
'reflection' => $reflectionText,
'score' => $score
];
if ($score >= 9) {
echo "Quality threshold reached (score: {$score}/10)!\n";
break;
}
// Refine
$refinementPrompt = "Task: {$task}\n\n" .
"Current output:\n{$output}\n\n" .
"Reflection:\n{$reflectionText}\n\n" .
"Improve the output based on the reflection. " .
"Address the identified issues.";
$refined = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => $refinementPrompt
]]
]);
$output = extractTextContent($refined);
echo "Refined output:\n{$output}\n\n";
}
return ['final_output' => $output, 'history' => $history];
}function generateCodeWithReflection($client, $requirement) {
// Generate
$code = generateCode($client, $requirement);
// Reflect
$review = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024,
'system' => 'You are an expert code reviewer.',
'messages' => [[
'role' => 'user',
'content' => "Review this code:\n\n{$code}\n\n" .
"Check for:\n" .
"- Security issues\n" .
"- Performance problems\n" .
"- Code quality\n" .
"- Best practices\n" .
"- Edge cases"
]]
]);
// Refine if issues found
$reviewText = extractTextContent($review);
if (containsIssues($reviewText)) {
$improved = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => "Original code:\n{$code}\n\n" .
"Review:\n{$reviewText}\n\n" .
"Fix the identified issues."
]]
]);
$code = extractTextContent($improved);
}
return $code;
}function writeEssayWithReflection($client, $topic, $iterations = 3) {
// Initial draft
$essay = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => "Write a short essay about: {$topic}"
]]
]);
$draft = extractTextContent($essay);
// Iterative refinement
for ($i = 0; $i < $iterations; $i++) {
// Critique
$critique = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024,
'system' => 'You are a writing instructor.',
'messages' => [[
'role' => 'user',
'content' => "Critique this essay:\n\n{$draft}\n\n" .
"Evaluate:\n" .
"- Argument strength\n" .
"- Evidence quality\n" .
"- Structure and flow\n" .
"- Clarity and style\n" .
"- Specific improvements needed"
]]
]);
$feedback = extractTextContent($critique);
// Revise
$revision = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => "Essay:\n{$draft}\n\n" .
"Feedback:\n{$feedback}\n\n" .
"Revise the essay to address the feedback."
]]
]);
$draft = extractTextContent($revision);
}
return $draft;
}function makeDecisionWithReflection($client, $question, $options) {
// Initial decision
$optionsList = implode("\n", array_map(
fn($o) => "- {$o}",
$options
));
$decision = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024,
'messages' => [[
'role' => 'user',
'content' => "Decision: {$question}\n\n" .
"Options:\n{$optionsList}\n\n" .
"Make a recommendation with reasoning."
]]
]);
$recommendation = extractTextContent($decision);
// Reflect on decision
$reflection = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1536,
'system' => "You are a devil's advocate. Question decisions critically.",
'messages' => [[
'role' => 'user',
'content' => "Decision question: {$question}\n\n" .
"Recommendation: {$recommendation}\n\n" .
"Analyze:\n" .
"- What are the risks?\n" .
"- What was overlooked?\n" .
"- What are alternative views?\n" .
"- Could this backfire?"
]]
]);
$critique = extractTextContent($reflection);
// Revise decision
$final = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1536,
'messages' => [[
'role' => 'user',
'content' => "Original recommendation: {$recommendation}\n\n" .
"Critical analysis: {$critique}\n\n" .
"Provide a final, balanced recommendation " .
"that addresses the critiques."
]]
]);
return extractTextContent($final);
}Evaluate different dimensions separately:
function multiAspectReflection($client, $output) {
$aspects = [
'technical' => 'Evaluate technical correctness and accuracy',
'clarity' => 'Evaluate clarity and understandability',
'completeness' => 'Evaluate whether all parts are addressed',
'style' => 'Evaluate adherence to style guidelines'
];
$scores = [];
foreach ($aspects as $aspect => $criteria) {
$evaluation = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 512,
'messages' => [[
'role' => 'user',
'content' => "Output: {$output}\n\n{$criteria}\n\nScore 1-10:"
]]
]);
$text = extractTextContent($evaluation);
preg_match('/(\d+)/', $text, $matches);
$scores[$aspect] = isset($matches[1]) ? (int)$matches[1] : 5;
}
return $scores;
}Generate multiple variants and compare:
function comparativeReflection($client, $task) {
// Generate 3 variants
$variants = [];
for ($i = 0; $i < 3; $i++) {
$response = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024,
'messages' => [['role' => 'user', 'content' => $task]]
]);
$variants[] = extractTextContent($response);
}
// Compare variants
$comparison = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => "Task: {$task}\n\n" .
"Variant 1:\n{$variants[0]}\n\n" .
"Variant 2:\n{$variants[1]}\n\n" .
"Variant 3:\n{$variants[2]}\n\n" .
"Compare these variants. " .
"Which is best and why? " .
"How can the best be improved further?"
]]
]);
$analysis = extractTextContent($comparison);
// Synthesize best version
$best = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 2048,
'messages' => [[
'role' => 'user',
'content' => "Analysis: {$analysis}\n\n" .
"Create the best possible version " .
"incorporating insights from all variants."
]]
]);
return extractTextContent($best);
}Increase critique depth each iteration:
function deepReflection($client, $output, $task) {
$levels = [
1 => 'Quick surface-level review',
2 => 'Detailed analysis of key aspects',
3 => 'Expert-level deep critique'
];
$refined = $output;
foreach ($levels as $level => $instruction) {
$reflection = $client->messages()->create([
'model' => 'claude-sonnet-4-5',
'max_tokens' => 1024 * $level,
'messages' => [[
'role' => 'user',
'content' => "Task: {$task}\n\n" .
"Output: {$refined}\n\n" .
"Level {$level} review: {$instruction}\n\n" .
"Provide critique and improved version."
]]
]);
$refined = extractTextContent($reflection);
}
return $refined;
}$thresholds = [
'minimum' => 6, // Below this = major issues
'acceptable' => 7, // Okay to use
'good' => 8, // High quality
'excellent' => 9 // Outstanding
];$config = [
'max_iterations' => 3, // Hard limit
'target_score' => 8, // Stop if reached
'min_improvement' => 0.5, // Stop if progress stalls
'timeout_seconds' => 300 // Time limit
];function managedReflection($client, $task, $budget) {
$cost = 0;
$iterations = 0;
$output = generate($client, $task);
while ($cost < $budget && $iterations < 5) {
$reflection = reflect($client, $output);
$cost += estimateCost($reflection);
if ($cost >= $budget) {
break;
}
$output = refine($client, $output, $reflection);
$cost += estimateCost($output);
$iterations++;
}
return ['output' => $output, 'cost' => $cost, 'iterations' => $iterations];
}// Generate API design
// Reflect on consistency, REST principles, security
// Refine based on best practices// Generate test cases
// Reflect on coverage, edge cases, maintainability
// Add missing tests// Write documentation
// Reflect on clarity, completeness, examples
// Improve based on feedback// Write query
// Reflect on performance, indexes, complexity
// Optimize based on analysisGood Use Cases:
✅ Code quality is critical (production systems) ✅ Output will be used by others (documentation, APIs) ✅ Errors are costly (financial, safety-critical) ✅ Learning/improvement over time is valuable ✅ Multiple quality dimensions matter
Poor Use Cases:
❌ Simple, straightforward tasks ❌ First-pass exploratory work ❌ Time/cost very constrained ❌ Output is temporary/disposable ❌ Quality bar is low
Track improvement metrics:
$metrics = [
'initial_score' => 6,
'final_score' => 9,
'improvement' => 3,
'iterations' => 2,
'time_seconds' => 45,
'cost_dollars' => 0.08,
'value_gained' => 'high'
];Before moving on, make sure you understand:
- Generate-Reflect-Refine loop structure
- How to define quality criteria
- Different types of reflection prompts
- Targeted refinement techniques
- When reflection adds value vs overhead
- How to set iteration limits and thresholds
- Multi-aspect and comparative reflection
- Cost-benefit trade-offs
You've mastered Reflection and Self-Critique! But what if we need multiple specialized agents working together?
Tutorial 11: Hierarchical Agents →
Learn how to build master-worker agent hierarchies for complex tasks!
Run the complete working example:
php tutorials/10-reflection/reflection_agent.phpThe script demonstrates:
- ✅ Generate-Reflect-Refine loops
- ✅ Quality assessment with scoring
- ✅ Iterative improvement cycles
- ✅ Code review with reflection
- ✅ Convergence detection
- ✅ Multi-round refinement
- Reflection improves quality - Self-evaluation catches issues early
- Iterate to perfection - Multiple rounds often better than one pass
- Define clear criteria - Know what "good" looks like
- Target improvements - Fix specific issues, don't regenerate blindly
- Balance cost vs quality - More iterations = better output but higher cost
- Combine with other patterns - Reflection + ReAct, Reflection + Planning
- Not always needed - Simple tasks don't benefit from reflection
- Measure improvement - Track before/after to validate value
- Reflexion: Language Agents with Verbal Reinforcement Learning - Shinn et al., 2023
- Self-Refine: Iterative Refinement with Self-Feedback - Madaan et al., 2023
- Constitutional AI - Bai et al., 2022
- Tutorial 5: Advanced ReAct - Combines reflection with ReAct
- Tutorial 8: Tree of Thoughts - Explores alternatives
- Tutorial 9: Plan-and-Execute - Systematic execution
Try implementing reflection for:
- Code Review - Generate, review security/performance, improve
- Writing - Draft → Critique → Revise (3 rounds)
- Design - Propose solution → Challenge assumptions → Refine
- Testing - Generate tests → Check coverage → Add missing cases
Issue: Reflection doesn't improve output
- Solution: Make criteria more specific, provide examples of good/bad
Issue: Too many iterations without convergence
- Solution: Set stricter thresholds, limit iterations, check criteria validity
Issue: High cost for marginal improvement
- Solution: Reduce iterations, increase score threshold, use cheaper model
Issue: Reflection identifies same issues repeatedly
- Solution: Be more explicit in refinement prompts, provide templates