docs: add scientific references to blog posts

Your Name · Your Name · commit 37f47e3f1c64 · 2025-11-24T20:41:03.000-03:00
diff --git a/docs/_posts/2025-11-14-context-engineering-for-real-codebases.md b/docs/_posts/2025-11-14-context-engineering-for-real-codebases.md
@@ -165,7 +165,7 @@ Chuchu's multi-agent architecture is designed around this principle:
 - Routes to appropriate specialized agent
 
 **Query Agent** (reasoning model)
-- Research and codebase analysis
+- Research and codebase analysis[^1]
 - Reads files, searches patterns
 - Compacts findings into structured output
 - Fresh context for each analysis
@@ -322,4 +322,10 @@ But the foundation is always the same: **manage your context window like your pr
 
 ---
 
+## References
+
+[^1]: Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. *NeurIPS 2020*. https://arxiv.org/abs/2005.11401
+
+---
+
 *Have questions about context engineering? Join the discussion in [GitHub Discussions](https://github.com/jadercorrea/chuchu/discussions)*
diff --git a/docs/_posts/2025-11-19-model-performance-benchmarks.md b/docs/_posts/2025-11-19-model-performance-benchmarks.md
@@ -11,7 +11,8 @@ tags: [benchmarks, performance, models, comparison]
 
 *Updated January 2025*
 
-**Important**: AI models evolve rapidly.
+**Important**: AI models evolve rapidly. Benchmark your models using established coding benchmarks like HumanEval[^1], SWE-Bench[^2], and LiveCodeBench[^3].
+
 1. Testing models with your specific workload
 2. Checking [Groq configurations]({% post_url 2025-11-15-groq-optimal-configs %}) for current recommendations
 3. Exploring [OpenRouter guide]({% post_url 2025-11-16-openrouter-multi-provider %}) for latest models
@@ -102,3 +103,11 @@ chu models search --agent editor openrouter
 ```
 
 See our [detailed configuration guides]({% post_url 2025-11-15-groq-optimal-configs %}) for setup instructions and cost breakdowns.
+
+## References
+
+[^1]: Chen, M., Tworek, J., Jun, H., Yuan, Q., et al. (2021). Evaluating large language models trained on code. *arXiv preprint arXiv:2107.03374*. https://arxiv.org/abs/2107.03374
+
+[^2]: Jimenez, C. E., Yang, J., Wettig, A., et al. (2024). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? *ICLR 2024*. https://arxiv.org/abs/2310.06770
+
+[^3]: Jain, N., Han, K., Gu, A., et al. (2024). LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code. *arXiv preprint arXiv:2403.07974*. https://arxiv.org/abs/2403.07974
diff --git a/docs/_posts/2025-11-20-advanced-context-management.md b/docs/_posts/2025-11-20-advanced-context-management.md
@@ -13,7 +13,7 @@ One of the biggest challenges in AI coding is the **Context Window**.
 
 ## How Chuchu Manages Context
 
-Chuchu uses **Retrieval-Augmented Generation (RAG)** to fetch only relevant information:
+Chuchu uses **Retrieval-Augmented Generation (RAG)**[^1] to fetch only relevant information:
 
 1.  **Project Map**: The `project_map` tool generates a tree-like view of your project structure in ~500 tokens, giving the model a "mental map" of where things are.
 
@@ -91,6 +91,10 @@ Each command starts with fresh context, preventing pollution.
 
 **Adaptive context**: Dynamic context window management based on task complexity and available token budget.
 
+## References
+
+[^1]: Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. *NeurIPS 2020*. https://arxiv.org/abs/2005.11401
+
 ## Related Posts
 
 - [Context Engineering for Real Codebases]({% post_url 2025-11-14-context-engineering-for-real-codebases %})
diff --git a/docs/_posts/2025-11-22-ml-powered-intelligence.md b/docs/_posts/2025-11-22-ml-powered-intelligence.md
@@ -84,7 +84,7 @@ User Input
     ↓
 TF-IDF Vectorization (1-3 grams)
     ↓
-Logistic Regression
+Logistic Regression[^1]
     ↓
 Confidence Score
     ↓
@@ -292,7 +292,7 @@ $ chu chat
 Pure ML would be faster but less accurate.
 Pure LLM would be more accurate but slower and expensive.
 
-**Hybrid ML + LLM** gives you the best of both worlds:
+**Hybrid ML + LLM**[^2] gives you the best of both worlds:
 - Fast path for confident decisions (80-90% of requests)
 - Smart fallback for edge cases
 - Configurable balance between speed and accuracy
@@ -355,6 +355,12 @@ But the foundation is here today: fast, cheap, accurate routing powered by embed
 
 *Have questions about the ML system? Check out the [full documentation](../ml-features) or ask in [GitHub Discussions](https://github.com/jadercorrea/chuchu/discussions)!*
 
+## References
+
+[^1]: Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. *Journal of Machine Learning Research*, 9(Aug), 1871-1874. https://www.jmlr.org/papers/v9/fan08a.html
+
+[^2]: Teerapittayanon, S., McDanel, B., & Kung, H. T. (2016). BranchyNet: Fast inference via early exiting from deep neural networks. *ICPR 2016*. https://arxiv.org/abs/1709.01686
+
 ## See Also
 
 - [Full ML Features Documentation](../ml-features) - Technical deep dive
diff --git a/docs/_posts/2025-11-23-future-of-ai-pair-programming.md b/docs/_posts/2025-11-23-future-of-ai-pair-programming.md
@@ -38,9 +38,15 @@ Imagine this workflow:
 
 ## Chuchu's Roadmap
 
-We are building towards Phase 3.
+We are building towards Phase 3, inspired by recent advances in multi-agent systems[^1][^2].
 -   **Memory**: Long-term memory of your coding style and architectural decisions.
 -   **Proactivity**: Agents that run in the background, running tests and fixing lint errors before you even see them.
 -   **Collaboration**: Agents that can comment on PRs and discuss architecture with other agents.
 
 The goal is not to replace the developer, but to elevate them. You become the **Architect**, and AI becomes your **Engineering Team**.
+
+## References
+
+[^1]: Qian, C., Cong, X., Yang, C., et al. (2023). Communicative Agents for Software Development. *arXiv preprint arXiv:2307.07924*. https://arxiv.org/abs/2307.07924
+
+[^2]: Hong, S., Zheng, X., Chen, J., et al. (2023). MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. *arXiv preprint arXiv:2308.00352*. https://arxiv.org/abs/2308.00352
diff --git a/docs/_posts/2025-11-24-complete-workflow-guide.md b/docs/_posts/2025-11-24-complete-workflow-guide.md
@@ -28,10 +28,10 @@ Traditional AI coding assistants give you code immediately. Sometimes that works
 ❌ No incremental verification  
 ❌ No way to course-correct  
 
-Chuchu's workflow solves this:
+Chuchu's workflow[^1] solves this:
 
 ✅ Research phase builds context  
-✅ Planning ensures coherent approach  
+✅ Planning ensures coherent approach[^2]  
 ✅ Implementation is incremental and verified  
 ✅ You control the pace (interactive or autonomous)
 
@@ -240,4 +240,12 @@ Implementation itself works for any language (LLM-based), but build/test verific
 
 ---
 
+## References
+
+[^1]: Beck, K. (2003). *Test-Driven Development: By Example*. Addison-Wesley Professional. ISBN: 978-0321146533
+
+[^2]: Fowler, M. (2018). *Refactoring: Improving the Design of Existing Code* (2nd ed.). Addison-Wesley Professional. ISBN: 978-0134757599
+
+---
+
 **Questions or issues?** [Open an issue on GitHub](https://github.com/jadercorrea/chuchu/issues)