Replies: 22 comments 20 replies
-
|
I'll try to respond later in more detail. But, one critical question for me is what we are trying to achieve with this repo? Is it merely a rich collection of how people have used Mesa, or should each example demonstrate some aspect of the Mesa library in greater detail than can be done by the curated examples in the core repo? Clarifying what we try to achieve with mesa-examples first will probably help answer some of the more detailed questions raised by @EwoutH. If, as @EwoutH also suggests, each example should demonstrate something genuinely useful about Mesa, then it is essential that the claimed contribution be clear. Moreover, it can be grounds to remove examples that merely duplicate rather than add. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
Great to see this formalized a lot of these pain points are exactly what I ran into building the four LLM examples (#360, #363, #372, #378). On example lifecycle The incubator → verified → showcase progression makes sense. One thing I'd add: for LLM examples specifically, "verified" is tricky because they require API keys and are non-deterministic. Maybe LLM examples need a slightly different verification standard e.g., structural checks (does it initialize, does the step loop complete with a mock) rather than output reproducibility. On README as mini-paper I started doing this naturally for the LLM examples each has a background section, model description, and key findings (e.g., LLM agents producing weaker segregation in Schelling, or flatter epidemic curves in SIR due to behavioral heterogeneity). Having a standardized template would have saved time and made them more consistent across examples. On CI for LLM examples The biggest gap I see is that LLM examples can't run in standard CI without API keys. One approach: a lightweight "smoke test" mode where models run with mock LLM responses, testing that the Mesa machinery works even if the LLM outputs are canned. This keeps CI useful without requiring secrets in the pipeline. On metadata YAML feels right it's already familiar in the Python ecosystem and readable without tooling. For required fields, I'd keep it minimal: title, authors, domain, mesa_version_min, complexity. Everything else optional. |
Beta Was this translation helpful? Give feedback.
-
|
Hey @EwoutH , My approach almost solves all the problems you've mentioned so far except for the Discoverability problem:
My approach includes:
Key Benefits of This Approach
here's my plan's workflow chart:-
|
Beta Was this translation helpful? Give feedback.
-
|
Following up on my earlier comment after reading quaquel's question and the responses so far, I want to engage more concretely with the open questions. On quaquel's question: what is mesa-examples for? I think the answer is both and the lifecycle system is precisely how to hold both without contradiction. The incubator tier is a rich collection: open to anyone, low bar, shows how people use Mesa in practice. Verified and above are the curated layer: editorially selected, demonstrating something genuinely useful about Mesa APIs or ABM methodology. The status makes the distinction visible without excluding anyone. The criterion for promotion from incubator → verified could be: does this example demonstrate a Mesa feature or modeling pattern not already covered by another verified example? That gives a concrete, reviewable bar and creates grounds to decline promotion (not rejection of the PR, just of the status upgrade) when there is genuine overlap. On where status labels live (responding to @EwoutH) Metadata file is the right answer it's the only persistent, version-controlled, machine-readable place. I'd propose a minimal title: LLM Schelling Segregation
authors:
- abhinavk0220
domain:
- social-dynamics
- segregation
complexity: intermediate # beginner / intermediate / advanced
mesa_version_min: "3.0"
status: incubator # incubator / verified / showcase / deprecated
owner: null # required for verified and above
llm_required: false # flag for examples needing API keysThe fields I'd make truly required: GitHub labels can mirror the metadata status as a view helpful for filtering PRs but the On CI-derived vs author-declared compatibility (responding to @quaquel) Author declares The key insight is that On CI for LLM examples Standard CI cannot run LLM examples without API keys, but the Mesa machinery can still be tested. A mock LLM responder that returns a fixed string lets you verify: does the model initialize correctly, does On preventing perfunctory peer reviews The review template should require the reviewer to complete a sentence: "I ran the model for N steps and observed [specific emergent behavior]." Without evidence of actually running it, the review is flagged as incomplete by the PR template checklist. This is low-friction (one sentence) but creates accountability you cannot fake having run a model without knowing what it does. For LLM examples, the bar adjusts: "I ran the model with [model name] for N steps and the agents produced responses consistent with [expected reasoning pattern]." On minimum viable README for incubator Two sections required: (1) what does this model do in one paragraph, (2) how to run it with exact commands. Everything else background, results, references required for verified, encouraged for incubator. This keeps the entry bar low while ensuring even the simplest submission is usable by a newcomer. On ownership commitment A realistic ownership commitment for verified examples: respond within two weeks when a CI-opened issue tags you, and either fix it or explicitly hand off. That is it no ongoing maintenance required beyond being reachable. If an owner goes silent for 30 days on a tagged issue, the example moves to a Happy to prototype the |
Beta Was this translation helpful? Give feedback.
-
|
From fixing PR #383 (aco_tsp path bug) and reviewing PR #382 (hex_snowflake visualization failure), I can confirm the silent breakage problem is real. Both examples failed for different reasons — one a path issue, one a removed API — and neither was visible without actually running them locally. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @EwoutH, thank you for writing this up so clearly — it maps almost exactly to the problems I have been thinking about while preparing my GSoC proposal for this project. |
Beta Was this translation helpful? Give feedback.
-
|
Coming at this from the peer review side, I've reviewed six PRs this week and a pattern kept coming up: reviewers reading the code but not running it. Two PRs had runtime-breaking bugs that were invisible from the diff alone but immediately obvious on execution. Agreeing with abhinavk0220 that requiring one sentence about observed behavior when running the model is the right fix. On the README-as-mini-paper idea: one concrete CI addition nobody mentioned, testing that README code snippets actually execute. PR #389 had a quick start snippet referencing three model attributes that don't exist in model.py. That's a whole class of breakage CI could catch automatically. One open question I'd love a steer on: for LLM examples, is mesa-llm abstractions or direct API calls the preferred pattern going forward? Happy to align my work to whatever direction makes most sense for the repo. |
Beta Was this translation helpful? Give feedback.
-
|
I was thinking a bit on the documentation, mini-paper and metadata. Ideally it's one thing, that's both easy to read and write for humans as for machines. Rather than maintaining a separate ---
title: LLM Schelling Segregation
authors:
- abhinavk0220
domain:
- social-dynamics
- segregation
complexity: intermediate
mesa_version_min: "3.0"
status: incubator
owner: null
keywords: [LLM, segregation, behavioral heterogeneity]
---
## Abstract
One-paragraph summary of what this model does and why it's interesting.
## Model Description
Agents, rules, space, parameters...
## How to Run
Exact commands to get it working.
## Results & Discussion
...This is a well-established pattern (Hugo, Jekyll, Quarto, Pandoc all use it), so contributors will likely recognize it and tooling already exists. One file means no drift between metadata and documentation. |
Beta Was this translation helpful? Give feedback.
-
|
This is a much cleaner design — one file, no drift, familiar pattern. The frontmatter approach also makes the CI validation script simpler: parse the frontmatter block, validate required fields for the declared status level, then treat everything below the --- as the human-readable documentation. No separate file to keep in sync. For CI purposes, should entry_point be derivable from convention (e.g., always run.py or app.py) rather than declared in frontmatter? That would reduce required fields further. |
Beta Was this translation helpful? Give feedback.
-
Hey @EwoutH I have a suggestion about the labels,Right now you are thinking of using this pattern :
This doesn't seem very user-friendly. It's kind of complex if we see it from a perspective of someone who is not involved with CS background. I think we should keep the things as simple as possible for the users, the pattern i suggest is this :
let me know if you think the same way. |
Beta Was this translation helpful? Give feedback.
-
|
The frontmatter approach makes sense from a contributor perspective. When I submitted PR #383, there was no standard structure — a single README with frontmatter would have made the contribution process clearer and reduced the chance of drift between metadata and docs. |
Beta Was this translation helpful? Give feedback.
-
|
Hey @EwoutH @quaquel while experimenting with my ideas i noticed that managing the dependencies of different model in CI can be messy. So i got a idea.
I have documented the experiment and workflow that I tested, and I’ll provide the link here so it’s easier to review the approach and results. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @EwoutH Thanks for notifying us about your vision on this, Im working on these pillars Automation and CI, Metadata & Discoverability, and Ownership & Graceful Degradation.
We can tie the CI, the CODEOWNERS file, and the metadata together to automate this. Lmk what you think about this, |
Beta Was this translation helpful? Give feedback.
-
|
@Nandha-kumar-S raises the core issue here — the goal of CI shouldn't be to enforce a shared environment, but to verify that each example works correctly within its own declared dependencies. The natural solution is a per-example This also makes the From fixing #383 and reviewing other examples locally, I think "runs for N steps without error" is the minimum viable CI check — that single gate would have caught most of the silent breakage I've seen. Convention over configuration for entry points (defaulting to Happy to prototype the matrix workflow — this is something I've been thinking through for my GSOC proposal on this project. |
Beta Was this translation helpful? Give feedback.
-
|
The per-example isolation point is exactly right — a matrix job where each example installs its own declared dependencies is the clean solution. It also makes the frontmatter mesa_version_min field meaningful: CI uses it as the floor for the test matrix, so you know the example passes on Mesa 3.x and you can track exactly where it breaks. |
Beta Was this translation helpful? Give feedback.
-
|
@aniketgit-hub101 the auto-generation idea is a good addition — it removes the last manual step for contributors. One thing worth thinking through: auto-generating from imports gives you direct dependencies, but not pinned versions. So CI would install latest by default, which brings back the version conflict problem unless we also pin at generation time. A practical approach: auto-generate the requirements.txt on first contribution with pinned versions from the contributor's environment (something like The frontmatter mesa_version_min then acts as a sanity check — if the auto-generated requirements.txt pins Mesa 3.x but mesa_version_min says 4.0, CI can flag the inconsistency automatically. |
Beta Was this translation helpful? Give feedback.
-
|
That's the right refinement — pinned versions from the contributor's environment solves the reproducibility problem cleanly. The pip freeze scoped to example imports approach is practical: contributors get a working lockfile automatically, and the rare case where they need to adjust it is explicit rather than hidden. |
Beta Was this translation helpful? Give feedback.
-
|
The single-file frontmatter approach is the right call less surface area for drift, familiar to anyone who has used Hugo/Jekyll, and the CI validation logic becomes trivially simple: parse frontmatter, validate required fields against status level, done. To make this concrete, here is what the frontmatter would look like for one of my existing LLM examples (#363): ---
title: LLM Schelling Segregation
authors:
- abhinavk0220
domain: [social-dynamics, segregation]
complexity: intermediate
mesa_version_min: "3.0"
status: incubator
owner: null
llm_required: true
entry_point: app.py
---On entry_point: convention vs. declared (responding to @aniketgit-hub101's question) Convention-first: CI looks for On review depth scaling with status level this is the open question nobody has answered concretely yet, and I have direct experience from the four LLM PRs I submitted (#360, #363, #372, #378).
The key insight is that the checklist scales with status level, not the process — same three-stage flow (self-review → peer → maintainer) at every tier, just with more boxes ticked as the bar rises. This keeps the contribution path predictable for everyone. On the On LLM examples specifically the |
Beta Was this translation helpful? Give feedback.
-
|
The convention-first entry_point resolution (app.py → run.py → declared) is the right call — it matches how contributors actually structure examples without requiring them to know the system exists. |
Beta Was this translation helpful? Give feedback.
-
|
Wanted to share a concrete output that connects back to this discussion. I built the Ratchet Effect model (@EwoutH's issue #249, open since March 2025) — PR #458. Remote work as the domain, demonstrating path dependency through asymmetric lock-in dynamics. The README uses the frontmatter schema you proposed here: ---
title: Ratchet Effect — Remote Work
authors:
- abhinavk0220
domain: [labor-economics, behavioral, social-dynamics]
complexity: intermediate
mesa_version_min: "3.0"
status: incubator
owner: abhinavk0220
llm_required: false
entry_point: app.py
---A few things I noticed while writing it: On the frontmatter in practice: the On On the Happy to use PR #458 as a guinea pig for iterating on the contribution process and metadata system if useful. |
Beta Was this translation helpful? Give feedback.
-
Update — Validator 2 (Declared Environment) prototype completeSince submitting my GSoC proposal on March 31, I've completed the prototype for Validator 2, which was the remaining open piece at the time of submission. Validator 2 now:
Both validators are now working end-to-end. You can find the updated prototype here: https://github.com/Tushar1733/mesa-examples/blob/main/scripts/declared_validate_examples.py You can comment down if any concern arises. @EwoutH @quaquel |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Explored with and final post written up by Claude 4.6 Opus.
What we're trying to do
Mesa-examples has been neglected since the core examples moved to the main repo. Examples break silently, documentation quality varies wildly, and new users struggle to find what they need. We want to turn mesa-examples into a well-maintained, discoverable, and contributor-friendly collection that stays healthy as Mesa evolves.
Goals
Key directions
Example lifecycle and status
It might be useful to every having an explicit status that reflect its maturity: something like incubator (works but not yet polished), verified (reviewed, documented, actively maintained), and showcase (editorially selected as exemplary). Plus deprecated for examples that are no longer maintained. Status makes quality visible to users and sets clear expectations for contributors.
Open questions:
Ownership
Every verified example could have an explicit owner: the person responsible for keeping it healthy. Not doing all the work, but responding when something needs attention. Incubator examples wouldn't require an owner, keeping the barrier to entry low. When an owner steps away, there should be a clear process: flag it, seek adoption, demote if nobody picks it up.
Open questions:
A structured contribution process
A structured contribution process could help a more gently way towards high-quality examples. This could be like three review stages for PRs: author self-review (working through a checklist, demonstrating understanding), peer/collaborator review (running the model, exploring behavior, asking substantive questions), and maintainer approval (confirming the process was followed, making the editorial call). This distributes load and builds community skills — especially important in the era of AI-generated code, where self-review is how contributors demonstrate they understand what they're submitting. See the review guidelines for an initial draft policy.
Open questions:
README as mini-paper
Example READMEs could follow a structure inspired by academic papers: abstract, background, model description (agents, rules, space, parameters), how to run, results and discussion, and references. This serves multiple audiences: users browsing the gallery, learners working through an example, contributors using it as a template, and academics wanting something citable. The completeness of these sections can scale with example status.
Open questions:
Metadata
Each example could carry a small metadata file enabling machine-readable discoverability and automated validation. We've been converging on fields like title, abstract, authors, domain, complexity, keywords, and Mesa version compatibility. The guiding principle: require only what you must, automate what you can, and never let information go silently stale.
Open questions:
Automation and CI
Automated validation could be a way how we keep the collection healthy without constant maintainer attention. This means CI on PRs (does it run, is metadata valid), scheduled CI against current Mesa (catch breakage early), and pre-release testing against Mesa release candidates. When something breaks, it should become a visible, tracked issue — not something hiding in a log.
Open questions:
What we're looking for in proposals
A strong proposal doesn't need to address all of the above. It should pick a coherent subset, demonstrate understanding of the tradeoffs involved, and show concrete thinking about implementation. We value proposals that are honest about what's hard and what they don't know, over ones that present everything as solved.
We're particularly interested in:
Beta Was this translation helpful? Give feedback.
All reactions