Performance optimization by Julian-Patzner · Pull Request #72 · IMMIDD/GEMS

Julian-Patzner · 2026-03-20T12:20:00Z

Refactor: Major Performance, Memory, and Parallelization Optimizations

Overview

This PR introduces a massive suite of performance optimizations aimed at significantly reducing memory allocations, eliminating lock contention during multi-threaded execution, ensuring type stability, and accelerating overall simulation and post-processing runtimes.

The core changes include replacing locked logging with lock-free thread-local buffers, transitioning to mutating functions for contact sampling, extensively rewriting the post-processing and analysis pipelines to use in-place operations, and introducing an optimized "dormant state" mechanism for simulations that reach a steady state early, allowing them to bypass heavy epidemiological loops while safely processing future triggers.

Key Changes & Features

1. Simulation Acceleration (Dormant State Optimization)

This PR introduces a highly optimized "Dormant State" to dramatically accelerate the simulation during the tail-end of a pandemic, bypassing heavy calculations when the disease has died out.

Steady-State Detection (is_dormant): The simulation actively monitors for a complete mathematical freeze. It checks that the macroscopic disease state is clear (0 exposed, 0 infectious, 0 quarantined) and peeks at the event_queue and tick_triggers to guarantee no external forces are scheduled to act on the current tick.
Dormant Stepping (step!): When the simulation detects it is asleep, the step! function instantly bypasses the computationally heavy $O(N)$ loops. It maintains the normal progression of the simulation timeline, guaranteeing that all scheduled events, background processes, and user-defined hooks (such as stepmod and custom loggers) execute uninterrupted.
$O(1)$ State Duplication (copy_last_log_state): During a dormant tick, the expensive $O(N)$ log_stepinfo population scan is entirely skipped. Instead, the simulation instantly duplicates the previous tick's state into the loggers in $O(1)$ time.
Data Integrity & Wake-Up: The system guarantees zero data loss. The moment a scheduled event or trigger is set to fire (a newly imported case), is_dormant naturally evaluates to false for that exact tick. The simulation automatically resumes the full physics loops to accurately capture the state change.

2. Lock-Free Parallel Execution & Logging

All event loggers (InfectionLogger, VaccinationLogger, DeathLogger, TestLogger, PoolTestLogger, SeroprevalenceLogger, QuarantineLogger, StateLogger) have been entirely rewritten to eliminate ReentrantLock bottlenecks during multi-threaded execution.

Thread-Local Storage: Data is now pushed directly to thread-specific vectors (Vector{Vector{T}} sized to Threads.maxthreadid()).
Atomic Counters: Threads.Atomic is now used to safely generate unique IDs across threads without blocking.
Data Aggregation: The dataframe() and save() functions now dynamically flatten (vcat) the thread-local arrays upon extraction.
Atomic Setting States: Replaced isactive::Bool and explicit locks with isactive::Threads.Atomic{Bool} on all Setting structs.

3. Zero-Allocation Simulation Loops & Structs

Thread-Local Buffers: Added present_buffers and contact_buffers to the Simulation struct. These are pre-allocated, thread-specific vectors used during the infection spread phase to prevent continuous array allocations.
Mutating Core Methods: Introduced sample_contacts!, individuals!, and present_individuals! which accept a pre-allocated vector to mutate rather than returning newly allocated arrays. Backwards-compatible non-mutating wrappers have been retained.
Manual Cumulative Sums: Replaced Categorical distribution allocations in AgeBasedProgressionAssignment with a fast, manual cumulative sum over the stratification matrix that avoids heap allocations.
Strict Disease Progression Typing: Refactored the DiseaseProgression constructor to require strictly typed Int16 positional arguments, ensuring type stability right at the allocation boundary during the infect! phase.

4. Fast Post-Processing & Analysis Refactoring

In-Place DataFrame Ops: Post-processing methods now extensively use mutating operations (leftjoin!, transform!, select!, rename!) and views (subset with view=true) to calculate SI, effective R, and demographics without repeatedly allocating large DataFrames.
Fast Contact Surveys: Refactored contact_samples to use pre-allocated column vectors instead of dynamically appending rows to a DataFrame.
Precomputed Relationships: household_attack_rates now uses standard vectors and pre-calculated dictionaries for parent->child relationships, removing an expensive nested DataFrame loop. r0_per_county now precomputes secondary cases in a single pass.
Binary Search & Views: Optimized rolling_observed_SI to use binary search (searchsortedlast, searchsortedfirst) and array views (@view) instead of sequential lookups (findfirst).

5. Type Stability and Dynamic Dispatch Removal

Parameterization of Settings: Replaced dynamic settingtype::DataType arguments with parameterized settingtype::Type{T} where {T <: Setting} across analysis functions (setting_age_contacts, contact_samples). Inner simulation loops now explicitly cast retrieved settings (e.g., s = raw_s::settingtype) to enforce type stability and eliminate dynamic dispatch overhead.
Concrete RNGs: Replaced generic AbstractRNG arguments with concrete Xoshiro references (and added DEFAULT_GEMS_RNG) across all transmission, progression, and initialization functions.
Function Barriers: Introduced internal function barriers (e.g., _get_size, _delete_dangling_for_type!) to handle abstract vector iteration with type-stable, concrete property access.
Static Setting Tuples: Added settings_tuple(individual) to replace dynamic settings() calls. It returns a statically typed tuple of setting pairs (Type, ID), heavily speeding up iteration over an individual's active settings.

6. Initialization & Memory Footprint Improvements

Optimized Settings Creation: Rewrote settings_from_population using exact pre-allocation (construct_and_add_settings!) and Counting Sort, sorting and grouping populations into settings exponentially faster.
Fast Population Parsing: Refactored the Population(df::DataFrame) constructor to extract DataFrame columns into a strongly-typed NamedTuple before the loop, bypassing slow row-by-row DataFrame indexing.
Precomputed AGS: Added precompute_ags! to calculate and statically assign the AGS (Amtlicher Gemeindeschlüssel) field for all ContainerSettings during initialization, preventing recursive dynamic lookups during runtime.
Comorbidities Bitmask: Replaced the heap-allocated comorbidities::Vector{Bool} in the Individual struct with a primitive UInt16 bitmask. This eliminates a vector allocation for every single agent, reducing the simulation's baseline memory footprint and improving cache locality.

Unit-Testing

All tests updated to handle flattened thread-local logger vectors.
Added specific test suites for settings_tuple outputs, Buffer-aware contact sampling, and recursive setting activation.
Added extensive testing for the Simulation Acceleration module, ensuring is_dormant accurately evaluates state, logger updates correctly trip the wake-up triggers, and catch_up_logs! accurately backfills skipped ticks.

… settings

…replaced Random.default_rng() with const Xoshiro

…dy state

…, barrier function for dynamic dispatching in spread_infection!

…basically the same compared to single threaded post processing

…ty-populations in contact_survey.jl

JohannesPonge

What an update! Great work, Julian! I have added some comments and suggestions that went through my head reviewing the updates. Half of it can probably be removed after there's the PR description. My major points are:

can we still use RNGs other than Xoshiro?
Does the user still get the keyword-based interfaces for all places where you changed the default function signatures? (usability is extremely important)
Better too many/ too detailed docstrings than none or too little info
The fast forwarding is smart, but I would put it into a dedicated StopCriterion
The ByRows used to have this nasty property that they loop over entries and don't benefit from vectorization. If that hasn't changed, I'd recommend reverting this to vector-based dataframe manipulation.
One more overall comment: When you did your performance optimization, did you also check this for single-CPU run time? I just want to make sure that the code is not getting slower for people who don't use threaded Julia. This concern is related to the previous item. Parallelizing a loop instead of broadcasting may increase performance with enough cores but decrease performance in single-core applications.

Thanks!

JohannesPonge · 2026-03-29T00:17:59Z


 """
-    sample_contacts(contact_sampling_method::ContactSamplingMethod, setting::Setting, individual::Individual, tick::Int16; rng::AbstractRNG = Random.default_rng())::ErrorException
+    sample_contacts!(indivs::Vector{Individual}, contact_sampling_method::ContactSamplingMethod, setting::Setting, individual_index::Int, present_inds::Vector{Individual}, tick::Int16, replace::Bool, rng::Xoshiro)::ErrorException


what do all the added parameters do? The docstrings are missing that. Also, this is a public interface for external users to extend the framework. Are all these additional arguments really necessary? It's adding a lot more complexity if people would like to add their own contact sampling mechanics. We usually tried to keep these things as simple as possible. If we really need this added complexity, would it be possible to put it into a wrapper (or keyword argument), so the user doesn't have to deal with all these cryptic arguments?

this comment applies to all contact sampling functions now.

Also, I saw that the rng types are now Xoshiro by default. What if a user would like to use different rng? Would that still be possible?

I specified Xoshiro because abstract parameters and keyword arguments lead to a lot heap allocations (which the gc has to clean up). If you say this is important, i will think of a solution

We could parameterize the Simulation struct (e.g., Simulation{R <: AbstractRNG}) to support multiple RNGs without losing performance, but it feels unnecessary right now. We can easily extend it later if we ever actually need to swap it out.

JohannesPonge · 2026-03-29T00:25:44Z

    offset = gems_rand(rng, 1:length(present_inds)-1)
    contact_index = mod(individual_index + offset - 1, length(present_inds)) + 1
-    return [present_inds[contact_index]]
+    push!(indivs, present_inds[contact_index])


does that mean that the user has to handle the indivs vector content? Also, is there a tutorial on the new structure, or has the old tutorial been updated? Or didn't we have one yet?

JohannesPonge · 2026-03-29T17:21:52Z


        # Add all setting types to the container
-        add_types!(cntnr, [s for s in prov_settingtypes if s in concrete_subtypes(Setting)])
+        add_types!(cntnr, [s for s in prov_settingtypes if s <: Setting && isconcretetype(s)])


I recommend using is_subtype(type::Symbol, parent::DataType). We have that function in the utils.jl and it also makes sure that it doesn't break if GEMS is used as a dependency, because that sometimes causes types to not be Household but GEMS.Household and thus breaking some of these subtype functions. I have spent hours tracking down these problems.

This problem comes from using strings here, which also causes a lot of gc time because strings are heap allocated. The issue you are describing can not happen here, as "s" does not eval to a string nor a symbol. For example, "isconcretetype(eval(GEMS.Household))" returns "true". I changed "prov_settingtypes" to be declared as "DataType[]" to make this more obvious

…e "Setting"

…tate. During this state, a lightweight step function is executed, and the simulation automatically wakes up and backfills logs when an epidemiological change occurs.

…or wake up

Julian-Patzner and others added 30 commits March 24, 2026 09:32

removed broadcasting in step!, reduced lock contention for activating…

d7e48ac

… settings

implemented thread local loggers

89f1dc3

completely removed reentrant locks from settings, updated loggertest.jl

d897659

fix in pathogen test

712cdf2

removed keyword arguments in hot loop

5ce5698

specified AbstractRNG as Xoshiro

7cd2c5a

added buffers for present_individuals! and sample_contacts!

10544e2

avoiding dicts in setting activation within spread_infection!

f0e9e53

wrappers for present_individuals and sample_contacts without buffer, …

b7a9a00

…replaced Random.default_rng() with const Xoshiro

removed dict usage in startconditions

9d6bc2e

fixed copy-paste error, using @view for present_inds in contact sampling

7ad2703

removed unneccesary Symbol() and geolocation call

98be3a0

optimized assign function of AgeBasedProgressionAssignment

bd1f500

optimized log_stepinfo, added fast forward if simulation reaches stea…

7561bef

…dy state

fixed progress bar for fast forwarding

0b739d7

added unit tests for new functions

9b72342

reverted parts of log_stepinfo, updated fast_forward! unit test

db02f3a

improved cache efficiency in log_stepinfo

6973ecd

further optimization for lon/lat calculation, optimization in infect!…

2a25b20

…, barrier function for dynamic dispatching in spread_infection!

optimizations in calculate_progression

c45cbac

fixed type error in calculate_progression

3721340

further optimized calculate_progression

e7c9ecc

further runtime-dispatching optimization

f6cc7e9

reverted last commit

d914f1d

fixed broken unit tests

d145be5

optimized file loading

67327b4

delete file

27a8300

reverted loading changes

8fbcb4a

optimizing simulation instantiation

92c5585

further optimization of simulation initialization

b203107

Julian-Patzner marked this pull request as draft March 25, 2026 07:54

Julian-Patzner marked this pull request as ready for review March 25, 2026 08:37

Julian-Patzner added 2 commits March 25, 2026 10:35

using buffer in contact_survey.jl

5b04182

fix

1995b12

Julian-Patzner marked this pull request as draft March 25, 2026 09:49

Julian-Patzner and others added 6 commits March 25, 2026 12:17

optimized memory allocations in post processing

cab8149

reduced allocations for AGSs in post processing

d66a2cf

further optimized post processing

c2a157c

using views to avoid allocations

dc595e7

set PARALLEL_POST_PROCESSING to true because max memory usage is now …

62b60ba

…basically the same compared to single threaded post processing

set to false again because speedup is low

bd81ee5

Julian-Patzner marked this pull request as ready for review March 26, 2026 00:07

optimized ram usage for r0_per_county, fixed issue for one-municipali…

70ccb06

…ty-populations in contact_survey.jl

JohannesPonge requested changes Mar 29, 2026

View reviewed changes

ajtnr and others added 16 commits March 29, 2026 21:38

using in-place joins in post processor constructor and effectiveR

245e407

optimized rolling_observed_SI

f0b30d3

changed "::Type{T}" to "settingtype::Type{T}" and restricted T to typ…

336ec0b

…e "Setting"

refactoring and docstrings

9135ef8

declared prov_settingtypes as DataType for more readability

1533403

fixed wrappers for sample_contacts! and calculate_progression

8765282

added back non-mutating sample_contacts wrappers

e694430

reverted DiseaseProgression call back to keywords, made wrapper inline

81d335d

Removed the fast-forwarding mechanic and replaced it with a dormant s…

7b93bd2

…tate. During this state, a lightweight step function is executed, and the simulation automatically wakes up and backfills logs when an epidemiological change occurs.

fix error in loggers, fix new unit tests

877b911

moved fire_custom_logger! in dormant state and checked more loggers f…

91a5dbe

…or wake up

refactor dormancy

32c9f33

further refactoring of dormancy

09a9d2d

further refactoring of dormancy mechanics

2bbe0f0

update dormancy unit tests

7dfc1d1

refactored copy_last_log_state! to copy_last_log_state

4da6e27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization#72

Performance optimization#72
Julian-Patzner wants to merge 60 commits intomainfrom
performance_optimization

Julian-Patzner commented Mar 20, 2026 •

edited

Loading

Uh oh!

JohannesPonge left a comment

Uh oh!

Uh oh!

JohannesPonge Mar 29, 2026

Uh oh!

JohannesPonge Mar 29, 2026

Uh oh!

JohannesPonge Mar 29, 2026

Uh oh!

Julian-Patzner Apr 9, 2026

Uh oh!

Julian-Patzner Apr 16, 2026

Uh oh!

Uh oh!

JohannesPonge Mar 29, 2026

Uh oh!

Uh oh!

Uh oh!

JohannesPonge Mar 29, 2026

Uh oh!

Julian-Patzner Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Julian-Patzner commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor: Major Performance, Memory, and Parallelization Optimizations

Overview

Key Changes & Features

1. Simulation Acceleration (Dormant State Optimization)

2. Lock-Free Parallel Execution & Logging

3. Zero-Allocation Simulation Loops & Structs

4. Fast Post-Processing & Analysis Refactoring

5. Type Stability and Dynamic Dispatch Removal

6. Initialization & Memory Footprint Improvements

Unit-Testing

Uh oh!

JohannesPonge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Julian-Patzner commented Mar 20, 2026 •

edited

Loading