Skip to content

Performance optimization#72

Open
Julian-Patzner wants to merge 60 commits intomainfrom
performance_optimization
Open

Performance optimization#72
Julian-Patzner wants to merge 60 commits intomainfrom
performance_optimization

Conversation

@Julian-Patzner
Copy link
Copy Markdown
Collaborator

@Julian-Patzner Julian-Patzner commented Mar 20, 2026

Refactor: Major Performance, Memory, and Parallelization Optimizations

Overview

This PR introduces a massive suite of performance optimizations aimed at significantly reducing memory allocations, eliminating lock contention during multi-threaded execution, ensuring type stability, and accelerating overall simulation and post-processing runtimes.

The core changes include replacing locked logging with lock-free thread-local buffers, transitioning to mutating functions for contact sampling, extensively rewriting the post-processing and analysis pipelines to use in-place operations, and introducing an optimized "dormant state" mechanism for simulations that reach a steady state early, allowing them to bypass heavy epidemiological loops while safely processing future triggers.


Key Changes & Features

1. Simulation Acceleration (Dormant State Optimization)

This PR introduces a highly optimized "Dormant State" to dramatically accelerate the simulation during the tail-end of a pandemic, bypassing heavy calculations when the disease has died out.

  • Steady-State Detection (is_dormant): The simulation actively monitors for a complete mathematical freeze. It checks that the macroscopic disease state is clear (0 exposed, 0 infectious, 0 quarantined) and peeks at the event_queue and tick_triggers to guarantee no external forces are scheduled to act on the current tick.
  • Dormant Stepping (step!): When the simulation detects it is asleep, the step! function instantly bypasses the computationally heavy $O(N)$ loops. It maintains the normal progression of the simulation timeline, guaranteeing that all scheduled events, background processes, and user-defined hooks (such as stepmod and custom loggers) execute uninterrupted.
  • $O(1)$ State Duplication (copy_last_log_state): During a dormant tick, the expensive $O(N)$ log_stepinfo population scan is entirely skipped. Instead, the simulation instantly duplicates the previous tick's state into the loggers in $O(1)$ time.
  • Data Integrity & Wake-Up: The system guarantees zero data loss. The moment a scheduled event or trigger is set to fire (a newly imported case), is_dormant naturally evaluates to false for that exact tick. The simulation automatically resumes the full physics loops to accurately capture the state change.

2. Lock-Free Parallel Execution & Logging

All event loggers (InfectionLogger, VaccinationLogger, DeathLogger, TestLogger, PoolTestLogger, SeroprevalenceLogger, QuarantineLogger, StateLogger) have been entirely rewritten to eliminate ReentrantLock bottlenecks during multi-threaded execution.

  • Thread-Local Storage: Data is now pushed directly to thread-specific vectors (Vector{Vector{T}} sized to Threads.maxthreadid()).
  • Atomic Counters: Threads.Atomic is now used to safely generate unique IDs across threads without blocking.
  • Data Aggregation: The dataframe() and save() functions now dynamically flatten (vcat) the thread-local arrays upon extraction.
  • Atomic Setting States: Replaced isactive::Bool and explicit locks with isactive::Threads.Atomic{Bool} on all Setting structs.

3. Zero-Allocation Simulation Loops & Structs

  • Thread-Local Buffers: Added present_buffers and contact_buffers to the Simulation struct. These are pre-allocated, thread-specific vectors used during the infection spread phase to prevent continuous array allocations.
  • Mutating Core Methods: Introduced sample_contacts!, individuals!, and present_individuals! which accept a pre-allocated vector to mutate rather than returning newly allocated arrays. Backwards-compatible non-mutating wrappers have been retained.
  • Manual Cumulative Sums: Replaced Categorical distribution allocations in AgeBasedProgressionAssignment with a fast, manual cumulative sum over the stratification matrix that avoids heap allocations.
  • Strict Disease Progression Typing: Refactored the DiseaseProgression constructor to require strictly typed Int16 positional arguments, ensuring type stability right at the allocation boundary during the infect! phase.

4. Fast Post-Processing & Analysis Refactoring

  • In-Place DataFrame Ops: Post-processing methods now extensively use mutating operations (leftjoin!, transform!, select!, rename!) and views (subset with view=true) to calculate SI, effective R, and demographics without repeatedly allocating large DataFrames.
  • Fast Contact Surveys: Refactored contact_samples to use pre-allocated column vectors instead of dynamically appending rows to a DataFrame.
  • Precomputed Relationships: household_attack_rates now uses standard vectors and pre-calculated dictionaries for parent->child relationships, removing an expensive nested DataFrame loop. r0_per_county now precomputes secondary cases in a single pass.
  • Binary Search & Views: Optimized rolling_observed_SI to use binary search (searchsortedlast, searchsortedfirst) and array views (@view) instead of sequential lookups (findfirst).

5. Type Stability and Dynamic Dispatch Removal

  • Parameterization of Settings: Replaced dynamic settingtype::DataType arguments with parameterized settingtype::Type{T} where {T <: Setting} across analysis functions (setting_age_contacts, contact_samples). Inner simulation loops now explicitly cast retrieved settings (e.g., s = raw_s::settingtype) to enforce type stability and eliminate dynamic dispatch overhead.
  • Concrete RNGs: Replaced generic AbstractRNG arguments with concrete Xoshiro references (and added DEFAULT_GEMS_RNG) across all transmission, progression, and initialization functions.
  • Function Barriers: Introduced internal function barriers (e.g., _get_size, _delete_dangling_for_type!) to handle abstract vector iteration with type-stable, concrete property access.
  • Static Setting Tuples: Added settings_tuple(individual) to replace dynamic settings() calls. It returns a statically typed tuple of setting pairs (Type, ID), heavily speeding up iteration over an individual's active settings.

6. Initialization & Memory Footprint Improvements

  • Optimized Settings Creation: Rewrote settings_from_population using exact pre-allocation (construct_and_add_settings!) and Counting Sort, sorting and grouping populations into settings exponentially faster.
  • Fast Population Parsing: Refactored the Population(df::DataFrame) constructor to extract DataFrame columns into a strongly-typed NamedTuple before the loop, bypassing slow row-by-row DataFrame indexing.
  • Precomputed AGS: Added precompute_ags! to calculate and statically assign the AGS (Amtlicher Gemeindeschlüssel) field for all ContainerSettings during initialization, preventing recursive dynamic lookups during runtime.
  • Comorbidities Bitmask: Replaced the heap-allocated comorbidities::Vector{Bool} in the Individual struct with a primitive UInt16 bitmask. This eliminates a vector allocation for every single agent, reducing the simulation's baseline memory footprint and improving cache locality.

Unit-Testing

  • All tests updated to handle flattened thread-local logger vectors.
  • Added specific test suites for settings_tuple outputs, Buffer-aware contact sampling, and recursive setting activation.
  • Added extensive testing for the Simulation Acceleration module, ensuring is_dormant accurately evaluates state, logger updates correctly trip the wake-up triggers, and catch_up_logs! accurately backfills skipped ticks.

Julian-Patzner and others added 30 commits March 24, 2026 09:32
…replaced Random.default_rng() with const Xoshiro
…, barrier function for dynamic dispatching in spread_infection!
@Julian-Patzner Julian-Patzner marked this pull request as draft March 25, 2026 07:54
@Julian-Patzner Julian-Patzner marked this pull request as ready for review March 25, 2026 08:37
@Julian-Patzner Julian-Patzner marked this pull request as draft March 25, 2026 09:49
@Julian-Patzner Julian-Patzner marked this pull request as ready for review March 26, 2026 00:07
Copy link
Copy Markdown
Contributor

@JohannesPonge JohannesPonge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an update! Great work, Julian! I have added some comments and suggestions that went through my head reviewing the updates. Half of it can probably be removed after there's the PR description. My major points are:

  • can we still use RNGs other than Xoshiro?
  • Does the user still get the keyword-based interfaces for all places where you changed the default function signatures? (usability is extremely important)
  • Better too many/ too detailed docstrings than none or too little info
  • The fast forwarding is smart, but I would put it into a dedicated StopCriterion
  • The ByRows used to have this nasty property that they loop over entries and don't benefit from vectorization. If that hasn't changed, I'd recommend reverting this to vector-based dataframe manipulation.
  • One more overall comment: When you did your performance optimization, did you also check this for single-CPU run time? I just want to make sure that the code is not getting slower for people who don't use threaded Julia. This concern is related to the previous item. Parallelizing a loop instead of broadcasting may increase performance with enough cores but decrease performance in single-core applications.

Thanks!

Comment thread src/initialization/start_conditions/sc_RegionalSeeds.jl Outdated

"""
sample_contacts(contact_sampling_method::ContactSamplingMethod, setting::Setting, individual::Individual, tick::Int16; rng::AbstractRNG = Random.default_rng())::ErrorException
sample_contacts!(indivs::Vector{Individual}, contact_sampling_method::ContactSamplingMethod, setting::Setting, individual_index::Int, present_inds::Vector{Individual}, tick::Int16, replace::Bool, rng::Xoshiro)::ErrorException
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do all the added parameters do? The docstrings are missing that. Also, this is a public interface for external users to extend the framework. Are all these additional arguments really necessary? It's adding a lot more complexity if people would like to add their own contact sampling mechanics. We usually tried to keep these things as simple as possible. If we really need this added complexity, would it be possible to put it into a wrapper (or keyword argument), so the user doesn't have to deal with all these cryptic arguments?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment applies to all contact sampling functions now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I saw that the rng types are now Xoshiro by default. What if a user would like to use different rng? Would that still be possible?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specified Xoshiro because abstract parameters and keyword arguments lead to a lot heap allocations (which the gc has to clean up). If you say this is important, i will think of a solution

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could parameterize the Simulation struct (e.g., Simulation{R <: AbstractRNG}) to support multiple RNGs without losing performance, but it feels unnecessary right now. We can easily extend it later if we ever actually need to swap it out.

Comment thread src/initialization/start_conditions/sc_RegionalSeeds.jl
offset = gems_rand(rng, 1:length(present_inds)-1)
contact_index = mod(individual_index + offset - 1, length(present_inds)) + 1
return [present_inds[contact_index]]
push!(indivs, present_inds[contact_index])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean that the user has to handle the indivs vector content? Also, is there a tutorial on the new structure, or has the old tutorial been updated? Or didn't we have one yet?

Comment thread src/methods/contact_sampling_methods.jl
Comment thread src/structs/entities/settingscontainers.jl

# Add all setting types to the container
add_types!(cntnr, [s for s in prov_settingtypes if s in concrete_subtypes(Setting)])
add_types!(cntnr, [s for s in prov_settingtypes if s <: Setting && isconcretetype(s)])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using is_subtype(type::Symbol, parent::DataType). We have that function in the utils.jl and it also makes sure that it doesn't break if GEMS is used as a dependency, because that sometimes causes types to not be Household but GEMS.Household and thus breaking some of these subtype functions. I have spent hours tracking down these problems.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem comes from using strings here, which also causes a lot of gc time because strings are heap allocated. The issue you are describing can not happen here, as "s" does not eval to a string nor a symbol. For example, "isconcretetype(eval(GEMS.Household))" returns "true". I changed "prov_settingtypes" to be declared as "DataType[]" to make this more obvious

Comment thread src/structs/simulation.jl Outdated
Comment thread src/structs/simulation.jl
Comment thread test/simulationtest.jl Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants