Conversation
…replaced Random.default_rng() with const Xoshiro
…, barrier function for dynamic dispatching in spread_infection!
…basically the same compared to single threaded post processing
…ty-populations in contact_survey.jl
JohannesPonge
left a comment
There was a problem hiding this comment.
What an update! Great work, Julian! I have added some comments and suggestions that went through my head reviewing the updates. Half of it can probably be removed after there's the PR description. My major points are:
- can we still use
RNGs other thanXoshiro? - Does the user still get the keyword-based interfaces for all places where you changed the default function signatures? (usability is extremely important)
- Better too many/ too detailed docstrings than none or too little info
- The fast forwarding is smart, but I would put it into a dedicated
StopCriterion - The
ByRows used to have this nasty property that they loop over entries and don't benefit from vectorization. If that hasn't changed, I'd recommend reverting this to vector-based dataframe manipulation. - One more overall comment: When you did your performance optimization, did you also check this for single-CPU run time? I just want to make sure that the code is not getting slower for people who don't use threaded Julia. This concern is related to the previous item. Parallelizing a loop instead of broadcasting may increase performance with enough cores but decrease performance in single-core applications.
Thanks!
|
|
||
| """ | ||
| sample_contacts(contact_sampling_method::ContactSamplingMethod, setting::Setting, individual::Individual, tick::Int16; rng::AbstractRNG = Random.default_rng())::ErrorException | ||
| sample_contacts!(indivs::Vector{Individual}, contact_sampling_method::ContactSamplingMethod, setting::Setting, individual_index::Int, present_inds::Vector{Individual}, tick::Int16, replace::Bool, rng::Xoshiro)::ErrorException |
There was a problem hiding this comment.
what do all the added parameters do? The docstrings are missing that. Also, this is a public interface for external users to extend the framework. Are all these additional arguments really necessary? It's adding a lot more complexity if people would like to add their own contact sampling mechanics. We usually tried to keep these things as simple as possible. If we really need this added complexity, would it be possible to put it into a wrapper (or keyword argument), so the user doesn't have to deal with all these cryptic arguments?
There was a problem hiding this comment.
this comment applies to all contact sampling functions now.
There was a problem hiding this comment.
Also, I saw that the rng types are now Xoshiro by default. What if a user would like to use different rng? Would that still be possible?
There was a problem hiding this comment.
I specified Xoshiro because abstract parameters and keyword arguments lead to a lot heap allocations (which the gc has to clean up). If you say this is important, i will think of a solution
There was a problem hiding this comment.
We could parameterize the Simulation struct (e.g., Simulation{R <: AbstractRNG}) to support multiple RNGs without losing performance, but it feels unnecessary right now. We can easily extend it later if we ever actually need to swap it out.
| offset = gems_rand(rng, 1:length(present_inds)-1) | ||
| contact_index = mod(individual_index + offset - 1, length(present_inds)) + 1 | ||
| return [present_inds[contact_index]] | ||
| push!(indivs, present_inds[contact_index]) |
There was a problem hiding this comment.
does that mean that the user has to handle the indivs vector content? Also, is there a tutorial on the new structure, or has the old tutorial been updated? Or didn't we have one yet?
|
|
||
| # Add all setting types to the container | ||
| add_types!(cntnr, [s for s in prov_settingtypes if s in concrete_subtypes(Setting)]) | ||
| add_types!(cntnr, [s for s in prov_settingtypes if s <: Setting && isconcretetype(s)]) |
There was a problem hiding this comment.
I recommend using is_subtype(type::Symbol, parent::DataType). We have that function in the utils.jl and it also makes sure that it doesn't break if GEMS is used as a dependency, because that sometimes causes types to not be Household but GEMS.Household and thus breaking some of these subtype functions. I have spent hours tracking down these problems.
There was a problem hiding this comment.
This problem comes from using strings here, which also causes a lot of gc time because strings are heap allocated. The issue you are describing can not happen here, as "s" does not eval to a string nor a symbol. For example, "isconcretetype(eval(GEMS.Household))" returns "true". I changed "prov_settingtypes" to be declared as "DataType[]" to make this more obvious
…tate. During this state, a lightweight step function is executed, and the simulation automatically wakes up and backfills logs when an epidemiological change occurs.
Refactor: Major Performance, Memory, and Parallelization Optimizations
Overview
This PR introduces a massive suite of performance optimizations aimed at significantly reducing memory allocations, eliminating lock contention during multi-threaded execution, ensuring type stability, and accelerating overall simulation and post-processing runtimes.
The core changes include replacing locked logging with lock-free thread-local buffers, transitioning to mutating functions for contact sampling, extensively rewriting the post-processing and analysis pipelines to use in-place operations, and introducing an optimized "dormant state" mechanism for simulations that reach a steady state early, allowing them to bypass heavy epidemiological loops while safely processing future triggers.
Key Changes & Features
1. Simulation Acceleration (Dormant State Optimization)
This PR introduces a highly optimized "Dormant State" to dramatically accelerate the simulation during the tail-end of a pandemic, bypassing heavy calculations when the disease has died out.
is_dormant): The simulation actively monitors for a complete mathematical freeze. It checks that the macroscopic disease state is clear (0 exposed, 0 infectious, 0 quarantined) and peeks at theevent_queueandtick_triggersto guarantee no external forces are scheduled to act on the current tick.step!): When the simulation detects it is asleep, thestep!function instantly bypasses the computationally heavystepmodand custom loggers) execute uninterrupted.copy_last_log_state): During a dormant tick, the expensivelog_stepinfopopulation scan is entirely skipped. Instead, the simulation instantly duplicates the previous tick's state into the loggers inis_dormantnaturally evaluates tofalsefor that exact tick. The simulation automatically resumes the full physics loops to accurately capture the state change.2. Lock-Free Parallel Execution & Logging
All event loggers (
InfectionLogger,VaccinationLogger,DeathLogger,TestLogger,PoolTestLogger,SeroprevalenceLogger,QuarantineLogger,StateLogger) have been entirely rewritten to eliminateReentrantLockbottlenecks during multi-threaded execution.Vector{Vector{T}}sized toThreads.maxthreadid()).Threads.Atomicis now used to safely generate unique IDs across threads without blocking.dataframe()andsave()functions now dynamically flatten (vcat) the thread-local arrays upon extraction.isactive::Booland explicit locks withisactive::Threads.Atomic{Bool}on all Setting structs.3. Zero-Allocation Simulation Loops & Structs
present_buffersandcontact_buffersto theSimulationstruct. These are pre-allocated, thread-specific vectors used during the infection spread phase to prevent continuous array allocations.sample_contacts!,individuals!, andpresent_individuals!which accept a pre-allocated vector to mutate rather than returning newly allocated arrays. Backwards-compatible non-mutating wrappers have been retained.Categoricaldistribution allocations inAgeBasedProgressionAssignmentwith a fast, manual cumulative sum over the stratification matrix that avoids heap allocations.DiseaseProgressionconstructor to require strictly typedInt16positional arguments, ensuring type stability right at the allocation boundary during theinfect!phase.4. Fast Post-Processing & Analysis Refactoring
leftjoin!,transform!,select!,rename!) and views (subsetwithview=true) to calculate SI, effective R, and demographics without repeatedly allocating large DataFrames.contact_samplesto use pre-allocated column vectors instead of dynamically appending rows to a DataFrame.household_attack_ratesnow uses standard vectors and pre-calculated dictionaries for parent->child relationships, removing an expensive nested DataFrame loop.r0_per_countynow precomputes secondary cases in a single pass.rolling_observed_SIto use binary search (searchsortedlast,searchsortedfirst) and array views (@view) instead of sequential lookups (findfirst).5. Type Stability and Dynamic Dispatch Removal
settingtype::DataTypearguments with parameterizedsettingtype::Type{T} where {T <: Setting}across analysis functions (setting_age_contacts,contact_samples). Inner simulation loops now explicitly cast retrieved settings (e.g.,s = raw_s::settingtype) to enforce type stability and eliminate dynamic dispatch overhead.AbstractRNGarguments with concreteXoshiroreferences (and addedDEFAULT_GEMS_RNG) across all transmission, progression, and initialization functions._get_size,_delete_dangling_for_type!) to handle abstract vector iteration with type-stable, concrete property access.settings_tuple(individual)to replace dynamicsettings()calls. It returns a statically typed tuple of setting pairs(Type, ID), heavily speeding up iteration over an individual's active settings.6. Initialization & Memory Footprint Improvements
settings_from_populationusing exact pre-allocation (construct_and_add_settings!) and Counting Sort, sorting and grouping populations into settings exponentially faster.Population(df::DataFrame)constructor to extract DataFrame columns into a strongly-typedNamedTuplebefore the loop, bypassing slow row-by-row DataFrame indexing.precompute_ags!to calculate and statically assign theAGS(Amtlicher Gemeindeschlüssel) field for allContainerSettingsduring initialization, preventing recursive dynamic lookups during runtime.comorbidities::Vector{Bool}in theIndividualstruct with a primitiveUInt16bitmask. This eliminates a vector allocation for every single agent, reducing the simulation's baseline memory footprint and improving cache locality.Unit-Testing
settings_tupleoutputs, Buffer-aware contact sampling, and recursive setting activation.is_dormantaccurately evaluates state, logger updates correctly trip the wake-up triggers, andcatch_up_logs!accurately backfills skipped ticks.