HIT-Leiden Explained for Software Developers

HIT-Leiden is an algorithm for efficiently maintaining Leiden communities in large dynamic graphs.

Why this algorithm exists

Classic Leiden is great for static graphs, but production systems usually deal with dynamic edges (arrivals/removals) and need fast updates. HIT-Leiden keeps Leiden’s quality goals while avoiding full recomputation on every change.

If you like mental models: this is Leiden with incremental deltas + hierarchy-aware maintenance + mode-dependent execution (deterministic vs throughput).

Core concepts (developer-friendly)

Graph: $G=(V,E)$
Delta graph: $\Delta G$ (edge updates to apply)
Level-(p) supergraph: $G^p$
Community map: $f^p(\cdot)$
Refined community map: $g^p(\cdot)$
Subcommunity map: $s^p(\cdot)$
Resolution: $\gamma$ (modularity trade-off knob)

In implementation terms, these are mostly arrays/vectors indexed by node or supernode.

Modularity definitions (what we optimize)

HIT-Leiden uses modularity as its objective during movement/refinement decisions.

Global modularity

For an undirected weighted graph, modularity is:

$$ Q = \frac{1}{2m}\sum_{i,j}\left(A_{ij} - \gamma\frac{k_i k_j}{2m}\right),\delta(c_i,c_j) $$

Where:

$A_{ij}$ is edge weight between nodes $i$ and $j$,
$k_i$ is weighted degree of node $i$,
$2m = \sum_i k_i$,
$c_i$ is the community label for node $i$,
$\delta(c_i,c_j)=1$ if same community, else $0$.

Intuition: higher $Q$ means more internal edge weight than expected under a degree-preserving random baseline.

Local move gain used in `inc-movement`

For a node $v$ moving from current community $C$ to candidate $C'$, the code evaluates a local gain of the form:

$$ \Delta Q(v: C \to C') = \frac{w(v,C') - w(v,C)}{2m}

\gamma,\frac{k_v\left(d(C) - k_v - d(C')\right)}{(2m)^2} $$

Where:

$w(v,C)$ is total edge weight from $v$ to nodes in community $C$,
$k_v$ is degree of $v$,
$d(C)=\sum_{u\in C}k_u$ is total degree mass of community $C$.

Moves are applied only when $\Delta Q > 0$ (and in throughput mode, batch guards apply as well).

Reporting note

The run-level quality score (compute_modularity) reports standard modularity from the final partition. Movement/refinement steps still use the configured resolution parameter $\gamma$ for local decisions.

End-to-end flow

Build/load a graph (GraphInput -> InMemoryGraph).
Choose mode:
- Deterministic: stable, single-thread style logic.
- Throughput: parallel movement/refinement with guardrails.
For each hierarchy level $p$:
- Apply updates: $G^p \leftarrow G^p \oplus \Delta G^p$.
- Run incremental movement.
- Run incremental refinement.
- Run incremental aggregation (unless skipped).
Run deferred update from top level down.
Output final map from refined level-1 assignment.

Algorithm walkthrough (1–6) in practical terms

1) Leiden baseline (Algorithm 1)

At each level:

move nodes to improve modularity,
refine communities to preserve connectivity quality,
aggregate into a coarser graph for the next level.

Then project results back to original nodes.

2) Incremental movement (Algorithm 2)

This is the local re-assignment step triggered by changes.

What it does:

Build active set $A$ from changed edges in $\Delta G$:
- positive inter-community changes activate endpoints,
- negative intra-community changes activate endpoints.
Pop active nodes and evaluate best destination community via modularity gain $\Delta Q$.
Move only when gain is positive.
Record changed nodes ($B$) and refinement-affected nodes ($K$).
Re-activate neighbors when local structure changed.

3) Incremental refinement (Algorithm 3)

Refinement keeps communities structurally healthy.

What it does:

Split disconnected components into new subcommunities.
For singleton refined nodes, evaluate merge targets in the same parent community.
Use T-filter style check (only consider targets where removing target subcommunity is not quality-improving).
Accept best positive-gain merge.

4) Incremental aggregation (Algorithm 4)

Converts refinement changes into next-level supergraph deltas.

What it does:

Start with lifted edge deltas from node space to subcommunity space.
For refined nodes, subtract old subcommunity edge contributions and add new ones.
Transfer self-loop contributions when needed.
Compress duplicate superedges by summing weights.

5) Deferred update (Algorithm 5)

After per-level operations, propagate changes top-down:

update $f^p$ from $f^{p+1}$ using subcommunity links,
propagate changed-node sets to lower levels.

6) HIT-Leiden orchestration (Algorithm 6)

At each level $p$:

Apply graph delta to current supergraph.
Run inc-movement -> returns changed/affected sets.
Run inc-refinement -> returns refined set.
If not last level, run inc-aggregation to produce $\Delta G^{p+1}$.
After loop, run deferred updates and finalize output.

Throughput mode behavior (implemented today)

Throughput mode is not just “run in parallel”; it adds correctness/quality protections.

T1. Decoupled parallel movement selection

Parallel workers generate candidate moves first, then a global selector keeps only compatible moves.

Rules:

maintain emitter set $E_{emit}$ and acceptor set $E_{acc}$,
reject move if it emits from an acceptor or accepts into an emitter,
enforce one move per node.

This reduces conflicting batch moves that can harm quality.

T2. Monotonicity guard

Let selected batch be $\hat{M}$. Compute total gain: $$ \Sigma\Delta Q(\hat{M}) = \sum_{m\in\hat{M}}\Delta Q(m) $$ If total gain is non-positive, skip applying that batch.

T3. Aggregation skip optimization

Skip aggregation when both are true:

no delta edges at level $p$,
no refined nodes at level $p$.

Formally: $$ \operatorname{skip}(p) = (|\Delta G^p|=0) \land (|R^p|=0) $$

Throughput safety gate in multilevel movement

The standalone multilevel movement path keeps parallel movement disabled by a safety gate until quality parity is fully acceptable.

Rust implementation crosswalk (quick reference)

Core state

Concept	Rust symbol
input graph $G$	`GraphInput`, `InMemoryGraph`
dynamic updates $\Delta G$	`delta_g`, `delta_graph`, `current_delta`
final map $f(\cdot)$	`state.node_to_comm`
per-level map $f^p$	`state.community_mapping_per_level[p]`
per-level refined map $g^p$	`state.refined_community_mapping_per_level[p]`
previous/current subcommunity maps	`previous_subcommunity_mapping_per_level[p]`, `current_subcommunity_mapping_per_level[p]`

Main algorithm entry points

Spec step	Rust function
HIT-Leiden loop	`hit_leiden(...)`
movement	`inc_movement(...)`
refinement	`inc_refinement(...)`
aggregation	`inc_aggregation(...)`
deferred update	`def_update(...)`
multilevel static-style path	`multilevel_leiden(...)`

Throughput-specific symbols

Throughput concept	Rust symbol
move candidate record	`MoveCandidate { node, from_comm, to_comm, node_degree, gain }`
shard proposal stage	`execute_shard(...)`
batch orchestrator	`inc_movement_parallel(...)`
decoupling selector	`select_decoupled_moves(...)`
selected moves $\hat{M}$	`selected_moves`
emitter / acceptor sets	`emitters`, `acceptors`
one-move-per-node guard	`moved_nodes`
total selected gain	`total_gain`
aggregation skip predicate	`should_skip_aggregation(...)`

Practical notes for contributors

If you change movement semantics, verify both quality and speed.
Throughput-mode changes should include tests for:
- conflict handling,
- monotonicity guard behavior,
- aggregation skip correctness.
Keep docs and symbol crosswalks updated when renaming key structs/functions.

In short: the math spec defines the contract, and this guide tells you how to reason about that contract when writing or reviewing Rust code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIT-Leiden Explained for Software Developers

Why this algorithm exists

Core concepts (developer-friendly)

Modularity definitions (what we optimize)

Global modularity

Local move gain used in `inc-movement`

Reporting note

End-to-end flow

Algorithm walkthrough (1–6) in practical terms

1) Leiden baseline (Algorithm 1)

2) Incremental movement (Algorithm 2)

3) Incremental refinement (Algorithm 3)

4) Incremental aggregation (Algorithm 4)

5) Deferred update (Algorithm 5)

6) HIT-Leiden orchestration (Algorithm 6)

Throughput mode behavior (implemented today)

T1. Decoupled parallel movement selection

T2. Monotonicity guard

T3. Aggregation skip optimization

Throughput safety gate in multilevel movement

Rust implementation crosswalk (quick reference)

Core state

Main algorithm entry points

Throughput-specific symbols

Practical notes for contributors

FilesExpand file tree

hit_leiden_explained.md

Latest commit

History

hit_leiden_explained.md

File metadata and controls

HIT-Leiden Explained for Software Developers

Why this algorithm exists

Core concepts (developer-friendly)

Modularity definitions (what we optimize)

Global modularity

Local move gain used in inc-movement

Reporting note

End-to-end flow

Algorithm walkthrough (1–6) in practical terms

1) Leiden baseline (Algorithm 1)

2) Incremental movement (Algorithm 2)

3) Incremental refinement (Algorithm 3)

4) Incremental aggregation (Algorithm 4)

5) Deferred update (Algorithm 5)

6) HIT-Leiden orchestration (Algorithm 6)

Throughput mode behavior (implemented today)

T1. Decoupled parallel movement selection

T2. Monotonicity guard

T3. Aggregation skip optimization

Throughput safety gate in multilevel movement

Rust implementation crosswalk (quick reference)

Core state

Main algorithm entry points

Throughput-specific symbols

Practical notes for contributors

Local move gain used in `inc-movement`