Skip to content

[WIP] Replace rbind.data.frame with data.table for efficient row binding#1798

Closed
Claude wants to merge 1 commit intomainfrom
claude/replace-rbind-with-data-table
Closed

[WIP] Replace rbind.data.frame with data.table for efficient row binding#1798
Claude wants to merge 1 commit intomainfrom
claude/replace-rbind-with-data-table

Conversation

@Claude
Copy link
Copy Markdown

@Claude Claude AI commented Mar 4, 2026

Thanks for assigning this issue to me. I'm starting to work on it and will keep this PR's description up to date as I form a plan and make progress.

Original prompt

This section details on the original issue you should resolve

<issue_title>Replace rbind.data.frame with data.table for efficient row binding</issue_title>
<issue_description>## Issue Overview

Replace inefficient rbind.data.frame operations with data.table::rbindlist for binding many small data frames in population simulation result processing.

Current Problem

File: R/utilities-simulation-results.R:94-97

allIndividualProperties <- do.call(
  rbind.data.frame,
  c(individualPropertiesCache, stringsAsFactors = FALSE)
)

Issues:

  • rbind.data.frame is slow for binding many small data frames
  • Creates intermediate copies during the binding process
  • For 1000 individuals: combines 1000 separate list structures into one data frame
  • Potential O(n²) complexity in worst case due to repeated memory allocations

Proposed Solution

Option 1: Use data.table for efficient row binding

library(data.table)
allIndividualProperties <- data.table::rbindlist(
  individualPropertiesCache,
  use.names = TRUE
)

Option 2: Pre-allocate and fill by columns

nRows <- length(individualIds) * valueLength
allIndividualProperties <- data.frame(
  IndividualId = integer(nRows),
  Time = numeric(nRows),
  stringsAsFactors = FALSE
)
# Pre-allocate columns for each covariate
for (covariateName in covariateNames) {
  allIndividualProperties[[covariateName]] <- numeric(nRows)
}
# Fill in values more efficiently
rowIdx <- 1
for (individualIndex in seq_along(individualIds)) {
  endIdx <- rowIdx + valueLength - 1
  allIndividualProperties$IndividualId[rowIdx:endIdx] <- individualIds[individualIndex]
  # ... fill other columns
  rowIdx <- endIdx + 1
}

Expected Impact

  • Priority: HIGH
  • Estimated Impact: 40-60% reduction in data frame construction time
  • Effort: Low
  • Testing: Benchmark with various population sizes (100, 1000, 10000 individuals)

Implementation Notes

  • Affects all population simulation result processing
  • Verify that data.table is available as a dependency or add it
  • Ensure backward compatibility of output format

Parent Issue

This is part of the comprehensive performance optimization analysis in #1765


Parent Issue: #1765</issue_description>

Comments on the Issue (you are @claude[agent] in this section)

@Claude Claude AI assigned Claude and PavelBal Mar 4, 2026
Copilot stopped work on behalf of PavelBal due to an error March 4, 2026 16:43
@PavelBal PavelBal closed this Mar 5, 2026
@PavelBal PavelBal deleted the claude/replace-rbind-with-data-table branch March 5, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace rbind.data.frame with data.table for efficient row binding

2 participants