Description
Optimize repeated pattern matching and string operations by caching results, reducing redundant regex calls, and using more efficient string manipulation methods.
Current Implementation Issues
R/MappedData.R - Nested gsub() calls
Lines 153-155: Nested string replacements
- Pattern:
gsub("_x", "", gsub("_y", "", ...))
- Creates intermediate strings unnecessarily
- Can be combined into single regex or cached
Line 464: gsub() in loop
gsub("y", private$direction, ...) called repeatedly in loop
- Same pattern replacement done multiple times
- Should cache result or move outside loop
R/utilities-defaults.R - Repeated pattern replacements
Lines 322-327: Multiple gsub() calls
gsub() called multiple times with pattern "ospsuite.plots.geom"
- Same pattern used repeatedly
- Could compile regex once or use string constants
R/utilities_export.R - Sequential grepl() calls
Lines 166-171: Multiple pattern checks
- Multiple
grepl() calls in sequence for pattern matching
- Each call scans the entire string
- Could combine patterns or use single regex with alternatives
Line 327: gsub() in loop
- Character replacement inside loop
- Same substitution applied repeatedly
R/utilities.R - Redundant string operations
Lines 177-180: Multiple trimws() calls
trimws(label) called twice
trimws(unit) called twice
- Should cache trimmed values
Suggested Implementation
1. Cache String Operations
Before:
for (item in items) {
cleaned <- gsub("pattern", "replacement", item)
process(cleaned)
}
After:
cleaned_items <- gsub("pattern", "replacement", items) # Vectorized
for (cleaned in cleaned_items) {
process(cleaned)
}
2. Combine Multiple Patterns
Before:
if (grepl("pattern1", text)) { }
if (grepl("pattern2", text)) { }
After:
if (grepl("pattern1|pattern2", text)) { }
3. Pre-compile Regex (for R >= 4.1)
Before:
for (text in texts) {
matched <- grepl("complex.*pattern", text)
}
After:
pattern <- gregexpr("complex.*pattern", texts) # Vectorized
4. Use stringi for Performance
Consider using stringi package for complex string operations (much faster than base R for large datasets).
Expected Benefits
- Faster string processing (up to 10x for large datasets)
- Reduced redundant regex compilation
- Lower CPU usage
- Better scalability for text-heavy operations
Implementation Notes
Files to Modify
R/MappedData.R
R/utilities-defaults.R
R/utilities_export.R
R/utilities.R
Testing
- Run existing unit tests
- Add tests for edge cases in string operations
- Verify all pattern replacements work correctly
- Test with special characters and Unicode
- Consider performance benchmarks for large datasets
Description
Optimize repeated pattern matching and string operations by caching results, reducing redundant regex calls, and using more efficient string manipulation methods.
Current Implementation Issues
R/MappedData.R- Nested gsub() callsLines 153-155: Nested string replacements
gsub("_x", "", gsub("_y", "", ...))Line 464: gsub() in loop
gsub("y", private$direction, ...)called repeatedly in loopR/utilities-defaults.R- Repeated pattern replacementsLines 322-327: Multiple gsub() calls
gsub()called multiple times with pattern "ospsuite.plots.geom"R/utilities_export.R- Sequential grepl() callsLines 166-171: Multiple pattern checks
grepl()calls in sequence for pattern matchingLine 327: gsub() in loop
R/utilities.R- Redundant string operationsLines 177-180: Multiple trimws() calls
trimws(label)called twicetrimws(unit)called twiceSuggested Implementation
1. Cache String Operations
Before:
After:
2. Combine Multiple Patterns
Before:
After:
3. Pre-compile Regex (for R >= 4.1)
Before:
After:
4. Use stringi for Performance
Consider using
stringipackage for complex string operations (much faster than base R for large datasets).Expected Benefits
Implementation Notes
Files to Modify
R/MappedData.RR/utilities-defaults.RR/utilities_export.RR/utilities.RTesting