Skip to content

Commit 72ff465

Browse files
committed
[Fix] --json-report: field parity, unified JSON escaping, O(N) list rendering at scale; issue #482
[Fix] --json-report list: reports[] entries now include a "path" field, matching the text-mode list output; active[] entries gain the lifecycle schema (eta, workers, sig_version, progress{}) so --json-report list and -L/--list-active emit the same shape; stopped[] entries gain an elapsed field; all string fields in active[] and stopped[] are now JSON-escaped via the shared _json_escape_string helper (previously only "path" was escaped — stage, engine, stopped_hr, stages, sig_version could emit invalid JSON if scan.meta contained quotes, backslashes, or control characters); issue #482 [Fix] --json-report list: scaling regression at large session counts. Pre-fix the function exhibited effective hang at ~20K indexed sessions from two compounding costs: (1) _seen_ids built as a whitespace-joined string with glob-pattern dedup (O(N^2)) and (2) one subshell fork per report for path escaping. Dedup now uses a function-scoped local -A associative array (O(N)), and the reports[]/legacy hot loops use the new _json_escape_var out-parameter helper instead of $() command substitution. Measured: 20K reports 82s -> 1.7s; 50K reports ~1.7s (linear). Regression guard: tests/31-json-report.bats test 27 runs a 10K-entry synthetic index under a 30s timeout [Change] Lifecycle JSON (-L --format json): "scan_id" is now the canonical field name; "scanid" remains as a deprecated alias for one release cycle and will be removed in v2.1.0. Consumers should switch to "scan_id" [Change] Lifecycle JSON (-L --format json): "workers" field type normalized to unquoted number to match the field's underlying integer value [Change] _json_escape_string: promoted from lmd_hook.sh (optional sub-lib) to lmd.lib.sh shared utilities so all JSON emitters — scan list, active lifecycle, post-scan hook, and ELK dispatch — share one helper. Three call sites in lmd_alert.sh previously using _alert_json_escape (from the vendored alert_lib) now use the project-owned helper; vendored library coupling reduced to the lib's internal self-use. Sibling out-parameter helper _json_escape_var (sets _JSON_ESC_OUT) also defined for hot loops that must avoid a subshell fork per iteration [New] tests/31-json-report.bats: four regression cases — reports[] path parity (issue #482), JSON validity with special-character paths, total_quarantined field presence, linear scaling at 10K sessions [Change] tests/46-post-scan-hook.bats: J-15 structural guard now points to lmd.lib.sh (the new home for _json_escape_string)
1 parent 9a80cb2 commit 72ff465

File tree

8 files changed

+251
-56
lines changed

8 files changed

+251
-56
lines changed

CHANGELOG

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,38 @@ v2.0.1 | Mar 25 2026:
152152

153153
-- Bug Fixes --
154154

155+
[Fix] --json-report list: reports[] entries now include a "path" field, matching
156+
the text-mode list output; active[] entries gain the lifecycle schema
157+
(eta, workers, sig_version, progress{}) so --json-report list and
158+
-L/--list-active emit the same shape; stopped[] entries gain an elapsed
159+
field; all string fields in active[] and stopped[] are now JSON-escaped
160+
via the shared _json_escape_string helper (previously only "path" was
161+
escaped — stage, engine, stopped_hr, stages, sig_version could emit
162+
invalid JSON if scan.meta contained quotes, backslashes, or control
163+
characters); issue #482
164+
[Change] Lifecycle JSON (-L --format json): "scan_id" is now the canonical
165+
field name; "scanid" remains as a deprecated alias for one release cycle
166+
and will be removed in v2.1.0. Consumers should switch to "scan_id"
167+
[Change] Lifecycle JSON (-L --format json): "workers" field type normalized to
168+
unquoted number to match the field's underlying integer value
169+
[Change] _json_escape_string: promoted from lmd_hook.sh (optional sub-lib) to
170+
lmd.lib.sh shared utilities so all JSON emitters — scan list, active
171+
lifecycle, post-scan hook, and ELK dispatch — share one helper. Three
172+
call sites in lmd_alert.sh previously using _alert_json_escape (from the
173+
vendored alert_lib) now use the project-owned helper; vendored library
174+
coupling reduced to the lib's internal self-use. A sibling out-parameter
175+
helper _json_escape_var (sets _JSON_ESC_OUT) is also defined for hot
176+
loops that must avoid a subshell fork per iteration
177+
[Fix] --json-report list: scaling regression at large session counts. Pre-fix
178+
the function exhibited effective hang at ~20K indexed sessions from two
179+
compounding costs: (1) _seen_ids built as a whitespace-joined string
180+
that grew O(N^2) with glob-pattern dedup, and (2) one subshell fork per
181+
report for path escaping. Dedup now uses a function-scoped local -A
182+
associative array (O(N) total), and the reports[]/legacy hot loops use
183+
the new _json_escape_var out-parameter helper instead of $() command
184+
substitution. Measured: 20K reports 82s -> 1.7s; 50K reports now ~1.7s
185+
(linear). Regression guard: tests/31-json-report.bats test 27 runs a
186+
10K-entry synthetic index under a 30s timeout
155187
[Fix] clamav_linksigs: guard mktemp staging failure to prevent writing signature
156188
files into the filesystem root. When mktemp -d fails, the empty _staging
157189
variable caused "cp -f ... /" as root; pr#478

CHANGELOG.RELEASE

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,38 @@ v2.0.1 | Mar 25 2026:
152152

153153
-- Bug Fixes --
154154

155+
[Fix] --json-report list: reports[] entries now include a "path" field, matching
156+
the text-mode list output; active[] entries gain the lifecycle schema
157+
(eta, workers, sig_version, progress{}) so --json-report list and
158+
-L/--list-active emit the same shape; stopped[] entries gain an elapsed
159+
field; all string fields in active[] and stopped[] are now JSON-escaped
160+
via the shared _json_escape_string helper (previously only "path" was
161+
escaped — stage, engine, stopped_hr, stages, sig_version could emit
162+
invalid JSON if scan.meta contained quotes, backslashes, or control
163+
characters); issue #482
164+
[Change] Lifecycle JSON (-L --format json): "scan_id" is now the canonical
165+
field name; "scanid" remains as a deprecated alias for one release cycle
166+
and will be removed in v2.1.0. Consumers should switch to "scan_id"
167+
[Change] Lifecycle JSON (-L --format json): "workers" field type normalized to
168+
unquoted number to match the field's underlying integer value
169+
[Change] _json_escape_string: promoted from lmd_hook.sh (optional sub-lib) to
170+
lmd.lib.sh shared utilities so all JSON emitters — scan list, active
171+
lifecycle, post-scan hook, and ELK dispatch — share one helper. Three
172+
call sites in lmd_alert.sh previously using _alert_json_escape (from the
173+
vendored alert_lib) now use the project-owned helper; vendored library
174+
coupling reduced to the lib's internal self-use. A sibling out-parameter
175+
helper _json_escape_var (sets _JSON_ESC_OUT) is also defined for hot
176+
loops that must avoid a subshell fork per iteration
177+
[Fix] --json-report list: scaling regression at large session counts. Pre-fix
178+
the function exhibited effective hang at ~20K indexed sessions from two
179+
compounding costs: (1) _seen_ids built as a whitespace-joined string
180+
that grew O(N^2) with glob-pattern dedup, and (2) one subshell fork per
181+
report for path escaping. Dedup now uses a function-scoped local -A
182+
associative array (O(N) total), and the reports[]/legacy hot loops use
183+
the new _json_escape_var out-parameter helper instead of $() command
184+
substitution. Measured: 20K reports 82s -> 1.7s; 50K reports now ~1.7s
185+
(linear). Regression guard: tests/31-json-report.bats test 27 runs a
186+
10K-entry synthetic index under a 30s timeout
155187
[Fix] clamav_linksigs: guard mktemp staging failure to prevent writing signature
156188
files into the filesystem root. When mktemp -d fails, the empty _staging
157189
variable caused "cp -f ... /" as root; pr#478

files/internals/lmd.lib.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,25 @@ get_filestat() {
155155
md5_hash="${_md5out%% *}"
156156
}
157157

158+
# Backslash MUST be escaped first — subsequent substitutions insert \ characters
159+
# that would otherwise get double-escaped.
160+
#
161+
# Two call patterns:
162+
# $(_json_escape_string "$x") — readable; forks a subshell (use in cold paths)
163+
# _json_escape_var "$x"; use "$_JSON_ESC_OUT" — zero forks (hot loops)
164+
_json_escape_var() {
165+
_JSON_ESC_OUT="${1//\\/\\\\}"
166+
_JSON_ESC_OUT="${_JSON_ESC_OUT//\"/\\\"}"
167+
_JSON_ESC_OUT="${_JSON_ESC_OUT//$'\t'/\\t}"
168+
_JSON_ESC_OUT="${_JSON_ESC_OUT//$'\r'/\\r}"
169+
_JSON_ESC_OUT="${_JSON_ESC_OUT//$'\n'/\\n}"
170+
}
171+
172+
_json_escape_string() {
173+
_json_escape_var "$1"
174+
printf '%s' "$_JSON_ESC_OUT"
175+
}
176+
158177
## Source vendored libraries
159178

160179
if [ -f "$tlog_lib" ]; then

files/internals/lmd_alert.sh

Lines changed: 52 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -304,9 +304,9 @@ _lmd_elk_post_hits() {
304304
[ -z "$_sig" ] && continue
305305
[[ "$_sig" == "#"* ]] && continue # skip header
306306
[ "$_quarpath" = "-" ] && _quarpath=""
307-
_sig=$(_alert_json_escape "$_sig")
308-
_filepath=$(_alert_json_escape "$_filepath")
309-
_hash=$(_alert_json_escape "${_hash:--}")
307+
_sig=$(_json_escape_string "$_sig")
308+
_filepath=$(_json_escape_string "$_filepath")
309+
_hash=$(_json_escape_string "${_hash:--}")
310310
$curl --output /dev/null --silent --show-error \
311311
-XPOST "$elk_url" \
312312
-H 'Content-Type: application/json' \
@@ -717,15 +717,28 @@ _lmd_render_json_list() {
717717
_lifecycle_read_meta "$_jl_scanid" || continue
718718
[ "$_first_active" != "1" ] && printf ","
719719
_first_active=0
720-
local _jl_path="${_meta_path//\\/\\\\}"
721-
_jl_path="${_jl_path//\"/\\\"}"
720+
local _jl_path _jl_engine _jl_sig
721+
_jl_path=$(_json_escape_string "$_meta_path")
722+
_jl_engine=$(_json_escape_string "${_meta_engine:--}")
723+
_jl_sig=$(_json_escape_string "${_meta_sig_version:--}")
722724
local _jl_pid="${_meta_pid:-0}"; [ "$_jl_pid" = "-" ] && _jl_pid=0
723725
local _jl_files="${_meta_total_files:-0}"; [ "$_jl_files" = "-" ] && _jl_files=0
724726
local _jl_hits="${_meta_hits:-0}"; [ "$_jl_hits" = "-" ] && _jl_hits=0
725727
local _jl_elapsed="${_meta_elapsed:-0}"; [ "$_jl_elapsed" = "-" ] && _jl_elapsed=0
726-
printf '\n {"scan_id": "%s", "state": "%s", "pid": %s, "path": "%s", "engine": "%s", "total_files": %s, "hits": %s, "elapsed": %s}' \
728+
# Live elapsed for running scans (matches _lifecycle_render_json_active)
729+
if [ "$_jl_elapsed" = "0" ] && [ -n "$_meta_started" ] && [ "$_meta_started" != "0" ]; then
730+
_jl_elapsed="$(( $(command date +%s) - _meta_started ))"
731+
fi
732+
local _jl_workers="${_meta_workers:-0}"; [ "$_jl_workers" = "-" ] && _jl_workers=0
733+
local _jl_prog_pos="${_meta_progress_pos:-0}"; [ "$_jl_prog_pos" = "-" ] && _jl_prog_pos=0
734+
local _jl_prog_total="${_meta_progress_total:-0}"; [ "$_jl_prog_total" = "-" ] && _jl_prog_total=0
735+
local _jl_eta
736+
_jl_eta=$(_lifecycle_compute_eta "${_meta_engine:-}" "$_jl_elapsed" "$_jl_prog_pos" "$_jl_prog_total")
737+
printf '\n {"scan_id": "%s", "state": "%s", "pid": %s, "path": "%s", "engine": "%s", "total_files": %s, "hits": %s, "elapsed": %s, "eta": %s, "workers": %s, "sig_version": "%s", "progress": {"position": %s, "total": %s}}' \
727738
"$_jl_scanid" "$_jl_state" "$_jl_pid" "$_jl_path" \
728-
"${_meta_engine:--}" "$_jl_files" "$_jl_hits" "$_jl_elapsed"
739+
"$_jl_engine" "$_jl_files" "$_jl_hits" "$_jl_elapsed" \
740+
"$_jl_eta" "$_jl_workers" "$_jl_sig" \
741+
"$_jl_prog_pos" "$_jl_prog_total"
729742
;;
730743
esac
731744
done
@@ -743,20 +756,27 @@ _lmd_render_json_list() {
743756
_lifecycle_read_meta "$_jl_stopped_sid" || continue
744757
[ "$_first_stopped" != "1" ] && printf ","
745758
_first_stopped=0
746-
local _jl_sp="${_meta_path//\\/\\\\}"
747-
_jl_sp="${_jl_sp//\"/\\\"}"
759+
local _jl_sp _jl_sstage _jl_shr
760+
_jl_sp=$(_json_escape_string "$_meta_path")
761+
_jl_sstage=$(_json_escape_string "${_meta_stage:--}")
762+
_jl_shr=$(_json_escape_string "${_meta_stopped_hr:-unknown}")
748763
local _jl_sfiles="${_meta_total_files:-0}"; [ "$_jl_sfiles" = "-" ] && _jl_sfiles=0
749764
local _jl_shits="${_meta_hits:-0}"; [ "$_jl_shits" = "-" ] && _jl_shits=0
750-
printf '\n {"scan_id": "%s", "stage": "%s", "total_files": %s, "hits": %s, "workers": "%s", "stopped_hr": "%s", "path": "%s"}' \
751-
"$_jl_stopped_sid" "${_meta_stage:--}" "$_jl_sfiles" "$_jl_shits" \
752-
"${_meta_workers:--}" "${_meta_stopped_hr:-unknown}" "$_jl_sp"
765+
local _jl_selapsed="${_meta_elapsed:-0}"; [ "$_jl_selapsed" = "-" ] && _jl_selapsed=0
766+
local _jl_sworkers="${_meta_workers:-0}"; [ "$_jl_sworkers" = "-" ] && _jl_sworkers=0
767+
printf '\n {"scan_id": "%s", "stage": "%s", "total_files": %s, "hits": %s, "elapsed": %s, "workers": %s, "stopped_hr": "%s", "path": "%s"}' \
768+
"$_jl_stopped_sid" "$_jl_sstage" "$_jl_sfiles" "$_jl_shits" \
769+
"$_jl_selapsed" "$_jl_sworkers" "$_jl_shr" "$_jl_sp"
753770
fi
754771
done
755772

756773
printf '\n ],\n "reports": ['
757774
local _first=1
758775
local _index_file="$sessdir/session.index"
759-
local _seen_ids=""
776+
# O(N) dedup between index and legacy passes; local -A is function-scoped
777+
# (bash 4.0+) and does not leak. String concat + glob match was O(N²) and
778+
# produced a visible hang on large indexes (~20K+ sessions).
779+
local -A _seen_ids
760780

761781
# Rebuild index from TSV files if missing (first call on upgraded server)
762782
if [ ! -f "$_index_file" ]; then
@@ -775,7 +795,7 @@ _lmd_render_json_list() {
775795
_ix_path="$_ix_tot_quar"
776796
_ix_tot_quar="0"
777797
fi
778-
_seen_ids="$_seen_ids $_ix_scanid"
798+
_seen_ids["$_ix_scanid"]=1
779799
if [ "$_first" != "1" ]; then printf ","; fi
780800
_first=0
781801
printf '\n {'
@@ -792,8 +812,15 @@ _lmd_render_json_list() {
792812
local _jquar="${_ix_tot_quar:-0}"
793813
[ "$_jquar" = "-" ] && _jquar="0"
794814
printf '"total_quarantined": %s, ' "$_jquar"
795-
if [ "$_ix_elapsed" = "-" ]; then printf '"elapsed_seconds": null'
796-
else printf '"elapsed_seconds": %s' "$_ix_elapsed"; fi
815+
if [ "$_ix_elapsed" = "-" ]; then printf '"elapsed_seconds": null, '
816+
else printf '"elapsed_seconds": %s, ' "$_ix_elapsed"; fi
817+
if [ -z "$_ix_path" ] || [ "$_ix_path" = "-" ]; then
818+
printf '"path": null'
819+
else
820+
# Out-param form avoids a subshell fork per report
821+
_json_escape_var "$_ix_path"
822+
printf '"path": "%s"' "$_JSON_ESC_OUT"
823+
fi
797824
printf '}'
798825
done < "$_index_file"
799826
fi
@@ -805,12 +832,12 @@ _lmd_render_json_list() {
805832
[ -f "$_file" ] || continue
806833
case "$_file" in *.tsv.*|*.hits.*) continue ;; esac
807834
local _sid="${_file##*session.}"
808-
case "$_seen_ids" in *" $_sid"*) continue ;; esac # skip if already in index
835+
[ -n "${_seen_ids[$_sid]:-}" ] && continue # skip if already in index
809836
# Clear vars before parsing (prevent stale data from prior iteration)
810-
scanid="" scan_start_hr="" scan_end_hr="" scan_et="" tot_files="" tot_hits="" tot_cl=""
837+
scanid="" scan_start_hr="" scan_end_hr="" scan_et="" tot_files="" tot_hits="" tot_cl="" hrspath=""
811838
_parse_session_metadata "$_file"
812839
[ -z "$scanid" ] && continue
813-
_seen_ids="$_seen_ids $scanid"
840+
_seen_ids["$scanid"]=1
814841
if [ "$_first" != "1" ]; then printf ","; fi
815842
_first=0
816843
printf '\n {'
@@ -827,6 +854,12 @@ _lmd_render_json_list() {
827854
printf '"total_quarantined": null, '
828855
if [ -z "$scan_et" ]; then printf '"elapsed_seconds": null, '
829856
else printf '"elapsed_seconds": %s, ' "$scan_et"; fi
857+
if [ -z "$hrspath" ]; then
858+
printf '"path": null, '
859+
else
860+
_json_escape_var "$hrspath"
861+
printf '"path": "%s", ' "$_JSON_ESC_OUT"
862+
fi
830863
printf '"source": "legacy"'
831864
printf '}'
832865
done

files/internals/lmd_hook.sh

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,6 @@ _LMD_HOOK_LOADED=1
1616
# shellcheck disable=SC2034
1717
LMD_HOOK_VERSION="1.0.0"
1818

19-
# _json_escape_string str — JSON-escape via bash param expansion (portable: $sed is "sed -E" on FreeBSD)
20-
# Backslash MUST be escaped first — invariant — else subsequent \t \r \n \" get re-escaped
21-
_json_escape_string() {
22-
local _in="$1"
23-
_in="${_in//\\/\\\\}"
24-
_in="${_in//\"/\\\"}"
25-
_in="${_in//$'\t'/\\t}"
26-
_in="${_in//$'\r'/\\r}"
27-
_in="${_in//$'\n'/\\n}"
28-
printf '%s' "$_in"
29-
}
30-
3119
# _scan_hook_validate hook_path — security validation (root-owned, not world-writable); returns 0/1
3220
_scan_hook_validate() {
3321
local _hook_path="$1"

files/internals/lmd_lifecycle.sh

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -384,6 +384,21 @@ _lifecycle_list_stopped() {
384384
return 0
385385
}
386386

387+
# _lifecycle_compute_eta engine elapsed prog_pos prog_total — echo ETA seconds, "null", or 0
388+
# clamav/clamdscan/yara → "null" (engines lack per-file progress data);
389+
# native with progress data → remaining seconds; otherwise "0".
390+
_lifecycle_compute_eta() {
391+
local _engine="$1" _elapsed="$2" _pp="$3" _pt="$4"
392+
case "$_engine" in
393+
clamav|clamdscan|yara) printf 'null'; return 0 ;;
394+
esac
395+
if [ "$_pp" -gt 0 ] 2>/dev/null && [ "$_pt" -gt 0 ] 2>/dev/null && [ "$_elapsed" -gt 0 ] 2>/dev/null; then
396+
printf '%d' "$(( (_elapsed * _pt / _pp) - _elapsed ))"
397+
else
398+
printf '0'
399+
fi
400+
}
401+
387402
# _lifecycle_render_json_active scanids — JSON array output (no jq dependency)
388403
_lifecycle_render_json_active() {
389404
local _ids="$1"
@@ -402,15 +417,11 @@ _lifecycle_render_json_active() {
402417
printf ',\n'
403418
fi
404419

405-
# JSON-escape strings (backslash and double-quote)
406-
local _j_path="${_meta_path//\\/\\\\}"
407-
_j_path="${_j_path//\"/\\\"}"
408-
409-
local _j_stages="${_meta_stages//\\/\\\\}"
410-
_j_stages="${_j_stages//\"/\\\"}"
411-
412-
local _j_sig="${_meta_sig_version//\\/\\\\}"
413-
_j_sig="${_j_sig//\"/\\\"}"
420+
local _j_path _j_stages _j_sig _j_engine
421+
_j_path=$(_json_escape_string "$_meta_path")
422+
_j_stages=$(_json_escape_string "${_meta_stages:-}")
423+
_j_sig=$(_json_escape_string "${_meta_sig_version:-}")
424+
_j_engine=$(_json_escape_string "${_meta_engine:--}")
414425

415426
local _i_pid="${_meta_pid:-0}"
416427
local _i_total="${_meta_total_files:-0}"
@@ -433,26 +444,21 @@ _lifecycle_render_json_active() {
433444
[ "$_i_prog_pos" = "-" ] && _i_prog_pos=0
434445
[ "$_i_prog_total" = "-" ] && _i_prog_total=0
435446

447+
local _eta_val
448+
_eta_val=$(_lifecycle_compute_eta "${_meta_engine:-}" "$_i_elapsed" "$_i_prog_pos" "$_i_prog_total")
449+
436450
printf ' {\n'
451+
# scan_id: canonical field name (v2.0.1+); scanid retained for one release
452+
# cycle for backward compat with existing consumers and removed in v2.1.0.
453+
printf ' "scan_id": "%s",\n' "$_scanid"
437454
printf ' "scanid": "%s",\n' "$_scanid"
438455
printf ' "state": "%s",\n' "$_state"
439456
printf ' "pid": %s,\n' "$_i_pid"
440457
printf ' "path": "%s",\n' "$_j_path"
441-
printf ' "engine": "%s",\n' "${_meta_engine:--}"
458+
printf ' "engine": "%s",\n' "$_j_engine"
442459
printf ' "total_files": %s,\n' "$_i_total"
443460
printf ' "hits": %s,\n' "$_i_hits"
444461
printf ' "elapsed": %s,\n' "$_i_elapsed"
445-
# ETA: null for engines without per-file progress (clamav, yara);
446-
# computed for native engine when progress data is available
447-
local _eta_val="0"
448-
case "${_meta_engine:-}" in
449-
clamav|clamdscan|yara) _eta_val="null" ;;
450-
*)
451-
if [ "$_i_prog_pos" -gt 0 ] && [ "$_i_prog_total" -gt 0 ] && [ "$_i_elapsed" -gt 0 ]; then
452-
_eta_val=$(( (_i_elapsed * _i_prog_total / _i_prog_pos) - _i_elapsed ))
453-
fi
454-
;;
455-
esac
456462
printf ' "eta": %s,\n' "$_eta_val"
457463
printf ' "workers": %s,\n' "$_i_workers"
458464
printf ' "stages": "%s",\n' "$_j_stages"

0 commit comments

Comments
 (0)