Skip to content

Commit 6a32df1

Browse files
committed
[Fix] pass-2 session glob skips legacy .html artifacts; issue #483
[Fix] Three glob-filter sites (_lmd_render_json_list in lmd_alert.sh; _resolve_latest_session_id and _view_session_list pass 2 in lmd_session.sh) used session.[0-9]* to enumerate legacy plaintext sessions and only excluded .tsv./.hits. variants — picking up stale session.N.html artifacts left by pre-on-demand-HTML code paths. _parse_session_metadata then slurped multi-MB HTML files line-by-line searching for FILE HIT LIST markers that never appear in HTML. Local reproducer with 3 HTML artifacts (one 23 MB, 491k lines): maldet -e list --format json took 12.4s; after fix 0.5s (24x). Filter now excludes *.html in all three sites. [New] tests/31-json-report.bats: regression test 33 fabricates a session.NNN.html with embedded "SCAN ID:" / "STARTED:" lines (pre-fix _parse_session_metadata would parse these and emit as a report); asserts absence from output. Guards the class — same filter bug repeated across text + json + latest-resolution paths. [Change] CHANGELOG + CHANGELOG.RELEASE: v2.0.1 Bug Fixes entry.
1 parent a1efba5 commit 6a32df1

File tree

5 files changed

+46
-4
lines changed

5 files changed

+46
-4
lines changed

CHANGELOG

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,13 @@ v2.0.1 | Mar 25 2026:
339339
[New] --json-report list: active[] and stopped[] entries now include
340340
started and started_epoch for consistency with reports[]; stopped[]
341341
also gains stopped_epoch alongside existing stopped_hr; issue #483
342+
[Fix] --report list / --json-report list / -e newest: pass-2 glob now
343+
skips legacy session.*.html artifacts (caught pre-fix by
344+
session.[0-9]* glob). _parse_session_metadata was slurping
345+
multi-MB HTML files line-by-line searching for break markers that
346+
never appear in HTML, producing 12s+ hangs on installs carrying
347+
pre-on-demand-HTML legacy artifacts. Fix at 3 sites in lmd_alert.sh
348+
+ lmd_session.sh
342349

343350
v1.6.6.1 | Feb 25 2025:
344351
[Fix] find_recentopts incorrectly escaping find options to the right of ( -mtime .. -ctime ); previously normalized by eval; issue #440, pr#442

CHANGELOG.RELEASE

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,3 +339,10 @@ v2.0.1 | Mar 25 2026:
339339
[New] --json-report list: active[] and stopped[] entries now include
340340
started and started_epoch for consistency with reports[]; stopped[]
341341
also gains stopped_epoch alongside existing stopped_hr; issue #483
342+
[Fix] --report list / --json-report list / -e newest: pass-2 glob now
343+
skips legacy session.*.html artifacts (caught pre-fix by
344+
session.[0-9]* glob). _parse_session_metadata was slurping
345+
multi-MB HTML files line-by-line searching for break markers that
346+
never appear in HTML, producing 12s+ hangs on installs carrying
347+
pre-on-demand-HTML legacy artifacts. Fix at 3 sites in lmd_alert.sh
348+
+ lmd_session.sh

files/internals/lmd_alert.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -855,7 +855,7 @@ _lmd_render_json_list() {
855855
local _file
856856
for _file in "$sessdir"/session.[0-9]*; do
857857
[ -f "$_file" ] || continue
858-
case "$_file" in *.tsv.*|*.hits.*) continue ;; esac
858+
case "$_file" in *.tsv.*|*.hits.*|*.html) continue ;; esac
859859
local _sid="${_file##*session.}"
860860
[ -n "${_seen_ids[$_sid]:-}" ] && continue # skip if already in index
861861
# Clear vars before parsing (prevent stale data from prior iteration)

files/internals/lmd_session.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,8 @@ _session_legacy_check() {
143143
local _latest _latest_file=""
144144
for _latest in "$sessdir"/session.[0-9]*; do
145145
[ -f "$_latest" ] || continue
146-
# Skip .tsv. and .hits. variants
147146
case "$_latest" in
148-
*.tsv.*|*.hits.*) continue ;;
147+
*.tsv.*|*.hits.*|*.html) continue ;;
149148
esac
150149
_latest_file="$_latest"
151150
done
@@ -382,7 +381,7 @@ view_report() {
382381
# Pass 2: Legacy plaintext session files (skip if TSV exists)
383382
for file in "$sessdir"/session.[0-9]*; do
384383
[ -f "$file" ] || continue
385-
case "$file" in *.tsv.*|*.hits.*) continue ;; esac
384+
case "$file" in *.tsv.*|*.hits.*|*.html) continue ;; esac
386385
local _sid="${file##*session.}"
387386
case "$_seen_ids" in *" $_sid"*) continue ;; esac
388387
_meta=$(command grep -E "^SCAN ID|^(TOTAL )?FILES|^(TOTAL )?HITS|^(TOTAL )?CLEANED|^TIME:|^STARTED:|^ELAPSED|^PATH" "$file")

tests/31-json-report.bats

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -674,3 +674,32 @@ EOF
674674
[[ "$output" =~ \"started_epoch\":[[:space:]]+$_started_epoch ]]
675675
[[ "$output" =~ \"stopped_epoch\":[[:space:]]+$_stopped_epoch ]]
676676
}
677+
678+
# --- Test 33: pass-2 glob skips legacy .html artifacts ---
679+
# Regression guard: pre-fix the session.[0-9]* glob caught stale
680+
# session.N.html files left behind by pre-on-demand-HTML code paths.
681+
# _parse_session_metadata would slurp multi-MB HTML files line-by-line
682+
# looking for break markers that don't exist in HTML, producing a 12s+
683+
# hang on installs with such artifacts. Fix: exclude *.html from pass 2
684+
# in lmd_alert.sh, lmd_session.sh:148 (_resolve_latest_session_id), and
685+
# lmd_session.sh:385 (text list pass 2).
686+
@test "--json-report list skips legacy session.*.html artifacts" {
687+
local sessdir="$LMD_INSTALL/sess"
688+
local sid="200102-0100.88882"
689+
# Fabricate an HTML artifact that LOOKS like a session (has SCAN ID
690+
# and STARTED lines). Pre-fix: _parse_session_metadata parses these,
691+
# emits a report[] entry. Post-fix: *.html filter skips it entirely.
692+
cat > "$sessdir/session.$sid.html" <<EOF
693+
<!DOCTYPE html><html><body>
694+
<p>SCAN ID: $sid</p>
695+
<p>STARTED: Jan 02 2020 01:00:00 +0000</p>
696+
<p>TOTAL FILES: 1</p>
697+
<p>TOTAL HITS: 0</p>
698+
</body></html>
699+
EOF
700+
run timeout 10 maldet --json-report list
701+
rm -f "$sessdir/session.$sid.html"
702+
assert_success
703+
# HTML scanid must NOT appear in reports[] — proves the glob filter caught it.
704+
refute_output --partial "\"$sid\""
705+
}

0 commit comments

Comments
 (0)