Skip to content

Commit 2e2261a

Browse files
joechenrhclaude
andcommitted
test(dm): escalate to SIGKILL in wait_process_exit after 120s timeout
dm-master.test occasionally hangs on SIGHUP (stuck in etcd compaction or DDL coordination), causing cleanup_process to fail with "didn't exit after 120 seconds" and killing the entire test group. Instead of failing, escalate to SIGKILL so cleanup proceeds and the next test can start. Affects both classic and next-gen CI — seen in ha_cases, ha_cases3, and other HA tests where multiple masters are started/stopped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2caf0e5 commit 2e2261a

1 file changed

Lines changed: 9 additions & 2 deletions

File tree

dm/tests/_utils/wait_process_exit

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,12 @@ while [ $WAIT_COUNT -lt 120 ]; do
1616
((WAIT_COUNT++))
1717
done
1818

19-
echo "process $process didn't exit after 120 seconds, current processlist: $(ps aux | grep $process | grep -v 'grep' | grep -v 'wait_process_exit')"
20-
exit 1
19+
echo "process $process didn't exit after 120 seconds, escalating to SIGKILL"
20+
ps aux | grep $process | grep -v 'grep' | grep -v 'wait_process_exit' | awk '{print $2}' | xargs -r kill -9 2>/dev/null || true
21+
sleep 2
22+
if ps aux | grep $process | grep -v 'grep' | grep -v 'wait_process_exit' >/dev/null 2>&1; then
23+
echo "process $process still alive after SIGKILL"
24+
exit 1
25+
fi
26+
echo "process $process killed with SIGKILL"
27+
exit 0

0 commit comments

Comments
 (0)