Skip to content

Commit 3a8f1e3

Browse files
authored
Fix zombie process race condition in sidecar container (#1806)
* Fix zombie process race condition in sidecar container Replace custom reapZombies implementation with tini as PID 1 to eliminate race condition causing "waitid: no child processes" errors during template operations. Problem: - reapZombies() used syscall.Wait4(-1, ...) which reaps ANY child process - This interfered with normal parent-child process waiting in CmdExec.Run - Race condition: reapZombies could reap ytt/vendir/imgpkg processes before their actual parent (sidecar process) could wait for them - Result: cmd.Wait() failed with ECHILD ("waitid: no child processes") Solution: - Install and use tini as proper PID 1 init system in Dockerfile - Remove problematic reapZombies function entirely - tini correctly handles only orphaned processes, not normal children - Eliminates race condition while maintaining proper zombie cleanup Changes: - Dockerfile: Install tini package and set as entrypoint - sidecarexec.go: Remove reapZombies function and unused imports - deployment.yml: Add documentation comment about tini configuration This fixes intermittent failures during PackageRepository reconciliation and other template-heavy operations under concurrent load. Fixes: Race condition between zombie reaper and command execution Made-with: Cursor Signed-off-by: Marin Dzhigarov <m.dzhigarov@gmail.com> Made-with: Cursor * Fix ytt template comment syntax in deployment.yml Use ytt-specific comment syntax (#!) instead of regular comments (#) to avoid template compilation errors. Signed-off-by: Marin Dzhigarov <m.dzhigarov@gmail.com> Made-with: Cursor --------- Signed-off-by: Marin Dzhigarov <m.dzhigarov@gmail.com>
1 parent 84c5e28 commit 3a8f1e3

File tree

3 files changed

+6
-23
lines changed

3 files changed

+6
-23
lines changed

Dockerfile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ RUN --mount=type=cache,target=/root/.cache/go-build \
2020
# --- run image ---
2121
FROM photon:5.0
2222

23-
# Install openssh for git
24-
RUN tdnf install -y git openssh-clients
23+
# Install openssh for git and tini for proper PID 1 handling
24+
RUN tdnf install -y git openssh-clients tini
2525

2626
# Create the kapp-controller user in the root group, the home directory will be mounted as a volume
2727
RUN echo "kapp-controller:x:1000:0:/home/kapp-controller:/usr/sbin/nologin" > /etc/passwd
@@ -36,4 +36,5 @@ COPY --from=deps /workspace/out/* ./
3636
# Run as kapp-controller by default, will be overridden to a random uid on OpenShift
3737
USER 1000
3838
ENV PATH="/:${PATH}"
39-
ENTRYPOINT ["/kapp-controller"]
39+
# Use tini as PID 1 to properly handle zombie processes and signals
40+
ENTRYPOINT ["tini", "--", "/kapp-controller"]

cmd/controller/sidecarexec.go

Lines changed: 1 addition & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,16 @@
44
package main
55

66
import (
7-
"syscall"
8-
"time"
9-
107
"carvel.dev/kapp-controller/pkg/exec"
118
"carvel.dev/kapp-controller/pkg/sidecarexec"
12-
"github.com/go-logr/logr"
139
"sigs.k8s.io/controller-runtime/pkg/log/zap"
1410
)
1511

1612
func sidecarexecMain() {
1713
mainLog := zap.New(zap.UseDevMode(false)).WithName("kc-sidecarexec")
1814
mainLog.Info("start sidecarexec", "version", Version)
1915

20-
go reapZombies(mainLog)
16+
// Note: Zombie reaping is now handled by tini as PID 1
2117

2218
localCmdRunner := exec.NewPlainCmdRunner()
2319
opts := sidecarexec.ServerOpts{
@@ -36,18 +32,3 @@ func sidecarexecMain() {
3632
mainLog.Error(err, "Serving RPC")
3733
}
3834
}
39-
40-
func reapZombies(log logr.Logger) {
41-
log.Info("starting zombie reaper")
42-
43-
for {
44-
var status syscall.WaitStatus
45-
46-
pid, _ := syscall.Wait4(-1, &status, syscall.WNOHANG, nil)
47-
if pid <= 0 {
48-
time.Sleep(1 * time.Second)
49-
} else {
50-
continue
51-
}
52-
}
53-
}

config/config/deployment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ spec:
6767
- name: kapp-controller-sidecarexec
6868
image: kapp-controller
6969
args: ["--sidecarexec"]
70+
#! tini is already configured as ENTRYPOINT in Dockerfile
7071
resources:
7172
requests:
7273
cpu: 120m

0 commit comments

Comments
 (0)