SIGUSR1/SIGUSR2 handler clobbering breaking JVM processes

## Problem

`libvgpu.so` registers signal handlers for `SIGUSR1` and `SIGUSR2` using `signal()`, which overwrites any previously installed handlers without saving them. This causes JVM processes to crash with `SIGSEGV` in `Monitor::wait()` because the JVM uses `SIGUSR1`/`SIGUSR2` internally for GC safepoints and thread management.

Observed on HAMi volcano-vgpu nodes (hami-core mode) when running PyTorch jobs with a JVM component — the crash occurs at startup before CUDA initializes.

Additional risks from the current implementation:
- `libvgpu.so` intercepts `dlsym()`, which the JVM also uses for native library loading
- The `ENSURE_RUNNING()` spin loop can cause a deadlock if a Java thread holding a JVM monitor gets suspended


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGUSR1/SIGUSR2 handler clobbering breaking JVM processes #161

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SIGUSR1/SIGUSR2 handler clobbering breaking JVM processes #161

Description

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions