Version
main branch (v1.1.0 actually)
Describe what's wrong
Gravitino-server can hit java.lang.OutOfMemoryError: Metaspace after running for a long time. Once it happens, requests may start failing with 401/500 (often empty/incomplete body), which is misleading because the underlying issue is the JVM OOM.
Error message and/or stacktrace
{
"error": "Unable to process: Received HTTP 500 response with empty body",
"code": 500,
"type": "RESTException",
"detail": "org.apache.gravitino.exceptions.RESTException: Unable to process: Received HTTP 500 response with empty body\n\tat org.apache.gravitino.client.ErrorHandlers$RestErrorHandler.accept(ErrorHandlers.java:1333)\n\tat org.apache.gravitino.client.ErrorHandlers$CatalogErrorHandler.accept(ErrorHandlers.java:549)\n\tat org.apache.gravitino.client.ErrorHandlers$CatalogErrorHandler.accept(ErrorHandlers.java:488)\n\tat
....",
"instance": "demo_catalog2$demo_schema"
}
2026-04-22 15:22:17.699
WARN[Gravitino-webserver-41] [org.apache.gravitino.utils.Principalutils.doAs(Principalutils.java:50)]- doAs method occurs
an unexpected error
java.lang.OutofMemoryError: Metaspace
How to reproduce
When capturing jcmd outputs from an OOM’ed process, we saw many long-lived org.apache.gravitino.hive.client.HiveClientClassLoader instances. Each classloader retains hundreds of classes, consistent with classloader churn + class unloading being blocked.
Additional context
We suspect the issue is Hive client pool cache miss / churn, amplified by frequent token refresh, which repeatedly creates isolated Hive-client classloaders:
- We use a custom cloud IAM-based Hive authenticator which fetches/refreshes a short-lived token and injects it into the Hive client configuration.
- If the token (or derived config) participates in the Hive client pool cache key, each refresh results in a new key, causing cache misses and continuous creation of new
HiveClientFactory / HiveClientClassLoader.
- Even if old pools are evicted, class unloading may still be blocked by global/static caches,
ThreadLocals, shutdown hooks, etc., leading to Metaspace growth and eventually OOM.
This matches the “classloader cannot be reclaimed/unloaded” behavior that is known to happen in Hive/Hadoop ecosystems when classloader-bound resources are not fully cleaned.
Version
main branch (v1.1.0 actually)
Describe what's wrong
Gravitino-server can hit java.lang.OutOfMemoryError: Metaspace after running for a long time. Once it happens, requests may start failing with 401/500 (often empty/incomplete body), which is misleading because the underlying issue is the JVM OOM.
Error message and/or stacktrace
How to reproduce
When capturing
jcmdoutputs from an OOM’ed process, we saw many long-livedorg.apache.gravitino.hive.client.HiveClientClassLoaderinstances. Each classloader retains hundreds of classes, consistent with classloader churn + class unloading being blocked.Additional context
We suspect the issue is Hive client pool cache miss / churn, amplified by frequent token refresh, which repeatedly creates isolated Hive-client classloaders:
HiveClientFactory/HiveClientClassLoader.ThreadLocals, shutdown hooks, etc., leading to Metaspace growth and eventually OOM.This matches the “classloader cannot be reclaimed/unloaded” behavior that is known to happen in Hive/Hadoop ecosystems when classloader-bound resources are not fully cleaned.