Skip to content

Cache EnvVars in ThreadLocal during sandboxed script/template execution#1568

Open
Vinamra-tech wants to merge 2 commits intojenkinsci:mainfrom
Vinamra-tech:caching
Open

Cache EnvVars in ThreadLocal during sandboxed script/template execution#1568
Vinamra-tech wants to merge 2 commits intojenkinsci:mainfrom
Vinamra-tech:caching

Conversation

@Vinamra-tech
Copy link
Copy Markdown
Contributor

@Vinamra-tech Vinamra-tech commented Apr 12, 2026

Caches EnvVars in a ThreadLocal for the duration of a single sandboxed
script/template execution to avoid redundant build.getEnvironment(listener)
calls on every variable lookup.

Previously, getEnvironment() was called on every variable reference within
a sandboxed Groovy email template — once in permitsMethod() and again in
methodMissing(). This is an expensive I/O and CPU operation (merging node
vars, job properties, parameters, etc.). In large templates with many
variables, this results in significant redundant processing.

Changes:

  • EmailExtScriptTokenMacroWhitelist — added static final ThreadLocal<EnvVars>,
    beginExecution(), endExecution(), and cache-first lookup in permitsMethod()
  • EmailExtScript — cache-first lookup in methodMissing() with fallback
  • ScriptContentbeginExecution/endExecution wrapped in try-finally
    around both sandbox execution points (renderTemplate and executeScript)
  • EmailExtScriptTokenMacroWhiteListTest — new test class (JUnit 5)

The cache lives purely in Java (not in the Groovy Binding), so scripts
cannot touch or mutate it. Cleanup is guaranteed via finally, preventing
any cross-execution leakage or memory retention.

Closes #1548
Follows up on #1542 (closed)

Testing done

Added EmailExtScriptTokenMacroWhiteListTest covering:

  • testGetEnvironmentCalledOnlyOnce — verifies that
    getEnvironment() is called exactly once across multiple cache lookups
  • testCacheIsClearedAfterEndExecution — verifies ENV_CACHE.get()
    returns null after endExecution() is called
  • testFallbackWhenCacheNotInitialized — verifies graceful behavior
    when beginExecution() was never called
  • testProfilingGain (@Tag("performance")) — benchmarks baseline vs
    cached execution over 100 macro evaluations with 5 JVM warm-up runs

Profiling results (100 macro evaluations, 5ms simulated I/O per
getEnvironment() call, 5 JVM warm-up runs discarded):

Scenario Time
Baseline (no cache) 520 ms
ThreadLocal cache 5 ms
Reduction ~99%

All 4 tests pass (BUILD SUCCESS).

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@Vinamra-tech Vinamra-tech requested a review from a team as a code owner April 12, 2026 07:53
@ArpanC6
Copy link
Copy Markdown

ArpanC6 commented Apr 14, 2026

I really like this optimization. The way you use ThreadLocal caching is really clean. The try finally guarantee in ScriptContent is great because it makes sure there is no cross execution leakage.

I have a thoughts on this

  1. The ENV_CACHE is public now. I think this is a problem because scripts run in a sandbox. If we expose the ThreadLocal directly it feels like a risk. Would it be better to make it private and only expose beginExecution and endExecution? This way other code cannot. Read the cache by accident when it is not supposed to.

  2. The testProfilingGain function does not actually check if anything is true. It just prints the results. If the cache stops working for some reason this test will still pass. I think it would be better if we added a threshold assertion. For example we could say that cachedMs has to be less than baselineMs divided by 10. This way the test will actually make sure the cache is working.

  3. There is one path in the beginExecution function that is not covered by tests. This is the path where an IOException or InterruptedException happens. The CI coverage report also says that lines 77-78 are not covered. If we add a test that simulates getEnvironment throwing an exception we can cover this path.

  4. This is a thing but other test classes, in this code use the @DisplayName annotation to make them easier to read. It might be an idea to add this here too just to be consistent.

Overall I think the main idea of this optimization is really good and the performance numbers are very impressive.

@Vinamra-tech
Copy link
Copy Markdown
Contributor Author

I really like this optimization. The way you use ThreadLocal caching is really clean. The try finally guarantee in ScriptContent is great because it makes sure there is no cross execution leakage.

I have a thoughts on this

  1. The ENV_CACHE is public now. I think this is a problem because scripts run in a sandbox. If we expose the ThreadLocal directly it feels like a risk. Would it be better to make it private and only expose beginExecution and endExecution? This way other code cannot. Read the cache by accident when it is not supposed to.
  2. The testProfilingGain function does not actually check if anything is true. It just prints the results. If the cache stops working for some reason this test will still pass. I think it would be better if we added a threshold assertion. For example we could say that cachedMs has to be less than baselineMs divided by 10. This way the test will actually make sure the cache is working.
  3. There is one path in the beginExecution function that is not covered by tests. This is the path where an IOException or InterruptedException happens. The CI coverage report also says that lines 77-78 are not covered. If we add a test that simulates getEnvironment throwing an exception we can cover this path.
  4. This is a thing but other test classes, in this code use the @DisplayName annotation to make them easier to read. It might be an idea to add this here too just to be consistent.

Overall I think the main idea of this optimization is really good and the performance numbers are very impressive.

Hey @ArpanC6, thanks for the review! Addressing each point:

1. ENV_CACHE visibility — Valid, fixed.
Agreed completely. ENV_CACHE is now private with a package-private
getCachedEnvVars() accessor. No external code can reach the ThreadLocal
directly anymore.

2. Threshold assertion in testProfilingGain — Respectfully disagree.
The test is intentionally tagged @tag("performance") and kept
assertion-free. Adding a hard threshold like cachedMs < baselineMs / 10
would make it flaky on shared CI nodes where GC pauses, CPU contention,
and JVM warm-up variance can easily skew timing results. The purpose of
this test is local observability during profiling runs, not a correctness
gate — that role is already covered by testGetEnvironmentCalledOnlyOnce
which verifies via Mockito that getEnvironment() is called exactly once.
If the cache stops working, that test fails. The profiling test doesn't
need to duplicate that responsibility.

3. Exception path coverage — Valid, fixed.
Added testBeginExecutionHandlesIOException and
testBeginExecutionHandlesInterruptedException. Both verify that the cache
remains null when getEnvironment() throws, confirming the fallback in
permitsMethod() will kick in gracefully. All 6 tests pass.

4. @DisplayName — Can't find precedent in the codebase.
I went through the existing test classes and none of them use
@DisplayName. Adding it here would introduce inconsistency rather than
improve it. If there's a specific test class you're referencing that uses
it, happy to follow that pattern — but I'd rather stay consistent with
what's already there.

@ArpanC6
Copy link
Copy Markdown

ArpanC6 commented Apr 15, 2026

I really like this optimization. The way you use ThreadLocal caching is really clean. The try finally guarantee in ScriptContent is great because it makes sure there is no cross execution leakage.
I have a thoughts on this

  1. The ENV_CACHE is public now. I think this is a problem because scripts run in a sandbox. If we expose the ThreadLocal directly it feels like a risk. Would it be better to make it private and only expose beginExecution and endExecution? This way other code cannot. Read the cache by accident when it is not supposed to.
  2. The testProfilingGain function does not actually check if anything is true. It just prints the results. If the cache stops working for some reason this test will still pass. I think it would be better if we added a threshold assertion. For example we could say that cachedMs has to be less than baselineMs divided by 10. This way the test will actually make sure the cache is working.
  3. There is one path in the beginExecution function that is not covered by tests. This is the path where an IOException or InterruptedException happens. The CI coverage report also says that lines 77-78 are not covered. If we add a test that simulates getEnvironment throwing an exception we can cover this path.
  4. This is a thing but other test classes, in this code use the @DisplayName annotation to make them easier to read. It might be an idea to add this here too just to be consistent.

Overall I think the main idea of this optimization is really good and the performance numbers are very impressive.

Hey @ArpanC6, thanks for the review! Addressing each point:

1. ENV_CACHE visibility — Valid, fixed. Agreed completely. ENV_CACHE is now private with a package-private getCachedEnvVars() accessor. No external code can reach the ThreadLocal directly anymore.

2. Threshold assertion in testProfilingGain — Respectfully disagree. The test is intentionally tagged @tag("performance") and kept assertion-free. Adding a hard threshold like cachedMs < baselineMs / 10 would make it flaky on shared CI nodes where GC pauses, CPU contention, and JVM warm-up variance can easily skew timing results. The purpose of this test is local observability during profiling runs, not a correctness gate — that role is already covered by testGetEnvironmentCalledOnlyOnce which verifies via Mockito that getEnvironment() is called exactly once. If the cache stops working, that test fails. The profiling test doesn't need to duplicate that responsibility.

3. Exception path coverage — Valid, fixed. Added testBeginExecutionHandlesIOException and testBeginExecutionHandlesInterruptedException. Both verify that the cache remains null when getEnvironment() throws, confirming the fallback in permitsMethod() will kick in gracefully. All 6 tests pass.

4. @DisplayName — Can't find precedent in the codebase. I went through the existing test classes and none of them use @DisplayName. Adding it here would introduce inconsistency rather than improve it. If there's a specific test class you're referencing that uses it, happy to follow that pattern — but I'd rather stay consistent with what's already there.

Thanks for addressing all the points and for the detailed explanation
on the profiling test. The reasoning makes sense testGetEnvironmentCalledOnlyOnce
already guards the correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: Cache EnvVars during macro expansion using ThreadLocal to avoid redundant getEnvironment() calls

2 participants