Skip to content

[TINKERPOP 3107] Remove Groovy dependency from default server initialization#3384

Open
Cole-Greer wants to merge 7 commits intomasterfrom
TINKERPOP-3107
Open

[TINKERPOP 3107] Remove Groovy dependency from default server initialization#3384
Cole-Greer wants to merge 7 commits intomasterfrom
TINKERPOP-3107

Conversation

@Cole-Greer
Copy link
Copy Markdown
Contributor

@Cole-Greer Cole-Greer commented Apr 14, 2026

https://issues.apache.org/jira/browse/TINKERPOP-3107

Gremlin Server currently relies on Groovy init scripts for server initialization — binding traversal sources, running lifecycle hooks, and loading data. This PR introduces a Groovy-free initialization path so that Groovy can be disabled by default in all shipped server configs. Groovy remains available as an
opt-in for backward compatibility.

Changes

Three new YAML configuration mechanisms replace the Groovy init scripts:

Auto-created TraversalSources — After graphs are loaded, a default TraversalSource is automatically created for each graph that doesn't have an explicit traversalSources entry. A graph named graph gets g; others get g_. A minimal config with just a graphs section is now fully functional.

Declarative traversalSources — A new YAML section for explicit TraversalSource creation with optional strategy configuration via a Gremlin query:
yaml
traversalSources: {
g: {graph: graph},
gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}

Java-based lifecycleHooks — A new YAML section for configuring LifeCycleHook implementations via reflection, replacing Groovy-based hook creation:
yaml
lifecycleHooks:

  • className: org.apache.tinkerpop.gremlin.server.util.TinkerFactoryDataLoader
    config: {graph: graph, dataset: modern}

Supporting changes

  • TinkerFactoryDataLoader — built-in LifeCycleHook that loads TinkerFactory sample datasets (modern, classic, crew, grateful, sink, airroutes) without Groovy
  • LifeCycleHook interface — added default void init(Map) for configuration and GraphManager to Context
  • Settings — new TraversalSourceSettings and LifeCycleHookSettings config classes with SnakeYAML type descriptors
  • ServerTestHelper — null-safe for configs without gremlin-groovy in scriptEngines
  • Deprecated the Groovy-based LifeCycleHook/TraversalSource creation path with startup warnings
  • New gremlin-server-airroutes.yaml shipped config

Config migrations

All default configs under gremlin-server/conf/ updated to remove gremlin-groovy from scriptEngines. All Docker integration test configs (docker/gremlin-server/) and JVM test configs migrated to use traversalSources + lifecycleHooks instead of generate-all.groovy. The gremlin-console test infrastructure
similarly migrated.

Deleted files

  • gremlin-server/src/test/scripts/generate-all.groovy — replaced by YAML config
  • gremlin-server/src/test/scripts/test-server-start.groovy — dead code
  • gremlin-console/src/test/resources/.../generate.groovy — replaced by YAML config

Testing

  • TinkerFactoryDataLoaderTest — all 6 datasets, error cases (missing graph, non-TinkerGraph, unknown dataset, missing config)
  • ServerGremlinExecutorTest — auto-creation, explicit suppression, lifecycle hook instantiation, resolveLanguage logic
  • SettingsTest — YAML parsing for traversalSources and lifecycleHooks, defaults when absent
  • GremlinServerConfigIntegrateTest — parameterized smoke test that boots each shipped config and runs a query
  • Existing JVM integration tests exercise the migrated gremlin-server-integration.yaml
  • GLV integration tests exercise the migrated Docker configs

VOTE +1

----
traversalSources: {
g: {graph: graph},
gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)", language: "gremlin-lang"}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "query" the right terminology here? it's not really a functioning query - it's only allowing configuration steps i assume? maybe it's a "sourceConfiguration"?? you use "expression" below - that's better - like, "gremlinExpression"?? "sourceExpression"??

anything stopping someone from doing stuff other than configuration steps? what happens if i use another language and do g.V()....? seems like we might want some controls there if there aren't any already.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to "gremlinExpression".

The only control that currently exists is whatever the script engine outputs from that "gremlinExpression" must be castable to TraversalSource. I believe that should block most accidental misuse, without really restricting any legitimate configuration.

I'm personally not as worried about additional enforced restrictions. My assumption is anyone with access to the server config is already a trusted individual. The ability to run a gremlin query during traversal source initialization is already a much weaker tool than the groovy init scripts of old.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the protection of TraversalSource is fine. the issue i had here was definitely with trusted individuals trying to do "interesting" things that we really don't want to support like, g.V().drop() as a "convenient" way to clean a graph for some test rig they got going on. or to load data, or whatever. just don't want to see that kind of usage pop up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they really wanted to try, they could probably sneak in multiple traversals in that single expression, as long as the last one returned a TraversalSource. I hope that's obviously enough of a bad idea that users aren't drawn to it, especially if we give a more convenient and powerful option via the lifecycle hooks.

g: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}
----
+
Alternatively, the deprecated `ScriptFileGremlinPlugin` approach can still be used with a Groovy script:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still wonder if we should document any of the Groovy configuration. Other than to have a callout that says that the groovy stuff from 3.x is still there, I wonder if we still need all this stuff. I don't think we should offer an "Alternatively, " in 4.x.

[[script-execution]]
==== Protecting Script Execution

NOTE: As of 4.0.0, Groovy is no longer required for server initialization. The `scriptEngines` configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing back to my last comment, this whole section could just go away rather than muddy up the conversation with callouts and "don't do this, but here it is in case you want to" discussion.


=== Upgrading for Users

==== Gremlin Server Initialization Without Groovy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all "what" changed - it's virtually all Reference Documentation. there's nothing about "why". there's nothing about the "benefit". nothing that prepares the reader for what's coming in the "future" with a server without Groovy. get people thinking forward. like, "Auto-created TraversalSources" - like, doesn't that "simplify" the configuration wonderfully? what about the "challenges" - like, how to maintain provider flexibility without a full scripting engine? where are the words that make this major new change have some life?

.map(kv -> (LifeCycleHook) kv.getValue())
.collect(Collectors.toList());

// process declarative traversalSources from YAML config - track which graphs have explicit entries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we allowing both script and declarative configurations to live in the same yaml? If so, is that a sharp edge that might mess someone up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend for users to mix and match groovy init scripts with the declarative approach, but I hadn't intended to restrict it either. My preference would be for users to completely migrate to the new configs in one atomic step, however I don't see any reason to prevent users from say converting their traversalSource configuration to the declarative system, while retaining an old groovy LifeCycleHook for a while to do data loading.

In the long-term, they will need to move everything to the new system, but it seems reasonable to allow a progressive migration.

I will followup on the order of operations there to see if there are likely to be bad interactions between the 2 systems.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's probably fine for them to coexist as long as they don't introduce unexpected problems or have some undocumented behavior. so, if one overwrites the other or something then at least make sure that's clear in the docs. i wouldn't be sad if the server refused to start and folks were forced into one config path or the other, but i don't think it's essential.

@@ -0,0 +1,21 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i immediately see what this is for in resources but we have a gremlin-server-min.yaml: https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server-min.yaml

should this one replace that one?

@vkagamlyk
Copy link
Copy Markdown
Contributor

Can you add unit test to verify gremlin-groovy is disabled? something like

 @Test
    public void gremlinLang() {
        final Cluster cluster = Cluster.build().addContactPoint("localhost").port(8182).create();
        final Client client = cluster.connect();

        // should handle traversal
        final GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster, "gmodern"));
        long count = g.V().count().next();
        assertEquals(6, count);

        // should handle script
        final RequestOptions noLang = RequestOptions.build().addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", noLang).one().getLong();
        assertEquals(6, count);

        // should handle script with explicit engine name
        final RequestOptions requestOptions = RequestOptions.build().language("gremlin-lang").addAlias("g", "gmodern").create();
        count = client.submit("g.V().count().next()", requestOptions).one().getLong();
        assertEquals(6, count);

        // should not allow suspicious queries
        try {
            client.submit("2+2", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        // in gremlin-groovy '1g' is valid BigDecimal value, but in gremlin-lang should be '1m';
        // the easiest way to determine which script engine the request was executed on
        try {
            client.submit("g.inject(1g)", requestOptions).one().getLong();
            fail("should throw exception");
        } catch (final CompletionException e) {
            final Throwable inner = e.getCause();
            assertTrue(inner instanceof ResponseException);
            assertEquals(ResponseStatusCode.SERVER_ERROR_EVALUATION, ((ResponseException) inner).getResponseStatusCode());
        }

        final BigDecimal one = (BigDecimal)client.submit("g.inject(1m)", requestOptions).one().getObject();
        assertEquals(BigDecimal.ONE, one);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants