Skip to content

CSHARP-5655: Support regular expressions in $replaceAll search string and $split delimiter#1955

Open
sanych-sun wants to merge 9 commits intomongodb:mainfrom
sanych-sun:csharp5655
Open

CSHARP-5655: Support regular expressions in $replaceAll search string and $split delimiter#1955
sanych-sun wants to merge 9 commits intomongodb:mainfrom
sanych-sun:csharp5655

Conversation

@sanych-sun
Copy link
Copy Markdown
Member

No description provided.

@sanych-sun sanych-sun requested a review from a team as a code owner April 15, 2026 22:24
@sanych-sun sanych-sun requested review from ajcvickers and Copilot and removed request for Copilot April 15, 2026 22:24
@sanych-sun sanych-sun added the feature Adds new user-facing functionality. label Apr 15, 2026
Copilot AI review requested due to automatic review settings April 15, 2026 22:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds LINQ3 translation support for using regular expressions as the $replaceAll.find argument and as the $split delimiter, along with accompanying serializer deduction and test coverage.

Changes:

  • Implemented $replaceAll translation for string.Replace(...) and Regex.Replace(...) (including regex literals and regex values stored as BSON regular expressions).
  • Extended $split translation to support regex delimiters via Regex.Split(...) and regex instance .Split(...) when stored as BSON regular expressions, plus additional string.Split overload handling.
  • Added/updated unit + integration tests and updated expression preprocessing in test helpers.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/MongoDB.Driver.Tests/Linq/TestHelpers.cs Preprocesses test expressions using LinqExpressionPreprocessor before serializer finding/translation.
tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Translators/…/SplitMethodToAggregationExpressionTranslatorTests.cs New unit tests for $split translation across many string.Split overloads and regex delimiters.
tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Translators/…/ReplaceMethodToAggregationExpressionTranslatorTests.cs New unit tests for $replaceAll translation including regex-based replace.
tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/SerializerFinders/StringTests.cs Expands serializer-finder coverage for Replace/Split and adds unsupported-case assertions.
tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/SerializerFinders/RegexTests.cs Expands serializer-finder coverage for regex Replace/Split and adds unsupported-case assertions.
tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Jira/CSharp5928Tests.cs Updates a non-translatable-expression regression to use unsupported Regex.Replace overload and asserts message content.
tests/MongoDB.Driver.Tests/Linq/Integration/StringSplitTests.cs New integration tests validating $split output for supported string.Split shapes.
tests/MongoDB.Driver.Tests/Linq/Integration/StringReplaceTests.cs New integration tests validating $replaceAll for string.Replace (feature-gated).
tests/MongoDB.Driver.Tests/Linq/Integration/RegexSplitTests.cs New integration tests validating regex delimiter support in $split (feature-gated).
tests/MongoDB.Driver.Tests/Linq/Integration/RegexReplaceTests.cs New integration tests validating regex find support in $replaceAll (feature-gated).
src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/SplitMethodToAggregationExpressionTranslator.cs Adds regex delimiter support and extends handling of additional string.Split overloads.
src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/ReplaceMethodToAggregationExpressionTranslator.cs New translator implementing $replaceAll for string/regex replace operations.
src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/MethodCallExpressionToAggregationExpressionTranslator.cs Routes method calls named Replace to the new replace translator.
src/MongoDB.Driver/Linq/Linq3Implementation/SerializerFinders/SerializerFinderVisitMethodCall.cs Deduces serializers for Replace and extends Split deduction to regex splits.
src/MongoDB.Driver/Linq/Linq3Implementation/Reflection/StringMethod.cs Adds Replace reflection metadata and expands split overload sets (char + string separator overloads).
src/MongoDB.Driver/Linq/Linq3Implementation/Reflection/RegexMethod.cs Adds Replace/Split reflection metadata and overload sets for serializer finding/translation.
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Expressions/AstExpression.cs Adds AstExpression.ReplaceAll(...) factory method.
src/MongoDB.Driver/Core/Misc/Feature.cs Fixes lookup feature typos and introduces ReplaceAll, ReplaceAllWithRegex, and SplitWithRegex feature flags.
Comments suppressed due to low confidence (1)

src/MongoDB.Driver/Linq/Linq3Implementation/Translators/ExpressionToAggregationExpressionTranslators/MethodTranslators/SplitMethodToAggregationExpressionTranslator.cs:73

  • For the string separator overloads, a null or empty separator has special meaning in .NET (and MongoDB $split also rejects empty delimiters). Currently separatorString/separatorStrings[0] are used without validating null/empty, which can generate an invalid $split delimiter or change semantics. Please add explicit validation (e.g., reject null/empty with ExpressionNotSupportedException, or implement equivalent semantics if intended).
                else if (method.IsOneOf(StringMethod.SplitWithStringOverloads))
                {
                    var separatorsExpression = arguments[0];
                    var separatorString = separatorsExpression.GetConstantValue<string>(containingExpression: expression);
                    delimiter = separatorString;
                }
                else if (method.IsOneOf(StringMethod.SplitWithStringsOverloads))
                {
                    var separatorsExpression = arguments[0];
                    var separatorStrings = separatorsExpression.GetConstantValue<string[]>(containingExpression: expression);
                    if (separatorStrings.Length != 1)
                    {
                        goto notSupported;
                    }
                    delimiter = separatorStrings[0];
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/MongoDB.Driver.Tests/Linq/TestHelpers.cs Outdated
Comment on lines +36 to +41
if (method.Is(StringMethod.ReplaceWithString))
{
inputAst = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object).Ast;
findAst = ExpressionToAggregationExpressionTranslator.Translate(context, arguments[0]).Ast;
replacementAst = ExpressionToAggregationExpressionTranslator.Translate(context, arguments[1]).Ast;
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For string.Replace(string oldValue, string newValue), .NET treats newValue == null as string.Empty. Translating a null replacement via ExpressionToAggregationExpressionTranslator will emit BSON null, which diverges from .NET semantics and is likely invalid for $replaceAll. Consider normalizing a constant null replacement to "" (or explicitly rejecting null with a clear ExpressionNotSupportedException).

Copilot uses AI. Check for mistakes.
@sanych-sun sanych-sun requested a review from adelinowona April 16, 2026 00:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
TestHelpers.MakeLambda<MyModel, string[]>(model => model.StringField.Split(__singleCharsSeparator, 3, StringSplitOptions.RemoveEmptyEntries)),
"{ $slice: [{ $filter : { input : { $split : [{ $getField : { field : 'StringField', input : '$$ROOT' } }, ','] }, as : 'item', cond : { $ne : ['$$item', ''] } } }, 3]}"
],
// TODO: Missed cases when StringSplitOptions.TrimEntries. Currently, it is silently ignored. Need jira ticket for that.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a ticket filed for this? Also why not do it in the current PR? Because currently I don't think we want to ship this code without TrimEntries or at least something that rejects unknown flag combinations

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a ticket for this. String.Split translation is exiting functionality, so I don't want to introduce the changes just in this PR, because it theoretically can break someone's code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare to the pre-existing IsMatchMethodToAggregationExpressionTranslator.cs:35-39:

if (inputTranslation.Serializer is IRepresentationConfigurable representationConfigurable &&                                                                                                                     
     representationConfigurable.Representation != BsonType.String)                                                                                                                                                
 {                                                                                                                                                                                                                
     throw new ExpressionNotSupportedException(inputExpression, expression, because: "input expression is not represented as a string");
 } 

The new Replace and Split translators never perform this check. For a field like [BsonRepresentation(BsonType.Int32)] string Code, a user calling d.Code.Replace("1", "2") gets a cryptic server-side error at query time instead of a clear translation-time error naming the representation mismatch.

Should the new code apply the same guard?

Copy link
Copy Markdown
Member Author

@sanych-sun sanych-sun Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do. BTW, I think we inconsistently checking for the representation in translators. We might want to add this as a best practice for all translators.

var findTranslation = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object);
if (SerializationHelper.GetRepresentation(findTranslation.Serializer) != BsonType.RegularExpression)
{
throw new ExpressionNotSupportedException(expression);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could benefit from an actionable error message. Maybe add a because context? something like "regex instance not represented as regex"

var delimiterTranslation = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object);
if (SerializationHelper.GetRepresentation(delimiterTranslation.Serializer) != BsonType.RegularExpression)
{
goto notSupported;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know goto notSupported flows to the generic ExpressionNotSupportedException but maybe some of this exceptions could benefit from a because context for a more actionable error.

public static AstExpression RegexMatch(AstExpression input, string pattern, string options)
=> new AstRegexExpression(AstRegexOperator.Match, input, pattern, options);

public static AstExpression ReplaceAll(AstExpression input, AstExpression find, AstExpression replace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename replace to replacement for consistency since AstReplaceAllExpression uses that.

{
var options = arguments.Count == 4 ? arguments[3].GetConstantValue<RegexOptions>(expression) : RegexOptions.None;
var pattern = arguments[1].GetConstantValue<string>(expression);
findAst = new BsonRegularExpression(new Regex(pattern, options));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a regex here? There is an existing BsonRegularExpression(string pattern, string options) overload.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because options is enum here, not a string as BsonRegularExpression ctor wants.

@sanych-sun sanych-sun requested a review from adelinowona April 17, 2026 18:41
// validate if replacement string uses substitutions: $1, ${1}, ${name}, etc
// more details on https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#substitutions
var matches = Regex.Matches(replacement, @"(?<escaped>\$\$)|\$(?<substitutions>\d+|\{[^}]+\}|&|`|'|\+|_)");
for (var i = 0; i < matches.Count; i++)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot use foreach or LINQ here, because in net472 MatchesCollection does not implement IEnumerable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Adds new user-facing functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants