CSHARP-5655: Support regular expressions in $replaceAll search string and $split delimiter#1955
CSHARP-5655: Support regular expressions in $replaceAll search string and $split delimiter#1955sanych-sun wants to merge 9 commits intomongodb:mainfrom
Conversation
… and $split delimiter
There was a problem hiding this comment.
Pull request overview
Adds LINQ3 translation support for using regular expressions as the $replaceAll.find argument and as the $split delimiter, along with accompanying serializer deduction and test coverage.
Changes:
- Implemented
$replaceAlltranslation forstring.Replace(...)andRegex.Replace(...)(including regex literals and regex values stored as BSON regular expressions). - Extended
$splittranslation to support regex delimiters viaRegex.Split(...)and regex instance.Split(...)when stored as BSON regular expressions, plus additionalstring.Splitoverload handling. - Added/updated unit + integration tests and updated expression preprocessing in test helpers.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/MongoDB.Driver.Tests/Linq/TestHelpers.cs | Preprocesses test expressions using LinqExpressionPreprocessor before serializer finding/translation. |
| tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Translators/…/SplitMethodToAggregationExpressionTranslatorTests.cs | New unit tests for $split translation across many string.Split overloads and regex delimiters. |
| tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Translators/…/ReplaceMethodToAggregationExpressionTranslatorTests.cs | New unit tests for $replaceAll translation including regex-based replace. |
| tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/SerializerFinders/StringTests.cs | Expands serializer-finder coverage for Replace/Split and adds unsupported-case assertions. |
| tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/SerializerFinders/RegexTests.cs | Expands serializer-finder coverage for regex Replace/Split and adds unsupported-case assertions. |
| tests/MongoDB.Driver.Tests/Linq/Linq3Implementation/Jira/CSharp5928Tests.cs | Updates a non-translatable-expression regression to use unsupported Regex.Replace overload and asserts message content. |
| tests/MongoDB.Driver.Tests/Linq/Integration/StringSplitTests.cs | New integration tests validating $split output for supported string.Split shapes. |
| tests/MongoDB.Driver.Tests/Linq/Integration/StringReplaceTests.cs | New integration tests validating $replaceAll for string.Replace (feature-gated). |
| tests/MongoDB.Driver.Tests/Linq/Integration/RegexSplitTests.cs | New integration tests validating regex delimiter support in $split (feature-gated). |
| tests/MongoDB.Driver.Tests/Linq/Integration/RegexReplaceTests.cs | New integration tests validating regex find support in $replaceAll (feature-gated). |
| src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/SplitMethodToAggregationExpressionTranslator.cs | Adds regex delimiter support and extends handling of additional string.Split overloads. |
| src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/ReplaceMethodToAggregationExpressionTranslator.cs | New translator implementing $replaceAll for string/regex replace operations. |
| src/MongoDB.Driver/Linq/Linq3Implementation/Translators/…/MethodCallExpressionToAggregationExpressionTranslator.cs | Routes method calls named Replace to the new replace translator. |
| src/MongoDB.Driver/Linq/Linq3Implementation/SerializerFinders/SerializerFinderVisitMethodCall.cs | Deduces serializers for Replace and extends Split deduction to regex splits. |
| src/MongoDB.Driver/Linq/Linq3Implementation/Reflection/StringMethod.cs | Adds Replace reflection metadata and expands split overload sets (char + string separator overloads). |
| src/MongoDB.Driver/Linq/Linq3Implementation/Reflection/RegexMethod.cs | Adds Replace/Split reflection metadata and overload sets for serializer finding/translation. |
| src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Expressions/AstExpression.cs | Adds AstExpression.ReplaceAll(...) factory method. |
| src/MongoDB.Driver/Core/Misc/Feature.cs | Fixes lookup feature typos and introduces ReplaceAll, ReplaceAllWithRegex, and SplitWithRegex feature flags. |
Comments suppressed due to low confidence (1)
src/MongoDB.Driver/Linq/Linq3Implementation/Translators/ExpressionToAggregationExpressionTranslators/MethodTranslators/SplitMethodToAggregationExpressionTranslator.cs:73
- For the string separator overloads, a null or empty separator has special meaning in .NET (and MongoDB $split also rejects empty delimiters). Currently separatorString/separatorStrings[0] are used without validating null/empty, which can generate an invalid $split delimiter or change semantics. Please add explicit validation (e.g., reject null/empty with ExpressionNotSupportedException, or implement equivalent semantics if intended).
else if (method.IsOneOf(StringMethod.SplitWithStringOverloads))
{
var separatorsExpression = arguments[0];
var separatorString = separatorsExpression.GetConstantValue<string>(containingExpression: expression);
delimiter = separatorString;
}
else if (method.IsOneOf(StringMethod.SplitWithStringsOverloads))
{
var separatorsExpression = arguments[0];
var separatorStrings = separatorsExpression.GetConstantValue<string[]>(containingExpression: expression);
if (separatorStrings.Length != 1)
{
goto notSupported;
}
delimiter = separatorStrings[0];
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (method.Is(StringMethod.ReplaceWithString)) | ||
| { | ||
| inputAst = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object).Ast; | ||
| findAst = ExpressionToAggregationExpressionTranslator.Translate(context, arguments[0]).Ast; | ||
| replacementAst = ExpressionToAggregationExpressionTranslator.Translate(context, arguments[1]).Ast; | ||
| } |
There was a problem hiding this comment.
For string.Replace(string oldValue, string newValue), .NET treats newValue == null as string.Empty. Translating a null replacement via ExpressionToAggregationExpressionTranslator will emit BSON null, which diverges from .NET semantics and is likely invalid for $replaceAll. Consider normalizing a constant null replacement to "" (or explicitly rejecting null with a clear ExpressionNotSupportedException).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| TestHelpers.MakeLambda<MyModel, string[]>(model => model.StringField.Split(__singleCharsSeparator, 3, StringSplitOptions.RemoveEmptyEntries)), | ||
| "{ $slice: [{ $filter : { input : { $split : [{ $getField : { field : 'StringField', input : '$$ROOT' } }, ','] }, as : 'item', cond : { $ne : ['$$item', ''] } } }, 3]}" | ||
| ], | ||
| // TODO: Missed cases when StringSplitOptions.TrimEntries. Currently, it is silently ignored. Need jira ticket for that. |
There was a problem hiding this comment.
Was a ticket filed for this? Also why not do it in the current PR? Because currently I don't think we want to ship this code without TrimEntries or at least something that rejects unknown flag combinations
There was a problem hiding this comment.
I'll file a ticket for this. String.Split translation is exiting functionality, so I don't want to introduce the changes just in this PR, because it theoretically can break someone's code.
There was a problem hiding this comment.
There was a problem hiding this comment.
Compare to the pre-existing IsMatchMethodToAggregationExpressionTranslator.cs:35-39:
if (inputTranslation.Serializer is IRepresentationConfigurable representationConfigurable &&
representationConfigurable.Representation != BsonType.String)
{
throw new ExpressionNotSupportedException(inputExpression, expression, because: "input expression is not represented as a string");
}
The new Replace and Split translators never perform this check. For a field like [BsonRepresentation(BsonType.Int32)] string Code, a user calling d.Code.Replace("1", "2") gets a cryptic server-side error at query time instead of a clear translation-time error naming the representation mismatch.
Should the new code apply the same guard?
There was a problem hiding this comment.
Good point, will do. BTW, I think we inconsistently checking for the representation in translators. We might want to add this as a best practice for all translators.
| var findTranslation = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object); | ||
| if (SerializationHelper.GetRepresentation(findTranslation.Serializer) != BsonType.RegularExpression) | ||
| { | ||
| throw new ExpressionNotSupportedException(expression); |
There was a problem hiding this comment.
This could benefit from an actionable error message. Maybe add a because context? something like "regex instance not represented as regex"
| var delimiterTranslation = ExpressionToAggregationExpressionTranslator.Translate(context, expression.Object); | ||
| if (SerializationHelper.GetRepresentation(delimiterTranslation.Serializer) != BsonType.RegularExpression) | ||
| { | ||
| goto notSupported; |
There was a problem hiding this comment.
I know goto notSupported flows to the generic ExpressionNotSupportedException but maybe some of this exceptions could benefit from a because context for a more actionable error.
| public static AstExpression RegexMatch(AstExpression input, string pattern, string options) | ||
| => new AstRegexExpression(AstRegexOperator.Match, input, pattern, options); | ||
|
|
||
| public static AstExpression ReplaceAll(AstExpression input, AstExpression find, AstExpression replace) |
There was a problem hiding this comment.
I would rename replace to replacement for consistency since AstReplaceAllExpression uses that.
| { | ||
| var options = arguments.Count == 4 ? arguments[3].GetConstantValue<RegexOptions>(expression) : RegexOptions.None; | ||
| var pattern = arguments[1].GetConstantValue<string>(expression); | ||
| findAst = new BsonRegularExpression(new Regex(pattern, options)); |
There was a problem hiding this comment.
Why are we creating a regex here? There is an existing BsonRegularExpression(string pattern, string options) overload.
There was a problem hiding this comment.
Because options is enum here, not a string as BsonRegularExpression ctor wants.
| // validate if replacement string uses substitutions: $1, ${1}, ${name}, etc | ||
| // more details on https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#substitutions | ||
| var matches = Regex.Matches(replacement, @"(?<escaped>\$\$)|\$(?<substitutions>\d+|\{[^}]+\}|&|`|'|\+|_)"); | ||
| for (var i = 0; i < matches.Count; i++) |
There was a problem hiding this comment.
Cannot use foreach or LINQ here, because in net472 MatchesCollection does not implement IEnumerable.
No description provided.