Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Once connected, you'll have access to these powerful tools:

1. **`get_data_sources`** - List your indexed repositories and workspaces
2. **`semantic_search`** - Canonical semantic search across indexed artifacts
3. **`grep_search`** - Exact text or regex search with line-level matches
3. **`grep_search`** - Exact literal or regex text search inside file content, plus literal file-name/path matching (returns files like `Form.xml` even when their content never mentions the name), with line-level previews for content matches
4. **`fetch_artifacts`** - Load the full source for relevant search hits
5. **`get_artifact_relationships`** - Expand call graph, inheritance, and reference relationships for one artifact
6. **`chat`** - Slower synthesized codebase Q&A, typically only after search
Expand Down
46 changes: 46 additions & 0 deletions src/tests/test_response_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,3 +354,49 @@ def test_grep_unicode_in_line_text(self):
line = result["results"][0]["matches"][0]["lineText"]
assert "ТипШтрихкода" in line
assert "GS1_DataMatrix" in line

def test_grep_forwards_matched_by_name_flag(self):
"""Name-only hits must carry matchedByName=True through to the MCP output
so LLM agents can distinguish a file-level name match from a content match.
Content hits must NOT include the field (backend omits null via
JsonIgnoreCondition.WhenWritingNull; the transformer mirrors that)."""
response = {
"results": [
{
"kind": "File",
"identifier": "biterp/.../Ext/Form.xml",
"location": {
"path": "bsl-checks/src/test/resources/checks/VerifyMetadata/CommonForms/Форма/Ext/Form.xml",
"range": {"start": {"line": 1}, "end": {"line": 1}},
},
"matchCount": 0,
"matches": [],
"matchedByName": True,
},
{
"kind": "File",
"identifier": "biterp/.../renames.txt",
"location": {"path": "renames.txt"},
"matchCount": 2,
"matches": [
{
"lineNumber": 3,
"startColumn": 1,
"endColumn": 9,
"lineText": "Form.xml -> Form2.xml",
}
],
# matchedByName intentionally absent — backend omits it for content hits
},
]
}

result = transform_grep_response(response)

assert len(result["results"]) == 2
name_only, content_hit = result["results"]
assert name_only["matchedByName"] is True
assert name_only["matchCount"] == 0
assert "matches" not in name_only # transformer only copies matches when non-empty
assert "matchedByName" not in content_hit
assert content_hit["matchCount"] == 2
26 changes: 21 additions & 5 deletions src/tools/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,16 +239,20 @@ async def grep_search(
regex: bool = False,
) -> Dict[str, Any]:
"""
Search indexed code by exact text or regex — finds code containing
a specific string.
Search indexed code by exact text or regex — matches file content
and, for literal queries, also file names/paths.

Use this when you know WHAT TEXT to look for: an identifier, an error
message, a config key, a literal string that must appear in the source.
message, a config key, or a file whose name you know (even if nothing
inside the file references that name — 1C `Form.xml`, `.mdo`, config
XML, media files, etc.).

**When to use grep_search:**
- Specific identifiers: class/function/variable names, domain events
(e.g. `RepositoryDeleted`, `handlePayment`, `AUTH_PROVIDERS`)
- Literal strings: error messages, URLs, config keys, file paths
- File names whose content may never contain their own name
(e.g. `Form.xml`, `schema.graphql`, `appsettings.json`)
- Import paths, TODO/FIXME comments, annotations
- Regex patterns: `def test_.*async`, `Status\\.(Alive|Failed)`
- Finding ALL occurrences of a known symbol across the codebase
Expand Down Expand Up @@ -276,16 +280,23 @@ async def grep_search(
max_results: Maximum number of results to return (1–500).

regex: If True, treat `query` as a regex pattern. Default: False (literal).
**Regex currently matches file content only** — file-name/path
matching is literal-substring only. This is a known limitation.

Returns:
{"results": [...], "hint": "..."}

Each result contains:
- path: file path
- identifier: pass to `fetch_artifacts` for full source
- matchCount: total matches in this file
- matchCount: total matches in this file (0 for file-name-only hits)
- matches: array of line-level hits, each with:
- lineNumber, startColumn, endColumn, lineText
- matchedByName: present and `true` only when the artifact matched
by its file name/path and has no content match. In that case
`matches` is empty and `location.line` defaults to 1 as a
file-level reference — do NOT interpret `location.line` as an
Comment on lines +297 to +298
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring refers to location.line, but the transform_grep_response function (via _build_result_dict) flattens the nested backend structure into startLine and endLine in the final output. To avoid confusing LLM agents that rely on this documentation to parse the tool response, the docstring should use the actual field names present in the transformed JSON.

Suggested change
`matches` is empty and `location.line` defaults to 1 as a
file-level referencedo NOT interpret `location.line` as an
`matches` is empty and `startLine` defaults to 1 as a
file-level referencedo NOT interpret `startLine` as an

actual line match. Content-match results omit this field.

The `hint` reminds you that line previews are evidence only — load
full source via `fetch_artifacts` or local `Read()` before reasoning.
Expand All @@ -295,7 +306,12 @@ async def grep_search(
grep_search(query="ConnectionString",
data_sources=["backend"])

2. Regex search for test methods:
2. Find a file by name (returns the file even if nothing inside
it references `Form.xml`):
grep_search(query="Form.xml",
data_sources=["biterp-bsl"])

3. Regex search for test methods (content only):
grep_search(query="def test_.*auth",
data_sources=["backend"],
extensions=[".py"],
Expand Down
6 changes: 6 additions & 0 deletions src/utils/response_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,12 @@ def transform_grep_response(grep_results: Dict[str, Any]) -> Dict[str, Any]:
item["matches"] = [
_build_match_dict(match) for match in result["matches"]
]
# Forward matchedByName only when the backend set it (name-only hits).
# The backend omits the field for content matches via System.Text.Json
# WhenWritingNull, so `get("matchedByName")` is None/missing for those
# and we skip it here to keep the happy path free of an extra key.
if result.get("matchedByName"):
item["matchedByName"] = True
formatted_results.append(item)

if not formatted_results:
Expand Down
Loading