🐛 Adding benchmark test, and updating to remove idx file#941
🐛 Adding benchmark test, and updating to remove idx file#941shawn-hurley merged 2 commits intokonveyor:mainfrom
Conversation
WalkthroughRefactors Java external provider Maven index lookup from binary-entry parsing to line/string-based lookup; adds Makefile targets to extract and benchmark Maven index files; updates .gitignore; and adds/adjusts unit tests and a benchmark for constructing artifacts from SHA and Gradle dependency parsing. Changes
Sequence Diagram(s)sequenceDiagram
actor Caller
participant Search
participant SearchIndex
participant DataFile
participant BuildArtifact
rect rgb(240,240,255)
Note over SearchIndex: Old (binary-entry) — removed
Caller->>Search: search(key)
Search->>SearchIndex: searchIndex(indexFile, key)
SearchIndex->>SearchIndex: read fixed-size binary entry
Search->>DataFile: read bytes at offset/length
Search->>BuildArtifact: buildArtifact(key, bytes)
BuildArtifact-->>Caller: artifact
end
rect rgb(230,255,230)
Note over SearchIndex: New (line/string) — current
Caller->>Search: search(key)
Search->>SearchIndex: searchIndex(dataFile, key)
SearchIndex->>DataFile: scan/read lines, find matching line
SearchIndex-->>Search: return matched line (string)
Search->>BuildArtifact: buildArtifact(key, valueString)
BuildArtifact-->>Caller: artifact
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go (1)
3-790: Tests for active implementation functions must be re-enabled with justification.Your review comment is fully justified. I've confirmed critical findings:
What I verified:
parseMavenDepLines(line 913) andparseGradleDependencyOutput(line 559) are active, functional implementations independency.go- All tests for these functions are disabled in
dependency_test.govia block comment (/* ... */)- No explanation exists anywhere in the code for why tests are disabled
- No TODO/FIXME comments explain this decision
The problem:
This creates untested code paths for critical dependency parsing functionality. The block comment at line 3 of the test file provides no rationale, making it impossible to determine if this is:
- Temporary work-in-progress
- Broken tests needing fixes
- Intentional but undocumented
Required action:
The developer must either:
- Re-enable and fix the tests if they broke during refactoring
- Add an explanatory comment (if intentionally disabled) with a tracking issue reference and timeline
- Document why this test coverage gap is acceptable
The current state—active implementation with disabled tests and no explanation—is not acceptable for production code.
🧹 Nitpick comments (1)
Makefile (1)
98-103: Consider adding error handling for container cleanup.The
extract-maven-index-filestarget runs cleanup commands with|| true, which is good. However, if thepodman cpcommands fail, the temporary container may not be cleaned up properly.Consider using a trap or ensuring cleanup runs even on failure:
extract-maven-index-files: podman run --name temp-jdtls -d quay.io/konveyor/jdtls-server-base:latest podman cp temp-jdtls:/usr/local/etc/maven-index.txt $(PWD)/external-providers/java-external-provider/pkg/java_external_provider/testdata/ || (podman stop temp-jdtls; podman rm temp-jdtls; exit 1) podman cp temp-jdtls:/usr/local/etc/maven-index.idx $(PWD)/external-providers/java-external-provider/pkg/java_external_provider/testdata/ || (podman stop temp-jdtls; podman rm temp-jdtls; exit 1) podman stop temp-jdtls || true podman rm temp-jdtls || trueHowever, this is optional—the current approach with
|| trueis reasonable for a developer-facing target.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
external-providers/java-external-provider/pkg/java_external_provider/testdata/should_find_in_index.jaris excluded by!**/*.jarexternal-providers/java-external-provider/pkg/java_external_provider/testdata/will_not_find.jaris excluded by!**/*.jar
📒 Files selected for processing (6)
.gitignore(1 hunks)Makefile(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go(3 hunks)external-providers/java-external-provider/pkg/java_external_provider/util.go(4 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:11:45.673Z
Learnt from: pranavgaikwad
PR: konveyor/analyzer-lsp#859
File: external-providers/java-external-provider/pkg/java_external_provider/dependency.go:694-694
Timestamp: 2025-07-30T12:11:45.673Z
Learning: In the Java external provider dependency walker (external-providers/java-external-provider/pkg/java_external_provider/dependency.go), errors from toDependency function calls should be ignored as they are not considered important by the maintainers.
Applied to files:
external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go
🔇 Additional comments (2)
.gitignore (1)
28-29: LGTM! Properly excluding generated Maven index files.The addition of
maven-index.txtandmaven-index.idxto.gitignoreis appropriate, as these files are extracted at build/test time via the newextract-maven-index-filesMakefile target and should not be committed to version control.external-providers/java-external-provider/pkg/java_external_provider/util.go (1)
700-721: Fix typo in buffer size comment and clarify external dependency format assumptions.The
readKeyAtfunction already documents the 500-byte buffer assumption at line 706. However, the comment contains a typo ("firt" → "first") and doesn't clarify that the 206-byte line length assumption depends on the Maven index file format remaining stable. Since index files are loaded externally and not created by this code, document that:
- The format must remain: space-separated key-value pairs, with lines ≤ 206 bytes
- This requires validating against the Maven central index format specification
Fix the typo in the inline comment at line 706 and consider adding a comment noting that line length validation happens implicitly via the
len(parts) != 2check at line 720 (malformed or oversized lines will fail there).
| }, | ||
| { | ||
| name: "NotInIndex", | ||
| jarFile: "testdata/will_not_find.jar.jar", |
There was a problem hiding this comment.
Fix duplicate .jar extension in test file path.
The filename has a double .jar extension: will_not_find.jar.jar. This appears to be a typo and should likely be will_not_find.jar.
Apply this diff:
{
name: "NotInIndex",
- jarFile: "testdata/will_not_find.jar.jar",
+ jarFile: "testdata/will_not_find.jar",
mavenIndexPath: "testdata",
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jarFile: "testdata/will_not_find.jar.jar", | |
| { | |
| name: "NotInIndex", | |
| jarFile: "testdata/will_not_find.jar", | |
| mavenIndexPath: "testdata", | |
| }, |
🤖 Prompt for AI Agents
In
external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go
around line 27, the test uses a filename with a duplicated “.jar” extension
("will_not_find.jar.jar"); update the string to remove the extra extension so it
reads "will_not_find.jar" to correct the typo and ensure the test path is valid.
| } | ||
| } | ||
|
|
||
| // BenchmarkConstructArtifactFromSHA benchmarks the constructArtifactFromSHA function// with different scenarios to measure performance characteristics. |
There was a problem hiding this comment.
Fix the comment to match the function name.
The comment references BenchmarkConstructArtifactFromSHA, but this is a test function named TestConstructArtifactFromSHA.
Apply this diff:
-// BenchmarkConstructArtifactFromSHA benchmarks the constructArtifactFromSHA function// with different scenarios to measure performance characteristics.
+// TestConstructArtifactFromSHA tests the toDependency function with different scenarios.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // BenchmarkConstructArtifactFromSHA benchmarks the constructArtifactFromSHA function// with different scenarios to measure performance characteristics. | |
| // TestConstructArtifactFromSHA tests the toDependency function with different scenarios. |
🤖 Prompt for AI Agents
In
external-providers/java-external-provider/pkg/java_external_provider/util_test.go
around line 104, the top-of-function comment incorrectly references
BenchmarkConstructArtifactFromSHA for a test function; update the comment to
reference TestConstructArtifactFromSHA (or change it to a neutral description)
so it matches the function name and purpose.
| val, err := toDependency(context.Background(), log, tc.jarFile, tc.mavenIndexPath) | ||
| log.Info("here", "val", fmt.Sprintf("%#v", val), "err", err) | ||
| if err != nil && !tc.shouldFind { | ||
| return | ||
| } | ||
| if err != nil { | ||
| t.Fail() | ||
| } | ||
| if !tc.shouldFind { | ||
| t.Fail() | ||
| } | ||
| if !reflect.DeepEqual(val, tc.value) { | ||
| t.Fail() | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add descriptive failure messages to test assertions.
The test lacks descriptive failure messages, making debugging difficult when tests fail. The error handling logic also appears incomplete—both test cases have shouldFind: true, but the conditional branches suggest support for shouldFind: false.
Apply this diff to improve test clarity:
t.Run(tc.name, func(t *testing.T) {
val, err := toDependency(context.Background(), log, tc.jarFile, tc.mavenIndexPath)
- log.Info("here", "val", fmt.Sprintf("%#v", val), "err", err)
- if err != nil && !tc.shouldFind {
- return
- }
if err != nil {
- t.Fail()
- }
- if !tc.shouldFind {
- t.Fail()
+ if tc.shouldFind {
+ t.Fatalf("expected to find artifact but got error: %v", err)
+ }
+ return
}
if !reflect.DeepEqual(val, tc.value) {
- t.Fail()
+ t.Errorf("artifact mismatch:\ngot: %#v\nwant: %#v", val, tc.value)
}
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val, err := toDependency(context.Background(), log, tc.jarFile, tc.mavenIndexPath) | |
| log.Info("here", "val", fmt.Sprintf("%#v", val), "err", err) | |
| if err != nil && !tc.shouldFind { | |
| return | |
| } | |
| if err != nil { | |
| t.Fail() | |
| } | |
| if !tc.shouldFind { | |
| t.Fail() | |
| } | |
| if !reflect.DeepEqual(val, tc.value) { | |
| t.Fail() | |
| } | |
| val, err := toDependency(context.Background(), log, tc.jarFile, tc.mavenIndexPath) | |
| if err != nil { | |
| if tc.shouldFind { | |
| t.Fatalf("expected to find artifact but got error: %v", err) | |
| } | |
| return | |
| } | |
| if !reflect.DeepEqual(val, tc.value) { | |
| t.Errorf("artifact mismatch:\ngot: %#v\nwant: %#v", val, tc.value) | |
| } |
🤖 Prompt for AI Agents
external-providers/java-external-provider/pkg/java_external_provider/util_test.go
around lines 145 to 158: the test uses bare t.Fail() calls and unclear control
flow for tc.shouldFind, so replace those with descriptive assertions and clearer
branching: when toDependency returns an error, if tc.shouldFind then fail the
test with t.Fatalf including tc.name and the error, otherwise return to succeed;
when no error and tc.shouldFind is false fail with a message showing unexpected
value; when values differ fail with a message showing expected vs actual and
tc.name; include fmt.Sprintf or %#+v representations in the messages so failures
are actionable.
6acb929 to
5ca4f3d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
external-providers/java-external-provider/pkg/java_external_provider/util.go (1)
655-690: Critical: Binary search implementation is fundamentally broken.The implementation uses
fi.Size()(file size in bytes) as the search space forsort.Search, butreadKeyAttreats positions as byte offsets into a text file with variable-length lines. This is incorrect because:
sort.Search(n, ...)expectsnto be the number of elements (lines)- The code sets
nto the file size in bytes (line 660)sort.Searchpasses values 0 to n-1 (byte positions) to the comparison functionreadKeyAtseeks to byte positioniand attempts to read lines, but byte positions don't align with line boundariesThis will produce incorrect results: the search may fail to find keys that exist, or return wrong matches. Binary search requires either:
- Option 1: Count lines first, search by line index, and maintain a line offset map
- Option 2: Use sequential line-by-line search for unsorted text files
- Option 3: Keep the sorted binary format with fixed-size records
Since the PR goal is removing the binary
.idxfile, and given the performance benchmarks show acceptable results, consider Option 2 (sequential scan) as the simplest correct implementation.Additionally, line 680 returns the wrong error variable (
errinstead ofsearchErr), preventing proper error propagation:if searchErr != nil { - return "", err + return "", searchErr }external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go (1)
27-27: Fix duplicate.jarextension in test file path.The filename has a double
.jarextension which appears to be a typo.- jarFile: "testdata/will_not_find.jar.jar", + jarFile: "testdata/will_not_find.jar",
🧹 Nitpick comments (1)
external-providers/java-external-provider/pkg/java_external_provider/util.go (1)
706-706: Fix typo in comment."firt byte" should be "first byte".
- // For now test with 500 bytes (largest line is 206, so worst case i is firt byte in that line, so 206 * 2 is what we want in the buffer, or 412 so 500 is a bit extra + // For now test with 500 bytes (largest line is 206, so worst case i is first byte in that line, so 206 * 2 is what we want in the buffer, or 412 so 500 is a bit extra
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
external-providers/java-external-provider/pkg/java_external_provider/testdata/should_find_in_index.jaris excluded by!**/*.jarexternal-providers/java-external-provider/pkg/java_external_provider/testdata/will_not_find.jaris excluded by!**/*.jar
📒 Files selected for processing (6)
.gitignore(1 hunks)Makefile(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/util.go(4 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_test.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- Makefile
- external-providers/java-external-provider/pkg/java_external_provider/util_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:11:45.673Z
Learnt from: pranavgaikwad
PR: konveyor/analyzer-lsp#859
File: external-providers/java-external-provider/pkg/java_external_provider/dependency.go:694-694
Timestamp: 2025-07-30T12:11:45.673Z
Learning: In the Java external provider dependency walker (external-providers/java-external-provider/pkg/java_external_provider/dependency.go), errors from toDependency function calls should be ignored as they are not considered important by the maintainers.
Applied to files:
external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test (macos-latest)
- GitHub Check: benchmark (windows-latest, windows)
🔇 Additional comments (1)
.gitignore (1)
28-29: LGTM: Properly ignoring generated test artifacts.The Maven index files align with the new Makefile targets and should not be committed.
* Wanted to compare the concepts in this PR to main w/ a benchmark * removes the need for the .idx file. Signed-off-by: Shawn Hurley <shawn@hurley.page>
Signed-off-by: Shawn Hurley <shawn@hurley.page>
7951250 to
1d30e1e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go (1)
31-34: Fix the NotInIndex fixture path.The benchmark still points at
will_not_find.jar.jar, so it opens the wrong file and never exercises the “missing in index” jar. Please drop the duplicate extension.{ name: "NotInIndex", - jarFile: "testdata/will_not_find.jar.jar", + jarFile: "testdata/will_not_find.jar", mavenIndexPath: "testdata", },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
external-providers/java-external-provider/pkg/java_external_provider/testdata/last_jar_in_file.jaris excluded by!**/*.jarexternal-providers/java-external-provider/pkg/java_external_provider/testdata/should_find_in_index.jaris excluded by!**/*.jarexternal-providers/java-external-provider/pkg/java_external_provider/testdata/will_not_find.jaris excluded by!**/*.jar
📒 Files selected for processing (7)
.gitignore(1 hunks)Makefile(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/dependency.go(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go(21 hunks)external-providers/java-external-provider/pkg/java_external_provider/util.go(4 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.go(1 hunks)external-providers/java-external-provider/pkg/java_external_provider/util_test.go(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (3)
- external-providers/java-external-provider/pkg/java_external_provider/dependency.go
- Makefile
- external-providers/java-external-provider/pkg/java_external_provider/util_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T12:11:45.673Z
Learnt from: pranavgaikwad
PR: konveyor/analyzer-lsp#859
File: external-providers/java-external-provider/pkg/java_external_provider/dependency.go:694-694
Timestamp: 2025-07-30T12:11:45.673Z
Learning: In the Java external provider dependency walker (external-providers/java-external-provider/pkg/java_external_provider/dependency.go), errors from toDependency function calls should be ignored as they are not considered important by the maintainers.
Applied to files:
external-providers/java-external-provider/pkg/java_external_provider/util_benchmark_test.goexternal-providers/java-external-provider/pkg/java_external_provider/util.goexternal-providers/java-external-provider/pkg/java_external_provider/dependency_test.go
🧬 Code graph analysis (1)
external-providers/java-external-provider/pkg/java_external_provider/dependency_test.go (1)
provider/provider.go (1)
DepDAGItem(483-483)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test (windows-latest, windows, amd64, windows-X64)
- GitHub Check: benchmark (windows-latest, windows)
Summary:
For inside the index:
The new approach generally takes slightly fewer iterations but takes 4100 more nanoseconds per iteration.
The new approach allocates more when found in the index and therefore has more Bytes allocated on average (20,080B).
None of these numbers looks egregiously worse to me, so we shouldn't keep an extra 900 MB in the base image.
Please run it yourselves and let me know if you see something different.
Benchmark Results:
goos: darwin
goarch: arm64
pkg: github.com/konveyor/analyzer-lsp/external-providers/java-external-provider/pkg/java_external_provider
cpu: Apple M4 Pro
BenchmarkConstructArtifactFromSHA/InIndex-14 31057 192109 ns/op 35208 B/op 48 allocs/op
BenchmarkConstructArtifactFromSHA/NotInIndex-14 11598850 474.1 ns/op 80 B/op 2 allocs/op
goos: darwin
goarch: arm64
pkg: github.com/konveyor/analyzer-lsp/external-providers/java-external-provider/pkg/java_external_provider
cpu: Apple M4 Pro
BenchmarkConstructArtifactFromSHA/InIndex-14 30781 196209 ns/op 55288 B/op 140 allocs/op
BenchmarkConstructArtifactFromSHA/NotInIndex-14 12938926 491.1 ns/op 80 B/op 2 allocs/op