Add support for pointing indexes in ocaml-index#2051
Add support for pointing indexes in ocaml-index#2051Tim-ats-d wants to merge 23 commits intoocaml:mainfrom
Conversation
art-w
left a comment
There was a problem hiding this comment.
@voodoos raised another issue: Since the indexes are not self-contained anymore, it's possible that a pointing index targets another index file that has been removed or updated. If we don't check for that and follow pointers to arbitrary locations, some very bad things could happen!
I think I could make some kind of defensive programming and simply fail if the pointed-to index file no longer exists. What do you think? |
0908c3a to
8032c54
Compare
8032c54 to
bc62cca
Compare
art-w
left a comment
There was a problem hiding this comment.
It's looking good! I only have some minor requests to improve the id checking logic :)
|
Here are some benchmark results from Hyperfine I got running Without pointing index: With pointing-index enabled: Complete command I used: |
|
Concerning reading performance, here are some other metrics. I ran With pointing-index enabled: Without: So reading performances are mostly similar. |
|
My last commit should address your two last review comments @art-w |
art-w
left a comment
There was a problem hiding this comment.
Thanks! I left some additional comments because I'm still not convinced that the store id validation is correct... In fact it might be easier to add the id field in the store type directly (instead of On_disk), since it's as relevant as the filename to identify the expected index file.
I also did not see the code to report a message to users that their build is out of date? (when we run into an invalid store file)
| | Serialized of { loc : int } | ||
| | Serialized_reused of { loc : int } | ||
| | On_disk of { store : store; loc : int; schema : 'a schema } | ||
| | On_disk_ptr of { filename : string; loc : int; id : int } |
There was a problem hiding this comment.
(cc @voodoos) Do we know if the filenames are absolute or relative? (I wonder if there's a risk that ocaml-index could be called from different folders, such that the pointed filenames are later not found because of the relative paths interpretation)
There was a problem hiding this comment.
There are relative. We can maybe use Unix.realpath on such filename.
There was a problem hiding this comment.
The filenames in the locations stored in the indexes are relative unless --root is passed to the indexer.
In dune they should always be relative to the root of the build context (like _build/default).
| | Serialized of { loc : int } | ||
| | Serialized_reused of { loc : int } | ||
| | On_disk of { store : store; loc : int; schema : 'a schema } | ||
| | On_disk_ptr of { filename : string; loc : int; id : int } |
There was a problem hiding this comment.
The filenames in the locations stored in the indexes are relative unless --root is passed to the indexer.
In dune they should always be relative to the root of the build context (like _build/default).
There was a problem hiding this comment.
Thanks, looks fine to me. Just one worry about the new exception handling when reading the indexes for occurrences. Especially since it's a point in time were indexes are much more likely to be out of sync than during aggregation at build time.
Also, it's all or nothing, but for user queries it would be great to still be able to provide partial results if some of the files are out-of-sync. Would that be hard to do ? If so let's just add a comment about it around the place where we read indexes for occurrences.
|
And this definitely deserves a changelog entry. |
This reverts commit 853a38b.
|
Thanks, can you rebase ? there are some merge conflicts. |
This PR adds support for pointing to an index file that already exists on disk.
This mechanism results in significantly smaller index files and fewer write operations.