Skip to content

Commit c15bf32

Browse files
ryan-williamsclaude
andcommitted
Use \/\-prefixed shorthand for out-of-dvc-dir deps (vs \../../\ chains)
When a .dvc file's dep lives outside the .dvc file's directory, write it as \`/repo-root-relative\` instead of \`../../../foo/bar\`. Reads already accept both forms (since the relative-paths spec landed), so this is purely a write-path change. Benefits: - Easier to scan deeply-nested .dvc files - /-prefixed paths are stable: moving a .dvc file doesn't break them - Visually distinct from same-dir deps (foo/bar vs /foo/bar) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3dadb9d commit c15bf32

File tree

3 files changed

+140
-12
lines changed

3 files changed

+140
-12
lines changed

specs/done/dep-path-shorthand.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Repo-root-absolute dep path shorthand on write
2+
3+
## Problem
4+
5+
When a `.dvc` file depends on artifacts outside its own directory, DVX writes
6+
the dep path via `os.path.relpath()`, producing `../`-laden strings:
7+
8+
```yaml
9+
# www/public/njsp/csvs.dvc
10+
deps:
11+
../../../njsp/data/crashes.parquet: e9799868d9114846a7052100454cde51
12+
```
13+
14+
These `../../../` prefixes are hard to scan at a glance, change meaning based
15+
on which `.dvc` file they live in, and get noisier the deeper the `.dvc` file
16+
is nested.
17+
18+
A shorter form is already supported on **read** (`src/dvx/run/dvc_files.py`
19+
lines 108-126): a leading `/` in a dep path is treated as repo-root-absolute,
20+
so the above is equivalent to:
21+
22+
```yaml
23+
deps:
24+
/njsp/data/crashes.parquet: e9799868d9114846a7052100454cde51
25+
```
26+
27+
But `_relativize_dep_paths` (same file, L129-153) is purely `os.path.relpath`
28+
based, so DVX writes back the `../` form and hand-edited `/`-shorthand paths
29+
revert on the next `dvx run` that rewrites the file.
30+
31+
## Proposal
32+
33+
In `_relativize_dep_paths`, prefer the repo-root-absolute shorthand for any
34+
dep whose relpath from `dvc_dir` would contain `..`.
35+
36+
### Write rule
37+
38+
For each `(dep_path, hash)`:
39+
40+
- If `dep_path` is inside `dvc_dir` (i.e. `relpath(dep_path, dvc_dir)` has no
41+
`..`): write as-is relative (current behavior).
42+
- e.g. `njsp/data/refresh.dvc` with dep `njsp/data/FAUQStats2024.xml`
43+
→ writes as `FAUQStats2024.xml` (dvc-dir-relative, no `/` prefix).
44+
- If `dep_path` is outside `dvc_dir` (relpath would start with `../`): write
45+
as `/<dep_path>` (repo-root-absolute).
46+
- e.g. `www/public/njsp/csvs.dvc` with dep `njsp/data/crashes.parquet`
47+
→ writes as `/njsp/data/crashes.parquet` (not `../../../njsp/data/...`).
48+
49+
### Rationale
50+
51+
- Humans scanning `.dvc` files in deeply-nested directories don't have to
52+
count `../` segments to locate the dep.
53+
- Moving a `.dvc` file doesn't invalidate dep paths that point outside its
54+
directory — `/path` is stable, `../../path` is not.
55+
- No migration needed for older files: reads continue to accept both forms.
56+
- The `/` prefix is visually distinct from the bare `foo/bar` form used for
57+
same-directory deps, making it easy to tell at a glance whether a dep
58+
lives inside or outside the stage's own dir.
59+
60+
### Edge cases
61+
62+
- `dvc_dir == "."` (stage file is at repo root): paths are already
63+
repo-root-relative; don't add `/` prefix. Leave as-is.
64+
- `dvc_dir` absolute (unusual; current code preserves this): don't modify.
65+
- Windows: `os.path.relpath` may raise `ValueError` across drives; current
66+
code preserves the raw path in that case. The new rule doesn't apply
67+
there (no meaningful "repo root").
68+
69+
## Opt-out
70+
71+
Add a `run.path_style` config key in `.dvx/config.yml` (or `dvx.yml`):
72+
73+
```yaml
74+
run:
75+
path_style: repo_root # default: prefer / shorthand for out-of-dir deps
76+
# path_style: relative # old behavior: always os.path.relpath
77+
```
78+
79+
Only needed if there's demand for the old behavior (e.g. to minimize diff
80+
churn on initial rollout). Probably unnecessary.
81+
82+
## Migration
83+
84+
No forced migration. Over time, as DVX rewrites each `.dvc` file in a given
85+
repo, deps get normalized to the new shorthand. Downstream projects that
86+
prefer an immediate bulk conversion can do it with a short script (find
87+
every `.dvc` file, re-resolve each dep to repo-root-relative via the existing
88+
`_resolve_dep_paths`, then write back via the new `_relativize_dep_paths`).
89+
90+
## Test plan
91+
92+
- New `test_dvc_files` cases covering:
93+
- Dep inside `dvc_dir`: writes dvc-dir-relative (no leading `/`).
94+
- Dep outside `dvc_dir`: writes `/repo-root-relative`.
95+
- `dvc_dir == "."`: writes repo-root-relative (no leading `/`).
96+
- Roundtrip: resolve(relativize(deps)) == resolve(original_deps) for any
97+
supported write form.
98+
- Existing tests in `tests/test_run_dvc_files.py` for the read path should
99+
keep passing unchanged.
100+
101+
## Out of scope
102+
103+
- Changing the resolve (read) path. The leading-`/` form is already accepted
104+
and that doesn't need to change.
105+
- Converting `outs` paths — those live within `dvc_dir` by convention.
106+
- Source project: `hccs/crashes` will adopt this shorthand via its own spec
107+
(see `specs/dvx-dep-path-shorthand.md` there) once the DVX change lands.

src/dvx/run/dvc_files.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -127,10 +127,15 @@ def _resolve_dep_paths(deps: dict[str, str], dvc_dir: Path) -> dict[str, str]:
127127

128128

129129
def _relativize_dep_paths(deps: dict[str, str], dvc_dir: Path) -> dict[str, str]:
130-
"""Convert repo-root-relative dep paths to .dvc-dir-relative.
130+
"""Convert repo-root-relative dep paths to .dvc-file-relative form.
131131
132-
Inverse of ``_resolve_dep_paths``. If dvc_dir is "." (repo root),
133-
paths are returned unchanged.
132+
Two output forms:
133+
- Deps inside ``dvc_dir``: written .dvc-dir-relative (no leading ``/``).
134+
- Deps outside ``dvc_dir``: written ``/repo-root-relative`` shorthand,
135+
which avoids ``../../../`` clutter for deeply-nested .dvc files.
136+
137+
Inverse of ``_resolve_dep_paths`` (which accepts both forms).
138+
Returns paths unchanged if dvc_dir is "." or absolute.
134139
"""
135140
dvc_dir_str = str(dvc_dir)
136141
# Only relativize if dvc_dir is a relative subdirectory (not "." or absolute)
@@ -143,13 +148,9 @@ def _relativize_dep_paths(deps: dict[str, str], dvc_dir: Path) -> dict[str, str]
143148
# Strip dvc_dir prefix to make relative
144149
result[dep_path[len(dvc_dir_str) + 1:]] = dep_hash
145150
else:
146-
# Outside dvc_dir — use os.path.relpath
147-
try:
148-
rel = os.path.relpath(dep_path, dvc_dir_str)
149-
result[rel] = dep_hash
150-
except ValueError:
151-
# Different drives on Windows, keep absolute
152-
result[dep_path] = dep_hash
151+
# Outside dvc_dir — use /-prefixed repo-root-absolute shorthand
152+
# instead of ../../../ chains
153+
result["/" + dep_path] = dep_hash
153154
return result
154155

155156

tests/test_run_dvc_files.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1065,18 +1065,38 @@ def test_resolve_dep_paths_repo_root():
10651065

10661066

10671067
def test_relativize_dep_paths():
1068-
"""Repo-root-relative paths are converted to .dvc-dir-relative."""
1068+
"""Deps inside dvc_dir → relative; outside → /-prefixed shorthand."""
10691069
from dvx.run.dvc_files import _relativize_dep_paths
10701070

10711071
deps = {"njsp/data/crashes.parquet": "abc123", "www/index.html": "def456"}
10721072
rel = _relativize_dep_paths(deps, Path("njsp/data"))
10731073

10741074
assert rel == {
10751075
"crashes.parquet": "abc123",
1076-
"../../www/index.html": "def456",
1076+
"/www/index.html": "def456",
10771077
}
10781078

10791079

1080+
def test_relativize_dep_paths_repo_root():
1081+
"""When dvc_dir is '.', paths are unchanged (no /-prefix added)."""
1082+
from dvx.run.dvc_files import _relativize_dep_paths
1083+
1084+
deps = {"data.txt": "abc", "subdir/file.txt": "def"}
1085+
rel = _relativize_dep_paths(deps, Path("."))
1086+
assert rel == deps
1087+
1088+
1089+
def test_relativize_dep_paths_deeply_nested():
1090+
"""Deep .dvc files use /-prefix instead of long ../../../ chains."""
1091+
from dvx.run.dvc_files import _relativize_dep_paths
1092+
1093+
deps = {"njsp/data/crashes.parquet": "abc"}
1094+
rel = _relativize_dep_paths(deps, Path("www/public/njsp"))
1095+
1096+
# Should be /-prefixed, not ../../../njsp/data/crashes.parquet
1097+
assert rel == {"/njsp/data/crashes.parquet": "abc"}
1098+
1099+
10801100
def test_relative_paths_roundtrip():
10811101
"""Write then read with subdirectory .dvc preserves dep paths."""
10821102
from dvx.run.dvc_files import _relativize_dep_paths, _resolve_dep_paths

0 commit comments

Comments
 (0)