You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Persists `user_agent` in `.dvc` dep so `dvx update` reuses it
for HEAD/GET requests. Needed for sites with bot protection
(e.g. Cloudflare).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`dvx import-url --git` sends `User-Agent: dvx/0.1` which gets 403'd by sites with bot protection (e.g. `njsp.njoag.gov` uses Cloudflare). A browser-like User-Agent works fine via `curl -H "User-Agent: Mozilla/5.0 ..."`.
6
+
7
+
## Proposed behavior
8
+
9
+
### 1. CLI flag: `--user-agent` / `-A`
10
+
11
+
```bash
12
+
dvx import-url --git -A "Mozilla/5.0" \
13
+
https://njsp.njoag.gov/.../2024-UCR.xlsx \
14
+
-o crime/2024-UCR.xlsx
15
+
```
16
+
17
+
### 2. Stored in `.dvc` file
18
+
19
+
The User-Agent is needed for subsequent `dvx update` calls too, so persist it in the `.dvc` deps:
20
+
21
+
```yaml
22
+
deps:
23
+
- path: https://njsp.njoag.gov/.../2024-UCR.xlsx
24
+
checksum: '"etag"'
25
+
size: 204114
26
+
mtime: '2026-02-24T00:00:00+00:00'
27
+
user_agent: 'Mozilla/5.0 (compatible; dvx/0.1)'
28
+
outs:
29
+
- md5: e2154bc8...
30
+
path: 2024-UCR.xlsx
31
+
meta:
32
+
git_tracked: true
33
+
```
34
+
35
+
`dvx update` reads `user_agent` from the dep and uses it for HEAD/GET requests.
0 commit comments