Description
After importing a WordPress site via the /_emdash/api/import/wordpress/* endpoints, images embedded in post content still reference the original WordPress domain for URLs that include WordPress-generated size suffixes (e.g. .../image-1024x695.png).
The rewrite-urls endpoint normalizes query strings in getBaseUrl() but does not strip WordPress's -NNNxNNN size suffix, so variant URLs in content do not match the urlMap keys built from <wp:attachment_url> (which only lists originals).
Looking into this I noticed:
- WXR's
<wp:attachment_url> contains only the original URL.
_wp_attachment_metadata (PHP-serialized postmeta) does contain all the generated variant filenames, but that path is not consumed during media import.
- Post content references the variant URLs directly.
I'm not sure whether the current behavior is intentional (e.g. a design choice to avoid serving larger originals in place of pre-resized variants) or simply not yet handled.
Possible fixes (if a fix is welcome):
- Minimal: strip
-NNNxNNN before the extension inside getBaseUrl() and allow the same pattern in the regex built in rewriteStringUrls(). Variant URLs in content are rewritten to the imported original. Simple, but serves a larger file where a pre-resized variant was used.
- Thorough: parse
_wp_attachment_metadata during analyze/media, import each variant as a separate media item, and map variant URLs individually.
I have a working local patch for option 1 I can turn into a PR if that direction is acceptable.
Steps to reproduce
Note: I haven't verified this through the admin UI "WordPress Import" wizard directly. The reproduction below uses the same underlying API endpoints that the wizard calls in sequence, so the same code path is exercised.
- Start a fresh EmDash dev server (
emdash@0.4.0).
- Run
dev-bypass?token=1 to get a PAT.
POST /_emdash/api/import/wordpress/analyze with a WXR file where at least one post embeds a resized image (e.g. a WordPress site exporting a post that contains <img src=".../foo-1024x695.png"> while the corresponding attachment URL is .../foo.png).
POST /_emdash/api/import/wordpress/prepare with the analyze output.
POST /_emdash/api/import/wordpress/execute with the WXR and a basic config.
POST /_emdash/api/import/wordpress/media with attachments from analyze, capture the returned urlMap.
POST /_emdash/api/import/wordpress/rewrite-urls with that urlMap.
- Inspect a post that used a variant URL — its
content.asset.url still points to https://<wp-domain>/.../foo-1024x695.png.
Expected: the URL is rewritten to the imported EmDash media URL.
Actual: the URL is left unchanged.
Environment
- emdash: 0.4.0
- @emdash-cms/cloudflare: 0.4.0
- astro: 6.1.6
- Node.js: 22.22.2
- OS: Linux (Docker sandbox)
- Template:
starter-cloudflare
Logs / error output
No errors are emitted. The endpoint returns `success: true` with `updated: 8` / `urlsRewritten: 8` on a sample site where ~60 posts reference size variants — the non-matching URLs are silently skipped because `findMatchingUrl()` returns `null`.
Description
After importing a WordPress site via the
/_emdash/api/import/wordpress/*endpoints, images embedded in post content still reference the original WordPress domain for URLs that include WordPress-generated size suffixes (e.g..../image-1024x695.png).The
rewrite-urlsendpoint normalizes query strings ingetBaseUrl()but does not strip WordPress's-NNNxNNNsize suffix, so variant URLs in content do not match theurlMapkeys built from<wp:attachment_url>(which only lists originals).Looking into this I noticed:
<wp:attachment_url>contains only the original URL._wp_attachment_metadata(PHP-serialized postmeta) does contain all the generated variant filenames, but that path is not consumed during media import.I'm not sure whether the current behavior is intentional (e.g. a design choice to avoid serving larger originals in place of pre-resized variants) or simply not yet handled.
Possible fixes (if a fix is welcome):
-NNNxNNNbefore the extension insidegetBaseUrl()and allow the same pattern in the regex built inrewriteStringUrls(). Variant URLs in content are rewritten to the imported original. Simple, but serves a larger file where a pre-resized variant was used._wp_attachment_metadataduring analyze/media, import each variant as a separate media item, and map variant URLs individually.I have a working local patch for option 1 I can turn into a PR if that direction is acceptable.
Steps to reproduce
emdash@0.4.0).dev-bypass?token=1to get a PAT.POST /_emdash/api/import/wordpress/analyzewith a WXR file where at least one post embeds a resized image (e.g. a WordPress site exporting a post that contains<img src=".../foo-1024x695.png">while the corresponding attachment URL is.../foo.png).POST /_emdash/api/import/wordpress/preparewith the analyze output.POST /_emdash/api/import/wordpress/executewith the WXR and a basic config.POST /_emdash/api/import/wordpress/mediawithattachmentsfrom analyze, capture the returnedurlMap.POST /_emdash/api/import/wordpress/rewrite-urlswith thaturlMap.content.asset.urlstill points tohttps://<wp-domain>/.../foo-1024x695.png.Expected: the URL is rewritten to the imported EmDash media URL.
Actual: the URL is left unchanged.
Environment
starter-cloudflareLogs / error output