Conversation
dahlia
left a comment
There was a problem hiding this comment.
Great, thanks for your work! Since this is a new feature, could you change the target of this pull request to branch main instead of stable?
dc1f74a to
5a06003
Compare
|
and, after some effort, rebased onto main... |
dahlia
left a comment
There was a problem hiding this comment.
Great work, thanks for your contribution! I left several review comments.
Also, could you add a changelog item to the CHANGES.md document?
| ); | ||
| } | ||
|
|
||
| const key = medium.thumbnailUrl.replace(STORAGE_URL_BASE as string, ""); |
There was a problem hiding this comment.
I think the key derivation is off for S3-style storage URLs. If STORAGE_URL_BASE is something like https://bucket.example.com, replacing the prefix here leaves /media/.../thumbnail.webp, but the object was uploaded under media/.../thumbnail.webp without the leading slash. disk.delete() will miss the real object, so the job can look successful without actually reclaiming anything. Could we derive the key from the URL pathname and normalize the leading slash away before deleting?
There was a problem hiding this comment.
Hmm, not sure about this. I did try this on a filesystem-based install in my home network and also my productive instance which uses minio, and it worked on both, but my ASSETS_URL_BASE there is https://minio-hollo.x27.one/hollo-live/ - with trailing slash and a sub directory.
Wouldn't the best approach be to normalize the STORAGE_URL_BASE to include (or not include) a trailing slash (maybe in the storage part of the codebase?). Seems simpler than trying to reverse what media/${id}/thumbnail.webp does in media#uploadThumbnail with string manipulation.
There was a problem hiding this comment.
I think normalizing STORAGE_URL_BASE would help, but I'd still prefer to derive the delete key from the URL pathname here.
The worker needs the exact storage key that was uploaded, and simple prefix replacement feels a bit fragile once STORAGE_URL_BASE can vary by trailing slash or include a subpath. Parsing the URL and normalizing from there seems safer to me.
|
|
||
| const disk = drive.use(); | ||
| await disk.delete(key); | ||
| await db |
There was a problem hiding this comment.
After the file is removed we still keep serving medium.thumbnailUrl everywhere. serializeMedium() and the web UI both use it as-is, so any cleaned item will point clients at a dead local thumbnail and show a broken preview. I think we need to either clear or fall back the thumbnail URL here, or teach the renderers to ignore thumbnails once thumbnailCleaned is true.
There was a problem hiding this comment.
Yeah, this is something that needs figuring out before this is fully released - for my immediate needs breaking thumbnails felt file, but we will need actual handling for it.
In the mastodon API, preview URL is declared as nullable, so we can send that to the clients, if thumbnailCleaned is set (and probably also clear the meta:small properties) - they should know their preferred way of handling missing thumbnails: https://docs.joinmastodon.org/entities/MediaAttachment/#preview_url
For the post renderer that is built into hollo, I am not sure. The non-thumbnail url could be fallen back on, but that may be an external resource and could still disappear, or we could use a placeholder?
There was a problem hiding this comment.
That makes sense to me.
For the Mastodon API, returning preview_url: null when thumbnailCleaned is set sounds like the right direction, and clearing the related thumbnail metadata at the same time would keep the response consistent.
For Hollo's own renderer, I don't think we need a perfect UX solution in this PR, but we do need to stop emitting a dead local thumbnail URL. Even a simple first pass such as “don't render the thumbnail when it has been cleaned” would be better than serving broken images.
I decided to implement a way to clean up old thumbnails from storage.
Idea here:
This is being done in a new page in the admin area titled "Thumbnail Cleanup":

The media table entry stays intact, so that the information of media existing does not get lost. In addition, some clients seem to (be able to) ignore thumbnails.
I have already tested it with both a locally hosted testing environment and my productive instance, on which it let me reclaim over 3GB out of 10GB storage used according to minio.
Related issue: #409