IMDB scraper broken: AWS WAF blocks all HTML requests — migration to API endpoints needed

## Problem

IMDB has enabled AWS WAF JavaScript challenges on all `www.imdb.com` HTML endpoints. Non-browser HTTP clients (including MediaElch) receive HTTP 202 with an empty response body. The response header `x-amzn-waf-action: challenge` confirms the block.

This affects **all** HTML-based IMDB functionality:
- Search (`/find?q=...`) — no results
- Title page (`/title/ttXXXX/`) — no details
- Reference page (`/title/ttXXXX/reference/`) — no additional data

The issue has been reported before as intermittent (#1952), but as of March 20, 2026 it appears to be **permanent**. The current IMDB scraper is completely non-functional.

## Working alternatives

Two IMDB API endpoints remain accessible and return JSON directly (no HTML parsing needed):

### 1. Suggest API (for search)
- **URL:** `https://v3.sg.media-imdb.com/suggestion/x/{query}.json`
- **Method:** GET, no authentication
- **Returns:** IMDB ID, title, year, type (movie/tv/short), poster URL, top cast
- **Example:** Searching "Inception" returns `tt1375666`, year 2010, type "movie", poster, cast

### 2. GraphQL API (for details)
- **URL:** `https://graphql.imdb.com/`
- **Method:** POST with JSON body, no authentication
- **Returns:** Virtually all title metadata — ratings, plot, genres, runtime, cast, crew, Metacritic score, etc.
- **Example query:**
```graphql
{ title(id: "tt1375666") {
    titleText { text }
    releaseYear { year }
    ratingsSummary { aggregateRating voteCount }
    plot { plotText { plainText } }
    genres { genres { text } }
    metacritic { metascore { score } }
    runtime { seconds }
} }
```

### Note on terms of use
The GraphQL API response includes a disclaimer: *"Public, commercial, and/or non-private use of the IMDb data provided by this API is not allowed."* MediaElch is LGPL-licensed and non-commercial, but this should be considered.

## Affected code

- `src/scrapers/imdb/ImdbApi.cpp` — URL construction, HTTP requests
- `src/scrapers/imdb/ImdbSearchPage.cpp` — search result parsing (HTML-based)
- `src/scrapers/imdb/ImdbJsonParser.cpp` — title detail parsing from `__NEXT_DATA__`
- `src/scrapers/imdb/ImdbReferencePage.cpp` — reference page parsing
- All movie and TV scraper jobs that depend on these classes

## Proposed approach

Replace the HTML-based scraper with API-based requests:
1. **Search:** Replace `ImdbSearchPage` with Suggest API parser
2. **Details:** Replace `ImdbJsonParser` + `ImdbReferencePage` with GraphQL API queries
3. **Preserve the existing interface** — `ImdbApi` remains the entry point, only the internal implementation changes

This would also resolve or improve several existing issues:
- #1881 (tags and fields broken — likely HTML parsing issue)
- #1774 (episodes 51+ — could use GraphQL for bulk episode data)
- #605 (full actor list — GraphQL can return complete cast)
- #1497 (wrong TV ratings — GraphQL returns structured rating data)

Closing PRs #1955 and #1956 as they are based on the now-blocked HTML approach.

Analyzed with AI assistance (Claude Code / Opus 4.6).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMDB scraper broken: AWS WAF blocks all HTML requests — migration to API endpoints needed #1966

Problem

Working alternatives

1. Suggest API (for search)

2. GraphQL API (for details)

Note on terms of use

Affected code

Proposed approach

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

IMDB scraper broken: AWS WAF blocks all HTML requests — migration to API endpoints needed #1966

Description

Problem

Working alternatives

1. Suggest API (for search)

2. GraphQL API (for details)

Note on terms of use

Affected code

Proposed approach

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions