Skip to content

fix(connect): locale-agnostic connection detection (fixes German / supersedes #319)#352

Open
Dominien wants to merge 798 commits intostickerdaniel:mainfrom
Dominien:fix/locale-agnostic-connection-detection
Open

fix(connect): locale-agnostic connection detection (fixes German / supersedes #319)#352
Dominien wants to merge 798 commits intostickerdaniel:mainfrom
Dominien:fix/locale-agnostic-connection-detection

Conversation

@Dominien
Copy link
Copy Markdown

Summary

`connect_with_person` currently returns `connect_unavailable` on every profile for users whose LinkedIn UI is not in English. On my German account every invite fails with `LinkedIn did not expose a usable Connect action for this profile.` even though the profile clearly renders a `Vernetzen` button.

Root cause

`core/browser.py` launches Playwright with `locale="en-US"`. That option only drives `navigator.language` + the `Accept-Language` header — LinkedIn ignores both for logged-in users and serves the UI in the language chosen under Settings → Account preferences → Display language on the account itself. So a US-locale browser can still render:

Account language Connect Follow Pending More
English (en) Connect Follow Pending More
German (de) Vernetzen Folgen Ausstehend Mehr
French (fr) Se connecter Suivre En attente Plus

`scraping/connection.py` only matched the English strings, so every non-English profile fell through to `unavailable`.

Changes

Kept `locale="en-US"` as a best-effort hint and added small per-locale tables in `scraping/connection.py` so detection works regardless of what LinkedIn actually renders:

  • `STATE_BUTTON_MAP_BY_LOCALE` — click labels keyed by locale + state. Ships `en`, `de`, `fr`. New locales = one line.
  • `_DETECTION_LABELS_BY_LOCALE` — labels used while parsing the action area.
  • `_SECTION_HEADINGS_BY_LOCALE` — per-locale section headings that end the action area (`About` / `Info` / `Infos` / …).
  • `_FIRST_DEGREE_MARKERS` — `· 1st` / `· 1.` / `· 1er`.
  • `_MORE_ARIA_LABELS` — `More` / `Mehr` / `Plus`.
  • `all_button_texts(state)` helper so locale-agnostic call sites (like the More menu) can build their locator text lists.

`detect_connection_state` now returns `tuple[ConnectionState, str]` where the second element is the detected locale, so the caller picks the right button without needing its own language guess. `connect_with_person` looks up `STATE_BUTTON_MAP_BY_LOCALE[locale][state]`; `_open_more_menu` builds its aria-label selector and menu-item regex dynamically from the locale table.

Why one PR for EN + DE + FR

#319 already proposed French support using an `is_french: bool` flag. That pattern doesn't generalize — adding German would require an `is_german: bool`, then Italian, Spanish, Dutch, etc. This PR introduces a single locale-keyed table so every new language is a one-line change in one file, and covers French as well to avoid leaving two overlapping PRs open. Credit to @vrpctaywal for pinning down the French case and the `_contains` helper shape.

Tests

  • Updated the 9 existing `TestDetectConnectionState` cases for the new tuple return value.
  • Added 6 German cases (connectable, follow_only, pending, incoming_request, already_connected, action-area cut-at-`Info`).
  • Added 5 French cases (parity with feat: add French locale support for connection state detection #319).
  • All 383 tests pass (`uv run pytest`).
  • `uv run ruff check`, `uv run ruff format --check`, `uv run pre-commit run` — all clean.

Test plan

  • `TestDetectConnectionState` — 20 unit tests cover EN + DE + FR
  • Full test suite — 383 passed
  • Manual: `connect_with_person` on a German profile with a `Vernetzen` button (verified locally — previously returned `connect_unavailable`, now sends the invite)
  • Manual: `connect_with_person` on a French profile with `Se connecter`
  • Manual: `connect_with_person` on a profile with only `Folgen` → goes through More menu

🤖 Generated with Claude Code

stickerdaniel and others added 30 commits March 5, 2026 15:22
- Use fixed 25-per-page offset instead of dynamic ID count
- Read "Page X of Y" from pagination state to cap pagination
- Add soft rate-limit retry via _extract_search_page helper
- Use keyword arguments in tool wrapper for clarity
- Stop on page 0 when no job IDs found (avoid useless page 1)
- Fix test_stops_at_total_pages to use distinct IDs per page so
  only the total_pages guard stops pagination
Add date_posted, job_type, experience_level, work_type, easy_apply,
and sort_by filters to search_jobs with human-readable normalization.
Fix Greptile review: always log no-results break, move _PAGE_SIZE to
module level, add Field(ge=1, le=10) on max_pages, skip ID extraction
on empty text.

Resolves: stickerdaniel#174
Use _normalize_csv for job_type to preserve raw commas in multi-value
filters and add human-readable names (full_time, contract, etc.).
Break early when _extract_search_page returns _RATE_LIMITED_MSG to
avoid extracting IDs from unreliable DOM state. Remove redundant
truthiness check now guarded by the early break.
Move _normalize_csv out of _build_job_search_url to module level for
reusability. Wait for job card links before sidebar scrolling to handle
async rendering. Document DOM-independence principle in CONTRIBUTING.md
and AGENTS.md.
The pagination state element has display:none so innerText cannot
capture it. Document why the class-based selector is necessary and
that it degrades gracefully to max_pages if LinkedIn renames it.
Use direct .get() lookup for date_posted and sort_by (single-select
filters). Remove unreachable _RATE_LIMITED_MSG check after early break.
Query _get_total_search_pages only once per search to avoid repeated
evaluate() calls when the element is absent.
Apply quote_plus to date_posted and sort_by passthrough values to
prevent malformed URLs from unexpected input. Use consistent 1-indexed
page numbers in all debug log messages.
Warn when search page rate-limit retry also fails. Add console.debug
in scroll_job_sidebar when no scrollable container is found.
Skip sidebar scrolling when <main> is absent to avoid 5s timeout on
edge-case pages. Fix off-by-one in total_pages log message. Add
page count assertion to test_deduplication_across_pages.
Append text to page_texts before breaking on no new IDs so the LLM
can read LinkedIn's feedback (e.g. "No jobs found") instead of
receiving empty sections.
Add await_count == 2 assertion to test_page_texts_joined_with_separator
matching the pattern already used in test_deduplication_across_pages.
Switch from innerText to textContent in _get_total_search_pages
so the "Page X of Y" text is readable regardless of CSS visibility.
- Replace console.debug in scroll_job_sidebar JS with sentinel return
  so the message is logged via Python logger instead
- Wrap _get_total_search_pages in its own try/except to prevent an
  exception from discarding already-fetched page text and job IDs
- Inline offset calculation into URL ternary for clarity
- Add debug log when sidebar container is found but no new content
  loads (scrolled == 0)
- Add debug log when <main> is absent and body fallback is used on
  search pages
- Use -2 sentinel for "job card link vanished" vs -1 for "no
  scrollable container" vs 0 for "no new content loaded"
- Return {source, text} from search page JS evaluate so the body
  fallback log fires based on actual DOM state, not the pre-evaluate
  wait_for_selector flag
- Add URL sanity check before _extract_job_ids to prevent extracting
  IDs from a stale page after a swallowed navigation failure
- Add test_no_ids_on_first_page_captures_text to pin the behavior
  where non-empty text with zero job IDs is returned in sections
- Change total_pages mock to None in test_pagination_uses_fixed_page_size
  since max_pages=2 caps the loop before total_pages is relevant
…uard

- Move _NOISE_MARKERS comment to directly precede the list it describes
- Log when <main> appears after wait_for_selector timeout but before
  evaluate (sidebar scroll skipped on late-appearing element)
- Add test_url_redirect_skips_id_extraction to exercise the URL
  sanity guard that prevents extracting IDs from a stale/redirect page
Capture _get_total_search_pages mock in test_stops_at_total_pages
and verify await_count == 1 to pin the query-once optimization.
…ols_add_job_ids_sidebar_scrolling_and_pagination_to_search_jobs

feat(tools): add job IDs, sidebar scrolling, and pagination to search_jobs
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…l commands (stickerdaniel#202)

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR adds a "Verifying Bug Reports" section to `AGENTS.md` with step-by-step `curl` commands for testing the MCP server end-to-end via HTTP transport. The `SESSION_ID` extraction via `grep`/`awk`/`tr -d '\r'` is correct and properly handles Windows-style line endings in curl header output.

However, the **server startup command blocks the terminal** — without `&` or an explicit note to use a separate shell, developers or agents following the script linearly will never reach the `curl` commands.

<h3>Confidence Score: 4/5</h3>

- Safe to merge once the server startup command is backgrounded or explicit terminal-switching instructions are added.
- The change is documentation-only and does not affect runtime code. The session-ID extraction logic is correct. The primary issue is a usability blocker: the server startup command blocks the terminal, preventing the documented workflow from executing end-to-end in a single shell. This is straightforward to fix with `&` or an explicit note.
- AGENTS.md — specifically the server startup command (line 138) needs to either background the process or include explicit instructions to use a separate terminal.

<sub>Last reviewed commit: e8e8eb9</sub>

> Greptile also left **1 inline comment** on this PR.

<!-- /greptile_comment -->
Activity feed pages lazy-load post content after tab headers render.
Add wait_for_function check and slower scroll params for /recent-activity/
URLs so posts section returns actual content instead of just tab headers.

Resolves: stickerdaniel#201
…ity-feed-posts-empty

fix(scraping): Wait for activity feed content before extracting
github-actions Bot and others added 26 commits April 6, 2026 08:53
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [docker/login-action](https://redirect.github.com/docker/login-action) ([changelog](https://redirect.github.com/docker/login-action/compare/b45d80f862d83dbcd57f89517bcf500b2ab88fb2..4907a6ddec9925e35a0a9e82d7399ccc52663121)) | action | digest | `b45d80f` → `4907a6d` |
| ghcr.io/astral-sh/uv | final | digest | `c4f5de3` → `90bbb3c` |

---

### Configuration

📅 **Schedule**: Branch creation - "before 6am on Monday" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://redirect.github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/stickerdaniel/linkedin-mcp-server).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDIuMTEiLCJ1cGRhdGVkSW5WZXIiOiI0My4xMDIuMTEiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbXX0=-->
…olve_ty_type-checker_errors_blocking_ci

fix: Resolve ty type-checker errors blocking CI
…prove_release_notes_with_direct_download_link_and_changelog_config

docs: Improve release notes with direct download link and changelog config
…tools

- Use --no-editable --no-dev --compile-bytecode in builder stage
- Remove git (no git-based deps exist)
- Remove COPY . /app/ and /app/data from runtime stage
- Bump requires-python to >=3.12,<3.15 (all 371 tests pass on 3.14.3)
- Add Python 3.14 classifier
- Restore README.md in Docker context for clean wheel builds
- Add Renovate rule to block automatic Python Docker tag bumps
…r-security

chore: optimize Dockerfile security and multi-stage build
…ump_version_to_4.8.3

chore: Bump version to 4.8.3
- Remove redundant text.length > 200 guard from details-page wait condition; startsWith checks are sufficient and the length threshold would cause 10s timeouts on legitimately short sections
…aping_wait_for_detail_panel_before_extracting_experience_sections

fix(scraping): wait for detail panel before extracting experience sections
- Fix off-by-one in MID line causing box misalignment with double-width emoji
- Update comment to say Unicode box-drawing instead of ASCII
…dd_dynamic_ascii_download_button_to_release_notes

style: Add dynamic ASCII download button to release notes
…me-status-sync

docs(readme): sync tool status table
- keep only tool-specific issue links in Features & Tool Status\n- add send_message issue stickerdaniel#344 and mark unaffected tools as working\n\nCloses stickerdaniel#346
…me-tool-status-sync

docs(readme): sync tool status table
…unts

`connect_with_person` currently returns `connect_unavailable` on every
profile for users whose LinkedIn UI is not in English (e.g. German,
French). The root cause sits upstream of the parser: Playwright's
`locale="en-US"` context option (core/browser.py) only drives
`navigator.language` and the Accept-Language header; LinkedIn ignores
both for logged-in sessions and serves the UI in the language chosen
under Settings → Display language on the account. A US-locale browser
thus still renders `Vernetzen` / `Folgen` for a German account,
`Se connecter` / `Suivre` for French, and the current detector — which
only matches the English `Connect` / `Follow` / `Pending` / `Accept`
strings — falls through to `unavailable`.

This change keeps `locale="en-US"` as a hint and adds explicit per-
locale tables so detection works regardless of what LinkedIn actually
renders:

  - `STATE_BUTTON_MAP_BY_LOCALE` — button text to click, keyed by
    locale + state. Ships `en`, `de`, `fr`. New locales = one line.
  - `_DETECTION_LABELS_BY_LOCALE` — labels used while parsing the
    action area.
  - `_SECTION_HEADINGS_BY_LOCALE` — per-locale section headings that
    mark the end of the action area (About / Info / Infos / …).
  - `_FIRST_DEGREE_MARKERS` — `· 1st` / `· 1.` / `· 1er`.
  - `_MORE_ARIA_LABELS` — `More` / `Mehr` / `Plus` for the three-dot
    menu button.

`detect_connection_state` now returns `tuple[ConnectionState, str]`
where the second element is the detected locale, so callers pick the
right button label without needing their own language guess.
`connect_with_person` threads the locale through to
`STATE_BUTTON_MAP_BY_LOCALE[locale][state]`, and `_open_more_menu`
builds its aria-label selector and menu-item regex from
`_MORE_ARIA_LABELS` + `all_button_texts("connectable")` so it works on
any supported locale.

Tests: updated the 9 existing `TestDetectConnectionState` cases for
the new tuple return, added 6 German cases (including an action-area
cut-at-`Info` regression) and 5 French cases (parity with stickerdaniel#319). All
383 tests pass, ruff + ruff-format + ty + pre-commit clean.

This supersedes stickerdaniel#319 (French-only) with a generalized fix; credit to
@vrpctaywal for identifying the French case and the `_contains`
helper shape.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

Replaces the hard-coded English-only button strings in connection.py with locale-keyed tables (EN/DE/FR), changes detect_connection_state to return (state, locale), and threads that locale through connect_with_person so the right on-screen label is clicked regardless of the account's display language. The design is clean, well-tested, and easy to extend for additional locales.

Confidence Score: 5/5

Safe to merge — all findings are P2 suggestions; the core locale-detection and button-click logic is correct.

All three inline comments are P2 (style/theoretical edge cases). The detection loop, locale-keyed button map, and extractor integration are logically sound. Test coverage is comprehensive across EN/DE/FR with only one missing French edge-case test. No P0/P1 issues found.

No files require special attention; the two minor concerns in connection.py (· 1. marker anchoring and "Plus" aria-label breadth) are low-risk in practice.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/connection.py Core locale-detection logic: well-structured locale tables, correct detection loop order; minor concerns around the unpinned · 1. German marker and the broad "Plus" aria-label fragment.
linkedin_mcp_server/scraping/extractor.py Correctly unpacks the new (state, locale) tuple and uses STATE_BUTTON_MAP_BY_LOCALE[locale][state] for button clicks; _open_more_menu builds a multi-locale selector but the "Plus" substring could match unintended French buttons.
tests/test_scraping.py Good coverage of EN/DE/FR states; only gap is a missing test_incoming_request_fr case to match German parity.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[profile_text] --> B{· 1st / · 1. / · 1er\nin first 300 chars?}
    B -- yes --> C[return already_connected, en]
    B -- no --> D[_extract_action_area\ncut at section heading]
    D --> E{Try EN labels}
    E -- Pending found --> F[return pending, en]
    E -- Accept+Ignore found --> G[return incoming_request, en]
    E -- Connect found --> H[return connectable, en]
    E -- Follow found --> I[return follow_only, en]
    E -- no match --> J{Try DE labels}
    J -- Ausstehend found --> K[return pending, de]
    J -- Annehmen+Ignorieren --> L[return incoming_request, de]
    J -- Vernetzen found --> M[return connectable, de]
    J -- Folgen found --> N[return follow_only, de]
    J -- no match --> O{Try FR labels}
    O -- En attente found --> P[return pending, fr]
    O -- Accepter+Ignorer --> Q[return incoming_request, fr]
    O -- Se connecter found --> R[return connectable, fr]
    O -- Suivre found --> S[return follow_only, fr]
    O -- no match --> T[return unavailable, en]
    H & M & R --> U[connect_with_person\nSTATE_BUTTON_MAP_BY_LOCALE\nlookup locale+state]
    G & L & Q --> U
    I & N & S --> V{_open_more_menu\nfinds Connect variant?}
    V -- yes --> U
    V -- no --> W[return follow_only result]
    U --> X[click_button_by_text\nbutton_text, scope]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tests/test_scraping.py
Line: 843-868

Comment:
**Missing `incoming_request` test for French locale**

The French test block covers `already_connected`, `connectable`, `follow_only`, `pending`, and the heading-cut case — but `incoming_request` is absent. German has `test_incoming_request_de` and English has `test_incoming_request`. Adding a French case would verify that `Accepter` + `Ignorer` together trigger `("incoming_request", "fr")`, the same parity the PR explicitly aims for with #319.

```python
def test_incoming_request_fr(self):
    text = "Jane Doe\n\n--\n\nParis\n\nAccepter\nIgnorer\nPlus"
    assert detect_connection_state(text) == ("incoming_request", "fr")
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/connection.py
Line: 79-83

Comment:
**German degree marker `· 1.` is an unpinned substring**

`· 1.` (4 characters) would match any string containing that sequence in the first 300 chars — including `· 1.5K followers` or `· 1.000 Kontakte` if LinkedIn ever renders follower/connection counts with the `·` (U+00B7) separator near the top of the extracted text. The English marker `· 1st` and French `· 1er` are terminated by a letter that prevents this drift. Adding a trailing space or newline anchor reduces the risk:

```python
_FIRST_DEGREE_MARKERS: tuple[str, ...] = (
    "\u00b7 1st",   # en
    "\u00b7 1. ",   # de – "· 1. Grades"; trailing space guards against "· 1.5K"
    "\u00b7 1er",   # fr
)
```

(The trailing space works because the degree badge text is always followed by a space before "Grades" / "Kontakt".)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/connection.py
Line: 85-88

Comment:
**`"Plus"` in `_MORE_ARIA_LABELS` could match non-More buttons in French**

Playwright's `aria-label*='Plus'` is a case-sensitive *substring* match. On a French profile LinkedIn renders several buttons whose accessible name contains "plus": "Afficher plus" (show-more toggles), "Ajouter une section" expand controls, etc. If any of those appear in `main` before the three-dot menu button, `more_btn.first.click()` fires on the wrong element. The English `"More"` and German `"Mehr"` are less ambiguous, but "Plus" is common enough in French UI strings to cause spurious clicks.

One mitigation is to also require `aria-label` to *end with* the label (e.g. `aria-label$='Plus'`), which is how LinkedIn normally labels the three-dot menu ("Plus d'options" or just "Plus"):

```python
more_selector = ", ".join(
    f"main button[aria-label$='{label}']" for label in _MORE_ARIA_LABELS
)
```

Alternatively, keep substring matching but place the French entry last so a false positive only fires if neither the English nor German More button was found first.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(connect): locale-agnostic detection ..." | Re-trigger Greptile

Comment thread tests/test_scraping.py
Comment on lines +843 to +868
# --- French locale (parity with existing PR #319) ---------------------

def test_already_connected_fr(self):
text = "Laurent Delade\n\n· 1er\n\nChannel Account Manager\n\nMessage\nPlus"
assert detect_connection_state(text) == ("already_connected", "en")

def test_connectable_fr(self):
text = (
"Laurent Delade\n\n· 2e\n\nChannel Account Manager\n\n"
"Se connecter\nEnregistrer\nPlus"
)
assert detect_connection_state(text) == ("connectable", "fr")

def test_follow_only_fr(self):
text = "Dragan Radulović\n\n· 3e\n\nPresident\n\nSuivre\nEnregistrer\nPlus"
assert detect_connection_state(text) == ("follow_only", "fr")

def test_pending_fr(self):
text = "Jane Doe\n\n· 2e\n\nEngineer\n\nEn attente\nPlus"
assert detect_connection_state(text) == ("pending", "fr")

def test_action_area_cuts_at_french_heading(self):
text = "Name\n\nSe connecter\nPlus\nInfos\n\nSuivre\nSe connecter"
area = _extract_action_area(text)
assert "Infos" not in area
assert "Se connecter" in area
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing incoming_request test for French locale

The French test block covers already_connected, connectable, follow_only, pending, and the heading-cut case — but incoming_request is absent. German has test_incoming_request_de and English has test_incoming_request. Adding a French case would verify that Accepter + Ignorer together trigger ("incoming_request", "fr"), the same parity the PR explicitly aims for with #319.

def test_incoming_request_fr(self):
    text = "Jane Doe\n\n--\n\nParis\n\nAccepter\nIgnorer\nPlus"
    assert detect_connection_state(text) == ("incoming_request", "fr")
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/test_scraping.py
Line: 843-868

Comment:
**Missing `incoming_request` test for French locale**

The French test block covers `already_connected`, `connectable`, `follow_only`, `pending`, and the heading-cut case — but `incoming_request` is absent. German has `test_incoming_request_de` and English has `test_incoming_request`. Adding a French case would verify that `Accepter` + `Ignorer` together trigger `("incoming_request", "fr")`, the same parity the PR explicitly aims for with #319.

```python
def test_incoming_request_fr(self):
    text = "Jane Doe\n\n--\n\nParis\n\nAccepter\nIgnorer\nPlus"
    assert detect_connection_state(text) == ("incoming_request", "fr")
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +79 to +83
_FIRST_DEGREE_MARKERS: tuple[str, ...] = (
"\u00b7 1st", # en
"\u00b7 1.", # de (Kontakt 1. Grades)
"\u00b7 1er", # fr
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 German degree marker · 1. is an unpinned substring

· 1. (4 characters) would match any string containing that sequence in the first 300 chars — including · 1.5K followers or · 1.000 Kontakte if LinkedIn ever renders follower/connection counts with the · (U+00B7) separator near the top of the extracted text. The English marker · 1st and French · 1er are terminated by a letter that prevents this drift. Adding a trailing space or newline anchor reduces the risk:

_FIRST_DEGREE_MARKERS: tuple[str, ...] = (
    "\u00b7 1st",   # en
    "\u00b7 1. ",   # de – "· 1. Grades"; trailing space guards against "· 1.5K"
    "\u00b7 1er",   # fr
)

(The trailing space works because the degree badge text is always followed by a space before "Grades" / "Kontakt".)

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/connection.py
Line: 79-83

Comment:
**German degree marker `· 1.` is an unpinned substring**

`· 1.` (4 characters) would match any string containing that sequence in the first 300 chars — including `· 1.5K followers` or `· 1.000 Kontakte` if LinkedIn ever renders follower/connection counts with the `·` (U+00B7) separator near the top of the extracted text. The English marker `· 1st` and French `· 1er` are terminated by a letter that prevents this drift. Adding a trailing space or newline anchor reduces the risk:

```python
_FIRST_DEGREE_MARKERS: tuple[str, ...] = (
    "\u00b7 1st",   # en
    "\u00b7 1. ",   # de – "· 1. Grades"; trailing space guards against "· 1.5K"
    "\u00b7 1er",   # fr
)
```

(The trailing space works because the degree badge text is always followed by a space before "Grades" / "Kontakt".)

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +85 to +88
# aria-label fragments for the profile's "More" (three-dot) menu button.
# LinkedIn renders these per the account's display language; Playwright's
# ``aria-label*=`` selector treats each entry as a case-sensitive substring.
_MORE_ARIA_LABELS: tuple[str, ...] = ("More", "Mehr", "Plus")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "Plus" in _MORE_ARIA_LABELS could match non-More buttons in French

Playwright's aria-label*='Plus' is a case-sensitive substring match. On a French profile LinkedIn renders several buttons whose accessible name contains "plus": "Afficher plus" (show-more toggles), "Ajouter une section" expand controls, etc. If any of those appear in main before the three-dot menu button, more_btn.first.click() fires on the wrong element. The English "More" and German "Mehr" are less ambiguous, but "Plus" is common enough in French UI strings to cause spurious clicks.

One mitigation is to also require aria-label to end with the label (e.g. aria-label$='Plus'), which is how LinkedIn normally labels the three-dot menu ("Plus d'options" or just "Plus"):

more_selector = ", ".join(
    f"main button[aria-label$='{label}']" for label in _MORE_ARIA_LABELS
)

Alternatively, keep substring matching but place the French entry last so a false positive only fires if neither the English nor German More button was found first.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/connection.py
Line: 85-88

Comment:
**`"Plus"` in `_MORE_ARIA_LABELS` could match non-More buttons in French**

Playwright's `aria-label*='Plus'` is a case-sensitive *substring* match. On a French profile LinkedIn renders several buttons whose accessible name contains "plus": "Afficher plus" (show-more toggles), "Ajouter une section" expand controls, etc. If any of those appear in `main` before the three-dot menu button, `more_btn.first.click()` fires on the wrong element. The English `"More"` and German `"Mehr"` are less ambiguous, but "Plus" is common enough in French UI strings to cause spurious clicks.

One mitigation is to also require `aria-label` to *end with* the label (e.g. `aria-label$='Plus'`), which is how LinkedIn normally labels the three-dot menu ("Plus d'options" or just "Plus"):

```python
more_selector = ", ".join(
    f"main button[aria-label$='{label}']" for label in _MORE_ARIA_LABELS
)
```

Alternatively, keep substring matching but place the French entry last so a false positive only fires if neither the English nor German More button was found first.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants