Skip to content

Add search_people_with_past_company tool for advanced people filtering#205

Open
guykwan wants to merge 580 commits intostickerdaniel:mainfrom
guykwan:main
Open

Add search_people_with_past_company tool for advanced people filtering#205
guykwan wants to merge 580 commits intostickerdaniel:mainfrom
guykwan:main

Conversation

@guykwan
Copy link
Copy Markdown

@guykwan guykwan commented Mar 6, 2026

Summary

This PR introduces a new tool search_people_with_past_company that enables advanced people search with filtering by past companies and current job titles.

New Feature: search_people_with_past_company

Use Cases

  • Find founders who previously worked at major tech companies
  • Identify executives with experience at specific companies
  • Build talent pools based on company background

Parameters

  • keywords (required): Search keywords (e.g., "founder", "CEO")
  • location (optional): Location filter (e.g., "Beijing", "San Francisco")
  • past_companies (optional): Comma-separated company names (e.g., "Alibaba,ByteDance,Tencent")
  • current_title (optional): Current job title filter (e.g., "founder", "CEO")
  • max_results (optional): Maximum results (default: 10)

Example

mcporter call linkedin.search_people_with_past_company \
    keywords="founder" \
    location="Beijing" \
    past_companies="Alibaba,ByteDance" \
    current_title="founder"

Implementation

  • Two-step search: basic search + detailed profile filtering
  • Rate limiting protection (1.5s delay between profiles)
  • Progress reporting during search
  • Returns both full and partial matches

Changes

  • Added search_people_with_past_company() tool
  • Added 5 helper functions for URL extraction and profile parsing
  • Added asyncio import

Testing

  • ✅ Code syntax validated
  • ✅ No breaking changes
  • ✅ Follows existing patterns

Related

Useful for talent acquisition, investment research, and competitive intelligence.

Greptile Summary

This PR introduces a search_people_with_past_company tool that performs a two-step search: first fetching LinkedIn people search results, then iterating each profile to filter by past company and current title. Unfortunately, the implementation has several blocking issues that prevent it from functioning at all.

Key issues found:

  • Wrong file location: The new file is created at tools/person.py (repository root) instead of linkedin_mcp_server/tools/person.py (the actual module). The server only imports from linkedin_mcp_server.tools.person, so the new tool is never registered.
  • Wrong keyword argument: extractor.scrape_person(username, requested_sections={"experience"}) uses a non-existent parameter name — the actual parameter is requested. This raises a TypeError on every profile fetch.
  • URL extraction is fundamentally broken: _extract_profile_urls searches for full https://linkedin.com/in/... URLs inside innerText, but extract_page returns plain text (no HTML). Profile URLs are only in href attributes and are never printed as visible text, so this function always returns an empty list.
  • username field always None: scrape_person returns {"url": ..., "sections": ...} — no "username" key — so every matched profile's username field will be None.
  • Non-deterministic profile ordering: Use of set() in _extract_profile_urls loses LinkedIn's relevance-ranked ordering.
  • Non-English comment and module-level import re style issues.

Confidence Score: 1/5

  • Not safe to merge — the new tool is unreachable due to wrong file placement and contains multiple critical runtime errors.
  • Three independent blocking defects (wrong module path, wrong keyword argument, broken URL extraction from plain text) each individually prevent the feature from functioning. The tool is effectively dead code in its current form.
  • tools/person.py — all changes are in this single file, which needs to be moved to linkedin_mcp_server/tools/person.py and the logic bugs fixed before any of the new functionality can work.

Important Files Changed

Filename Overview
tools/person.py New file added at wrong path (root-level tools/ instead of linkedin_mcp_server/tools/), making the new tool completely unreachable. Contains multiple critical bugs: wrong keyword argument name on scrape_person, URL extraction from innerText that will always return empty, and username always being None in output.

Sequence Diagram

sequenceDiagram
    participant Client
    participant MCP as MCP Server
    participant Tool as search_people_with_past_company
    participant Extractor as LinkedInExtractor

    Client->>MCP: call search_people_with_past_company(keywords, location, past_companies, current_title)
    MCP->>Tool: invoke
    Tool->>Extractor: search_people(keywords, location)
    Extractor-->>Tool: {url, sections: {search_results: innerText}}
    Note over Tool: _extract_profile_urls(innerText)<br/>⚠️ Always returns [] — URLs not in plain text
    loop For each profile URL (up to max_results × 3)
        Tool->>Extractor: scrape_person(username, requested_sections={"experience"})<br/>⚠️ TypeError: wrong kwarg name (should be 'requested')
        Extractor-->>Tool: {url, sections: {experience: text}}
        Note over Tool: _parse_profile_for_filters()<br/>profile_result.get("username") → None always
        alt matches_all
            Tool->>Tool: append to matching_profiles
        else matches_partial
            Tool->>Tool: append to partial_matches
        end
        Note over Tool: asyncio.sleep(1.5)
    end
    Tool-->>Client: {search_url, total_checked, filters, matching_profiles, partial_matches}
Loading

Last reviewed commit: 0192ab1

Greptile also left 7 inline comments on this PR.

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

stickerdaniel and others added 30 commits January 12, 2026 15:10
…iel#91)

<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Automates Docker Hub page updates during releases.
> 
> - Adds `Update Docker Hub description` step in `release.yml` using `peter-evans/dockerhub-description@v5` with repo credentials and `readme-filepath` pointing to `docs/docker-hub.md`
> - Introduces `docs/docker-hub.md` containing concise image description, features, and quick-start instructions (cookie auth and uvx session mount)
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit ee39269. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
…kerdaniel#92)

<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Improves documentation for authentication, session handling, and Docker usage across `README.md` and `docs/docker-hub.md`.
> 
> - **Security**: Adds warning that `~/.linkedin-mcp/session.json` contains sensitive auth data
> - **Auth/session flow**: Promotes `--get-session` for browser login, clarifies captcha/2FA handling, and points users to uvx to resolve challenges
> - **Docker guidance**: Clearly states `--get-session`/`--no-headless` aren’t available in Docker; provides two auth options (mount session or pass `li_at` cookie) with examples and notes
> - **DXT and local setup**: Updates steps to create session first, then run; simplifies notes and troubleshooting; separates login vs scraping issues
> - **Copy/consistency**: Tightens wording, aligns CLI options and examples, fixes/updates links and formatting
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 40b8ef4. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Minor release version bump.
> 
> - Updates project version in `pyproject.toml` from `2.1.1` to `2.1.2`
> - Syncs `uv.lock` to reflect the new package version
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4350e34. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
…ependencies

chore(deps): pin dependencies
Updated instructions to use an incognito tab for obtaining the 'li_at' cookie.
…sh-setup-bun-digest

chore(deps): update oven-sh/setup-bun digest to db6bcf6
…sh-setup-bun-digest

chore(deps): update oven-sh/setup-bun digest to 3d26778
…opics-claude-code-action-digest

chore(deps): update anthropics/claude-code-action digest to a017b83
Move semantic validation (ranges, positive values) from loaders to
schema classes. Add BrowserConfig.validate() for viewport, timeout,
and slow_mo validation. Call validate() at end of load_config().

- Add new env vars: TIMEOUT, USER_AGENT, HOST, PORT, HTTP_PATH, SLOW_MO, VIEWPORT
- Add --linkedin-cookie CLI argument
- Fix --viewport default to None (was overwriting env vars)
- Change viewport CLI error from warning to ConfigurationError
…iel#99)

Move semantic validation (ranges, positive values) from loaders to
schema classes. Add BrowserConfig.validate() for viewport, timeout,
and slow_mo validation. Call validate() at end of load_config().

- Add new env vars: TIMEOUT, USER_AGENT, HOST, PORT, HTTP_PATH, SLOW_MO, VIEWPORT
- Add --linkedin-cookie CLI argument
- Fix --viewport default to None (was overwriting env vars)
- Change viewport CLI error from warning to ConfigurationError

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Shifts semantic validation from loaders into `BrowserConfig.validate()` and `AppConfig.validate()`, with a final `config.validate()` call in `load_config()`.
> 
> - Adds env vars: `TIMEOUT`, `USER_AGENT`, `HOST`, `PORT`, `HTTP_PATH`, `SLOW_MO`, `VIEWPORT`; removes `DEFAULT_TIMEOUT`
> - Adds CLI: `--linkedin-cookie`; sets `--viewport` default to `None` and raises `ConfigurationError` on bad format
> - Validates and parses integers for `TIMEOUT`, `PORT`, `SLOW_MO`; rejects invalid `TRANSPORT`
> - Keeps loaders focused on reading values; schema enforces ranges/format (viewport, timeout, slow_mo, port, path)
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 77e159f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| ghcr.io/astral-sh/uv | final | pinDigest |  → `9a23023` |
| python | stage | pinDigest |  → `4a3ceab` |

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://redirect.github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/stickerdaniel/linkedin-mcp-server).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->
stickerdaniel and others added 25 commits March 5, 2026 19:07
Use direct .get() lookup for date_posted and sort_by (single-select
filters). Remove unreachable _RATE_LIMITED_MSG check after early break.
Query _get_total_search_pages only once per search to avoid repeated
evaluate() calls when the element is absent.
Apply quote_plus to date_posted and sort_by passthrough values to
prevent malformed URLs from unexpected input. Use consistent 1-indexed
page numbers in all debug log messages.
Warn when search page rate-limit retry also fails. Add console.debug
in scroll_job_sidebar when no scrollable container is found.
Skip sidebar scrolling when <main> is absent to avoid 5s timeout on
edge-case pages. Fix off-by-one in total_pages log message. Add
page count assertion to test_deduplication_across_pages.
Append text to page_texts before breaking on no new IDs so the LLM
can read LinkedIn's feedback (e.g. "No jobs found") instead of
receiving empty sections.
Add await_count == 2 assertion to test_page_texts_joined_with_separator
matching the pattern already used in test_deduplication_across_pages.
Switch from innerText to textContent in _get_total_search_pages
so the "Page X of Y" text is readable regardless of CSS visibility.
- Replace console.debug in scroll_job_sidebar JS with sentinel return
  so the message is logged via Python logger instead
- Wrap _get_total_search_pages in its own try/except to prevent an
  exception from discarding already-fetched page text and job IDs
- Inline offset calculation into URL ternary for clarity
- Add debug log when sidebar container is found but no new content
  loads (scrolled == 0)
- Add debug log when <main> is absent and body fallback is used on
  search pages
- Use -2 sentinel for "job card link vanished" vs -1 for "no
  scrollable container" vs 0 for "no new content loaded"
- Return {source, text} from search page JS evaluate so the body
  fallback log fires based on actual DOM state, not the pre-evaluate
  wait_for_selector flag
- Add URL sanity check before _extract_job_ids to prevent extracting
  IDs from a stale page after a swallowed navigation failure
- Add test_no_ids_on_first_page_captures_text to pin the behavior
  where non-empty text with zero job IDs is returned in sections
- Change total_pages mock to None in test_pagination_uses_fixed_page_size
  since max_pages=2 caps the loop before total_pages is relevant
…uard

- Move _NOISE_MARKERS comment to directly precede the list it describes
- Log when <main> appears after wait_for_selector timeout but before
  evaluate (sidebar scroll skipped on late-appearing element)
- Add test_url_redirect_skips_id_extraction to exercise the URL
  sanity guard that prevents extracting IDs from a stale/redirect page
Capture _get_total_search_pages mock in test_stops_at_total_pages
and verify await_count == 1 to pin the query-once optimization.
…ols_add_job_ids_sidebar_scrolling_and_pagination_to_search_jobs

feat(tools): add job IDs, sidebar scrolling, and pagination to search_jobs
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…l commands (stickerdaniel#202)

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR adds a "Verifying Bug Reports" section to `AGENTS.md` with step-by-step `curl` commands for testing the MCP server end-to-end via HTTP transport. The `SESSION_ID` extraction via `grep`/`awk`/`tr -d '\r'` is correct and properly handles Windows-style line endings in curl header output.

However, the **server startup command blocks the terminal** — without `&` or an explicit note to use a separate shell, developers or agents following the script linearly will never reach the `curl` commands.

<h3>Confidence Score: 4/5</h3>

- Safe to merge once the server startup command is backgrounded or explicit terminal-switching instructions are added.
- The change is documentation-only and does not affect runtime code. The session-ID extraction logic is correct. The primary issue is a usability blocker: the server startup command blocks the terminal, preventing the documented workflow from executing end-to-end in a single shell. This is straightforward to fix with `&` or an explicit note.
- AGENTS.md — specifically the server startup command (line 138) needs to either background the process or include explicit instructions to use a separate terminal.

<sub>Last reviewed commit: e8e8eb9</sub>

> Greptile also left **1 inline comment** on this PR.

<!-- /greptile_comment -->
Activity feed pages lazy-load post content after tab headers render.
Add wait_for_function check and slower scroll params for /recent-activity/
URLs so posts section returns actual content instead of just tab headers.

Resolves: stickerdaniel#201
…ity-feed-posts-empty

fix(scraping): Wait for activity feed content before extracting
Comment thread tools/person.py
@@ -0,0 +1,347 @@
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File placed in wrong directory — tool never registered

This file is added to tools/person.py at the repository root, but the MCP server imports from linkedin_mcp_server.tools.person (see linkedin_mcp_server/server.py line 20):

from linkedin_mcp_server.tools.person import register_person_tools

The actual module that is loaded and registered is at linkedin_mcp_server/tools/person.py. This new file at tools/person.py is never imported by anything, so search_people_with_past_company will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to linkedin_mcp_server/tools/person.py instead.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 1

Comment:
**File placed in wrong directory — tool never registered**

This file is added to `tools/person.py` at the repository root, but the MCP server imports from `linkedin_mcp_server.tools.person` (see `linkedin_mcp_server/server.py` line 20):

```python
from linkedin_mcp_server.tools.person import register_person_tools
```

The actual module that is loaded and registered is at `linkedin_mcp_server/tools/person.py`. This new file at `tools/person.py` is never imported by anything, so `search_people_with_past_company` will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to `linkedin_mcp_server/tools/person.py` instead.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread tools/person.py
Comment on lines +218 to +222

await ctx.report_progress(
progress=30 + int((idx / len(profile_urls)) * 60),
total=100,
message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong keyword argument name causes TypeError at runtime

scrape_person is defined with the parameter name requested (see linkedin_mcp_server/scraping/extractor.py line 254):

async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:

Calling it with the keyword argument requested_sections will raise a TypeError: scrape_person() got an unexpected keyword argument 'requested_sections' at runtime, causing every profile check to fail.

Suggested change
await ctx.report_progress(
progress=30 + int((idx / len(profile_urls)) * 60),
total=100,
message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}"
profile_result = await extractor.scrape_person(
username, requested={"experience"}
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 218-222

Comment:
**Wrong keyword argument name causes `TypeError` at runtime**

`scrape_person` is defined with the parameter name `requested` (see `linkedin_mcp_server/scraping/extractor.py` line 254):

```python
async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:
```

Calling it with the keyword argument `requested_sections` will raise a `TypeError: scrape_person() got an unexpected keyword argument 'requested_sections'` at runtime, causing every profile check to fail.

```suggestion
                    profile_result = await extractor.scrape_person(
                        username, requested={"experience"}
                    )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread tools/person.py
Comment on lines +196 to +198
)

# Extract profile URLs from search results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL extraction from innerText will always return an empty list

extractor.search_people() calls extract_page(), which returns main.innerText — plain text with no HTML markup. LinkedIn profile URLs (e.g. https://www.linkedin.com/in/username) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the innerText string, so _extract_profile_urls will always return [], meaning the second-step filtering never runs and the function always returns zero matches.

To reliably extract profile URLs, the extractor would need to read href attributes directly from the DOM (similar to how _extract_job_ids does it via page.evaluate) rather than parsing plain text.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 196-198

Comment:
**URL extraction from `innerText` will always return an empty list**

`extractor.search_people()` calls `extract_page()`, which returns `main.innerText` — plain text with no HTML markup. LinkedIn profile URLs (e.g. `https://www.linkedin.com/in/username`) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the `innerText` string, so `_extract_profile_urls` will always return `[]`, meaning the second-step filtering never runs and the function always returns zero matches.

To reliably extract profile URLs, the extractor would need to read `href` attributes directly from the DOM (similar to how `_extract_job_ids` does it via `page.evaluate`) rather than parsing plain text.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread tools/person.py
Comment on lines +270 to +272
def _extract_profile_urls(search_text: str) -> list[str]:
"""Extract LinkedIn profile URLs from search results text."""
import re
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profile_result.get("username") always returns None

scrape_person returns {"url": ..., "sections": ...} — there is no "username" key in its return dict. This means every profile in matching_profiles and partial_matches will have "username": None, making it impossible for callers to look up or identify the matching profiles.

Suggested change
def _extract_profile_urls(search_text: str) -> list[str]:
"""Extract LinkedIn profile URLs from search results text."""
import re
"username": url.split("/in/")[-1].rstrip("/") if url else None,

Or more cleanly, pass the username variable (already extracted on line 210) into _parse_profile_for_filters.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 270-272

Comment:
**`profile_result.get("username")` always returns `None`**

`scrape_person` returns `{"url": ..., "sections": ...}` — there is no `"username"` key in its return dict. This means every profile in `matching_profiles` and `partial_matches` will have `"username": None`, making it impossible for callers to look up or identify the matching profiles.

```suggestion
        "username": url.split("/in/")[-1].rstrip("/") if url else None,
```

Or more cleanly, pass the `username` variable (already extracted on line 210) into `_parse_profile_for_filters`.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread tools/person.py
Comment on lines +290 to +294
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-deterministic URL ordering from set() deduplication

_extract_profile_urls returns [f"https://linkedin.com/in/{username}" for username in set(matches)]. The set conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use dict.fromkeys to preserve insertion order while deduplicating:

Suggested change
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")
seen = dict.fromkeys(matches)
return [f"https://linkedin.com/in/{username}" for username in seen]
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 290-294

Comment:
**Non-deterministic URL ordering from `set()` deduplication**

`_extract_profile_urls` returns `[f"https://linkedin.com/in/{username}" for username in set(matches)]`. The `set` conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use `dict.fromkeys` to preserve insertion order while deduplicating:

```suggestion
    seen = dict.fromkeys(matches)
    return [f"https://linkedin.com/in/{username}" for username in seen]
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread tools/person.py

except Exception as e:
raise_tool_error(e, "search_people") # NoReturn

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-English inline comment

The comment # 更长超时,因为需要获取多个档案 is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:

Suggested change
timeout=TOOL_TIMEOUT_SECONDS * 3, # Longer timeout because multiple profiles need to be fetched
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 124

Comment:
**Non-English inline comment**

The comment `# 更长超时,因为需要获取多个档案` is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:

```suggestion
        timeout=TOOL_TIMEOUT_SECONDS * 3,  # Longer timeout because multiple profiles need to be fetched
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread tools/person.py
Comment on lines +287 to +295
profile_result: dict[str, Any],
past_company_list: list[str],
current_title: str | None,
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import re inside function body

re is imported inside both _extract_profile_urls (line 288) and _extract_username_from_url (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move import re to the module-level imports alongside import asyncio and import logging.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 287-295

Comment:
**`import re` inside function body**

`re` is imported inside both `_extract_profile_urls` (line 288) and `_extract_username_from_url` (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move `import re` to the module-level imports alongside `import asyncio` and `import logging`.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants