Skip to content

Commit 24b4047

Browse files
committed
feat: refactor Telegram whitelist logic and add comprehensive test coverage
- Extract Telegram path/username whitelisting into separate WHITELISTED_TELEGRAM_PATHS constant - Separate domain-based (t.me domain) from path-based (channel username) whitelisting - Enable more granular control over allowed Telegram channels for new user probation - Improve is_url_whitelisted() function in anti_spam.py - Add explicit handling for t.me and telegram.me hosts with path-based validation - Case-insensitive path matching for Telegram channel names - Reject root paths (/) and empty paths - Add comprehensive pytest coverage for URL whitelist validation - 48 test cases covering 487 lines of test code - Increases anti_spam.py coverage from 35% to 54% - Test categories: * Telegram link validation (protocols, message links, case sensitivity, ports) * Domain whitelist coverage (GitHub, documentation, AI/ML, cloud, package repos) * URL parsing edge cases (query parameters, fragments, special characters) * Helper function coverage (is_forwarded, has_link, extract_urls, etc.) - Add AGENTS.md documenting project structure and development workflow - Update .gitignore to track AGENTS.md
1 parent 0c1fbc8 commit 24b4047

5 files changed

Lines changed: 815 additions & 2 deletions

File tree

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ __pycache__/
88
.pytest_cache/
99
data/
1010
.vscode
11-
AGENTS.md
11+
# AGENTS.md

AGENTS.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# AGENTS.md - PythonID Telegram Bot
2+
3+
## Commands
4+
5+
```bash
6+
# Install dependencies
7+
uv sync
8+
9+
# Run tests
10+
uv run pytest
11+
12+
# Run a single test file
13+
uv run pytest tests/test_check.py
14+
15+
# Run a single test function
16+
uv run pytest tests/test_check.py::TestHandleCheckCommand::test_check_command_non_admin
17+
18+
# Run tests with coverage
19+
uv run pytest --cov=bot --cov-report=term-missing
20+
21+
# Run the bot
22+
uv run pythonid-bot
23+
```
24+
25+
## Architecture
26+
27+
- **src/bot/**: Main application package
28+
- **main.py**: Entry point with JobQueue integration — register new handlers here
29+
- **config.py**: Pydantic settings (`get_settings()` cached via `lru_cache`)
30+
- **constants.py**: Centralized message templates and utilities
31+
- **handlers/**: Telegram update handlers (message.py, dm.py, captcha.py, verify.py, anti_spam.py, topic_guard.py, check.py)
32+
- **services/**: Business logic (user_checker.py, scheduler.py, bot_info.py, telegram_utils.py, captcha_recovery.py)
33+
- **database/**: SQLModel schemas (models.py) and SQLite operations (service.py) — use `get_database()` singleton
34+
- **tests/**: pytest-asyncio tests with mocked telegram API
35+
- **data/bot.db**: SQLite database (auto-created via `SQLModel.metadata.create_all`)
36+
37+
## Code Style
38+
39+
- **Python 3.11+** with type hints; imports grouped: stdlib → third-party → local
40+
- **Async/await**: All handlers are async functions
41+
- **PTB v20+**: Use `ContextTypes.DEFAULT_TYPE` for context type hints, not legacy `Dispatcher`/`Updater`
42+
- **SQLModel**: Use `session.exec(select(Model).where(...)).first()` syntax; no Alembic migrations
43+
- **Logging**: Use `logfire` for structured logging, not `print()` or stdlib `logging`
44+
- **Error handling**: Catch specific exceptions (e.g., `TimedOut`), log errors, return gracefully
45+
- **No comments**: Avoid inline comments unless code is complex
46+
- **Docstrings**: Module-level docstrings required, function docstrings for public APIs
47+
48+
## Testing
49+
50+
- **Async mode**: `asyncio_mode = auto` in pyproject.toml — do NOT use `@pytest.mark.asyncio` decorators
51+
- **Fixtures**: Check existing fixtures in test files (`mock_update`, `mock_context`, `mock_settings`)
52+
- **Mocking**: Use `AsyncMock` and `MagicMock` for telegram API; no real network calls

src/bot/constants.py

Lines changed: 258 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,263 @@ def format_hours_display(hours: int) -> str:
266266
"core.telegram.org",
267267

268268
# Indonesian Tech Communities
269-
"t.me",
270269
"dicoding.com",
271270
])
271+
272+
# Whitelisted Telegram paths/usernames for new user probation
273+
# Only these specific t.me paths are allowed (exact match on first path segment)
274+
# e.g., "PythonID" allows "t.me/PythonID", "t.me/PythonID/123", but not "t.me/PythonIDSpam"
275+
# Values should be lowercased for case-insensitive matching
276+
WHITELISTED_TELEGRAM_PATHS = frozenset([
277+
# Cloud & Platforms
278+
"juaragcp",
279+
"awsdatausergroupid",
280+
"awsusergroupid",
281+
"azureindo",
282+
"gcpuserid",
283+
"gcp_id",
284+
285+
# AI & Data Science
286+
"artificialintelligence_indonesia",
287+
"businessintelligenceid",
288+
"dataengineeringid",
289+
"datascienceindonesia",
290+
"iaiforum",
291+
"machinelearningid",
292+
"nlp_lounge",
293+
"pytorchid",
294+
"scrapeid",
295+
"tableauprofessionals",
296+
"tensorflowid",
297+
298+
# Databases
299+
"sqlserverid",
300+
"mongodb_id",
301+
"mongo_db",
302+
"mysqlid",
303+
"postgresql_id",
304+
305+
# General Programming & Developer Groups
306+
"bandungdevcom",
307+
"belajarcoding",
308+
"belajarngodingbareng",
309+
"gnurindonesia",
310+
"belajargolangmariadb",
311+
"belajarhtmlcss",
312+
"bogordev",
313+
"borneokoding",
314+
"tgbotid",
315+
"otodidak_ngoding",
316+
"crbdev",
317+
"codingfess",
318+
"cscript",
319+
"femalegeek",
320+
"freekelasgithub",
321+
"frontendid",
322+
"gresikdev",
323+
"iamindonesia",
324+
"idstack",
325+
"infotechprogrammer",
326+
"itnusantara",
327+
"djemberdev",
328+
"kabayan_coding",
329+
"kelasmobilemalang",
330+
"backendid",
331+
"komunitasbk",
332+
"komunitasrpaindonesia",
333+
"kongkowitmedan",
334+
"kongkowitpekanbaru",
335+
"kotakodebetachat",
336+
"kulkultech",
337+
"odooindonesia",
338+
"pasuruandev",
339+
"programersemarangraya",
340+
"rantaudev",
341+
"santrenkoding",
342+
"sarccomuniverse",
343+
"sidoarjodev",
344+
"sinaudev",
345+
"soft_eng_id",
346+
"sparkarindonesia",
347+
"surabayadev",
348+
"lamongandev",
349+
"tamankodekode",
350+
"tiadevcommunity",
351+
"teknologi_umum_v2",
352+
"idwordpress",
353+
"smk_dev",
354+
355+
# DevOps & Infrastructure
356+
"ansibleid",
357+
"cloudcomputingindonesia",
358+
"dockeridn",
359+
"iddevops",
360+
"kubernetesindonesia",
361+
"okdindonesia",
362+
"devopsjogja",
363+
364+
# Firebase
365+
"firebaseindonesia",
366+
367+
# FreeBSD
368+
"setanmerahid",
369+
370+
# Game Development
371+
"gamerang",
372+
"gdevelopid",
373+
"godot_indonesia",
374+
"lombokgamedev",
375+
376+
# IoT
377+
"kelasrobotgrup",
378+
"arduinoindonesiancommunity",
379+
"edukasielektronika",
380+
"raspberrypi_id",
381+
382+
# iOS
383+
"ikaskus",
384+
"initialestore",
385+
"libimobiledevice",
386+
387+
# Jokes
388+
"linux_memes",
389+
"programmerjokes",
390+
391+
# Linux
392+
"archlinuxid",
393+
"artixlinux_id",
394+
"gnulinuxindonesia",
395+
"belajarlinuxbareng",
396+
"blankonlinux",
397+
"centosid",
398+
"debianid",
399+
"deepin_indonesia",
400+
"dotfiles_id",
401+
"elementaryid",
402+
"fedoraid",
403+
"gnomeid",
404+
"gnuweeb",
405+
"kalilinuxid",
406+
"kdeid",
407+
"linuxmalang",
408+
"linuxjember",
409+
"lfsid",
410+
"langitketujuh_id",
411+
"mint_id",
412+
"linuxgroupid",
413+
"manjaroid",
414+
"nixosid",
415+
"opensuse_id",
416+
"linuxsolo",
417+
"parrotsecurityindonesia",
418+
"rhel_id",
419+
"ubuntu_indo",
420+
"voidlinux_id",
421+
422+
# macOS
423+
"macosid",
424+
425+
# Office Productivity
426+
"excelid",
427+
"belajarlibreofficeindonesia",
428+
429+
# Open Source & Security
430+
"osint_indonesia",
431+
"doscomedia",
432+
"forensicaid",
433+
"itsecurityindonesia",
434+
"linuxhackingid",
435+
"orangsiber",
436+
"reversingid",
437+
"cybersecurity_id",
438+
"hacktheboxindo",
439+
440+
# Programming Languages (Specific)
441+
"dotnetusergroup",
442+
"dotnetcore_id",
443+
"xamarinindonesia",
444+
"androiddevbdg",
445+
"androiddevelopernasional",
446+
"teknorialcom",
447+
"android_lombok",
448+
"androiddevsurabaya",
449+
"jcomposeindonesia",
450+
"androidsemarang",
451+
"source_code_android",
452+
"yacgroup",
453+
"agilecirclesid",
454+
"agileindonesia",
455+
"assemblyid",
456+
"bashidorg",
457+
"ccpp_indonesia",
458+
"idcplc",
459+
"crystalid",
460+
"dart_web",
461+
"flutter_id",
462+
"flutter_jkt",
463+
"fluttermakassar",
464+
"lombokflutter",
465+
"elixir_id",
466+
"gophers_id",
467+
"golangjogja",
468+
"golangsurabaya",
469+
"rustacean_id",
470+
"jvmindonesia",
471+
"adonisid",
472+
"angularid",
473+
"deno_id",
474+
"indonesiaionic",
475+
"js_id",
476+
"jogjajs",
477+
"lombokjs",
478+
"nativescript_id",
479+
"nestjs_indonesia",
480+
"nextjs_id",
481+
"nodejsid",
482+
"bun_id",
483+
"react_idn",
484+
"reactnativeindo",
485+
"surabayajs",
486+
"svelte_id",
487+
"vuejsindonesia",
488+
"kotlin_crb",
489+
"kotlinindonesia",
490+
"delphiindonesia",
491+
"pascalid",
492+
"codeigniterindonesia",
493+
"laravelindonesia",
494+
"phpidforbusiness",
495+
"phpidforstudent",
496+
"phpjogloraya",
497+
"symfonyid",
498+
"botphp",
499+
"yiiframeworkindonesia",
500+
"bandung_py",
501+
"djangoid",
502+
"fastapiid",
503+
"flaskid",
504+
"lombok_py",
505+
"mkspy",
506+
"pyjogja",
507+
"pythonid", # Duplicate of "pythonid" but kept for completeness of list
508+
"python",
509+
"pythonlearnerr",
510+
"python_learners_group",
511+
"surabayapy",
512+
"railsid",
513+
"ruby_id",
514+
"swiftid",
515+
"typescriptindonesia",
516+
"sapabapindonesia",
517+
"gis_id",
518+
"leafletid",
519+
"qgisindonesia",
520+
521+
# QA
522+
"sqa_id",
523+
"qamalang",
524+
525+
# Text Editors
526+
"emacsid",
527+
"vimid",
528+
])

src/bot/handlers/anti_spam.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
NEW_USER_SPAM_WARNING,
2121
RESTRICTED_PERMISSIONS,
2222
WHITELISTED_URL_DOMAINS,
23+
WHITELISTED_TELEGRAM_PATHS,
2324
format_hours_display,
2425
)
2526
from bot.database.service import get_database
@@ -139,6 +140,22 @@ def is_url_whitelisted(url: str) -> bool:
139140
# Remove port if present
140141
if ':' in hostname:
141142
hostname = hostname.rsplit(':', 1)[0]
143+
144+
# Specific logic for Telegram links
145+
# Check against WHITELISTED_TELEGRAM_PATHS instead of WHITELISTED_URL_DOMAINS
146+
if hostname in {"t.me", "telegram.me"}:
147+
path = parsed.path
148+
if not path or path == "/":
149+
return False
150+
151+
# Extract the first segment of the path (the username/channel name)
152+
# e.g., "/PythonID/123" -> "pythonid"
153+
parts = path.strip("/").split("/")
154+
if not parts:
155+
return False
156+
157+
first_segment = parts[0].lower()
158+
return first_segment in WHITELISTED_TELEGRAM_PATHS
142159

143160
# Check suffixes of the hostname against the set
144161
# e.g., "sub.example.github.com" checks:

0 commit comments

Comments
 (0)