I build production-ready computer vision and multimodal systems β from research ideas to real-world pipelines.
Currently at 2GIS, working on visual AI solutions. Previously at MTS (video analytics & moderation) and Skoltech (applied ML research).
- π₯ Video understanding: scene detection, temporal segmentation, content moderation
- π§ Multimodal models: CLIP, BLIP, retrieval & captioning systems
- π Detection & segmentation: YOLO, Mask R-CNN, GroundingDINO, OWLv2
- βοΈ ML system design: from offline pipelines to real-time inference
- π¦ Production ML: optimization (ONNX, TensorRT), batching, GPU pipelines
Python Β· PyTorch Β· OpenCV Β· MMDetection Β· Docker Β· Kubernetes Β· Kafka Β· MLflow
- Built modular video analytics pipeline (scene detection β tracking β captioning β clustering)
- Designed CLIP-based scene grouping with temporal constraints
- Developed content moderation tools (alcohol, smoking, etc.) using multimodal models
- Worked on super-resolution and temporal action segmentation

