Coreference Resolution Module

Part of M-Flow — the cognitive memory engine for AI agents.

A lightweight, rule-based coreference resolution system that replaces pronouns with their concrete antecedents. Used by M-Flow's preprocessing pipeline to resolve references before retrieval, ensuring that "he", "it", "the company" are replaced with actual names before entering the knowledge graph.

Supports Chinese (11 pronoun types with semantic role analysis) and English (basic resolution).

✨ Features / 特点

11 Pronoun Types: Person, possessive, object, location, time, ordinal, event, formal deictic, reflexive, generic, and bound variable pronouns
Semantic Role Analysis: Uses verb semantics to determine correct antecedent (e.g., patient vs. agent)
Stream Processing: Real-time sentence-by-sentence resolution
No Training Required: Pure rule-based, no ML models needed for core functionality
English Support: Includes a basic English coreference module

📦 Installation / 安装

From Source / 源码安装

git clone https://github.com/FlowElement-ai/m_flow.git
cd m_flow/coreference
pip install -e .

With Optional Dependencies / 安装可选依赖

# Install with HanLP/LTP for enhanced parsing
pip install -e ".[nlp]"

# Install with development tools
pip install -e ".[dev]"

🚀 Quick Start / 快速开始

Basic Usage / 基础用法

from coreference_module import CoreferenceResolver

resolver = CoreferenceResolver()

text = "小明去北京。他在那里工作。"
resolved, replacements = resolver.resolve_text(text)

print(resolved)
# Output: 小明去北京。小明在北京工作。

print(replacements)
# Output: [
#   {'pronoun': '他', 'replacement': '小明', 'position': 7},
#   {'pronoun': '那里', 'replacement': '北京', 'position': 10}
# ]

# Reset for new document
resolver.reset()

Stream Processing / 流式处理

from coreference_module import StreamCorefSession

session = StreamCorefSession()

# Process sentences one by one
result1, _ = session.add_sentence("张三是医生。")
result2, reps = session.add_sentence("他很忙。")

print(result2)  # 张三很忙。

# Reset for new conversation
session.reset()

Structured Output / 结构化输出

resolver = CoreferenceResolver()
output = resolver.resolve_text_structured("妈妈买了苹果。她说它很甜。")

print(output.resolved_text)      # 妈妈买了苹果。妈妈说苹果很甜。
print(output.replacements)       # List of replacement details
print(output.mentions)           # All detected mentions
print(output.time_extractions)   # Normalized time expressions

NER Extraction / 命名实体识别

from coreference_module import NERService

ner = NERService()
result = ner.extract("小明在北京大学读书")

print(result.PER)  # ['小明']
print(result.LOC)  # ['北京大学']

Time Normalization / 时间归一化

from coreference_module import normalize_time
from datetime import datetime

ref_date = datetime(2026, 2, 7)
time_span = normalize_time("昨天", ref_date)

print(time_span.start_dt)   # 2026-02-06 00:00:00
print(time_span.end_dt)     # 2026-02-07 00:00:00
print(time_span.precision)  # DAY

📊 Supported Pronoun Types / 支持的代词类型

Type	Examples	Description
Person	他、她、他们	Third-person pronouns
Possessive	他的、她的、它的	Possessive pronouns
Object	它、它们	Object/inanimate pronouns
Location	这里、那里、那边	Location pronouns
Time	那时候、当时	Temporal pronouns
Ordinal	前者、后者	Ordinal pronouns
Event	这件事、此事	Event reference pronouns
Formal Deictic	该、上述	Formal/written deixis
Ambiguous	这个、那个	Context-dependent pronouns

❌ Non-Resolution Cases / 不消解的情况

The system correctly preserves (does not resolve) these pronouns:

Case	Example	Reason
First person	我、我们	Speaker reference
Second person	你、您	Listener reference
Reflexive	自己、本人	Self-reference
Generic	人家、别人	Generic reference
Bound variable	每个学生都带了他的书	Quantifier-bound
First sentence	他很高。	No antecedent available

📁 Project Structure / 项目结构

coreference/
├── coreference_module/      # Core Chinese coreference resolution
│   ├── coreference.py       # Main resolver class
│   ├── tokenizer.py         # Chinese tokenizer with NER
│   ├── syntax_adapter.py    # HanLP/LTP adapter
│   ├── time_normalizer.py   # Time expression normalization
│   ├── canonicalizer.py     # Entity canonicalization
│   └── ner_adapter.py       # NER service adapter
├── english_coreference/     # English coreference module
├── tests/                   # Test suite (85 tests)
├── configs/                 # Configuration files
├── pyproject.toml           # Project configuration
├── LICENSE                  # MIT License
└── README.md

🧪 Testing / 测试

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=coreference_module --cov-report=html

Current status: 85 tests, 100% passing

📈 Performance / 性能

Metric	Value
Test Cases	85
Pass Rate	100%
Pronoun Types	11
Branch Coverage (`_find_replacement`)	100%

🔧 Configuration / 配置

The system works out of the box with sensible defaults. For advanced usage, you can configure:

resolver = CoreferenceResolver()
resolver.reset()  # Reset resolver state for a new document

🌐 English Support / 英文支持

from english_coreference import CoreferenceResolver as EnglishResolver

resolver = EnglishResolver()
text = "John went home. He was tired."
resolved, _ = resolver.resolve_text(text)

print(resolved)  # John went home. John was tired.

📄 License / 许可证

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing / 贡献

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📚 Citation / 引用

If you use this project in your research, please cite:

@software{chinese_coref,
  title = {Chinese Coreference Resolution System},
  author = {Junting Hua},
  year = {2026},
  url = {https://github.com/FlowElement-ai/m_flow}
}

📧 Contact / 联系

GitHub Issues: Issues
Email: contact@xinliuyuansu.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coreference Resolution Module

✨ Features / 特点

📦 Installation / 安装

From Source / 源码安装

With Optional Dependencies / 安装可选依赖

🚀 Quick Start / 快速开始

Basic Usage / 基础用法

Stream Processing / 流式处理

Structured Output / 结构化输出

NER Extraction / 命名实体识别

Time Normalization / 时间归一化

📊 Supported Pronoun Types / 支持的代词类型

❌ Non-Resolution Cases / 不消解的情况

📁 Project Structure / 项目结构

🧪 Testing / 测试

📈 Performance / 性能

🔧 Configuration / 配置

🌐 English Support / 英文支持

📄 License / 许可证

🤝 Contributing / 贡献

📚 Citation / 引用

📧 Contact / 联系

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Coreference Resolution Module

✨ Features / 特点

📦 Installation / 安装

From Source / 源码安装

With Optional Dependencies / 安装可选依赖

🚀 Quick Start / 快速开始

Basic Usage / 基础用法

Stream Processing / 流式处理

Structured Output / 结构化输出

NER Extraction / 命名实体识别

Time Normalization / 时间归一化

📊 Supported Pronoun Types / 支持的代词类型

❌ Non-Resolution Cases / 不消解的情况

📁 Project Structure / 项目结构

🧪 Testing / 测试

📈 Performance / 性能

🔧 Configuration / 配置

🌐 English Support / 英文支持

📄 License / 许可证

🤝 Contributing / 贡献

📚 Citation / 引用

📧 Contact / 联系