Skip to content

Commit 28da71b

Browse files
docs: update README
1 parent 67281d5 commit 28da71b

File tree

2 files changed

+27
-14
lines changed

2 files changed

+27
-14
lines changed

README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture co
5151
After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) and [xtuner](https://github.com/InternLM/xtuner) to finetune your LLMs.
5252

5353
## 📌 Latest Updates
54+
- 🎉 **2026.04.13**: The paper based on GraphGen, *Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains*, has been accepted to the **ACL 2026** Main Conference! Congratulations!
5455
- **2026.02.04**: We support HuggingFace Datasets as input data source for data generation now.
5556
- **2026.01.15**: **LLM benchmark synthesis** now supports single/multiple-choice & fill-in-the-blank & true-or-false—ideal for education 🌟🌟
5657
- **2025.12.26**: Knowledge graph evaluation metrics about accuracy (entity/relation), consistency (conflict detection), structural robustness (noise, connectivity, degree distribution)
@@ -79,7 +80,7 @@ Inspired by Kimi-K2's [technical report](https://arxiv.org/pdf/2507.20534) (Impr
7980

8081
**Setup:** Qwen3-0.6B trained from scratch on [SlimPajama-6B](https://huggingface.co/datasets/DKYoon/SlimPajama-6B).
8182

82-
| Method | ARC-E | ARC-C | HellaSwag | GSM8K | TruthfulQA-MC1 | TruthfulQA-MC2 | **Average** |
83+
| Method | [ARC-E](https://allenai.org/data/arc) | [ARC-C](https://allenai.org/data/arc) | [HellaSwag](https://rowanzellers.com/hellaswag/) | [GSM8K](https://github.com/openai/grade-school-math) | [TruthfulQA-MC1](https://github.com/sylinrl/TruthfulQA) | [TruthfulQA-MC2](https://github.com/sylinrl/TruthfulQA) | **Average** |
8384
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
8485
| SlimPajama-6B trained for 2 epochs | 25.55 | 21.08 | 24.48 | 0.08 | 24.36 | 49.90 | 24.24 |
8586
| SlimPajama-6B + Executive-Summary Rephrase trained for 1 epoch | 26.43 | **22.70** | **24.75** | **1.36** | **26.19** | 51.90 | **25.56**(↑1.32) |
@@ -94,19 +95,19 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
9495
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
9596
|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
9697
| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
97-
| Common | CMMLU | 73.6 | **75.8** |
98-
| Knowledge | GPQA-Diamond | **40.0** | 33.3 |
99-
| Math | AIME24 | **20.6** | 16.7 |
100-
| | AIME25 | **22.7** | 7.2 |
98+
| Common | [CMMLU](https://github.com/haonan-li/CMMLU) | 73.6 | **75.8** |
99+
| Knowledge | [GPQA-Diamond](https://github.com/idavidrein/gpqa) | **40.0** | 33.3 |
100+
| Math | [AIME24](https://artofproblemsolving.com/wiki/index.php/2024_AIME_I) | **20.6** | 16.7 |
101+
| | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** | 7.2 |
101102

102103
### RLVR
103-
We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
104+
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs). However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. By leveraging **GraphGen** for automated verifiable data synthesis, we extend RLVR to these broader domains. We applied reinforcement learning directly to the Qwen2.5-7B base model using the synthesized data without any prior SFT. Here are the results.
104105
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
105106
|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
106107
| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 |
107-
| Law | LawBench | **55.2** | 54.76 |
108-
| Medicine | MedQA | **87.1** | 80.7 |
109-
| General | BBH | **55.3** | 49.6 |
108+
| Law | [LawBench](https://github.com/open-compass/LawBench) | **55.2** | 54.76 |
109+
| Medicine | [MedQA](https://github.com/jind11/MedQA) | **87.1** | 80.7 |
110+
| General | [BBH](https://github.com/suzgunmirac/BIG-Bench-Hard) | **55.3** | 49.6 |
110111

111112
More details can be found at [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa).
112113

README_zh.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
5353
在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)[xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。
5454

5555
## 📌 最新功能
56+
- 🎉 **2026.04.13**:基于 GraphGen 的论文 *Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains* 已被 **ACL 2026** 主会录取!恭喜!
5657
- **2026.02.04**:支持使用直接读入 HuggingFace 数据集进行数据生成
5758
- **2026.01.15**:合成垂域评测数据(单选题、多选题、填空题和判断题型)🌟🌟
5859
- **2025.12.26**:引入知识图谱评估指标,包括准确度评估(实体/关系抽取质量)、一致性评估(冲突检测)和结构鲁棒性评估(噪声比、连通性、度分布)
@@ -82,7 +83,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
8283

8384
**实验设置:** 使用 Qwen3-0.6B 模型,基于 [SlimPajama-6B](https://huggingface.co/datasets/DKYoon/SlimPajama-6B) 数据集从头训练。
8485

85-
| 方法 | ARC-E | ARC-C | HellaSwag | GSM8K | TruthfulQA-MC1 | TruthfulQA-MC2 | **平均值** |
86+
| 方法 | [ARC-E](https://allenai.org/data/arc) | [ARC-C](https://allenai.org/data/arc) | [HellaSwag](https://rowanzellers.com/hellaswag/) | [GSM8K](https://github.com/openai/grade-school-math) | [TruthfulQA-MC1](https://github.com/sylinrl/TruthfulQA) | [TruthfulQA-MC2](https://github.com/sylinrl/TruthfulQA) | **平均值** |
8687
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
8788
| SlimPajama-6B 训练 2 epoch | 25.55 | 21.08 | 24.48 | 0.08 | 24.36 | 49.90 | 24.24 |
8889
| SlimPajama-6B + Executive-Summary Rephrase 训练 1 epoch | 26.43 | **22.70** | **24.75** | **1.36** | **26.19** | 51.90 | **25.56**(↑1.32) |
@@ -97,10 +98,21 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
9798
| 领域 | 数据集 | 我们的方案 | Qwen2.5-7B-Instruct(基线) |
9899
|:--:|:---------------------------------------------------------:|:--------:|:-----------------------:|
99100
| 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **65.9** | 51.5 |
100-
| 常识 | CMMLU | 73.6 | **75.8** |
101-
| 知识 | GPQA-Diamond | **40.0** | 33.3 |
102-
| 数学 | AIME24 | **20.6** | 16.7 |
103-
| | AIME25 | **22.7** | 7.2 |
101+
| 常识 | [CMMLU](https://github.com/haonan-li/CMMLU) | 73.6 | **75.8** |
102+
| 知识 | [GPQA-Diamond](https://github.com/idavidrein/gpqa) | **40.0** | 33.3 |
103+
| 数学 | [AIME24](https://artofproblemsolving.com/wiki/index.php/2024_AIME_I) | **20.6** | 16.7 |
104+
| | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** | 7.2 |
105+
106+
### RLVR
107+
带有可验证奖励的强化学习(RLVR)在提升大语言模型(LLM)的推理能力方面展现了巨大的潜力。然而,由于缺乏高质量的可验证数据,其在知识密集型领域的应用尚未得到有效探索。通过利用 **GraphGen** 进行自动化的可验证数据合成,我们将 RLVR 扩展到了这些更广泛的领域。我们使用合成的数据,直接在 Qwen2.5-7B base 模型上进行强化学习,且不使用任何前置 SFT。结果如下:
108+
| 领域 | 数据集 | 我们的方案 | Qwen2.5-7B-Instruct(基线) |
109+
|:--:|:---------------------------------------------------------:|:--------:|:-----------------------:|
110+
| 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 |
111+
| 法律 | [LawBench](https://github.com/open-compass/LawBench) | **55.2** | 54.76 |
112+
| 医学 | [MedQA](https://github.com/jind11/MedQA) | **87.1** | 80.7 |
113+
| 通用 | [BBH](https://github.com/suzgunmirac/BIG-Bench-Hard) | **55.3** | 49.6 |
114+
115+
更多细节见 [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa)
104116

105117
## ⚙️ 支持列表
106118

0 commit comments

Comments
 (0)