Skip to content

Commit 83f55b2

Browse files
docs: update README
1 parent 1c70362 commit 83f55b2

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
101101
| | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** | 7.2 |
102102

103103
### RLVR
104-
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs). However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. By leveraging **GraphGen** for automated verifiable data synthesis, we extend RLVR to these broader domains. We applied reinforcement learning directly to the Qwen2.5-7B base model using the synthesized data without any prior SFT. Here are the results.
104+
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs). However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. By leveraging **GraphGen** for automated verifiable data synthesis, we extend RLVR to these broader domains. We applied reinforcement learning directly to the Qwen2.5-7B base model using the synthesized data without any prior SFT ([Code](https://github.com/superfarther/K2V)). Here are the results.
105105
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
106106
|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
107107
| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 |

README_zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
104104
| | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** | 7.2 |
105105

106106
### RLVR
107-
带有可验证奖励的强化学习(RLVR)在提升大语言模型(LLM)的推理能力方面展现了巨大的潜力。然而,由于缺乏高质量的可验证数据,其在知识密集型领域的应用尚未得到有效探索。通过利用 **GraphGen** 进行自动化的可验证数据合成,我们将 RLVR 扩展到了这些更广泛的领域。我们使用合成的数据,直接在 Qwen2.5-7B base 模型上进行强化学习,且不使用任何前置 SFT。结果如下:
107+
带有可验证奖励的强化学习(RLVR)在提升大语言模型(LLM)的推理能力方面展现了巨大的潜力。然而,由于缺乏高质量的可验证数据,其在知识密集型领域的应用尚未得到有效探索。通过利用 **GraphGen** 进行自动化的可验证数据合成,我们将 RLVR 扩展到了这些更广泛的领域。我们使用合成的数据,直接在 Qwen2.5-7B base 模型上进行强化学习,且不使用任何前置 SFT ([代码](https://github.com/superfarther/K2V))。结果如下:
108108
| 领域 | 数据集 | 我们的方案 | Qwen2.5-7B-Instruct(基线) |
109109
|:--:|:---------------------------------------------------------:|:--------:|:-----------------------:|
110110
| 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 |

0 commit comments

Comments
 (0)