docs: update README

ChenZiHong-Gavin · ChenZiHong-Gavin · commit 83f55b29b6ef · 2026-04-13T19:48:10.000+08:00
diff --git a/README.md b/README.md
@@ -101,7 +101,7 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
 |           | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** |              7.2               |
 
 ### RLVR
-Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs). However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. By leveraging **GraphGen** for automated verifiable data synthesis, we extend RLVR to these broader domains. We applied reinforcement learning directly to the Qwen2.5-7B base model using the synthesized data without any prior SFT. Here are the results.
+Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs). However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. By leveraging **GraphGen** for automated verifiable data synthesis, we extend RLVR to these broader domains. We applied reinforcement learning directly to the Qwen2.5-7B base model using the synthesized data without any prior SFT ([Code](https://github.com/superfarther/K2V)). Here are the results. 
 |  Domain   |                          Dataset                          |   Ours   | Qwen2.5-7B-Instruct (baseline) |
 |:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
 |   Plant   | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** |              51.5              |
diff --git a/README_zh.md b/README_zh.md
@@ -104,7 +104,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱，然后利用期
 |    | [AIME25](https://artofproblemsolving.com/wiki/index.php/2025_AIME_I) | **22.7** |           7.2           |
 
 ### RLVR
-带有可验证奖励的强化学习（RLVR）在提升大语言模型（LLM）的推理能力方面展现了巨大的潜力。然而，由于缺乏高质量的可验证数据，其在知识密集型领域的应用尚未得到有效探索。通过利用 **GraphGen** 进行自动化的可验证数据合成，我们将 RLVR 扩展到了这些更广泛的领域。我们使用合成的数据，直接在 Qwen2.5-7B base 模型上进行强化学习，且不使用任何前置 SFT。结果如下：
+带有可验证奖励的强化学习（RLVR）在提升大语言模型（LLM）的推理能力方面展现了巨大的潜力。然而，由于缺乏高质量的可验证数据，其在知识密集型领域的应用尚未得到有效探索。通过利用 **GraphGen** 进行自动化的可验证数据合成，我们将 RLVR 扩展到了这些更广泛的领域。我们使用合成的数据，直接在 Qwen2.5-7B base 模型上进行强化学习，且不使用任何前置 SFT ([代码](https://github.com/superfarther/K2V))。结果如下：
 | 领域 |                            数据集                            |  我们的方案   | Qwen2.5-7B-Instruct（基线） |
 |:--:|:---------------------------------------------------------:|:--------:|:-----------------------:|
 | 植物 | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** |          51.5           |