TY - GEN
T1 - CP-Search
T2 - 40th AAAI Conference on Artificial Intelligence, AAAI 2026
AU - Wang, Zehua
AU - Li, Shipeng
AU - Tang, Buzhou
N1 - Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2026
Y1 - 2026
N2 - Retrieval-Augmented Generation (RAG) has been demonstrated to effectively mitigate the knowledge recency issue in Large Language Models (LLMs) while significantly reducing hallucinations. However, existing RAG methods exhibit insufficient capability in modeling reasoning paths for complex multi-hop reasoning tasks. While Reinforcement Learning (RL) has demonstrated success in enhancing model reasoning ability, Token-level RL frameworks exhibit inherent limitations in maintaining coherent reasoning trajectories. This approach remains susceptible to the compounding accumulation of contextual errors during the retrieval process, ultimately resulting in erroneous output generation. To address this challenge, we propose Chain Progressive Search (CP-Search), a novel two-stage training framework designed to enhance the model’s retrieval capability in complex scenarios. This framework models the entire retrieval process as a Retrieval-level Markov Decision Process, systematically optimizing the model’s retrieval behavior at each step of the chained retrieval. Specifically, CP-Search first constructs a retrieval-cognitive behavioral dataset and employs Supervised Fine-Tuning (SFT) to endow the model with cognitive behaviors for searching. More importantly, by introducing a dense progressive procedural reward in reinforcement learning training, CP-Search significantly improves the model’s reasoning consistency and feedback correction ability in chained retrieval. Experiments conducted on multiple multi-hop datasets demonstrate that CP-Search significantly outperforms existing RAG methods in complex multi-hop reasoning tasks.
AB - Retrieval-Augmented Generation (RAG) has been demonstrated to effectively mitigate the knowledge recency issue in Large Language Models (LLMs) while significantly reducing hallucinations. However, existing RAG methods exhibit insufficient capability in modeling reasoning paths for complex multi-hop reasoning tasks. While Reinforcement Learning (RL) has demonstrated success in enhancing model reasoning ability, Token-level RL frameworks exhibit inherent limitations in maintaining coherent reasoning trajectories. This approach remains susceptible to the compounding accumulation of contextual errors during the retrieval process, ultimately resulting in erroneous output generation. To address this challenge, we propose Chain Progressive Search (CP-Search), a novel two-stage training framework designed to enhance the model’s retrieval capability in complex scenarios. This framework models the entire retrieval process as a Retrieval-level Markov Decision Process, systematically optimizing the model’s retrieval behavior at each step of the chained retrieval. Specifically, CP-Search first constructs a retrieval-cognitive behavioral dataset and employs Supervised Fine-Tuning (SFT) to endow the model with cognitive behaviors for searching. More importantly, by introducing a dense progressive procedural reward in reinforcement learning training, CP-Search significantly improves the model’s reasoning consistency and feedback correction ability in chained retrieval. Experiments conducted on multiple multi-hop datasets demonstrate that CP-Search significantly outperforms existing RAG methods in complex multi-hop reasoning tasks.
UR - https://www.scopus.com/pages/publications/105034848846
U2 - 10.1609/aaai.v40i40.40666
DO - 10.1609/aaai.v40i40.40666
M3 - 会议稿件
AN - SCOPUS:105034848846
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 33755
EP - 33763
BT - Proceedings of the AAAI Conference on Artificial Intelligence
A2 - Koenig, Sven
A2 - Jenkins, Chad
A2 - Taylor, Matthew E.
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 January 2026 through 27 January 2026
ER -