A Decomposition Perspective to Long-context Reasoning for LLMs

Paper · arXiv 2604.07981 · Published April 9, 2026

Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model’s atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7% (improving from 46.3% to 54.0%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.

In this paper, we propose a paradigm shift from a monolithic view of long-context reasoning to a decomposition perspective. We argue that, from a cognitive standpoint, long-context reasoning is a hierarchical spectrum composed of fundamental atomic skills. For instance, as illustrated in Figure 1, the task of calculating Sanofi’s revenue share growth cannot be solved by simple retrieval. Instead, it necessitates Global Integration to synthesize distributed financial data across different reporting periods (e.g., aggregating figures from H1 2024 and H1 2023), followed by Dynamic State Tracking to execute multi-step reasoning—holding these intermediate values in memory to compute the final percentage increase. We decompose long-context reasoning into five atomic skills including Foundational Retrieval, Anti-Interference, Global Integration, Relational Reasoning, and Dynamic State Tracking (§2.1). Unlike the complex long-context reasoning task, each atomic task is comparatively straightforward; thus, we can relatively easily curate training data for each atomic skill through an anchor-based automatic pipeline with verification (§2.2). Our empirical experiments further demonstrate that these atomic skills are strongly correlated with long-context reasoning skill (§3).

This finding indicates that enhancing these atomic skills of LLMs can ultimately boost their long-text reasoning performance. Based on the curated datasets for these atomic skills, we then present a highly efficient training strategy: we employ Reinforcement Learning (RL) (Shao et al., 2024; Yu et al., 2025) in the curated set of approximately 4,000 synthetic samples generated through our pipeline. This targeted approach sharpens the model’s atomic capabilities, enabling it to generalize to complex, unseen long-context reasoning tasks. Experimental results on six challenging benchmarks— including Loogle (Li et al., 2024b), Loong (Wang et al., 2024) and LongBench v2 (Bai et al., 2025)—show that our approach significantly improves performance, outperforming strong baselines such as DeepSeek-R1-distill- 32B (DeepSeek, 2025a) by an average margin of 7.7% (improved from 46.3% to 54.0%) (§4).