Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Paper · arXiv 2505.13792 · Published May 20, 2025

Since smaller language models (SLMs) are computationally more efficient but often under-perform compared to larger models, Knowledge Distillation (KD) methods allow for finetuning these smaller models to improve their final performance. Lately, the intermediate tokens or the so called ‘reasoning’ traces produced by Chain-of-Thought (CoT) or by reasoning models such as DeepSeek R1 are used as a training signal for KD. However, these reasoning traces are often verbose and difficult to interpret or evaluate. In this work, we aim to address the challenge of evaluating the faithfulness of these reasoning traces and their correlation with the final performance. To this end, we employ a KD method leveraging rule-based problem decomposition. This approach allows us to break down complex queries into structured sub-problems, generating interpretable traces whose correctness can be readily evaluated, even at inference time. Specifically, we demonstrate this approach on Open Book QA, decomposing the problem into a Classification step and an Information Retrieval step, thereby simplifying trace evaluation.

We would ideally want a setting where we can not only evaluate the final solution but also these intermediate traces, which is not possible in the case of CoT or R1 generated traces. Such a setting will allow us to generate traces with controlled content and structure for distillation, facilitating evaluation of trace accuracy and their correlation with the final solution. To this end, we employ an knowledge distillation method for finetuning smaller language models. Inspired by existing approaches [18, 19], we adopt a problem decomposition technique to first break down the queried problem into sub-problems. We can then obtain a solution for these sub-problems that can be individually evaluated and together utilized as the reasoning trace for knowledge distillation. At inference, this allows us to verify the correctness of both the final solution and the intermediate traces generated by the distilled model.

Specifically, we look at Open Book Question-Answering (QA with access to external knowledge) domains where we decompose the problem into 1) identifying the type of task posed by the question (Classification step), and 2) identifying the relevant set of knowledge (or facts) to answer the query (Information Retrieval (IR) step). Classification and IR sub-tasks mirror the explicit reasoning stages humans use in QA (e.g., “Is this a factoid question?”→“What facts are relevant?”), making traces inherently auditable. By constraining traces to predefined sub-tasks (e.g., classification + IR in our case), we ensure each step aligns with verifiable reasoning sub-problems therefore simplifying evaluation. To critically understand the correlation between the performance of these intermediate traces and the final solution, we design two Supervised Fine-Tuning (SFT) experiments on Llama-3.2-1B-Instruct and the Qwen3-1.7B chat models. We first finetune these models using data that consists of both correct intermediate traces and correct solutions.