Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback

Paper · arXiv 2508.10795 · Published August 14, 2025
Co Writing CollaborationDeep ResearchDomain Specialization

Novelty assessment is a central yet understudied aspect of peer review, particularly in highvolume fields like NLP where reviewer capacity is increasingly strained. We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages: content extraction from submissions, retrieval and synthesis of related work, and structured comparison for evidence-based assessment. Our method is informed by a large-scale analysis of humanwritten novelty reviews and captures key patterns such as independent claim verification and contextual reasoning. Evaluated on 182 ICLR 2025 submissions with human annotated reviewer novelty assessments, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions—substantially outperforming existing LLM-based baselines. The method produces detailed, literature-aware analyses and improves consistency over ad hoc reviewer judgments. These results highlight the potential for structured LLM-assisted approaches to support more rigorous and transparent peer review without displacing human expertise.

Among peer review tasks, novelty assessment stands out as one of the problematic (Ernst et al., 2020) (Horbach and Halffman, 2018). Novelty assessment requires reviewers to determine whether a submission makes sufficiently original contributions by identifying what specific advances it makes beyond existing work, evaluating whether these advances are significant enough to warrant publication, and verifying that the authors have accurately characterized their contributions relative to prior research. This knowledge-intensive process demands that reviewers maintain comprehensive awareness of related work across their field and can precisely distinguish between meaningful innovations and incremental modifications—a task that becomes exponentially more difficult as publication rates accelerate and research domains specialize. Overwhelmed reviewers often resort to superficial analyses, producing vague feedback like "not novel enough" without clear justification.

The challenge compounds when reviewers encounter papers outside their specific expertise, leading to either overly conservative rejections or inadequate assessments that fail to catch incremental work (Kuznetsov et al., 2024). Recent advances in large language models present an unprecedented opportunity to address these novelty assessment challenges at scale. These breakthrough technologies have revolutionized text processing and demonstrated remarkable performance across knowledge-intensive tasks (Raiaan et al., 2024), with recent technical advancements expanding capabilities to specialized reasoning and efficient inference (Li et al., 2024a; Zhang et al., 2025).

While recent LLM advances create this opportunity, no existing work specifically addresses novelty assessment as a dedicated task within the peer review process. Prior research incorporates novelty evaluation within idea generation pipelines (Radensky et al., 2025; Lu et al., 2024; Li et al., 2024b), generates peer reviews with novelty assessments occuring as a result of them existing in peer reviews from training data (Idahl and Ahmadi, 2025; D’Arcy et al., 2024), or adds novelty assessment steps to review synthesis pipelines for improvement (Zhu et al., 2025). However, these approaches either operate on synthetic ideas rather than real research contributions or fail to evaluate novelty assessment capabilities in isolation. This represents a critical gap requiring specialized methodologies for peer review novelty assessment.

To address this gap, we propose an end-to-end novelty assessment pipeline for peer review submissions. Our approach consists of three stages: document processing and content extraction, related work retrieval and ranking, and structured novelty assessment. The final stage implements four sequential steps: novelty related content selection from the submission pdf, building comprehensive understanding of related work from retrieved papers, comparing claimed novelty against the comprehensive analysis from the prior step, and generating a summary with cited evidence from the comparison. This pipeline operates on real research papers and directly evaluates novelty assessment capabilities, addressing the limitations of existing approaches. Importantly, we conduct the first evaluation of LLMs for novelty assessment using actual human data, including annotated novelty assessment statements, and provide comprehensive evaluation across multiple dimensions. This pipeline operates on real research papers and directly evaluates novelty assessment capabilities, addressing the limitations of existing approaches.