Reasoning and Knowledge

Can schema-free graphs objectively evaluate open-ended search?

Can a directed graph with no preset structure capture the complexity of real search outputs while still enabling objective, fine-grained evaluation? This matters because existing evaluation methods trade objectivity for rigidity or richness for subjectivity.

Note · 2026-05-28 · sourced from Deep Research

Open-ended search evaluation faces a dilemma. Fixed-schema scoring — against items, sets, or tables — is objective and stable but cannot represent the complex, irregular knowledge structures real search produces. Free-text evaluation captures that richness but requires rubric design that is subjective and unstable. VibeSearchBench's resolution is a schema-free ground-truth knowledge graph: a directed graph carries no preset structure, so it can model arbitrary relationships relevant to the search intent, yet because it is a graph it still supports fine-grained, objectively verifiable matching. Each task pairs a user persona with such a graph and is scored through a graph-matching framework, escaping both horns of the dilemma.

The pattern generalizes beyond search: whenever the target output is structured but its structure cannot be fixed in advance, a graph ground truth plus graph-matching evaluation offers objectivity without rigidity. The cost is that constructing high-quality ground-truth graphs is labor-intensive — VibeSearchBench's 200 tasks were manually curated — and graph-matching introduces its own scoring choices. The counterpoint is that even with this method the best model reaches only 30.30 F1, partly because models produce structurally flat graphs; the evaluation is demanding precisely because it is faithful. This matters because it provides a reusable template for evaluating any open-ended generation task whose correct answer is a web of relations rather than a list.


— "VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild", https://arxiv.org/abs/2605.27882

Related concepts in this collection

Concept map
14 direct connections · 119 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

a schema-free ground-truth knowledge graph enables objective evaluation of open-ended search