Design & LLM Interaction Language Understanding and Pragmatics LLM Reasoning and Architecture

Why do LLMs excel at feasible design but struggle with novelty?

When LLMs generate conceptual product designs, they produce more implementable and useful solutions than humans but fewer novel ones. This explores why domain constraints flip the novelty advantage seen in research ideation.

Note · 2026-02-23 · sourced from Design Frameworks

Expert evaluation of LLM-generated conceptual design solutions compared to crowdsourced ones reveals a profile that INVERTS the research ideation finding:

Feasibility: LLMs higher (solutions are more technically implementable)
Usefulness: LLMs higher (solutions are more relevant to the design prompt)
Novelty: LLMs lower (solutions are less unique relative to the existing design space)

Few-shot learning further constrains: it makes LLM solutions more similar to crowdsourced examples (improving quality alignment) but reduces the diversity of solutions the LLM can generate.

This inverts Why do LLMs generate more novel research ideas than experts?, where LLM research ideas were rated MORE novel but LESS feasible than human expert ideas. The critical variable is domain structure:

Unconstrained domains (research ideation): LLMs generate without the expert constraints that limit human novelty → MORE novel, LESS feasible
Constrained domains (conceptual design): feasibility constraints and evaluation criteria push LLMs toward safe, implementable solutions → MORE feasible, LESS novel

The pattern suggests that Can LLMs generate more novel ideas than human experts? — in design, the evaluation criteria are embedded in the prompt (feasibility, usefulness ratings), channeling generation toward conservative solutions. In research, evaluation criteria are absent from the prompt, allowing unconstrained generation.

The few-shot finding connects to How much does demo position alone affect in-context learning accuracy? — examples constrain not just accuracy but creative scope. Each example narrows the generative space.

The Pron vs Prompt contest (2024) provides complementary evidence from creative writing specifically. In a direct contest between Patricio Pron (an award-winning novelist) and GPT-4, evaluated by literature critics and scholars using a Boden-inspired creativity rubric across 5,400 manual assessments, "LLMs are still far from challenging a top human creative writer." The authors conclude that "reaching such level of autonomous creative writing skills probably cannot be reached simply with larger language models." This extends the feasible-not-novel pattern beyond design: LLMs generate competent but uncreative output across both design and literary domains. Source: Arxiv/Prompts Prompting.

Source: Design Frameworks

Related concepts in this collection

Why do LLMs generate more novel research ideas than experts? LLM-generated research ideas are statistically more novel than those from 100+ expert researchers, but the mechanisms behind this advantage and its practical implications remain unclear. Understanding this paradox could reshape how we use AI in creative knowledge work.
inverted in constrained design domains
Can LLMs generate more novel ideas than human experts? Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
domain structure determines which side of the dissociation dominates
Why do LLMs generate novel ideas from narrow ranges? LLM research agents produce individually novel ideas but cluster them in homogeneous sets. This explores why high average novelty coexists with poor diversity coverage and what it means for automated ideation.
diversity collapse occurs in both domains but through different mechanisms
How much does demo position alone affect in-context learning accuracy? Moving demonstrations from prompt start to end without changing their content produces surprisingly large accuracy swings. Does spatial position in the prompt matter more than what demonstrations actually contain?
few-shot constrains creative scope
Why does AI writing sound generic despite being grammatically correct? Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
the design domain inversion may be a grammar-rhetoric gap manifestation: in constrained design, LLMs produce structurally sound (grammatically competent) but evaluatively conservative (rhetorically inert) solutions; the same absence of evaluative stance-taking that makes academic writing generic makes design solutions feasible but unoriginal

Concept map

15 direct connections · 109 in 2-hop network ·medium cluster

Why do LLMs excel at feasible design but struggl… Why do LLMs generate more novel research ideas tha… Can LLMs generate more novel ideas than human expe… Why do LLMs generate novel ideas from narrow range… How much does demo position alone affect in-contex… Why does AI writing sound generic despite being gr…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

LLMs generate more feasible and useful but less novel conceptual design solutions than humans — few-shot learning decreases diversity further