Generative Interfaces for Language Models

Paper · arXiv 2508.19227 · Published August 26, 2025
Design FrameworksTasks PlanningWork Application Use Cases

Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with humans preferring them in over 70% of cases. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human- AI interaction. Data and code are available at https://github.com/SALT-NLP/GenUI.

A longstanding goal in computing is to design systems that not only respond to users but also adapt by dynamically reshaping interfaces to facilitate users’ interaction and help them achieve their goals (Apple Inc., 1987; Lyytinen & Yoo, 2002). While recent advances in large language models (LLMs) have brought us closer to this vision by enabling flexible natural language understanding, the dominant interaction paradigm, which we call the conversational UI, remains static and linear: most LLM outputs are still rendered as long blocks of text, regardless of task complexity or user preference, limiting the model’s ability to support the diverse ways users seek to learn, explore, and interact. At the same time, state-of-the-art LLMs have shown remarkable capabilities in automatically generating high-quality, functional webpages from sketches, queries, or natural language descriptions (Si et al., 2024; Li et al., 2024; Xiao et al., 2024). Together, these developments raise an exciting research question: How can LLMs go beyond conversational interfaces to enable adaptive, goal-driven interactions that meaningfully serve human needs?

In this work, we introduce Generative Interfaces, a new paradigm that differs from conversational UIs. Rather than delivering static text responses within a predefined chatbot window, Generative Interfaces dynamically create entirely new interface structures that adapt to users’ specific goals and interaction requirements. While recent tools like OpenAI’s Canvas and Claude’s Artifacts enhance user interaction by providing dedicated workspaces for documents, code, and visualizations, our approach extends this vision by supporting deeper engagement and enabling richer, taskspecific experiences. For example, as shown in Figure 1, when users pose questions such as “I want to understand neural networks” or “How can I learn piano effectively?”, conversational interfaces typically return long blocks of text. In contrast, Generative Interfaces transform these queries into an interactive neural network animation or a piano practice tool that offers real-time feedback. This paradigm shift presents two key challenges: (1) building the infrastructure to generate user interfaces on the fly in response to users’ queries, and (2) rigorously evaluating whether such generated interfaces actually improve user experience.

To address the first challenge, our framework introduces a structured interface-specific representation coupled with an iterative refinement procedure. The structured representation enables more controllable and interpretable generation by explicitly modeling high-level interaction flows, interface state transitions, and component dependencies, which we formalize using finite state machines (Shehady & Siewiorek, 1997; Wagner et al., 2006). The iterative refinement procedure further enhances output quality by prompting LLMs to generate query-specific evaluation rubrics and repeatedly refine interface candidates through generation-evaluation cycles until the system converges on a polished, context-appropriate solution. To address the second challenge, we establish a systematic evaluation framework for assessing language model interfaces across three key dimensions: functionality, interactivity, and emotional perception (Hartmann et al., 2008; Nielsen et al., 2012; Duan, 2025). Specifically, we construct a diverse prompt suite, User Interface eXperience (UIX), that strategically covers diverse domains and prompt types to reflect real-world usage scenarios (Tamkin et al., 2024). For each user query, we recruit experienced annotators to interact with different interfaces and conduct pairwise comparisons. This comprehensive evaluation not only demonstrates the superior performance of generative interfaces but also reveals when they excel (in structured and information-dense domains) and why users prefer them (through enhanced visual organization, interactivity, and reduced cognitive load).

Our main contributions are as follows:

• We propose Generative Interfaces, a paradigm that enables adaptive, goal-driven interactions with LLMs by dynamically generating user interfaces.

• We develop a technical infrastructure with structured representations and iterative refinement, and an evaluation framework that systematically compares generative and conversational interfaces.

• We demonstrate that generative interfaces significantly outperform conversational ones across diverse query types and interaction patterns.