Task Planning

Topic · 63 papers

Related topics:

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems
To study customer service dialogue systems in more realistic settings, we introduce the Action-Based Conversations Dataset (ABCD), a fully-labeled dataset with over 10K human-to-human dialogues contai…
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either …
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achiev…
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models
We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blu…
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental a…
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects a…
Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface de…
Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued …
Can Large Language Models Reason and Plan?
Their seeming versatility has however led many researchers to wonder whether they can also do well on planning and reasoning tasks typically associated with System 2 competency. Nothing in the traini…
Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems
we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited bet…
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (…
Conversational Semantic Parsing for Dialog State Tracking
We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical repre…
DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
Real-time human-artificial intelligence (AI) collaboration is crucial yet challenging, especially when AI agents must adapt to diverse and unseen human behaviors in dynamic scenarios. Existing large l…
Decision-Oriented Dialogue for Human–AI Collaboration
“All these situations share an underlying structured decision problem in the face of uncertainty, where communicating and collaborating with others is often critical to arrive at the best solution. D…
Dialogue Transformers
Conversational AI assistants promise to help users achieve a task through natural language. Interpreting simple instructions like please turn on the lights is relatively straightforward, but to handle…
Dynamic Planning with a LLM
Conversely, traditional symbolic planners, such as the Fast-Forward planner (Hoffmann and Nebel, 2001) or the BFS(f) planner (Lipovetzky et al., 2014), excel at finding optimal plans efficiently. But …
Efficient Tool Use with Chain-of-Abstraction Reasoning
To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools…
Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition
Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computati…
Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks
Abstract—Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rat…
Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
The integration of Natural Language Processing (NLP) and AI into legal tasks is a natural progression, given the linguistic nature of law. This combination allows for more efficient and accurate analy…
Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI
We present Federation of Agents (FoA), a distributed orchestration framework that transforms static multi-agent coordination into dynamic, capability-driven collaboration. FoA introduces Versioned Cap…
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It enta…
Generalization to New Sequential Decision Making Tasks with In-Context Learning
However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment’s stochasticity or the agent’s actions can lead to unseen, and som…
Generative Interfaces for Language Models
Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain cons…
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large languag…
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing…
Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans
Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating helpful and fluent responses. Howev…
LESS: Selecting Influential Data for Targeted Instruction Tuning
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), using combined datasets to develop general-purpose chatbots. However, real-world applications often require a spe…
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
“Large language models (LLMs) have demonstrated remarkable zero shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life…
Large Language Models as Planning Domain Generators
Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of …
Large Language Models can accomplish Business Process Management Tasks
“In this paper, we illustrate how LLMs can be utilized for three BPM tasks that require textual documents as input. For all tasks, we follow the same approach, illustrated in Fig. 1. We start by assem…
Learning to Map Context-Dependent Sentences to Executable Formal Queries
We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that upd…
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
LLM-as-a-Judge models generate chain-of-thought (CoT) sequences intended to capture the step-by-step reasoning process that underlies the final evaluation of a response. However, due to the lack of hu…
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several fact…
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framewo…
On the Limits of Innate Planning in Large Language Models
Large language models (LLMs) achieve impressive results on many benchmarks, yet their capacity for planning and stateful reasoning remains unclear. We study these abilities directly, without code exec…
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embeddin…
Opportunities for large language models and discourse in engineering design
In this paper, we argue that foundation models such as LLMs can be used for creative reasoning tasks in the engineering design process, complementing and integrating existing computational methods suc…
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity …
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
In general, the decision-making task requires performing the following three steps: (1) making a plan for which kind of analysis is needed for decision; (2) retrieving necessary data using queries; (3…
Planning Like Human: A Dual-process Framework for Dialogue Planning
In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to…
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs–making it a new kind of model: a Large Reaso…
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs…
PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-orie…
Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Large Language Models (LLMs), essentially n-gram models on steroids which have been pre-trained on web-scale language corpora (or, effectively, our collective consciousness), have caught the imaginati…
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes — fundamental reasoning patterns that capture the essence of problems across domains. These prototypes …
React - Synergizing Reasoning And Acting In Language Models
“While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-though…
Real-World Planning with PDDL+ and Beyond
Real-world applications of AI Planning often require a highly expressive modeling language to accurately capture important intricacies of target systems. Hybrid systems are ubiquitous in the real-worl…
Reinforced Language Models for Sequential Decision Making
Large Language Models (LLMs) show potential as sequential decision-making agents, but their application is often limited due to a reliance on large, computationally expensive models. This creates a ne…
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
We present a new method, SOLOIST,1 that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-base…
Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
Previous work on task oriented intent and slot-filling work has been restricted to one intent per query and one slot label per token, and thus cannot model complex compositional requests. Alternative …
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dy…
Task Contamination: Language Models May Not Be Few-Shot Anymore
we find that on datasets released before the LLM training data creation date, LLMs perform surprisingly better than on datasets released after. This strongly indicates that, for many LLMs, there exist…
Task-Oriented Dialogue as Dataflow Synthesis
We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs i…
Task-Oriented Dialogue with In-Context Learning
We describe a system for building task oriented dialogue systems combining the in context learning abilities of large language models (LLMs) with the deterministic execution of business logic. LLMs ar…
TaskLAMA: Probing the Complex Task Understanding of Language Models
“Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute…
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
To measure the progress of these LLM agents’ performance on performing real-world professional tasks, in this paper we introduce TheAgentCompany, an extensible benchmark for evaluating AI agents that …
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Large language models (LLMs) have exhibited remarkable reasoning and planning capabilities. Most prior work in this area has used LLMs to reason through steps from an initial to a goal state or criter…
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
Supervised fine-tuning (SFT) is a common method to enhance the tool calling capabilities of Large Language Models (LLMs), with the training data often being synthesized. The current data synthesis pro…
Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning
We propose a hybrid approach to machine Theory of Mind (ToM) that uses large language models (LLMs) as a mechanism for generating hypotheses and likelihood functions with a Bayesian inverse planning m…
Training a Generally Curious Agent
Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. …
Tree Search for Language Model Agents
Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily…
TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models
However, reasoning problems defined in PDDL do not capture temporal aspects of action taking, for example that two agents in the domain can execute an action simultaneously if postconditions of each d…