All Papers

1652 papers · sorted alphabetically

"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline
"It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems
"My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community
(QA)2: Question Answering with Questionable Assumptions
1. ELI5 (Explain Like I'm 5)
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
12 New Advanced Types of RAG
2. TL;DR (Summarize Long Text)
3. Jargonize (Professional/Nerdy Tone)
4. Humanize (Sound More Natural)
5. Feynman Technique (Deep Understanding)
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
A comprehensive analysis of concept drift locality in data streams
A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
A comprehensive taxonomy of hallucinations in Large Language Models
A Computational Framework for Behavioral Assessment of LLM Therapists
A Contextual-Bandit Approach to Personalized News Article Recommendation
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems
A Decomposition Perspective to Long-context Reasoning for LLMs
A Domain Specific Modeling Language for Multiagent Systems
A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts
A Hybrid Human-AI Approach for Argument Map Creation From Transcripts
A Hybrid Intelligence Method for Argument Mining
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
A Little Human Data Goes A Long Way
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
A meta-analysis of the persuasive power of large language models
A Multi-facet Paradigm to Bridge Large Language Model and Recommendation
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
A natural language processing approach reveals first-person pronoun usage and non-fluency as markers of therapeutic alliance in psychotherapy
A Non-Factoid Question-Answering Taxonomy
A Personalized Recommender System based-on Knowledge Graph Embeddings
A polar coordinate system represents syntax in large language models
A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows
A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
A recipe for annotating grounded clarifications
A ripple in time: a discontinuity in American history
A Robustness Evaluation Framework for Argument Mining
A Socially-Aware Conversational Recommender System for Personalized Recipe Recommendations
A sociotechnical perspective for the future of AI: narratives, inequalities, and human control
A Survey of Calibration Process for Black-Box LLMs
A Survey of Continual Reinforcement Learning
A Survey of Meta-Reinforcement Learning
A Survey of Reinforcement Learning from Human Feedback
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
A Survey on Concept Drift Adaptation
A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
A Survey on Diffusion Language Models
A Survey on Knowledge Distillation of Large Language Models
A Survey on Large Language Models for Recommendation
A Survey on Large Language Models with some Insights on their Capabilities and Limitations
A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation
A Survey on LLM Inference-Time Self-Improvement
A Survey on Post-training of Large Language Models
A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects
A Survey on Prompt Tuning
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks
A Taxonomy of Empathetic Questions in Social Dialogs
A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems
Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research
Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
ACE: Abstractions for Communicating Efficiently
Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems
Activation Steering for Chain-of-Thought Compression
Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
Active Retrieval Augmented Generation
Adam's Law: Textual Frequency Law on Large Language Models
Adaptation of Agentic AI
Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization
Adapting LLM Agents with Universal Feedback in Communication
Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Advances and Challenges in Conversational Recommender Systems: A Survey
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
Affordable AI Assistants with Knowledge Graph of Thoughts
Agent Development Kit
Agent Laboratory: Using LLM Agents as Research Assistants
Agent Learning via Early Experience
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Agent Workflow Memory
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Agentic AI and the next intelligence explosion
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Agentic Reasoning for Large Language Models
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Agentic Web: Weaving the Next Web with AI Agents
AgentRxiv: Towards Collaborative Autonomous Research
Agents Are Not Enough
Agents of Chaos
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
Agreement Tracking for Multi-Issue Negotiation Dialogues
AI & Human Co-Improvement for Safer Co-Superintelligence
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
AI Assistance Reduces Persistence and Hurts Independent Performance
AI Can Learn Scientific Taste
AI Enters Public Discourse: A Habermasian Assessment Of The Moral Status Of Large Language Models
AI Meets the Classroom: When Does ChatGPT Harm Learning?
AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms
AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
AI-Powered (Finance) Scholarship
AI-Researcher: Autonomous Scientific Innovation
AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making
Aligning Language Models to Explicitly Handle Ambiguity
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
All AI Models are Wrong, but Some are Optimal
All AI PDFs
AlphaGo Moment for Model Architecture Discovery
Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models
An Automatic Graph Construction Framework based on Large Language Models for Recommendation
An Emulator for Fine-Tuning Large Language Models using Small Language Models
An extended framework for characterizing social robots
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
An Overview Of Temporal Commonsense Reasoning and Acquisition
ANAPHORA RESOLUTION: THE STATE OF THE ART
Answer is All You Need: Instruction-following Text Embedding via Answering the Question
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates
Are Customers Lying to Your Chatbot?
Are Emergent Abilities in Large Language Models just In-Context Learning?
Are Emergent Abilities of Large Language Models a Mirage?
Are LLMs All You Need for Task-Oriented Dialogue?
Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
ARGS: Alignment as Reward-Guided Search
Argument Quality Assessment in the Age of Instruction-Following Large Language Models
Argument Summarization and its Evaluation in the Era of Large Language Models
Argumentative Large Language Models for Explainable and Contestable Decision-Making
Argunauts: Open LLMs that Master Argument Analysis with Argdown
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Artifacts as Memory Beyond the Agent Boundary
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Artificial Intelligence and the Labor Market∗
Artificial intelligence is ineffective and potentially harmful for fact checking
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Ask, and it shall be given: Turing completeness of prompting
Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
Asking Clarifying Questions Based on Negative Feedback in Conversational Search
Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification
Assessing adaptive world models in machines with novel games
Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews
Assessment of Personality Dimensions Across Situations Using Conversational Speech
Atesa-bært: A Heterogeneous Ensemble Learning Model For Aspect-based Sentiment Analysis
Atom of Thoughts for Markov LLM Test-Time Scaling
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
Attention on the brain
Attention, Intentions, And The Structure Of Discourse
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models
Attribute Controlled Dialogue Prompting
Auditing language models for hidden objectives
Augmenting Autotelic Agents with Large Language Models
Augmenting Netflix Search with In-Session Adapted Recommendations
AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
AutoGLM: Autonomous Foundation Agents for GUIs
Automated Alignment Researchers: Using large language models to scale scalable oversight
Automated Design of Agentic Systems
Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Backtracing: Retrieving the Cause of the Query
Base Models Know How to Reason, Thinking Models Learn When
Behavioral Exploration: Learning to Explore via In-Context Adaptation
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
Benchmarking the Pedagogical Knowledge of Large Language Models
Better Alignment with Instruction Back-and-Forth Translation
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Beyond Answers: How LLMs Can Pursue Strategic Thinking in Education
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
Beyond Hallucinations: The Illusion of Understanding in Large Language Models
Beyond neural scaling laws: beating power law scaling via data pruning
Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration
Beyond Preferences in AI Alignment
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Beyond the Surface: Probing the Ideological Depth of Large Language Models
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following
Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation
Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Bilevel Autoresearch: Meta-Autoresearching Itself
Boosted Prompt Ensembles for Large Language Models
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Boundless Socratic Learning with Language Games
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Break It Down: Evidence for Structural Compositionality in Neural Networks
Break the Chain: Large Language Models Can be Shortcut Reasoners
Bridging Offline and Online Reinforcement Learning for LLMs
Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.
Building a Stronger CASA: Extending the Computers Are Social Actors Paradigm
Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
Building Cooperative Embodied Agents Modularly with Large Language Models
Building Decision Making Models Through Language Model Regime
Building Machines that Learn and Think with People
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Byte Latent Transformer: Patches Scale Better Than Tokens
Calibrated Recommendations
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Can AI Explanations Make You Change Your Mind?
Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training
Can Authorship Representation Learning Capture Stylistic Features?
Can Language Models Recognize Convincing Arguments?
Can Language Models Represent the Past without Anachronism?
Can Language Models Serve as Text-Based World Simulators?
Can Language Models Solve Graph Problems in Natural Language?
Can Large Language Models Capture Human Annotator Disagreements?
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Can Large Language Models do Analytical Reasoning?
Can large language models explore in-context?
Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
Can Large Language Models perform Relation-based Argument Mining?
Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
Can Large Language Models Reason and Plan?
Can Large Language Models Transform Computational Social Science?
Can Large Language Models Understand Context?
Can Large Reasoning Models Self-Train?
Can LLM be a Personalized Judge?
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation
Can LLMs Follow Simple Rules?
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games
Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms
Can Theoretical Physics Research Benefit from Language Agents?
Can You Trust LLM Judgments? Reliability of LLM-as-a-Judge
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Causal Claims in Economics
Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
CEO: Corpus-based Open-Domain Event Ontology Induction
Chain of Draft: Thinking Faster by Writing Less
Chain of Stance: Stance Detection with Large Language Models
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Chain of Thoughtlessness? An Analysis of CoT in Planning
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Chain-of-Retrieval Augmented Generation
Chain-of-Thought Is Not Explainability
Chain-of-thought Reasoning Is A Policy Improvement Operator
Chain-of-Thought Reasoning Without Prompting
Chain-of-Verification Reduces Hallucination in Large Language Models
Challenges of Large Language Models for Mental Health Counseling
Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?
Characterizing Deep Research: A Benchmark and Formal Definition
Characterizing Online Discussion Using Coarse Discourse Sequences
Chatbot vs. Human: The Impact of Responsive Conversational Features on Users’ Responses to Chat Advisors
Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatGPT codes
ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context
ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
ChatGPT: deconstructing the debate and moving it forward
ChatGPT: towards AI subjectivity
Checklists Are Better Than Reward Models For Aligning Language Models
Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems
Circuit Tracing: Revealing Computational Graphs in Language Models
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness
Classifying YouTube Comments Based on Sentiment and Type of Sentence
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models
Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
CogBench: a large language model walks into a psychology lab
Cognitive Architectures for Language Agents
Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations
Cognitive Effects in Large Language Models
CollabLLM: From Passive Responders to Active Collaborators
Collaborative Deep Learning for Recommender Systems
Collaborative Filtering Bandits
Collaborative Filtering for Implicit Feedback Datasets
Collaborative Filtering with Temporal Dynamics
Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews
Comparing emotion feature extraction approaches for predicting depression and anxiety
Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
Competitive Programming with Large Reasoning Models
Complex Logical Instruction Generation
Complexity-Based Prompting for Multi-Step Reasoning
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations
Computational Modelling of Undercuts in Real-world Arguments
Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
Computer says “No”: The Case Against Empathetic Conversational AI
Conceptual Design Generation Using Large Language Models
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
Considering the Context to Build Theory in HCI, HRI, and HMC: Explicating Differences in Processes of Communication and Socialization With Social Technologies
Consistency Training Helps Stop Sycophancy and Jailbreaks
Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Content-aware Collaborative Music Recommendation Using Pre-trained Neural Networks
Context Embeddings for Efficient Answer Generation in RAG
Context Tuning for Retrieval Augmented Generation
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Continual Instruction Tuning for Large Multimodal Models
CONTROL PREFIXES for Parameter-Efficient Text Generation
Controlling Linguistic Style Aspects in Neural Language Generation
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations
Conversation Derailment Forecasting with Graph Convolutional Networks
Conversational Alignment with Artificial Intelligence in Context
Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI
Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
Conversational Prompt Engineering
Conversational Recommendation: A Grand AI Challenge
Conversational Semantic Parsing for Dialog State Tracking
Conversations Gone Awry: Detecting Early Signs of Conversational Failure
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Creativity Has Left the Chat: The Price of Debiasing Language Models
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
Critiques of World Models
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
Cultural Evolution of Cooperation among LLM Agents
Cumulated Gain-Based Evaluation of IR Techniques
Cumulative Reasoning with Large Language Models
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
Curse of “Low” Dimensionality in Recommender Systems
DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
DataComp-LM: In search of the next generation of training sets for language models
DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
Debating with More Persuasive LLMs Leads to More Truthful Answers
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision-Oriented Dialogue for Human–AI Collaboration
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Deep Interest Network for Click-Through Rate Prediction
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
Deep Neural Network Approach for the Dialog State Tracking Challenge
Deep Neural Networks for YouTube Recommendations
Deep Research: A Systematic Survey
Deep Researcher with Test-Time Diffusion
Deep Think with Confidence
DeepCT-enhanced Lexical Argument Retrieval
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
DeepGesture: A conversational gesture synthesis system based on emotions and semantics
DeepNet: Scaling Transformers to 1,000 Layers
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
Demystifying Chains, Trees, and Graphs of Thoughts
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Dense Retrieval Adaptation using Target Domain Description
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
Design Principles for Generative AI Applications
Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design
Detecting Cognitive Distortions from Patient-Therapist Interactions
Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change
Detecting hallucinations in large language models using semantic entropy
Determinants of LLM-assisted Decision-Making
Detoxify Language Model Step-by-Step
Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy
Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine
Dialog Inpainting: Turning Documents into Dialogs
Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot’s Self-Disclosure in Conversational Recommendations
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Dialogue State Tracking with a Language Model using Schema-Driven Prompting
Dialogue Transformers
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
Diffusion Language Models Know the Answer Before Decoding
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Diffusion Models are Evolutionary Algorithms
Diffusion-LM Improves Controllable Text Generation
Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Discourse-Level Representations can Improve Prediction of Degree of Anxiety
Discovering Latent Concepts Learned in BERT
Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
DiscussLLM: Teaching Large Language Models When to Speak
Dissociating language and thought in large language models
Distilling LLMs' Decomposition Abilities into Compact Language Models
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Do Large Language Models Reason Causally Like Us? Even Better?
Do large language models resemble humans in language use?
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Do LLMs Encode Functional Importance of Reasoning Tokens?
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
Do LLMs produce texts with "human-like" lexical diversity?
Do LLMs Truly Understand When a Precedent Is Overruled?
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
DO THEY SEE WHAT WE SEE?
Do We Trust ChatGPT as much as Google Search and Wikipedia?
DOC: Improving Long Story Coherence With Detailed Outline Control
Does It Make Sense to Speak of Introspection in Large Language Models?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Domain-specific Question Answering with Hybrid Search
Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
DR-HAI: Argumentation-based Dialectical Reconciliation in Human-AI Interactions
DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models
Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Durably reducing conspiracy beliefs through dialogues with AI
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Dynamic Planning with a LLM
Dynamic Prompting: A Unified Framework for Prompt Tuning
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
Dynamically Expandable Graph Convolution for Streaming Recommendation
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge
Efficient Nearest Neighbor Language Models
Efficient Reasoning with Balanced Thinking
Efficient Reasoning with Hidden Thinking
Efficient Reinforcement Learning via Large Language Model-based Search
Efficient Streaming Language Models with Attention Sinks
Efficient Tool Use with Chain-of-Abstraction Reasoning
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Eliciting Latent Knowledge from Quirky Language Models
Eliciting Reasoning in Language Models with Cognitive Tools
Embarrassingly Shallow Autoencoders for Sparse Data*
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
Emergent Introspective Awareness in Large Language Models
Emerging Properties in Unified Multimodal Pretraining
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Empathetic Persuasion: Reinforcing Empathy and Persuasiveness in Dialogue Systems
Empathy Through Multimodality in Conversational Interfaces
Empirical Study of Symmetrical Reasoning in Conversational Chatbots
Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance
Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting
Enabling Explainable Recommendation in E-commerce with LLM-powered Product Knowledge Graph
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
End-to-End Test-Time Training for Long Context
Energy-Based Transformers are Scalable Learners and Thinkers
Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate
Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation
Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System
Enhancing personalized multi-turn dialogue with curiosity reward
Enhancing Pipeline-Based Conversational Agents with Large Language Model
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Enhancing social cohesion with cooperative bots in societies of greedy, mobile individuals
Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
Equipping agents for the real world with Agent Skills
Escaping the Verifier: Learning to Reason via Demonstrations
Estimating AI productivity gains from Claude conversations
Evaluating Emotional Nuances In Dialogue Summarization
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
Evaluating Large Language Models in Theory of Mind Tasks
Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling
Evaluating the psychometric properties of ChatGPT-generated questions
Evaluating the Therapeutic Alliance With a Free-Text CBT Conversational Agent (Wysa): A Mixed-Methods Study
Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluation and Benchmarking of LLM Agents: A Survey
Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading
Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition
Evidence of Human-Level Bonds Established With a Digital Conversational Agent: Cross-sectional, Retrospective Observational Study
EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory
Evolving Deeper LLM Thinking
Example 1:
Example 2:
Existential Conversations with Large Language Models: Content, Community, and Culture
Expanding Explainability: Towards Social Transparency in AI systems
Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of Mind
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure
Explainable Multimodal Emotion Reasoning
Explainable Recommendation with Personalized Review Retrieval and Aspect Learning
Explainable Recommendations via Attentive Multi-Persona Collaborative Filtering
Explicit Inductive Inference using Large Language Models
Exploiting Dialogue Acts and Context to Identify Argumentative Relations in Online Debates
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks
Exploring Format Consistency for Instruction Tuning
Exploring Large Language Models for Knowledge Graph Completion
Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
Exploring Student-AI Interactions in Vibe Coding
Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
Exploring the Potential of Large Language Models in Computational Argumentation
Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Extracting memorized pieces of (copyrighted) books from open-weight language models
Extrapolation by Association: Length Generalization Transfer in Transformers
Extreme Multi-Label Skill Extraction Training using Large Language Models
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
Faith and Fate: Limits of Transformers on Compositionality
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations
Fake News Detectors are Biased against Texts Generated by Large Language Models
Fast and Slow Learning From Reviews
Fast, Slow, and Tool-augmented Thinking for LLMs: A Review
Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI
FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning
Find the Gap: AI, Responsible Agency and Vulnerability
Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences
Fine-grained Hallucination Detection and Editing for Language Models
Fine-tuning Language Models for Factuality
Fine-tuning Large Language Model for Automated Algorithm Design
Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
FlowReasoner: Reinforcing Query-Level Meta-Agents
Flows: Building Blocks of Reasoning and Collaborating AI
Forecasting the presence and intensity of hostility on Instagram using linguistic and social features
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming
Foundation Priors
Foundations of Large Language Models
From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models
From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization
From Language to Logic: A Bi-Level Framework for Structured Reasoning
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
From Louvain to Leiden: guaranteeing well-connected communities
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
From Prompt Engineering to Prompt Science With Human in the Loop
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
From speaking like a person to being personal: The effects of personalized, regular interactions with conversational agents
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
Gdpval: Evaluating Ai Model Performance On Real-world Economically Valuable Tasks
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
General
Generalization through Memorization: Nearest Neighbor Language Models
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy
Generating Query-Relevant Document Summaries via Reinforcement Learning
Generative Agent Simulations of 1,000 People
Generative Agents: Interactive Simulacra of Human Behavior
Generative AI in Real-World Workplaces
Generative Interfaces for Language Models
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
GenRec: Large Language Model for Generative Recommendation
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation
Goal Alignment in LLM-Based User Simulators for Conversational AI
Goals, Plans, and Action Models
Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations
GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes
GPT-4 is judged more human than humans in displaced and inverted Turing tests
Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Grounding Gaps in Language Model Generations
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Grounding ‘Grounding’ in NLP
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models
Guiding Large Language Models via Directional Stimulus Prompting
H2HTalk: Evaluating Large Language Models as Emotional Companion
Hallucinating with AI: AI Psychosis as Distributed Delusions
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Harnessing Business and Media Insights with Large Language Models
Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Hierarchical Reasoning Model
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Holy Grail 2.0: From Natural Language to Constraint Models
HonestBait: Forward References for Attractive but Faithful Headline Generation
Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
How AI Impacts Skill Formation
How do Transformers Learn Implicit Reasoning?
How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index
How Far Are We from Genuinely Useful Deep Research Agents?
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
How Many Instructions Can LLMs Follow at Once?
How much do language models memorize?
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
How new data permeates LLM knowledge and how to dilute it
How Should We Meta-Learn Reinforcement Learning Algorithms?
How susceptible are LLMs to Logical Fallacies?
How to Correctly do Semantic Backpropagation on Language-based Agentic Systems
How we built our multi-agent research system
How well can large language models explain business processes?
HowProjective is Projective Content? Gradience in Projectivity and At-issueness
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
Humans learn to prefer trustworthy AI over human partners
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans overrely on overconfident language models, across languages
Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Hyperagents
HyperBandit: Contextual Bandit with Hypernetwork for Time-Varying User Preferences in Streaming Recommendation
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems
Identification of Propositional and Illocutionary Relations
IFEvalCode: Controlled Code Generation
IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
Implicit Chain of Thought Reasoning via Knowledge Distillation
Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions
Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
Improving Dialog Systems for Negotiation with Personality Modeling
Improving Document-Level Sentiment Analysis with User and Product Context
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans
Improving large language models with concept-aware fine-tuning
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
In-context learning agents are asymmetric belief updaters
Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Inducing Positive Perspectives with Text Reframing
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Inference-Time Scaling for Generalist Reward Modeling
Information-Theoretic Reward Decomposition for Generalizable RLHF
Informed Named Entity Recognition Decoding For Generative Language Models
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Insert-expansions For Tool-enabled Conversational Agents
Inspecting and Editing Knowledge Representations in Language Models
INSPIRED: Toward Sociable Recommendation Dialog Systems
Instance-adaptive Zero-shot Chain-of-Thought Prompting
Instruction
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Instruction Tuning for Large Language Models: A Survey
Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
Intelligent AI Delegation
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues
Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy
Interaction Dynamics as a Reward Signal for LLMs
Interactions with generative AI chatbots: unveiling dialogic dynamics, students’ perceptions, and practical competencies in creative problem-solving
Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
Interrogator
Intrinsically Motivated Graph Exploration Using Network Theories of Human Curiosity
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
Investigating Gender Bias in Language Models Using Causal Mediation Analysis
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Irony in Emojis: A Comparative Study of Human and LLM Interpretation
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Jamba: A Hybrid Transformer-Mamba Language Model
JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
Jointly Reinforcing Diversity and Quality in Language Model Generations
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
KellyBench: Can Language Models Beat the Market?
KETOD: Knowledge-Enriched Task-Oriented Dialogue
KGAT: Knowledge Graph Attention Network for Recommendation
KiPT: Knowledge-injected Prompt Tuning for Event Detection
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models
Knowledge Graph Prompting for Multi-Document Question Answering
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
Knowledge Retrieval Based on Generative AI
Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
KTO: Model Alignment as Prospect Theoretic Optimization
Language Agents as Optimizable Graphs
Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration
Language Model Personalization via Reward Factorization
Language Modeling by Language Models
Language Modeling is Compression
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Language Models are Pragmatic Speakers
Language models are weak learners
Language Models Learn to Mislead Humans via RLHF
Language models show human-like content effects on reasoning tasks
Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
Large Action Models: From Inception to Implementation
Large Causal Models From Large Language Models
Large Concept Models: Language Modeling in a Sentence Representation Space
Large Language Diffusion Models
Large Language Model Agents Are Not Always Faithful Self-Evolvers
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Large Language Model Guided Tree-of-Thought
Large Language Model Programs
Large Language Model Reasoning Failures
Large Language Model-based Data Science Agent: A Survey
Large Language Model-Brained GUI Agents: A Survey
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
Large Language Models Are Human-level Prompt Engineers
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
Large Language Models are Zero-Shot Rankers for Recommender Systems
Large Language Models as Planning Domain Generators
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?*
Large Language Models as Zero-Shot Conversational Recommenders
Large Language Models can accomplish Business Process Management Tasks
Large Language Models Can Infer Psychological Dispositions of Social Media Users
Large language models can segment narrative events similarly to humans
Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
Large Language Models Do Not Simulate Human Psychology
Large Language Models For Social Networks: Applications, Challenges, And Solutions
Large Language Models for User Interest Journeys
Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities
Large Language Models Reflect the Ideology of their Creators
Large Language Models Report Subjective Experience Under Self-Referential Processing
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Large language models surpass human experts in predicting neuroscience results
Large Language Models Think Too Fast To Explore Effectively
Large Linguistic Models: Investigating LLMs' metalinguistic abilities
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
Large Multimodal Agents: A Survey
Large Scale Product Graph Construction for Recommendation in E-commerce
Latent Collaboration in Multi-Agent Systems
Latent Skill Discovery for Chain-of-Thought Reasoning
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Learning Distributed Representations from Reviews for Collaborative Filtering
Learning Human-Object Interaction as Groups
Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Learning Retrieval Augmentation for Personalized Dialogue Generation
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Learning to Ask Appropriate Questions in Conversational Recommendation
Learning to Ask Critical Questions for Assisting Product Search
Learning to Discover at Test Time
Learning To Guide Human Experts Via Personalized Large Language Models
Learning to Map Context-Dependent Sentences to Executable Formal Queries
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Learning to Rank for Recommender Systems
Learning to Reason for Factuality
Learning to Reason without External Rewards
Learning to Relate to Previous Turns in Conversational Search
Learning To Retrieve Prompts for In-Context Learning
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
Learning to Select the Relevant History Turns in Conversational Question Answering
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders
Least-to-most Prompting Enables Complex Reasoning In Large Language Models
LESS: Selecting Influential Data for Targeted Instruction Tuning
Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Let’s Verify Step by Step
Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity
Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation
Leveraging Large Language Models in Conversational Recommender Systems
Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Lexical Entrainment for Conversational Systems
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
LIMA: Less Is More for Alignment
LIMI: Less is More for Agency
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
Linguistic Blind Spots of Large Language Models
Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLM Augmentations to support Analytical Reasoning over Multiple Documents
LLM Generated Persona is a Promise with a Catch
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
LLM Reasoning Is Latent, Not the Chain of Thought
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools
LLMs are Frequency Pattern Learners in Natural Language Inference
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
LLMs as Architects and Critics for Multi-Source Opinion Summarization
LLMs as Method Actors: A Model for Prompt Engineering and Architecture
LLMs can be Fooled into Labelling a Document as Relevant
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
LLMs can implicitly learn from mistakes in-context
LLMs Get Lost In Multi-Turn Conversation
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Logical Reasoning in Large Language Models: A Survey
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
Long-context LLMs Struggle with Long In-context Learning
Long-form Factuality In Large Language models
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Looking beyond the next token
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems
LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
Machine ex machina: A Framework Decentering the Human in AI Design Praxis
Machine gaze in online behavioral targeting: The effects of algorithmic human likeness on social presence and social influence
Machine Psychology
Magentic-UI: Towards Human-in-the-loop Agentic Systems
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
Making Sense of Memory in AI Agents
Man vs machine – Detecting deception in online reviews
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
MasRouter: Learning to Route LLMs for Multi-Agent Systems
Mastering Diverse Domains through World Models
MatFormer: Nested Transformer for Elastic Inference
Mathematical methods and human thought in the age of AI
MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Meanings are like Onions: a Layered Approach to Metaphor Processing
Measuring Agents in Production
Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling
Measuring and Mitigating Persona Distortions from AI Writing Assistance
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Human Preferences in RLHF is a Social Science Problem
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Measuring the Value of Social Dynamics in Online Product Ratings Forums
Mechanisms of Introspective Awareness
Mechanistic Indicators of Understanding in Large Language Models
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Memorization and Knowledge Injection in Gated LLMs
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Metacognitive Prompting Improves Understanding in Large Language Models
Metacognitive Retrieval-Augmented Large Language Models
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Metadiscursive nouns in academic argument: ChatGPT vs student practices
Metagpt: Meta Programming For Multi-agent Collaborative Framework
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Methodologies for Improving Modern Industrial Recommender Systems
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Minds versus Machines: Rethinking Entailment Verification with Language Models
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Mindstorms in Natural Language-Based Societies of Mind
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Misaligned by Design: Incentive Failures in Machine Learning
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Mitigating Hallucinations in Large Language Models via Causal Reasoning
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
MLLM-CBench: A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Model Organisms for Emergent Misalignment
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Modeling Appropriate Language in Argumentation
Modeling Code: Is Text All You Need?
Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance
Modeling the Quality of Dialogical Explanations
MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monolith: Real Time Recommendation System With Collisionless Embedding Table
MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis
Mostly Exploration-Free Algorithms for Contextual Bandits
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning
Multi-agent cooperation through in-context co-player inference
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
Multi-hop Question Answering via Reasoning Chains
Multi-Task End-to-End Training Improves Conversational Recommendation
Multi-Token Attention
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Natural Emergent Misalignment From Reward Hacking In Production RL
Natural Emergent Misalignment From Reward Hacking In Production Rl
Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support
Nested Attention: Semantic-aware Attention Values for Concept Personalization
Nested Learning: The Illusion of Deep Learning Architecture Expanded
Nested Learning: The Illusion of Deep Learning Architectures
Neural Approaches to Conversational AI
Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning
Neural Collaborative Filtering
Neural Collaborative Filtering vs. Matrix Factorization Revisited
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
Neural Topic Modeling of Psychotherapy Sessions
Neuro-Symbolic AI in 2024: A Systematic Review
NeuroQL: A Neuro-Symbolic Language and Dataset for Inter-Subjective Reasoning
Neurosymbolic AI- Why, What, and How
Neutralizing Bias in LLM Reasoning using Entailment Graphs
News Sentiment Embeddings for Stock Price Forecasting
News Source Citing Patterns in AI Search Systems
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
Next Steps for Human-Centered Generative AI: A Technical Perspective
No that's not what I meant: Handling Third Position Repair in Conversational Question Answering
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Octopus v2: On-device language model for super agent
Octopus v4: Graph of language models
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
OmniParser for Pure Vision Based GUI Agent
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
On Generative Agents in Recommendation
On Information Distortions in Online Ratings
On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
On the Adaptive Psychological Persuasion of Large Language Models
On the Binding Problem in Artificial Neural Networks
On the Conversational Basis of Some Presuppositions
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
On the Limits of Innate Planning in Large Language Models
On The Persona-based Summarization of Domain-Specific Documents
On the Reasoning Capacity of AI Models and How to Quantify It
On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
On the Theoretical Limitations of Embedding-Based Retrieval
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Open Problems in Mechanistic Interpretability
Openagents: An Open Platform For Language Agents In The Wild
OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenClaw-RL: Train Any Agent Simply by Talking
OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
OpenThoughts: Data Recipes for Reasoning Models
Operating Multi-Client Influence Networks Across Platforms
OpinionConv: Conversational Product Search with Grounded Opinions
Opportunities for large language models and discourse in engineering design
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Orchestrating Synthetic Data with Reasoning
Outcome-based Exploration for LLM Reasoning
Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals
Peer-Preservation in Frontier Models
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
People cannot distinguish GPT-4 from a human in a Turing test
Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity
Persistent Pre-Training Poisoning of LLMs
PersLLM: A Personified Training Approach for Large Language Models
Persona Generators: Generating Diverse Synthetic Personas at Scale
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
PersonaGym: Evaluating Persona Agents and LLMs
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Personalization of Large Language Models: A Survey
Personalized Dialogue Generation with Persona-Adaptive Attention
Personalized Language Modeling from Personalized Human Feedback
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Persuasive presuppositions
PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
Pixels, Patterns, but No Poetry: To See The World like Humans
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
Planning Like Human: A Dual-process Framework for Dialogue Planning
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
Polanyi’s Revenge and AI’s New Romance with Tacit Knowledge
PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
POMDP-based Statistical Spoken Dialogue Systems: a Review
Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Position: Towards Bidirectional Human-AI Alignment
Post-Completion Learning for Language Models
Post-training for Efficient Communication via Convention Formation
Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
PosterMate: Audience-driven Collaborative Persona Agents for Poster Design
Posting versus Lurking: Communicating in a Multiple Audience Context
Potemkin Understanding in Large Language Models
Pragmatic Implicature Processing in ChatGPT
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pre-Trained Policy Discriminators are General Reward Models
Precise Zero-Shot Dense Retrieval without Relevance Labels
Predictive Preference Learning from Human Interventions
Preference Discerning with LLM-Enhanced Generative Retrieval
Prefix-Tuning: Optimizing Continuous Prompts for Generation
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Premise Order Matters in Reasoning with Large Language Models
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
Presuppositions are more persuasive than assertions if addressees accommodate them: Experimental evidence for philosophical reasoning
Pretrained Language Models as Containers of the Discursive Knowledge
PRewrite: Prompt Rewriting with Reinforcement Learning
PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
Pro-Active Systems and Influenceable Users: Simulating Pro-Activity in Task-oriented Dialogues
Proactive behavior in voice assistants: A systematic review and conceptual model
Proactive Conversational Agents in the Post-ChatGPT World
Proactive Conversational Agents with Inner Thoughts
Proactive Human-Machine Conversation with Explicit Conversation Goals
Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Process Reward Models That Think
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Progress Measures For Grokking Via Mechanistic Interpretability
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
Prompting Large Language Models With the Socratic Method
Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
Propositional Interpretability in Artificial Intelligence
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
ProsocialDialog: A Prosocial Backbone for Conversational Agents
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience
PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health
Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
Psychologically Enhanced AI Agents
Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics
PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
Quantifying Controversy on Social Media
Quantifying Human-AI Synergy
Quantitative Introspection in Language Models: Tracking Internal States Across Conversation
Query Rewriting for Retrieval-Augmented Large Language Models
Query Understanding in the Age of Large Language Models
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
R-Zero: Self-Evolving Reasoning LLM from Zero Data
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
RAG Does Not Work for Enterprises
RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation
RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
RARR: Researching and Revising What Language Models Say, Using Language Models
Re3: Generating Longer Stories With Recursive Reprompting and Revision
React - Synergizing Reasoning And Acting In Language Models
Real-time News Story Identification
Real-Time Procedural Learning From Experience for AI Agents
Real-World Planning with PDDL+ and Beyond
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
Reasoning Can Hurt the Inductive Abilities of Large Language Models
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
Reasoning Language Models: A Blueprint
Reasoning LLMs are Wandering Solution Explorers
Reasoning Models Are More Easily Gaslighted Than You Think
Reasoning Models Can Be Effective Without Thinking
Reasoning Models Don't Always Say What They Think
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
Reasoning to Learn from Latent Thoughts
Reasoning with Large Language Models, a Survey
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
Recommendation systems and convergence of online reviews: The type of product network matters!
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
Recommender Systems with Social Regularization
Recommending What Video to Watch Next: A Multitask Ranking System
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Reconciling the accuracy-diversity trade-off in recommendations
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Recursive Language Models
Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
Reflexion: an autonomous agent with dynamic memory and self-reflection
Reinforced Language Models for Sequential Decision Making
Reinforcement Learning be Enough for Thinking?
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning with Rubric Anchors
Reinforcement Pre-Training
Reinforcing General Reasoning without Verifiers
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
Representation biases: will we achieve complete understanding by analyzing representations?
Representation Engineering: A Top-Down Approach to AI Transparency
Reranking-based Generation for Unbiased Perspective Summarization
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond
Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning
Rethinking Large Language Models in Mental Health Applications
Rethinking STS and NLI in Large Language Models
Rethinking Thinking Tokens: LLMs as Improvement Operators
Rethinking with Retrieval: Faithful Large Language Model Inference
Retrieval Head Mechanistically Explains Long-Context Factuality
Retrieval-augmented reasoning with lean language models
RevCore: Review-augmented Conversational Recommendation
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
Reverse Thinking Makes LLMs Stronger Reasoners
Review-LLM: Harnessing Large Language Models for Personalized Review Generation
Revisiting LLM Reasoning via Information Bottleneck
Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation
Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents
Reward Reasoning Model
Reward-Robust RLHF in LLMs
RewardBench: Evaluating Reward Models for Language Modeling
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
Rise of Machine Agency: A Framework for Studying the Psychology of Human–AI Interaction (HAII)
RL + Transformer = A General-Purpose Problem Solver
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
RLHF Workflow: From Reward Modeling to Online RLHF
RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards
RLP: Reinforcement as a Pretraining Objective
RLPR: Extrapolating RLVR to General Domains without Verifiers
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
RM-R1: Reward Modeling as Reasoning
Role play with large language models
Role-Play with Large Language Models
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
RouteLLM: Learning to Route LLMs with Preference Data
rStar2-Agent: Agentic Reasoning Technical Report
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
SAND: Boosting LLM Agents with Self-Taught Action Deliberation
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Scalable Language Models with Posterior Inference of Latent Thought Vectors
Scalable Neural Contextual Bandit for Recommender Systems
Scaling can lead to compositional generalization
Scaling Expert Language Models with Unsupervised Domain Discovery
Scaling Laws for Neural Language Models
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Schema-learning and rebinding as mechanisms of in-context learning and emergence
SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM
SDPO: Segment-Level Direct Preference Optimization for Social Agents
SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs
Search Arena: Analyzing Search-Augmented LLMs
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Searching for Best Practices in Retrieval-Augmented Generation
See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Seemingly Conscious AI Risks
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Self Selection and Information Role of Online Product Reviews
Self-Adapting Language Models
Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems
Self-Alignment with Instruction Backtranslation
Self-consistency Improves Chain Of Thought Reasoning In Language Models
Self-critiquing models for assisting human evaluators
Self-Directed Synthetic Dialogues and Revisions Technical Report
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Self-distillation Enables Continual Learning
Self-Evaluation Guided Beam Search for Reasoning
Self-Improving Model Steering
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Self-Questioning Language Models
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Self-Refine: Iterative Refinement with Self-Feedback
Self-reflecting Large Language Models: A Hegelian Dialectical Approach
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
Self-reinforcing cascades: A spreading model for beliefs or products of varying intensity or quality
Self-Rewarding Language Models
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Self-Supervised Models of Speech Infer Universal Articulatory Kinematics
Self-Taught Evaluators
Semantic Change Characterization with LLMs using Rhetorics
Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
Semantic Specialization for Knowledge-based Word Sense Disambiguation
Semantic Structure in Large Language Model Embeddings
Sequence Organization in Interaction: A Primer in Conversation Analysis
SERL: Self-Examining Reinforcement Learning on Open-Domain
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
Should Humans Lie to Machines? The Incentive Compatibility of Lasso and General Weighted Lasso
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
Simple Synthetic Data Reduces Sycophancy In Large Language Models
SimPO: Simple Preference Optimization with a Reference-Free Reward
Simulacra as conscious exotica
Simulating Society Requires Simulating Thought
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
Single-agent or Multi-agent Systems? Why Not Both?
Situating Recommender Systems in Practice: Towards Inductive Learning and Incremental Updates
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Sleep-time Compute: Beyond Inference Scaling at Test-time
Small Language Models are the Future of Agentic AI
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Social Responses to Media Technologies in the 21st Century: The Media are Social Actors Paradigm
Social Robots for Long-Term Interaction: A Survey
Social Skill Training with Large Language Models
SocraSynth: Multi-LLM Reasoning with Conditional Statistics
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Soft Tokens, Hard Truths
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
Solving a Million-Step LLM Task with Zero Errors
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Sources of Hallucination by Large Language Models on Inference Tasks
SParC: Cross-Domain Semantic Parsing in Context
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
SPICE: Self-Play In Corpus Environments Improves Reasoning
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Spurious Forgetting in Continual Learning of Language Models
Spurious Rewards: Rethinking Training Signals in RLVR
SSRL: Self-Search Reinforcement Learning
Stance Detection on Social Media with Fine-Tuned Large Language Models
Statistical and Algorithmic Foundations of Reinforcement Learning
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Strategic Reasoning with Language Models
Stream of Search (SoS): Learning to Search in Language
Stress Testing Deliberative Alignment for Anti-Scheming Training
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Structured and Natural Responses Co-generation for Conversational Search
Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act
Style Vectors for Steering Generative Large Language Models
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system
Supervised Pretraining Can Learn In-Context Reinforcement Learning
SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning
Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents
Suppressing Pink Elephants with Direct Principle Feedback
Survey on Evaluation of LLM-based Agents
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Synthetic Dialogue Dataset Generation using LLM Agents
System 1 vs. System 2 Thinking
System 2 Attention (is something you might need too)
Systematic synthesis of design prompts for large language models in conceptual design
Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
TaleStream: Supporting Story Ideation with Trope Knowledge
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
Talk like a Graph: Encoding Graphs for Large Language Models
Talking About Large Language Models
TarGEN: Targeted Data Generation with Large Language Models
Target-Guided Open-Domain Conversation
Task Contamination: Language Models May Not Be Few-Shot Anymore
Task-Oriented Dialogue as Dataflow Synthesis
Task-Oriented Dialogue with In-Context Learning
TaskLAMA: Probing the Complex Task Understanding of Language Models
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Probabilistic Logical Reasoning to Transformers
Tell me about yourself: LLMs are aware of their learned behaviors
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future
Test-time Prompt Intervention
Test-Time Scaling with Reflective Generative Model
Textgrad: Automatic “Differentiation” via Text
The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
The Art of Scaling Reinforcement Learning Compute for LLMs
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study
The Consensus Game: Language Model Generation via Equilibrium Search
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
The Curse Of Recursion: Training On Generated Data Makes Models Forget
The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning
The Digital Therapeutic Alliance and Human-Computer Interaction
The Digital Therapeutic Alliance: Prospects and Considerations
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
The False Promise of Imitating Proprietary LLMs
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
The Future of AI: Exploring the Potential of Large Concept Models
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
The Hallucination Tax of Reinforcement Finetuning
The Hermeneutics of Artificial Text
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
The Illusion of the Illusion of the Illusion of Thinking
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
The Impact of AI-Generated Text on the Internet
The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
The Insanity of Relying on Vector Embeddings: Why RAG Fails
The Invisible Leash: Why RLVR May Not Escape Its Origin
The Labor Market Effects of Generative Artificial Intelligence
The Levers of Political Persuasion with Conversational AI
The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
The Method of Critical AI Studies, A Propaedeutic
The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
The Netflix Recommender System: Algorithms, Business Value, and Innovation
The Partner Modelling Questionnaire: A validated self-report measure of perceptions toward machines as dialogue partners
The persuasive effects of political microtargeting in the age of generative artificial intelligence
The Place of Emotion in Argument
The Prompt Report: A Systematic Survey of Prompting Techniques
The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
The Serial Scaling Hypothesis
The social component of the projection behavior of clausal complement contents
The state of enterprise AI
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
The Thin Line Between Comprehension and Persuasion in LLMs
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities
The Unreasonable Ineffectiveness of the Deeper Layers
The Vanishing Gradient Problem for Stiff Neural Differential Equations
The Vector Grounding Problem
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models
Theory of Knowledge Based on the Idea of the Discursive Space
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
Think before you speak: Training Language Models With Pause Tokens
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph
Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering
Thinking Augmented Pre-training
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
Thinking LLMs: General Instruction Following with Thought Generation
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender
Thinkless: LLM Learns When to Think
Thought Anchors: Which LLM Reasoning Steps Matter?
Thought Communication in Multiagent Collaboration
Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration
Tina: Tiny Reasoning Models via LoRA
Titans: Learning to Memorize at Test Time
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To Tell The Truth: Language of Deception and Language Models
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
Topic Modeling in Embedding Spaces
Topic Shift Detection for Mixed Initiative Response
Topic-Guided Conversational Recommender in Multiple Domains
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Toward Conversational Agents with Context and Time Sensitive Long-term Memory
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Toward understanding and preventing misalignment generalization
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models
Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
Towards a Science of Scaling Agent Systems
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Towards Algorithmic Experience
Towards Collective Superintelligence, a Pilot Study
Towards Conversational Recommendation over Multi-Type Dialogs
Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
Towards Healthy AI: Large Language Models Need Therapists Too
Towards Human-centered Proactive Conversational Agents
Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities
Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Towards Question-based Recommender Systems
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Training a Generally Curious Agent
Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Training language models to be warm and empathetic makes them less reliable and more sycophantic
Training language models to follow instructions with human feedback
Training Language Models to Self-Correct via Reinforcement Learning
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Training Large Language Models to Reason in a Continuous Latent Space
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training-Free Group Relative Policy Optimization
Transcendence: Generative Models Can Outperform The Experts That Train Them
Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
Transformer2: Self-adaptive LLMs
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree Search for Language Model Agents
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods
TrustLLM: Trustworthiness in Large Language Models
Truth or lie: Exploring the language of deception
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Tube2Vec: Social and Semantic Embeddings of YouTube Channels
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Tuning Language Models by Proxy
Turiya at DialAM-2024: Inference Anchoring Theory Based LLM Parsers
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion
Turning large language models into cognitive models
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering
UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting
Understanding Hidden Computations in Chain-of-Thought Reasoning
Understanding the Role of User Profile in the Personalization of Large Language Models
Understanding the Therapeutic Relationship between Counselors and Clients in Online Text-based Counseling using LLMs
Understanding, explaining, and utilizing medical artificial intelligence
Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning
Unifying Large Language Models and Knowledge Graphs: A Roadmap
Unifying Nearest Neighbors Collaborative Filtering
UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs
Universe of Thoughts: Enabling Creative Reasoning with Large Language Models
Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Unsupervised Elicitation of Language Models
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
UR2: Unify RAG and Reasoning through Reinforcement Learning
User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
User-Centric Conversational Recommendation with Multi-Aspect User Modeling
UserBench: An Interactive Gym Environment for User-Centric Agents
Using Computational Models to Test Syntactic Learnability
Using Large Language Models to Create AI Personas for Replication and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings
Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Using Linguistic Synchrony to Evaluate Large Language Models for Cognitive Behavioral Therapy
Using LLMs to Discover Legal Factors
Using Natural Language for Reward Shaping in Reinforcement Learning
Using Navigation to Improve Recommendations in Real-Time
Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Variational Autoencoders for Collaborative Filtering
VCounselor: A Psychological Intervention Chat Agent Based on a Knowledge-Enhanced Large Language Model
Verbal lie detection using Large Language Models
Virtual Assistance in Any Context
Virtuous Machines: Towards Artificial General Science
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Voxtral
Voyager: An Open-Ended Embodied Agent with Large Language Models
We Are All Creators: Generative AI, Collective Knowledge, and the Path Towards Human-AI Synergy
We Wont be Missed: Work and Growth in the Era of AGI
Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation
Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics
Weight-sparse transformers have interpretable circuits
We’re Afraid Language Models Aren’t Modeling Ambiguity
What are the Goals of Distributional Semantics?
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
What does it mean to understand language?
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
What is a Discourse Graph?
What Makes a Good Natural Language Prompt?
What the F*ck Is Artificial General Intelligence?
What we talk to when we talk to language models
When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
When Large Language Models are More Persuasive Than Incentivized Humans, and Why
When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour
When More is Less: Understanding Chain-of-Thought Length in LLMs
When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Who’s Afraid of (Left) Hyperstitions
Why Do Multi-agent LLM Systems Fail?
Why Do People Rate? Theory and Evidence on Online Ratings
Why Do Some Language Models Fake Alignment While Others Don't?
Wide & Deep Learning for Recommender Systems
Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
Witness
Word Meanings in Transformer Language Models
Working Alliance Transformer for Psychotherapy Dialogue Classification
Working with AI: Measuring the Occupational Implications of Generative AI
Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models
Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards
You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Zero-Shot Verification-guided Chain of Thoughts
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
“Hello There! Is Now a Good Time to Talk?”: Opportune Moments for Proactive Interactions with Smart Speakers
“It Felt Like Having a Second Mind”: Investigating Human-AI Co-creativity in Prewriting with Large Language Models
“Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations
“Understanding AI”: Semantic Grounding in Large Language Models
“What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning