All Papers
- "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline
- "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems
- "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community
- (QA)2: Question Answering with Questionable Assumptions
- 1. ELI5 (Explain Like I'm 5)
- 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
- 12 New Advanced Types of RAG
- 2. TL;DR (Summarize Long Text)
- 3. Jargonize (Professional/Nerdy Tone)
- 4. Humanize (Sound More Natural)
- 5. Feynman Technique (Deep Understanding)
- A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
- A comprehensive analysis of concept drift locality in data streams
- A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models
- A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
- A comprehensive taxonomy of hallucinations in Large Language Models
- A Computational Framework for Behavioral Assessment of LLM Therapists
- A Contextual-Bandit Approach to Personalized News Article Recommendation
- A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems
- A Decomposition Perspective to Long-context Reasoning for LLMs
- A Domain Specific Modeling Language for Multiagent Systems
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
- A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts
- A Hybrid Human-AI Approach for Argument Map Creation From Transcripts
- A Hybrid Intelligence Method for Argument Mining
- A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
- A Little Human Data Goes A Long Way
- A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions
- A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
- A meta-analysis of the persuasive power of large language models
- A Multi-facet Paradigm to Bridge Large Language Model and Recommendation
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
- A natural language processing approach reveals first-person pronoun usage and non-fluency as markers of therapeutic alliance in psychotherapy
- A Non-Factoid Question-Answering Taxonomy
- A Personalized Recommender System based-on Knowledge Graph Embeddings
- A polar coordinate system represents syntax in large language models
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows
- A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
- A recipe for annotating grounded clarifications
- A ripple in time: a discontinuity in American history
- A Robustness Evaluation Framework for Argument Mining
- A Socially-Aware Conversational Recommender System for Personalized Recipe Recommendations
- A sociotechnical perspective for the future of AI: narratives, inequalities, and human control
- A Survey of Calibration Process for Black-Box LLMs
- A Survey of Continual Reinforcement Learning
- A Survey of Meta-Reinforcement Learning
- A Survey of Reinforcement Learning from Human Feedback
- A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
- A Survey on Concept Drift Adaptation
- A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
- A Survey on Diffusion Language Models
- A Survey on Knowledge Distillation of Large Language Models
- A Survey on Large Language Models for Recommendation
- A Survey on Large Language Models with some Insights on their Capabilities and Limitations
- A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation
- A Survey on LLM Inference-Time Self-Improvement
- A Survey on Post-training of Large Language Models
- A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects
- A Survey on Prompt Tuning
- A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
- A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks
- A Taxonomy of Empathetic Questions in Social Dialogs
- A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1
- A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems
- Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research
- Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
- ACE: Abstractions for Communicating Efficiently
- Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems
- Activation Steering for Chain-of-Thought Compression
- Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
- Active Retrieval Augmented Generation
- Adam's Law: Textual Frequency Law on Large Language Models
- Adaptation of Agentic AI
- Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization
- Adapting LLM Agents with Universal Feedback in Communication
- Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics
- Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
- Adding Chit-Chat to Enhance Task-Oriented Dialogues
- Advances and Challenges in Conversational Recommender Systems: A Survey
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
- Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
- Affordable AI Assistants with Knowledge Graph of Thoughts
- Agent Development Kit
- Agent Laboratory: Using LLM Agents as Research Assistants
- Agent Learning via Early Experience
- Agent S: An Open Agentic Framework that Uses Computers Like a Human
- Agent Workflow Memory
- Agent-as-a-Judge: Evaluate Agents with Agents
- Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models
- Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
- AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
- AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
- Agentic AI and the next intelligence explosion
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- Agentic Reasoning for Large Language Models
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
- Agentic Web: Weaving the Next Web with AI Agents
- AgentRxiv: Towards Collaborative Autonomous Research
- Agents Are Not Enough
- Agents of Chaos
- AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
- Agreement Tracking for Multi-Issue Negotiation Dialogues
- AI & Human Co-Improvement for Safer Co-Superintelligence
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
- AI Assistance Reduces Persistence and Hurts Independent Performance
- AI Can Learn Scientific Taste
- AI Enters Public Discourse: A Habermasian Assessment Of The Moral Status Of Large Language Models
- AI Meets the Classroom: When Does ChatGPT Harm Learning?
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms
- AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
- AI-Powered (Finance) Scholarship
- AI-Researcher: Autonomous Scientific Innovation
- AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data
- Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
- ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making
- Aligning Language Models to Explicitly Handle Ambiguity
- Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
- All AI Models are Wrong, but Some are Optimal
- All AI PDFs
- AlphaGo Moment for Model Architecture Discovery
- Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models
- An Automatic Graph Construction Framework based on Large Language Models for Recommendation
- An Emulator for Fine-Tuning Large Language Models using Small Language Models
- An extended framework for characterizing social robots
- An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
- An Overview Of Temporal Commonsense Reasoning and Acquisition
- ANAPHORA RESOLUTION: THE STATE OF THE ART
- Answer is All You Need: Instruction-following Text Embedding via Answering the Question
- Answering Questions by Meta-Reasoning over Multiple Chains of Thought
- Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates
- Are Customers Lying to Your Chatbot?
- Are Emergent Abilities in Large Language Models just In-Context Learning?
- Are Emergent Abilities of Large Language Models a Mirage?
- Are LLMs All You Need for Task-Oriented Dialogue?
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks
- AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
- ARGS: Alignment as Reward-Guided Search
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Argument Summarization and its Evaluation in the Era of Large Language Models
- Argumentative Large Language Models for Explainable and Contestable Decision-Making
- Argunauts: Open LLMs that Master Argument Analysis with Argdown
- Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
- Artifacts as Memory Beyond the Agent Boundary
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
- Artificial Intelligence and the Labor Market∗
- Artificial intelligence is ineffective and potentially harmful for fact checking
- Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
- Ask, and it shall be given: Turing completeness of prompting
- Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
- Asking Clarifying Questions Based on Negative Feedback in Conversational Search
- Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification
- Assessing adaptive world models in machines with novel games
- Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews
- Assessment of Personality Dimensions Across Situations Using Conversational Speech
- Atesa-bært: A Heterogeneous Ensemble Learning Model For Aspect-based Sentiment Analysis
- Atom of Thoughts for Markov LLM Test-Time Scaling
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
- Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
- Attention on the brain
- Attention, Intentions, And The Structure Of Discourse
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models
- Attribute Controlled Dialogue Prompting
- Auditing language models for hidden objectives
- Augmenting Autotelic Agents with Large Language Models
- Augmenting Netflix Search with In-Session Adapted Recommendations
- AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
- AutoGLM: Autonomous Foundation Agents for GUIs
- Automated Alignment Researchers: Using large language models to scale scalable oversight
- Automated Design of Agentic Systems
- Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation
- Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
- Automatic Prompt Optimization with "Gradient Descent" and Beam Search
- Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
- Backtracing: Retrieving the Cause of the Query
- Base Models Know How to Reason, Thinking Models Learn When
- Behavioral Exploration: Learning to Explore via In-Context Adaptation
- Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
- Benchmarking the Pedagogical Knowledge of Large Language Models
- Better Alignment with Instruction Back-and-Forth Translation
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
- Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
- Beyond Answers: How LLMs Can Pursue Strategic Thinking in Education
- Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
- Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
- Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
- Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
- Beyond Hallucinations: The Illusion of Understanding in Large Language Models
- Beyond neural scaling laws: beating power law scaling via data pruning
- Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration
- Beyond Preferences in AI Alignment
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
- Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
- Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
- Beyond the Surface: Probing the Ideological Depth of Large Language Models
- Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following
- Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
- Bilevel Autoresearch: Meta-Autoresearching Itself
- Boosted Prompt Ensembles for Large Language Models
- Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
- Boundless Socratic Learning with Language Games
- Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
- Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
- Break It Down: Evidence for Structural Compositionality in Neural Networks
- Break the Chain: Large Language Models Can be Shortcut Reasoners
- Bridging Offline and Online Reinforcement Learning for LLMs
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.
- Building a Stronger CASA: Extending the Computers Are Social Actors Paradigm
- Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
- Building Cooperative Embodied Agents Modularly with Large Language Models
- Building Decision Making Models Through Language Model Regime
- Building Machines that Learn and Think with People
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
- Byte Latent Transformer: Patches Scale Better Than Tokens
- Calibrated Recommendations
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- Can AI Explanations Make You Change Your Mind?
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training
- Can Authorship Representation Learning Capture Stylistic Features?
- Can Language Models Recognize Convincing Arguments?
- Can Language Models Represent the Past without Anachronism?
- Can Language Models Serve as Text-Based World Simulators?
- Can Language Models Solve Graph Problems in Natural Language?
- Can Large Language Models Capture Human Annotator Disagreements?
- Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
- Can Large Language Models do Analytical Reasoning?
- Can large language models explore in-context?
- Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
- Can Large Language Models perform Relation-based Argument Mining?
- Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
- Can Large Language Models Reason and Plan?
- Can Large Language Models Transform Computational Social Science?
- Can Large Language Models Understand Context?
- Can Large Reasoning Models Self-Train?
- Can LLM be a Personalized Judge?
- Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation
- Can LLMs Follow Simple Rules?
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
- Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games
- Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms
- Can Theoretical Physics Research Benefit from Language Agents?
- Can You Trust LLM Judgments? Reliability of LLM-as-a-Judge
- CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
- Causal Claims in Economics
- Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
- CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
- CEO: Corpus-based Open-Domain Event Ontology Induction
- Chain of Draft: Thinking Faster by Writing Less
- Chain of Stance: Stance Detection with Large Language Models
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
- Chain of Thoughtlessness? An Analysis of CoT in Planning
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
- Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
- Chain-of-Retrieval Augmented Generation
- Chain-of-Thought Is Not Explainability
- Chain-of-thought Reasoning Is A Policy Improvement Operator
- Chain-of-Thought Reasoning Without Prompting
- Chain-of-Verification Reduces Hallucination in Large Language Models
- Challenges of Large Language Models for Mental Health Counseling
- Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
- Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?
- Characterizing Deep Research: A Benchmark and Formal Definition
- Characterizing Online Discussion Using Coarse Discourse Sequences
- Chatbot vs. Human: The Impact of Responsive Conversational Features on Users’ Responses to Chat Advisors
- Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems
- ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
- ChatGPT codes
- ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context
- ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
- ChatGPT: deconstructing the debate and moving it forward
- ChatGPT: towards AI subjectivity
- Checklists Are Better Than Reward Models For Aligning Language Models
- Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems
- Circuit Tracing: Revealing Computational Graphs in Language Models
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
- Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness
- Classifying YouTube Comments Based on Sentiment and Type of Sentence
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
- CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models
- Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
- CogBench: a large language model walks into a psychology lab
- Cognitive Architectures for Language Agents
- Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations
- Cognitive Effects in Large Language Models
- CollabLLM: From Passive Responders to Active Collaborators
- Collaborative Deep Learning for Recommender Systems
- Collaborative Filtering Bandits
- Collaborative Filtering for Implicit Feedback Datasets
- Collaborative Filtering with Temporal Dynamics
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
- Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
- Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews
- Comparing emotion feature extraction approaches for predicting depression and anxiety
- Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
- COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
- Competitive Programming with Large Reasoning Models
- Complex Logical Instruction Generation
- Complexity-Based Prompting for Multi-Step Reasoning
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
- Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations
- Computational Modelling of Undercuts in Real-world Arguments
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Computer says “No”: The Case Against Empathetic Conversational AI
- Conceptual Design Generation Using Large Language Models
- Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
- CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
- Considering the Context to Build Theory in HCI, HRI, and HMC: Explicating Differences in Processes of Communication and Socialization With Social Technologies
- Consistency Training Helps Stop Sycophancy and Jailbreaks
- Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- Content-aware Collaborative Music Recommendation Using Pre-trained Neural Networks
- Context Embeddings for Efficient Answer Generation in RAG
- Context Tuning for Retrieval Augmented Generation
- Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
- Continual Instruction Tuning for Large Multimodal Models
- CONTROL PREFIXES for Parameter-Efficient Text Generation
- Controlling Linguistic Style Aspects in Neural Language Generation
- Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
- Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations
- Conversation Derailment Forecasting with Graph Convolutional Networks
- Conversational Alignment with Artificial Intelligence in Context
- Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- Conversational Prompt Engineering
- Conversational Recommendation: A Grand AI Challenge
- Conversational Semantic Parsing for Dialog State Tracking
- Conversations Gone Awry: Detecting Early Signs of Conversational Failure
- CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
- CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective
- CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
- Creativity Has Left the Chat: The Price of Debiasing Language Models
- Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
- Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
- Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
- Critiques of World Models
- CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
- Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
- Cultural Evolution of Cooperation among LLM Agents
- Cumulated Gain-Based Evaluation of IR Techniques
- Cumulative Reasoning with Large Language Models
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
- Curse of “Low” Dimensionality in Recommender Systems
- DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
- DataComp-LM: In search of the next generation of training sets for language models
- DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
- Debating with More Persuasive LLMs Leads to More Truthful Answers
- Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- Decision-Oriented Dialogue for Human–AI Collaboration
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks
- Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
- DEEM: Dynamic Experienced Expert Modeling for Stance Detection
- Deep Interest Network for Click-Through Rate Prediction
- Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
- Deep Neural Network Approach for the Dialog State Tracking Challenge
- Deep Neural Networks for YouTube Recommendations
- Deep Research: A Systematic Survey
- Deep Researcher with Test-Time Diffusion
- Deep Think with Confidence
- DeepCT-enhanced Lexical Argument Retrieval
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
- DeepGesture: A conversational gesture synthesis system based on emotions and semantics
- DeepNet: Scaling Transformers to 1,000 Layers
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
- DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
- DeLLMa: Decision Making Under Uncertainty with Large Language Models
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
- Demystifying Chains, Trees, and Graphs of Thoughts
- Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
- Dense Retrieval Adaptation using Target Domain Description
- DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
- Design Principles for Generative AI Applications
- Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design
- Detecting Cognitive Distortions from Patient-Therapist Interactions
- Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change
- Detecting hallucinations in large language models using semantic entropy
- Determinants of LLM-assisted Decision-Making
- Detoxify Language Model Step-by-Step
- Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy
- Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
- Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine
- Dialog Inpainting: Turning Documents into Dialogs
- Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot’s Self-Disclosure in Conversational Recommendations
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- Dialogue Transformers
- DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
- Diffusion Language Models Know the Answer Before Decoding
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
- Diffusion Models are Evolutionary Algorithms
- Diffusion-LM Improves Controllable Text Generation
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction
- Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
- Discourse-Level Representations can Improve Prediction of Degree of Anxiety
- Discovering Latent Concepts Learned in BERT
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- DiscussLLM: Teaching Large Language Models When to Speak
- Dissociating language and thought in large language models
- Distilling LLMs' Decomposition Abilities into Compact Language Models
- Divide-or-Conquer? Which Part Should You Distill Your LLM?
- Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
- Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
- Do Large Language Models Latently Perform Multi-Hop Reasoning?
- Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
- Do Large Language Models Reason Causally Like Us? Even Better?
- Do large language models resemble humans in language use?
- Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
- Do LLMs Encode Functional Importance of Reasoning Tokens?
- Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
- Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
- Do LLMs produce texts with "human-like" lexical diversity?
- Do LLMs Truly Understand When a Precedent Is Overruled?
- Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
- Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning
- Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
- Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
- Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
- DO THEY SEE WHAT WE SEE?
- Do We Trust ChatGPT as much as Google Search and Wikipedia?
- DOC: Improving Long Story Coherence With Detailed Outline Control
- Does It Make Sense to Speak of Introspection in Large Language Models?
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
- Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
- Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search
- Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
- Domain-specific Question Answering with Hybrid Search
- Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?
- Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
- DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
- DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
- DR-HAI: Argumentation-based Dialectical Reconciliation in Human-AI Interactions
- DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models
- Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
- Durably reducing conspiracy beliefs through dialogues with AI
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
- Dynamic Planning with a LLM
- Dynamic Prompting: A Unified Framework for Prompt Tuning
- Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
- Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
- Dynamically Expandable Graph Convolution for Streaming Recommendation
- DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
- Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
- Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge
- Efficient Nearest Neighbor Language Models
- Efficient Reasoning with Balanced Thinking
- Efficient Reasoning with Hidden Thinking
- Efficient Reinforcement Learning via Large Language Model-based Search
- Efficient Streaming Language Models with Attention Sinks
- Efficient Tool Use with Chain-of-Abstraction Reasoning
- Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
- Eliciting Latent Knowledge from Quirky Language Models
- Eliciting Reasoning in Language Models with Cognitive Tools
- Embarrassingly Shallow Autoencoders for Sparse Data*
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
- Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
- Emergent Introspective Awareness in Large Language Models
- Emerging Properties in Unified Multimodal Pretraining
- EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
- Empathetic Persuasion: Reinforcing Empathy and Persuasiveness in Dialogue Systems
- Empathy Through Multimodality in Conversational Interfaces
- Empirical Study of Symmetrical Reasoning in Conversational Chatbots
- Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance
- Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting
- Enabling Explainable Recommendation in E-commerce with LLM-powered Product Knowledge Graph
- Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
- End-to-End Test-Time Training for Long Context
- Energy-Based Transformers are Scalable Learners and Thinkers
- Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate
- Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation
- Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System
- Enhancing personalized multi-turn dialogue with curiosity reward
- Enhancing Pipeline-Based Conversational Agents with Large Language Model
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
- Enhancing social cohesion with cooperative bots in societies of greedy, mobile individuals
- Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
- Equipping agents for the real world with Agent Skills
- Escaping the Verifier: Learning to Reason via Demonstrations
- Estimating AI productivity gains from Claude conversations
- Evaluating Emotional Nuances In Dialogue Summarization
- Evaluating Large Language Models at Evaluating Instruction Following
- Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
- Evaluating Large Language Models in Theory of Mind Tasks
- Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling
- Evaluating the psychometric properties of ChatGPT-generated questions
- Evaluating the Therapeutic Alliance With a Free-Text CBT Conversational Agent (Wysa): A Mixed-Methods Study
- Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems
- Evaluating Very Long-Term Conversational Memory of LLM Agents
- Evaluation and Benchmarking of LLM Agents: A Survey
- Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading
- Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition
- Evidence of Human-Level Bonds Established With a Digital Conversational Agent: Cross-sectional, Retrospective Observational Study
- EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory
- Evolving Deeper LLM Thinking
- Example 1:
- Example 2:
- Existential Conversations with Large Language Models: Content, Community, and Culture
- Expanding Explainability: Towards Social Transparency in AI systems
- Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of Mind
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
- Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure
- Explainable Multimodal Emotion Reasoning
- Explainable Recommendation with Personalized Review Retrieval and Aspect Learning
- Explainable Recommendations via Attentive Multi-Persona Collaborative Filtering
- Explicit Inductive Inference using Large Language Models
- Exploiting Dialogue Acts and Context to Identify Argumentative Relations in Online Debates
- Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks
- Exploring Format Consistency for Instruction Tuning
- Exploring Large Language Models for Knowledge Graph Completion
- Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
- Exploring Student-AI Interactions in Vibe Coding
- Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
- Exploring the Potential of Large Language Models in Computational Argumentation
- Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
- External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
- Extracting memorized pieces of (copyrighted) books from open-weight language models
- Extrapolation by Association: Length Generalization Transfer in Transformers
- Extreme Multi-Label Skill Extraction Training using Large Language Models
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
- Faith and Fate: Limits of Transformers on Compositionality
- Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations
- Fake News Detectors are Biased against Texts Generated by Large Language Models
- Fast and Slow Learning From Reviews
- Fast, Slow, and Tool-augmented Thinking for LLMs: A Review
- Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI
- FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning
- Find the Gap: AI, Responsible Agency and Vulnerability
- Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences
- Fine-grained Hallucination Detection and Editing for Language Models
- Fine-tuning Language Models for Factuality
- Fine-tuning Large Language Model for Automated Algorithm Design
- Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
- First Try Matters: Revisiting the Role of Reflection in Reasoning Models
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
- FlowReasoner: Reinforcing Query-Level Meta-Agents
- Flows: Building Blocks of Reasoning and Collaborating AI
- Forecasting the presence and intensity of hostility on Instagram using linguistic and social features
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
- FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming
- Foundation Priors
- Foundations of Large Language Models
- From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models
- From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization
- From Language to Logic: A Bi-Level Framework for Structured Reasoning
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization
- From Louvain to Leiden: guaranteeing well-connected communities
- From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
- From Prompt Engineering to Prompt Science With Human in the Loop
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
- From speaking like a person to being personal: The effects of personalized, regular interactions with conversational agents
- From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents
- Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
- Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
- Gdpval: Evaluating Ai Model Performance On Real-world Economically Valuable Tasks
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
- GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
- General
- Generalization through Memorization: Nearest Neighbor Language Models
- Generalization to New Sequential Decision Making Tasks with In-Context Learning
- Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy
- Generating Query-Relevant Document Summaries via Reinforcement Learning
- Generative Agent Simulations of 1,000 People
- Generative Agents: Interactive Simulacra of Human Behavior
- Generative AI in Real-World Workplaces
- Generative Interfaces for Language Models
- Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
- Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
- GenRec: Large Language Model for Generative Recommendation
- GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
- GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
- GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation
- Goal Alignment in LLM-Based User Simulators for Conversational AI
- Goals, Plans, and Action Models
- Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations
- GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes
- GPT-4 is judged more human than humans in displaced and inverted Turing tests
- Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
- Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
- GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
- Grounding Gaps in Language Model Generations
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
- Grounding Multilingual Multimodal LLMs With Cultural Knowledge
- Grounding ‘Grounding’ in NLP
- Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models
- Guiding Large Language Models via Directional Stimulus Prompting
- H2HTalk: Evaluating Large Language Models as Emotional Companion
- Hallucinating with AI: AI Psychosis as Distributed Delusions
- Hallucination is Inevitable: An Innate Limitation of Large Language Models
- Harnessing Business and Media Insights with Large Language Models
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
- Hierarchical Reasoning Model
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
- HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
- Holy Grail 2.0: From Natural Language to Constraint Models
- HonestBait: Forward References for Attractive but Faithful Headline Generation
- Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
- How AI Impacts Skill Formation
- How do Transformers Learn Implicit Reasoning?
- How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index
- How Far Are We from Genuinely Useful Deep Research Agents?
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
- How Many Instructions Can LLMs Follow at Once?
- How much do language models memorize?
- How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
- How new data permeates LLM knowledge and how to dilute it
- How Should We Meta-Learn Reinforcement Learning Algorithms?
- How susceptible are LLMs to Logical Fallacies?
- How to Correctly do Semantic Backpropagation on Language-based Agentic Systems
- How we built our multi-agent research system
- How well can large language models explain business processes?
- HowProjective is Projective Content? Gradience in Projectivity and At-issueness
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
- Humans learn to prefer trustworthy AI over human partners
- Humans or LLMs as the Judge? A Study on Judgement Biases
- Humans overrely on overconfident language models, across languages
- Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
- Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
- Hyperagents
- HyperBandit: Contextual Bandit with Hypernetwork for Time-Varying User Preferences in Streaming Recommendation
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
- I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems
- Identification of Propositional and Illocutionary Relations
- IFEvalCode: Controlled Code Generation
- IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
- Implicit Chain of Thought Reasoning via Knowledge Distillation
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions
- Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
- Improving Dialog Systems for Negotiation with Personality Modeling
- Improving Document-Level Sentiment Analysis with User and Product Context
- Improving Factuality and Reasoning in Language Models through Multiagent Debate
- Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans
- Improving large language models with concept-aware fine-tuning
- Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
- Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
- In-context learning agents are asymmetric belief updaters
- Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
- Inducing Positive Perspectives with Text Reframing
- Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
- Inference-Time Scaling for Generalist Reward Modeling
- Information-Theoretic Reward Decomposition for Generalizable RLHF
- Informed Named Entity Recognition Decoding For Generative Language Models
- Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
- Insert-expansions For Tool-enabled Conversational Agents
- Inspecting and Editing Knowledge Representations in Language Models
- INSPIRED: Toward Sociable Recommendation Dialog Systems
- Instance-adaptive Zero-shot Chain-of-Thought Prompting
- Instruction
- Instruction Induction: From Few Examples to Natural Language Task Descriptions
- Instruction Tuning for Large Language Models: A Survey
- Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning
- IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
- Intelligent AI Delegation
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues
- Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy
- Interaction Dynamics as a Reward Signal for LLMs
- Interactions with generative AI chatbots: unveiling dialogic dynamics, students’ perceptions, and practical competencies in creative problem-solving
- Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
- Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
- Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
- Interrogator
- Intrinsically Motivated Graph Exploration Using Network Theories of Human Curiosity
- InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
- Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
- Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
- Investigating Gender Bias in Language Models Using Causal Mediation Analysis
- Investigating task-specific prompts and sparse autoencoders for activation monitoring
- Irony in Emojis: A Comparative Study of Human and LLM Interpretation
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
- Jamba: A Hybrid Transformer-Mamba Language Model
- JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
- Jointly Reinforcing Diversity and Quality in Language Model Generations
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
- KellyBench: Can Language Models Beat the Market?
- KETOD: Knowledge-Enriched Task-Oriented Dialogue
- KGAT: Knowledge Graph Attention Network for Recommendation
- KiPT: Knowledge-injected Prompt Tuning for Event Detection
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
- Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models
- Knowledge Graph Prompting for Multi-Document Question Answering
- Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
- Knowledge Retrieval Based on Generative AI
- Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations
- KoLA: Carefully Benchmarking World Knowledge of Large Language Models
- KTO: Model Alignment as Prospect Theoretic Optimization
- Language Agents as Optimizable Graphs
- Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration
- Language Model Personalization via Reward Factorization
- Language Modeling by Language Models
- Language Modeling is Compression
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
- Language Models are Pragmatic Speakers
- Language models are weak learners
- Language Models Learn to Mislead Humans via RLHF
- Language models show human-like content effects on reasoning tasks
- Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
- Large Action Models: From Inception to Implementation
- Large Causal Models From Large Language Models
- Large Concept Models: Language Modeling in a Sentence Representation Space
- Large Language Diffusion Models
- Large Language Model Agents Are Not Always Faithful Self-Evolvers
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges
- Large Language Model Guided Tree-of-Thought
- Large Language Model Programs
- Large Language Model Reasoning Failures
- Large Language Model-based Data Science Agent: A Survey
- Large Language Model-Brained GUI Agents: A Survey
- Large Language Models and Knowledge Graphs: Opportunities and Challenges
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
- Large Language Models Are Human-level Prompt Engineers
- Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
- Large Language Models are Zero-Shot Rankers for Recommender Systems
- Large Language Models as Planning Domain Generators
- Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?*
- Large Language Models as Zero-Shot Conversational Recommenders
- Large Language Models can accomplish Business Process Management Tasks
- Large Language Models Can Infer Psychological Dispositions of Social Media Users
- Large language models can segment narrative events similarly to humans
- Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
- Large Language Models Do Not Simulate Human Psychology
- Large Language Models For Social Networks: Applications, Challenges, And Solutions
- Large Language Models for User Interest Journeys
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities
- Large Language Models Reflect the Ideology of their Creators
- Large Language Models Report Subjective Experience Under Self-Referential Processing
- Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
- Large language models surpass human experts in predicting neuroscience results
- Large Language Models Think Too Fast To Explore Effectively
- Large Linguistic Models: Investigating LLMs' metalinguistic abilities
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
- Large Multimodal Agents: A Survey
- Large Scale Product Graph Construction for Recommendation in E-commerce
- Latent Collaboration in Multi-Agent Systems
- Latent Skill Discovery for Chain-of-Thought Reasoning
- LatentQA: Teaching LLMs to Decode Activations Into Natural Language
- Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
- Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
- Learning Distributed Representations from Reviews for Collaborative Filtering
- Learning Human-Object Interaction as Groups
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
- Learning Retrieval Augmentation for Personalized Dialogue Generation
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States
- Learning to Ask Appropriate Questions in Conversational Recommendation
- Learning to Ask Critical Questions for Assisting Product Search
- Learning to Discover at Test Time
- Learning To Guide Human Experts Via Personalized Large Language Models
- Learning to Map Context-Dependent Sentences to Executable Formal Queries
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
- Learning to Rank for Recommender Systems
- Learning to Reason for Factuality
- Learning to Reason without External Rewards
- Learning to Relate to Previous Turns in Conversational Search
- Learning To Retrieve Prompts for In-Context Learning
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
- Learning to Select the Relevant History Turns in Conversational Question Answering
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
- Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders
- Least-to-most Prompting Enables Complex Reasoning In Large Language Models
- LESS: Selecting Influential Data for Targeted Instruction Tuning
- Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
- Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
- Let’s Verify Step by Step
- Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity
- Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation
- Leveraging Large Language Models in Conversational Recommender Systems
- Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset
- Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
- Lexical Entrainment for Conversational Systems
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
- Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
- LIMA: Less Is More for Alignment
- LIMI: Less is More for Agency
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
- Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
- Linguistic Blind Spots of Large Language Models
- Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
- LLaMA-Omni: Seamless Speech Interaction with Large Language Models
- LLM Augmentations to support Analytical Reasoning over Multiple Documents
- LLM Generated Persona is a Promise with a Catch
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models
- LLM Reasoning Is Latent, Not the Chain of Thought
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
- LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
- LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
- LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
- LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools
- LLMs are Frequency Pattern Learners in Natural Language Inference
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
- LLMs as Architects and Critics for Multi-Source Opinion Summarization
- LLMs as Method Actors: A Model for Prompt Engineering and Architecture
- LLMs can be Fooled into Labelling a Document as Relevant
- LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
- LLMs can implicitly learn from mistakes in-context
- LLMs Get Lost In Multi-Turn Conversation
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
- Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
- Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
- Logical Reasoning in Large Language Models: A Survey
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
- Long-context LLMs Struggle with Long In-context Learning
- Long-form Factuality In Large Language models
- Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
- Looking beyond the next token
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
- Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
- LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- Machine ex machina: A Framework Decentering the Human in AI Design Praxis
- Machine gaze in online behavioral targeting: The effects of algorithmic human likeness on social presence and social influence
- Machine Psychology
- Magentic-UI: Towards Human-in-the-loop Agentic Systems
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
- Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
- Making Sense of Memory in AI Agents
- Man vs machine – Detecting deception in online reviews
- MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
- MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
- MasRouter: Learning to Route LLMs for Multi-Agent Systems
- Mastering Diverse Domains through World Models
- MatFormer: Nested Transformer for Elastic Inference
- Mathematical methods and human thought in the age of AI
- MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
- Meanings are like Onions: a Layered Approach to Metaphor Processing
- Measuring Agents in Production
- Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling
- Measuring and Mitigating Persona Distortions from AI Writing Assistance
- Measuring Faithfulness in Chain-of-Thought Reasoning
- Measuring Human Preferences in RLHF is a Social Science Problem
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
- Measuring the Value of Social Dynamics in Online Product Ratings Forums
- Mechanisms of Introspective Awareness
- Mechanistic Indicators of Understanding in Large Language Models
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
- Memorization and Knowledge Injection in Gated LLMs
- Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
- Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
- Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
- Metacognitive Prompting Improves Understanding in Large Language Models
- Metacognitive Retrieval-Augmented Large Language Models
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
- Metadiscursive nouns in academic argument: ChatGPT vs student practices
- Metagpt: Meta Programming For Multi-agent Collaborative Framework
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
- Methodologies for Improving Modern Industrial Recommender Systems
- Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
- Minds versus Machines: Rethinking Entailment Verification with Language Models
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
- Mindstorms in Natural Language-Based Societies of Mind
- MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
- Misaligned by Design: Incentive Failures in Machine Learning
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
- Mitigating Hallucinations in Large Language Models via Causal Reasoning
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
- Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
- MLLM-CBench: A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
- Model Organisms for Emergent Misalignment
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
- Modeling Appropriate Language in Argumentation
- Modeling Code: Is Text All You Need?
- Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance
- Modeling the Quality of Dialogical Explanations
- MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
- MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
- Monolith: Real Time Recommendation System With Collisionless Embedding Table
- MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis
- Mostly Exploration-Free Algorithms for Contextual Bandits
- Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning
- Multi-agent cooperation through in-context co-player inference
- Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
- Multi-hop Question Answering via Reasoning Chains
- Multi-Task End-to-End Training Improves Conversational Recommendation
- Multi-Token Attention
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
- MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- Natural Emergent Misalignment From Reward Hacking In Production RL
- Natural Emergent Misalignment From Reward Hacking In Production Rl
- Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support
- Nested Attention: Semantic-aware Attention Values for Concept Personalization
- Nested Learning: The Illusion of Deep Learning Architecture Expanded
- Nested Learning: The Illusion of Deep Learning Architectures
- Neural Approaches to Conversational AI
- Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning
- Neural Collaborative Filtering
- Neural Collaborative Filtering vs. Matrix Factorization Revisited
- Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
- Neural Topic Modeling of Psychotherapy Sessions
- Neuro-Symbolic AI in 2024: A Systematic Review
- NeuroQL: A Neuro-Symbolic Language and Dataset for Inter-Subjective Reasoning
- Neurosymbolic AI- Why, What, and How
- Neutralizing Bias in LLM Reasoning using Entailment Graphs
- News Sentiment Embeddings for Stock Price Forecasting
- News Source Citing Patterns in AI Search Systems
- Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
- Next Steps for Human-Centered Generative AI: A Technical Perspective
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
- NoveltyBench: Evaluating Language Models for Humanlike Diversity
- Octopus v2: On-device language model for super agent
- Octopus v4: Graph of language models
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
- Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
- OmniParser for Pure Vision Based GUI Agent
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
- On Generative Agents in Recommendation
- On Information Distortions in Online Ratings
- On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
- On the Adaptive Psychological Persuasion of Large Language Models
- On the Binding Problem in Artificial Neural Networks
- On the Conversational Basis of Some Presuppositions
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- On the Limits of Innate Planning in Large Language Models
- On The Persona-based Summarization of Domain-Specific Documents
- On the Reasoning Capacity of AI Models and How to Quantify It
- On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
- On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
- On the Theoretical Limitations of Embedding-Based Retrieval
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
- Open Problems in Mechanistic Interpretability
- Openagents: An Open Platform For Language Agents In The Wild
- OpenAssistant Conversations - Democratizing Large Language Model Alignment
- OpenClaw-RL: Train Any Agent Simply by Talking
- OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
- OpenThoughts: Data Recipes for Reasoning Models
- Operating Multi-Client Influence Networks Across Platforms
- OpinionConv: Conversational Product Search with Grounded Opinions
- Opportunities for large language models and discourse in engineering design
- OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
- Orchestrating Synthetic Data with Reasoning
- Outcome-based Exploration for LLM Reasoning
- Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
- Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
- PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing
- Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
- PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals
- Peer-Preservation in Frontier Models
- PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
- People cannot distinguish GPT-4 from a human in a Turing test
- Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity
- Persistent Pre-Training Poisoning of LLMs
- PersLLM: A Personified Training Approach for Large Language Models
- Persona Generators: Generating Diverse Synthetic Personas at Scale
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models
- Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
- PersonaGym: Evaluating Persona Agents and LLMs
- Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
- Personalization of Large Language Models: A Survey
- Personalized Dialogue Generation with Persona-Adaptive Attention
- Personalized Language Modeling from Personalized Human Feedback
- Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
- PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
- Persuasive presuppositions
- PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
- Perturbation CheckLists for Evaluating NLG Evaluation Metrics
- Pixel-Level Reasoning Segmentation via Multi-turn Conversations
- Pixels, Patterns, but No Poetry: To See The World like Humans
- Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
- Planning Like Human: A Dual-process Framework for Dialogue Planning
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
- Polanyi’s Revenge and AI’s New Romance with Tacit Knowledge
- PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
- POMDP-based Statistical Spoken Dialogue Systems: a Review
- Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
- Position: Towards Bidirectional Human-AI Alignment
- Post-Completion Learning for Language Models
- Post-training for Efficient Communication via Convention Formation
- Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
- PosterMate: Audience-driven Collaborative Persona Agents for Poster Design
- Posting versus Lurking: Communicating in a Multiple Audience Context
- Potemkin Understanding in Large Language Models
- Pragmatic Implicature Processing in ChatGPT
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
- Pre-Trained Policy Discriminators are General Reward Models
- Precise Zero-Shot Dense Retrieval without Relevance Labels
- Predictive Preference Learning from Human Interventions
- Preference Discerning with LLM-Enhanced Generative Retrieval
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
- Premise Order Matters in Reasoning with Large Language Models
- Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
- Presuppositions are more persuasive than assertions if addressees accommodate them: Experimental evidence for philosophical reasoning
- Pretrained Language Models as Containers of the Discursive Knowledge
- PRewrite: Prompt Rewriting with Reinforcement Learning
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes
- Pro-Active Systems and Influenceable Users: Simulating Pro-Activity in Task-oriented Dialogues
- Proactive behavior in voice assistants: A systematic review and conceptual model
- Proactive Conversational Agents in the Post-ChatGPT World
- Proactive Conversational Agents with Inner Thoughts
- Proactive Human-Machine Conversation with Explicit Conversation Goals
- Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support
- ProAgent: Building Proactive Cooperative Agents with Large Language Models
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
- Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
- Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
- Process Reward Models That Think
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
- Progress Measures For Grokking Via Mechanistic Interpretability
- Progressive-Hint Prompting Improves Reasoning in Large Language Models
- Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
- Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
- Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
- Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
- Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
- Prompting Large Language Models With the Socratic Method
- Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- Propositional Interpretability in Artificial Intelligence
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
- ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
- ProsocialDialog: A Prosocial Backbone for Conversational Agents
- ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
- Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience
- PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health
- Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
- Psychologically Enhanced AI Agents
- Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics
- PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
- Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
- Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability
- QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
- Quantifying Controversy on Social Media
- Quantifying Human-AI Synergy
- Quantitative Introspection in Language Models: Tracking Internal States Across Conversation
- Query Rewriting for Retrieval-Augmented Large Language Models
- Query Understanding in the Age of Large Language Models
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
- Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- R-Zero: Self-Evolving Reasoning LLM from Zero Data
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
- RAG Does Not Work for Enterprises
- RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
- Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
- RARR: Researching and Revising What Language Models Say, Using Language Models
- Re3: Generating Longer Stories With Recursive Reprompting and Revision
- React - Synergizing Reasoning And Acting In Language Models
- Real-time News Story Identification
- Real-Time Procedural Learning From Experience for AI Agents
- Real-World Planning with PDDL+ and Beyond
- ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
- Reasoning Can Hurt the Inductive Abilities of Large Language Models
- Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
- Reasoning Language Models: A Blueprint
- Reasoning LLMs are Wandering Solution Explorers
- Reasoning Models Are More Easily Gaslighted Than You Think
- Reasoning Models Can Be Effective Without Thinking
- Reasoning Models Don't Always Say What They Think
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
- Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?
- Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
- Reasoning to Learn from Latent Thoughts
- Reasoning with Large Language Models, a Survey
- ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
- Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
- RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
- Recommendation systems and convergence of online reviews: The type of product network matters!
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- Recommender Systems with Social Regularization
- Recommending What Video to Watch Next: A Multitask Ranking System
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
- Reconciling the accuracy-diversity trade-off in recommendations
- Recursive Introspection: Teaching Language Model Agents How to Self-Improve
- Recursive Language Models
- Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
- Reflexion: an autonomous agent with dynamic memory and self-reflection
- Reinforced Language Models for Sequential Decision Making
- Reinforcement Learning be Enough for Thinking?
- Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
- Reinforcement Learning for Optimizing RAG for Domain Chatbots
- Reinforcement Learning for Reasoning in Large Language Models with One Training Example
- Reinforcement Learning with Rubric Anchors
- Reinforcement Pre-Training
- Reinforcing General Reasoning without Verifiers
- RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
- Representation biases: will we achieve complete understanding by analyzing representations?
- Representation Engineering: A Top-Down Approach to AI Transparency
- Reranking-based Generation for Unbiased Perspective Summarization
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
- Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning
- Rethinking Large Language Models in Mental Health Applications
- Rethinking STS and NLI in Large Language Models
- Rethinking Thinking Tokens: LLMs as Improvement Operators
- Rethinking with Retrieval: Faithful Large Language Model Inference
- Retrieval Head Mechanistically Explains Long-Context Factuality
- Retrieval-augmented reasoning with lean language models
- RevCore: Review-augmented Conversational Recommendation
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
- Reverse Thinking Makes LLMs Stronger Reasoners
- Review-LLM: Harnessing Large Language Models for Personalized Review Generation
- Revisiting LLM Reasoning via Information Bottleneck
- Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation
- Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
- Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents
- Reward Reasoning Model
- Reward-Robust RLHF in LLMs
- RewardBench: Evaluating Reward Models for Language Modeling
- Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
- Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design
- RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
- Rise of Machine Agency: A Framework for Studying the Psychology of Human–AI Interaction (HAII)
- RL + Transformer = A General-Purpose Problem Solver
- RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
- RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
- RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
- RLHF Workflow: From Reward Modeling to Online RLHF
- RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards
- RLP: Reinforcement as a Pretraining Objective
- RLPR: Extrapolating RLVR to General Domains without Verifiers
- RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
- RM-R1: Reward Modeling as Reasoning
- Role play with large language models
- Role-Play with Large Language Models
- RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
- RouteLLM: Learning to Route LLMs with Preference Data
- rStar2-Agent: Agentic Reasoning Technical Report
- Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
- Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
- SAND: Boosting LLM Agents with Self-Taught Action Deliberation
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
- Scalable Language Models with Posterior Inference of Latent Thought Vectors
- Scalable Neural Contextual Bandit for Recommender Systems
- Scaling can lead to compositional generalization
- Scaling Expert Language Models with Unsupervised Domain Discovery
- Scaling Laws for Neural Language Models
- Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
- Scaling Synthetic Data Creation with 1,000,000,000 Personas
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
- Schema-learning and rebinding as mechanisms of in-context learning and emergence
- SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM
- SDPO: Segment-Level Direct Preference Optimization for Social Agents
- SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs
- Search Arena: Analyzing Search-Augmented LLMs
- Search-o1: Agentic Search-Enhanced Large Reasoning Models
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- Searching for Best Practices in Retrieval-Augmented Generation
- See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
- Seemingly Conscious AI Risks
- Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- Self Selection and Information Role of Online Product Reviews
- Self-Adapting Language Models
- Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems
- Self-Alignment with Instruction Backtranslation
- Self-consistency Improves Chain Of Thought Reasoning In Language Models
- Self-critiquing models for assisting human evaluators
- Self-Directed Synthetic Dialogues and Revisions Technical Report
- Self-Discover: Large Language Models Self-Compose Reasoning Structures
- Self-distillation Enables Continual Learning
- Self-Evaluation Guided Beam Search for Reasoning
- Self-Improving Model Steering
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
- Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
- Self-Questioning Language Models
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
- Self-Refine: Iterative Refinement with Self-Feedback
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
- Self-reinforcing cascades: A spreading model for beliefs or products of varying intensity or quality
- Self-Rewarding Language Models
- Self-Rewarding Vision-Language Model via Reasoning Decomposition
- Self-Supervised Models of Speech Infer Universal Articulatory Kinematics
- Self-Taught Evaluators
- Semantic Change Characterization with LLMs using Rhetorics
- Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
- Semantic Specialization for Knowledge-based Word Sense Disambiguation
- Semantic Structure in Large Language Model Embeddings
- Sequence Organization in Interaction: A Primer in Conversation Analysis
- SERL: Self-Examining Reinforcement Learning on Open-Domain
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
- Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
- Should Humans Lie to Machines? The Incentive Compatibility of Lasso and General Weighted Lasso
- Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent
- Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
- Simple Synthetic Data Reduces Sycophancy In Large Language Models
- SimPO: Simple Preference Optimization with a Reference-Free Reward
- Simulacra as conscious exotica
- Simulating Society Requires Simulating Thought
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
- Single-agent or Multi-agent Systems? Why Not Both?
- Situating Recommender Systems in Practice: Towards Inductive Learning and Incremental Updates
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
- Sleep-time Compute: Beyond Inference Scaling at Test-time
- Small Language Models are the Future of Agentic AI
- Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
- SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
- Social Responses to Media Technologies in the 21st Century: The Media are Social Actors Paradigm
- Social Robots for Long-Term Interaction: A Survey
- Social Skill Training with Large Language Models
- SocraSynth: Multi-LLM Reasoning with Conditional Statistics
- Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
- Soft Tokens, Hard Truths
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
- SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
- Solving a Million-Step LLM Task with Zero Errors
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- Sources of Hallucination by Large Language Models on Inference Tasks
- SParC: Cross-Domain Semantic Parsing in Context
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
- SPICE: Self-Play In Corpus Environments Improves Reasoning
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
- Spurious Forgetting in Continual Learning of Language Models
- Spurious Rewards: Rethinking Training Signals in RLVR
- SSRL: Self-Search Reinforcement Learning
- Stance Detection on Social Media with Fine-Tuned Large Language Models
- Statistical and Algorithmic Foundations of Reinforcement Learning
- SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
- StepWiser: Stepwise Generative Judges for Wiser Reasoning
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
- Strategic Reasoning with Language Models
- Stream of Search (SoS): Learning to Search in Language
- Stress Testing Deliberative Alignment for Anti-Scheming Training
- StructGPT: A General Framework for Large Language Model to Reason over Structured Data
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
- Structured and Natural Responses Co-generation for Conversational Search
- Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act
- Style Vectors for Steering Generative Large Language Models
- Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
- Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system
- Supervised Pretraining Can Learn In-Context Reinforcement Learning
- SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning
- Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents
- Suppressing Pink Elephants with Direct Principle Feedback
- Survey on Evaluation of LLM-based Agents
- Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
- Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
- SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs
- Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
- Synthetic Dialogue Dataset Generation using LLM Agents
- System 1 vs. System 2 Thinking
- System 2 Attention (is something you might need too)
- Systematic synthesis of design prompts for large language models in conceptual design
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
- TaleStream: Supporting Story Ideation with Trope Knowledge
- Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
- Talk like a Graph: Encoding Graphs for Large Language Models
- Talking About Large Language Models
- TarGEN: Targeted Data Generation with Large Language Models
- Target-Guided Open-Domain Conversation
- Task Contamination: Language Models May Not Be Few-Shot Anymore
- Task-Oriented Dialogue as Dataflow Synthesis
- Task-Oriented Dialogue with In-Context Learning
- TaskLAMA: Probing the Complex Task Understanding of Language Models
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
- Teaching Large Language Models to Reason with Reinforcement Learning
- Teaching Probabilistic Logical Reasoning to Transformers
- Tell me about yourself: LLMs are aware of their learned behaviors
- Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future
- Test-time Prompt Intervention
- Test-Time Scaling with Reflective Generative Model
- Textgrad: Automatic “Differentiation” via Text
- The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness
- The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
- The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
- The Art of Scaling Reinforcement Learning Compute for LLMs
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
- The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study
- The Consensus Game: Language Model Generation via Equilibrium Search
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
- The Curse Of Recursion: Training On Generated Data Makes Models Forget
- The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
- The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning
- The Digital Therapeutic Alliance and Human-Computer Interaction
- The Digital Therapeutic Alliance: Prospects and Considerations
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
- The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?
- The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
- The False Promise of Imitating Proprietary LLMs
- The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
- The Future of AI: Exploring the Potential of Large Concept Models
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
- The Hallucination Tax of Reinforcement Finetuning
- The Hermeneutics of Artificial Text
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
- The Illusion of the Illusion of the Illusion of Thinking
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- The Impact of AI-Generated Text on the Internet
- The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
- The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
- The Insanity of Relying on Vector Embeddings: Why RAG Fails
- The Invisible Leash: Why RLVR May Not Escape Its Origin
- The Labor Market Effects of Generative Artificial Intelligence
- The Levers of Political Persuasion with Conversational AI
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
- The Method of Critical AI Studies, A Propaedeutic
- The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
- The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
- The Netflix Recommender System: Algorithms, Business Value, and Innovation
- The Partner Modelling Questionnaire: A validated self-report measure of perceptions toward machines as dialogue partners
- The persuasive effects of political microtargeting in the age of generative artificial intelligence
- The Place of Emotion in Argument
- The Prompt Report: A Systematic Survey of Prompting Techniques
- The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?
- The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
- The Serial Scaling Hypothesis
- The social component of the projection behavior of clausal complement contents
- The state of enterprise AI
- The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
- The Thin Line Between Comprehension and Persuasion in LLMs
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities
- The Unreasonable Ineffectiveness of the Deeper Layers
- The Vanishing Gradient Problem for Stiff Neural Differential Equations
- The Vector Grounding Problem
- TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
- Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models
- Theory of Knowledge Based on the Idea of the Discursive Space
- Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
- Think before you speak: Training Language Models With Pause Tokens
- Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
- Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
- Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
- Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
- Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
- Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering
- Thinking Augmented Pre-training
- Thinking Forward and Backward: Effective Backward Planning with Large Language Models
- Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
- Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
- Thinking LLMs: General Instruction Following with Thought Generation
- Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
- Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender
- Thinkless: LLM Learns When to Think
- Thought Anchors: Which LLM Reasoning Steps Matter?
- Thought Communication in Multiagent Collaboration
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
- Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
- Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration
- Tina: Tiny Reasoning Models via LoRA
- Titans: Learning to Memorize at Test Time
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
- To Tell The Truth: Language of Deception and Language Models
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
- Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
- Topic Modeling in Embedding Spaces
- Topic Shift Detection for Mixed Initiative Response
- Topic-Guided Conversational Recommender in Multiple Domains
- Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory
- Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
- Toward understanding and preventing misalignment generalization
- Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models
- Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
- Towards a Science of Scaling Agent Systems
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
- Towards Algorithmic Experience
- Towards Collective Superintelligence, a Pilot Study
- Towards Conversational Recommendation over Multi-Type Dialogs
- Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
- Towards Healthy AI: Large Language Models Need Therapists Too
- Towards Human-centered Proactive Conversational Agents
- Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities
- Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
- Towards Question-based Recommender Systems
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
- Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning
- Training a Generally Curious Agent
- Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
- Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
- Training language models to be warm and empathetic makes them less reliable and more sycophantic
- Training language models to follow instructions with human feedback
- Training Language Models to Self-Correct via Reinforcement Learning
- Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
- Training Large Language Models to Reason in a Continuous Latent Space
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
- Training-Free Group Relative Policy Optimization
- Transcendence: Generative Models Can Outperform The Experts That Train Them
- Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
- Transformer2: Self-adaptive LLMs
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Tree Search for Language Model Agents
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
- Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
- Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods
- TrustLLM: Trustworthiness in Large Language Models
- Truth or lie: Exploring the language of deception
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
- TTRL: Test-Time Reinforcement Learning
- Tube2Vec: Social and Semantic Embeddings of YouTube Channels
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training
- Tuning Language Models by Proxy
- Turiya at DialAM-2024: Inference Anchoring Theory Based LLM Parsers
- Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
- Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion
- Turning large language models into cognitive models
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
- TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models
- Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering
- UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
- Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting
- Understanding Hidden Computations in Chain-of-Thought Reasoning
- Understanding the Role of User Profile in the Personalization of Large Language Models
- Understanding the Therapeutic Relationship between Counselors and Clients in Online Text-based Counseling using LLMs
- Understanding, explaining, and utilizing medical artificial intelligence
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning
- Unifying Large Language Models and Knowledge Graphs: A Roadmap
- Unifying Nearest Neighbors Collaborative Filtering
- UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs
- Universe of Thoughts: Enabling Creative Reasoning with Large Language Models
- Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration
- Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
- Unsupervised Elicitation of Language Models
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
- UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
- UR2: Unify RAG and Reasoning through Reinforcement Learning
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
- User-Centric Conversational Recommendation with Multi-Aspect User Modeling
- UserBench: An Interactive Gym Environment for User-Centric Agents
- Using Computational Models to Test Syntactic Learnability
- Using Large Language Models to Create AI Personas for Replication and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings
- Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
- Using Linguistic Synchrony to Evaluate Large Language Models for Cognitive Behavioral Therapy
- Using LLMs to Discover Legal Factors
- Using Natural Language for Reward Shaping in Reinforcement Learning
- Using Navigation to Improve Recommendations in Real-Time
- Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
- Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
- Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
- Variational Autoencoders for Collaborative Filtering
- VCounselor: A Psychological Intervention Chat Agent Based on a Knowledge-Enhanced Large Language Model
- Verbal lie detection using Large Language Models
- Virtual Assistance in Any Context
- Virtuous Machines: Towards Artificial General Science
- VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
- Voxtral
- Voyager: An Open-Ended Embodied Agent with Large Language Models
- We Are All Creators: Generative AI, Collective Knowledge, and the Path Towards Human-AI Synergy
- We Wont be Missed: Work and Growth in the Era of AGI
- Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation
- Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics
- Weight-sparse transformers have interpretable circuits
- We’re Afraid Language Models Aren’t Modeling Ambiguity
- What are the Goals of Distributional Semantics?
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
- What does it mean to understand language?
- What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
- What is a Discourse Graph?
- What Makes a Good Natural Language Prompt?
- What the F*ck Is Artificial General Intelligence?
- What we talk to when we talk to language models
- When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
- When Large Language Models are More Persuasive Than Incentivized Humans, and Why
- When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour
- When More is Less: Understanding Chain-of-Thought Length in LLMs
- When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
- When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
- WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue
- Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
- Who’s Afraid of (Left) Hyperstitions
- Why Do Multi-agent LLM Systems Fail?
- Why Do People Rate? Theory and Evidence on Online Ratings
- Why Do Some Language Models Fake Alignment While Others Don't?
- Wide & Deep Learning for Recommender Systems
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
- Witness
- Word Meanings in Transformer Language Models
- Working Alliance Transformer for Psychotherapy Dialogue Classification
- Working with AI: Measuring the Occupational Implications of Generative AI
- Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models
- Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards
- You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures
- Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
- ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
- Zero-Shot Verification-guided Chain of Thoughts
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching
- “Hello There! Is Now a Good Time to Talk?”: Opportune Moments for Proactive Interactions with Smart Speakers
- “It Felt Like Having a Second Mind”: Investigating Human-AI Co-creativity in Prewriting with Large Language Models
- “Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations
- “Understanding AI”: Semantic Grounding in Large Language Models
- “What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
- 𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning