Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper · arXiv 2508.21365 · Published August 29, 2025

Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think-In Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.

This brings us back to our central paradox: traditional RL agents know how but cannot explain why, while LLMs know why but cannot execute how. To bridge this gap, we propose Think-In Games (TiG), a novel framework that enables LLMs to develop procedural understanding through direct interaction with the game environment while maintaining their natural ability to reason and explain. Specifically, we reformulate traditional RL decision-making task as a language modeling task: our approach uses an LLM to generate policies in language, which are then refined through online reinforcement learning based on direct interaction with game environments. The game environment provides rewards for each action, and the policy model learns from this feedback while generating step-by-step explanations of its reasoning.

2.1. Motivation

To enable LLMs to develop a deep, intrinsic understanding of game mechanics, we draw inspiration from the learning processes of expert MOBA players. Expert gameplay in MOBA environments is characterized by macro-level reasoning, which involves devising and executing team-wide strategies, such as objective control, map pressure1, and coordinated team maneuvers. Unlike micro-level actions (e.g., precise skill execution), macro-level reasoning prioritizes long-term objectives and team synergy. Our goal is to equip LLMs with these macro-level reasoning capabilities, fostering a comprehensive understanding of game mechanics and enabling generalization across diverse tasks.