Diffusion-Based LLMs
Related topics:
- A Survey on Diffusion Language ModelsAbstract—Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterati…
- Deep Researcher with Test-Time DiffusionDeep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time …
- DeepGesture: A conversational gesture synthesis system based on emotions and semanticsAlong with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans…
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion ForcingDiffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. How…
- Diffusion Language Models Know the Answer Before DecodingDiffusion language models (DLMs) have recently emerged as an alternative to autoregressive approaches, offering parallel sequence generation and flexible token orders. However, their inference remains…
- Diffusion Models are Evolutionary AlgorithmsIn a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we m…
- Diffusion-LM Improves Controllable Text GenerationControlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sente…
- Large Language Diffusion ModelsIs the autoregressive paradigm the only viable path to achieving the intelligence exhibited by LLMs? we argue that scalability is primarily a consequence of the interplay between Transformers (Vaswan…
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth ApproachWe study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling…
- Thinking Inside the Mask: In-Place Prompting in Diffusion LLMsDespite large language models (LLMs) have achieved remarkable success, their prefix-only prompting paradigm and sequential generation process offer limited flexibility for bidirectional information. D…