Diffusion Language Models Project // Noah's Ark Lab

News

Latest Updates & Announcements

2026.02

🚀 We released DLLM-Agent: The first Diffusion-based LLM agents. They deliver over 30% faster end-to-end performance on average than autoregressive agents at comparable accuracy, achieving up to 8× speedups in selected cases with more efficient multi-step planning.

2026.01

Our proposal about Top 10 Open Challenges of diffusion LLMs was presented at AAAI'26, outlining current bottlenecks and effective ideas for boosting performance and expanding application fields.

2026.01

We proposed a novel Diffusion-in-Diffusion paradigm: a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models.

2026.01

Dynamic output length issues are addressed by our new DCD (Deferred Commitment Decoding) method, which maintains a certainty-aware sliding window to resolve tokens only when sufficient contextual evidence is available.

Featured Research

Selected Publications

ARXIV: 2602.07451

DLLM-Agent

First Diffusion-based LLM agents with 30% faster planning.

ARXIV: 2601.13599

Diffusion In Diffusion

A recursive 'draft-then-refine' generation framework.

ARXIV: 2512.06776

NB-Diff

Non-blocking diffusion for parallel decoding acceleration.

ARXIV: 2601.02076

Deferred Commitment Decoding

Solving the dynamic output length issue with certainty windows.

Open Challenges

Bottlenecks & Scalability

Inference-Efficient Architectures

Native designs for non-causal, iterative updates without redundant global re-computation or ineffective KV caching.

Structured Hierarchy

Multi-scale tokenization that moves beyond flat BPE to allocate resources between semantic structuring and lexical polishing.

Gradient Sparsity

Solving computational waste in long-sequence pre-training where minimal tokens provide gradient feedback.

Advanced Masking

Structured mechanisms accounting for interdependencies between functional tokens vs. generic filler words.

Adaptive Termination

Predicting optimal output length dynamically using methods like CDC to avoid "hallucinatory padding".

Data Engineering

Curating corpora that highlight structural relationships and multi-point dependencies for bidirectional learning.

Resource Optimization

Balancing denoising quality with the "iterative tax" of multiple steps during high-throughput execution.

Latent Thinking

Enabling the model to "re-think" or edit its output, allowing iterative self-correction beyond linear trajectories.

Structured Prompting

Frameworks where prompts serve as global constraints or skeletal scaffolds rather than simple prefixes.

Unified Multimodal

Collapsing understanding and generation into a single denoising manifold for unified multimodal models (e.g., VLA).

Strategic Roadmap

The Path Forward

Pillar I: Infrastructure

Shifting to Diffusion-native ecosystems. We propose stochastic-aware attention and multi-scale tokenizers that simulate hierarchical thought.

Pillar II: Optimization

Implementing dynamic masking and speculative denoising. Incorporating uncertainty-aware windows directly into the decoding process.

Pillar III: Cognitive Reasoning

Shifting to Active Remasking. Identifying low-confidence regions for immediate re-generation, enabling self-correction.

Pillar IV: Unified Intelligence

Treating understanding and generation as a single continuum to collapse the modality gap in unified multimodal models.

Diffusion Language Models
Project from Noah's Ark Lab

Abstract

News

Featured Research

DLLM-Agent

Diffusion In Diffusion

NB-Diff

Deferred Commitment Decoding

Open Challenges

Inference-Efficient Architectures

Structured Hierarchy

Gradient Sparsity

Advanced Masking

Adaptive Termination

Data Engineering

Resource Optimization

Latent Thinking

Structured Prompting

Unified Multimodal

Strategic Roadmap

Pillar I: Infrastructure

Pillar II: Optimization

Pillar III: Cognitive Reasoning

Pillar IV: Unified Intelligence