Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Tang, Yao; Dong, Li; Hao, Yaru; Dong, Qingxiu; Wei, Furu; Gu, Jiatao

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Yao Tang¹, Li Dong², Yaru Hao², Qingxiu Dong², Furu Wei², Jiatao Gu¹

¹University of Pennsylvania ²Microsoft Research

Paper Code

🤗

Checkpoints

Multiplex Thinking branches by sampling multiple discrete tokens and merges them into one multiplex token at each thinking step.

Multiplex tokens are self-adaptive:
Confident step (low entropy) → tokens agree → a multiplex token ≈ a standard CoT step
Uncertain step (high entropy) → tokens diversify → a multiplex token ≈ multiple next steps

Abstract

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences.

Better performance, fewer forward passes

Multiplex Thinking improves Pass@k while keeping reasoning compact.

Beat RL on discrete rollouts (Pass@1 → Pass@1024)

Pass@1 to Pass@1024 results comparing Multiplex Thinking against baselines — Pass@1–Pass@1024 results on representative datasets.

Think better, think shorter

Average response length vs multiplex width K — Response length comparison between Multiplex Thinking (K=2/3/4) and discrete RL (K=1). Left: average length. Right: length scaling.

Response length scaling vs multiplex width K — Response length comparison between Multiplex Thinking (K=2/3/4) and discrete RL (K=1). Left: average length. Right: length scaling.

Case Study

We visualize a multiplex trajectory where 3 discrete tokens are independently sampled at each position and displayed as Sample 1/2/3. The gray number above each token represents its index in the trajectory. A yellow background indicates all branches sampled the same token, purple indicates two unique tokens, and red indicates three distinct tokens.

BibTeX

@article{tang2026multiplexthinking,
  title   = {Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge},
  author  = {Tang, Yao and Dong, Li and Hao, Yaru and Dong, Qingxiu and Wei, Furu and Gu, Jiatao},
  journal = {arXiv preprint arXiv:2601.08808},
  year    = {2026}
}