Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

Yuchen Li, Amanmeet Garg, Shalini Chaudhuri, Rui Zhao, Garin Kessler

March 2026

Perceptio explores perception-enhanced vision-language modeling through spatial token generation for complex 2D and 3D spatial reasoning.

Type

Publication

arXiv preprint, 2026