Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation featured image

Abstract

Perceptio explores perception-enhanced vision-language modeling through spatial token generation for complex 2D and 3D spatial reasoning.

Publication
arXiv preprint, 2026