Spatial Reasoning

Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

Perceptio explores perception-enhanced vision-language modeling through spatial token generation for complex 2D and 3D spatial reasoning.