Reimplementing "VLMs are blind"

// image placeholder — probe examples

Motivation

Modern VLMs look impressive on demos, yet a line of work on "VLM blindness" shows they routinely fail tasks a child handles: counting overlapping objects, judging relative positions, recognizing simple relationships. This write-up reimplements those probes from scratch and extends them, to build an honest picture of where these models actually break.

What I'm looking at

Reimplementing the core probe suite rather than trusting reported numbers.
Stress-testing counting, spatial reasoning, and attribute binding.
Checking whether failures are consistent across prompting and resolution changes.

Findings

// in progress — detailed results, figures, and failure case studies to follow as the analysis matures.