// image placeholder
VLM blindness probe
VLM blindness probe
// vision-language models · reimplementation
Reimplementing "VLMs are blind"
A from-scratch reimplementation and extension of the probes showing that large vision-language models fail surprisingly basic visual tasks — counting, spatial reasoning, simple object relationships. Why do models that "see" so well in demos stumble on things a child handles?
Read article →