
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models
Chartographer addresses a critical blind spot in vision-language model evaluation: models can game chart QA benchmarks through memorization or statistical shortcuts rather than genuine visual reasoning. By reverse-engineering charts into executable code and generating controlled counterfactual variants, researchers can now measure whether VLMs actually understand visual semantics or exploit dataset artifacts. This matters because it exposes whether leading proprietary and open-source models possess robust multimodal reasoning or merely pattern-match on familiar chart structures, reshaping how the field should benchmark visual intelligence.62






















