Abstract Summary
Many existing benchmarks focus on simplified chart generation. DV-World tests agents in software-grounded settings that require editing spreadsheets, adapting reference visuals, and clarifying ambiguous requests.
- DV-Sheet focuses on spreadsheet charting, repair, and dashboard tasks.
- DV-Evolution tests visual adaptation across data and frameworks.
- DV-Interact measures clarification and intent alignment.
- Current leading models still score below 50% overall.