SlideAudit: A Dataset and Taxonomy for Automated Evaluation of Presentation Slides
Abstract
Automated evaluation of specific graphic designs like presentation slides is an open problem. We present SlideAudit, a dataset for automated slide evaluation. We collaborated with design experts to develop a thorough taxonomy of slide design flaws. Our dataset comprises 2400 slides collected and synthesized from multiple sources, including a subset intentionally modified with specific design problems. We then fully annotated them using our taxonomy through strictly trained crowdsourcing from Prolific. To evaluate whether AI is capable of identifying design flaws, we compared multiple large language models under different prompting strategies, and with an existing design critique pipeline. We show that AI models struggle to accurately identify slide design flaws, with F1 scores ranging from 0.331 to 0.655. Notably, prompting techniques leveraging our taxonomy achieved the highest performance. We further conducted a remediation study to assess AI's potential for improving slides. Among 82.0% of slides that showed significant improvement, 87.8% of them were improved more with our taxonomy, further demonstrating its utility.