Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio questuin answering

Published: Sep 14, 2025

Last Updated: Sep 14, 2025

Authors:Jinghua Zhao, Hang Su, Lichun Fan, Zhenbo Luo, Jian Luan, Hui Wang, Haoqin Sun, Yong Qin

Abstract

We propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought for audio question answering. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Integrated with GRPO training, these strategies enable the model to learn more effectively from informative samples. Experiments on MMAU-mini and MMAR demonstrate that Omni-CLST achieves competitive accuracy (73.80% on MMAU-mini) and establishes a new state of the art (64.30% on MMAR), highlighting its robustness and generalization capability in multimodal audio-language understanding.

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio questuin answering

Abstract

Categories