CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
arXiv preprint arXiv:2412.12932, 2024.
Cheng, Zihui and Chen, Qiguang and Zhang, Jin and Fei, Hao and Feng, Xiaocheng and Che, Wanxiang and Li, Min and Qin, Libo