🚀[2025-05-15]: MMOral-Bench has been avaliable for public evaluation! 😆 More datasets will be coming soon!
we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray understanding. MMOral comprises 20,563 annotated panoramic X-rays paired with 1.3 million instruction-following instances, spanning multiple task formats including attribute extraction, report generation, visual question answering, and image-grounded dialogue. Complementing the dataset, MMOral-Bench offers a curated evaluation suite covering five key diagnostic dimensions, including condition of teeth, pathological findings, historical treatments, jawbone observations, and clinical summary & recommendations. This benchmark consists of 100 images with 1,100 QA pairs, including 500 closed-ended and 600 open-ended questions. All cases in MMOral-Bench are manually chosen and checked from the MMOral to ensure their quality and reliability. Together, MMOral and MMOral-Bench lay a critical foundation for advancing intelligent dentistry and enabling clinically meaningful multimodal AI.
The curation pipeline of the MMOral dataset consists of four sequential steps, as outlined below:
MMOral-Report Visulization Case 1.
MMOral-Report Visulization Case 2.
MMOral-Report Visulization Case 3.
MMOral-Report Visulization Case 4.
MMOral-Report Visulization Case 5.
MMOral-Report Visulization Case 6.
The brief description of four sub-datasets in MMOral and their corresponding data size.
The data statistic distribution and human evaluation results.
MMOral-Bench
We construct MMOral-Bench by curating 500 closed-ended and 600 open-ended QA pairs with 100 images through significant manual selection and validation. To ensure image quality, we select images from the dataset proposed by Hoang Viet Do because its acquisition process is clearer and more reliable. Moreover, we filter out QA pairs that could not be answered with the image, and incorrect answers are identified and re-annotated. MMOral-Bench covers five different clinically grounded dimensions (e.g., Teeth, Patho, HisT, Jaw, SumRec) and thus can comprehensively evaluate the ability of LVLMs to understand and interpret panoramic X-rays. Each QA pair is assigned to one or more diagnostic dimensions based on its clinical intent, enabling a multi-dimensional analysis.
Performance comparison on both closed-ended and open-ended QA across multiple LVLMs.
@inproceedings{todo,
title={todo},
author={Xiang Yue and Yuansheng Ni and Kai Zhang and Tianyu Zheng and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen},
booktitle={Proceedings of CVPR},
year={2024},
}