Logo MMOral

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis


1 Faculty of Dentistry, The University of Hong Kong
2 The Hong Kong University of Science and Technology (GZ)
3 National University of Singapore 4 CVTE 5 Sun Yat-sen University
6 Department of Diagnostic Radiology, The University of Hong Kong
7 Imaging and Interventional Radiology, Faculty of Medicine, The Chinese University of Hong Kong
8 School of Computer Science, Peking University

*Equal Contributors
†Corresponding to: isjinghao@gmail.com, orionsfan@outlook.com, haotangpku.edu.cn, hungkfg@hku.hk
geometric reasoning

Overview of the MMOral. It consists of four sub-datasets: MMOral-Attribute, MMOral-Report, MMOral-VQA, and MMOral-Chat. 1) MMOral-Attribute contains a total of 49 categories of anatomical structures within panoramic X-rays. 2) MMOral-Report consists of two types of textual descriptions: the location caption and the medical report. 3) MMOral-VQA includes closed-ended and open-ended QA pairs spanning five diagnostic dimensions. 4) MMOral-Chat simulates the dialogue process between patients and radiology experts regarding the interpretation of panoramic X-rays.

🔔News

🚀[2025-05-15]: MMOral-Bench has been avaliable for public evaluation! 😆 More datasets will be coming soon!

Introduction

we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray understanding. MMOral comprises 20,563 annotated panoramic X-rays paired with 1.3 million instruction-following instances, spanning multiple task formats including attribute extraction, report generation, visual question answering, and image-grounded dialogue. Complementing the dataset, MMOral-Bench offers a curated evaluation suite covering five key diagnostic dimensions, including condition of teeth, pathological findings, historical treatments, jawbone observations, and clinical summary & recommendations. This benchmark consists of 100 images with 1,100 QA pairs, including 500 closed-ended and 600 open-ended questions. All cases in MMOral-Bench are manually chosen and checked from the MMOral to ensure their quality and reliability. Together, MMOral and MMOral-Bench lay a critical foundation for advancing intelligent dentistry and enabling clinically meaningful multimodal AI.

MMOral Curation

The curation pipeline of the MMOral dataset consists of four sequential steps, as outlined below:

algebraic reasoning

MMOral-Report Visulization

MMOral Statistic

MMOral-Bench

Overview

We construct MMOral-Bench by curating 500 closed-ended and 600 open-ended QA pairs with 100 images through significant manual selection and validation. To ensure image quality, we select images from the dataset proposed by Hoang Viet Do because its acquisition process is clearer and more reliable. Moreover, we filter out QA pairs that could not be answered with the image, and incorrect answers are identified and re-annotated. MMOral-Bench covers five different clinically grounded dimensions (e.g., Teeth, Patho, HisT, Jaw, SumRec) and thus can comprehensively evaluate the ability of LVLMs to understand and interpret panoramic X-rays. Each QA pair is assigned to one or more diagnostic dimensions based on its clinical intent, enabling a multi-dimensional analysis.

algebraic reasoning
algebraic reasoning

Performance comparison on both closed-ended and open-ended QA across multiple LVLMs.


Experiment Results

VQA Examples

BibTeX


          @inproceedings{todo,
            title={todo},
            author={Xiang Yue and Yuansheng Ni and Kai Zhang and Tianyu Zheng and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen},
            booktitle={Proceedings of CVPR},
            year={2024},
          }