Performance of Chat-GPT 5.1 in the Diagnostic Evaluation of Apical Lesions on Panoramic Radiographs
Abstract
Objective: The aim of this study was to evaluate the diagnostic performance of ChatGPT-5.1 in determining the presence or absence of apical lesions on panoramic radiographs based on visual input and to analyze the obtained results on a jaw-specific basis. Materials and Methods: A total of 207 anonymized panoramic radiographs were retrospectively analyzed. In each radiograph, the region in which an apical lesion was present was recorded as “lesion-present,” whereas the contralateral jaw region without an apical lesion on the same radiograph was considered “lesion-absent.” In this context, each lesion-present and lesion-absent region was treated as an independent unit of analysis. All evaluations were independently performed by ChatGPT-5.1 using standardized and anatomically restricted prompts that clearly defined the jaw (maxilla/mandible), side (right/left), and anatomical region. Model outputs were classified as true positive, true negative, false positive, or false negative. Sensitivity, specificity, accuracy, and F1 score were calculated for overall performance and on a jaw-specific basis. Results: Overall sensitivity, specificity, accuracy, and F1 score of ChatGPT-5.1 were 67.15%, 60.87%, 64.01%, and 65.11%, respectively. Tooth-level detection sensitivity was 67.6%. Mandibular performance was higher than maxillary performance (accuracy: 67.52% vs. 57.14%; tooth-level sensitivity: 69.89% vs. 63.04%). Concusion: ChatGPT-5.1 demonstrated a moderate level of diagnostic performance in detecting apical lesions on panoramic radiographs. The findings indicate that the model is not suitable for use as a standalone reliable diagnostic tool.
Keywords
Supporting Institution
Ethical Statement
Thanks
References
- 1. Sezer B, Okutan AE. Evaluation of ChatGPT-4’s performance on pediatric dentistry questions: accuracy and completeness analysis. BMC Oral Health 2025;25(1):1427.
- 2. Durmazpinar PM, Ekmekci E. Comparing diagnostic skills in endodontic cases: dental students versus ChatGPT-4o. BMC Oral Health 2025;25(1):457.
- 3. Tussie C, Starosta A. Comparing the dental knowledge of large language models. Br Dent J 2024.
- 4. Hamada M, Kikuchi S, Akitomo T, Kusaka S, Iwamoto Y, Nomura R. Applications and potential of ChatGPT in dentistry: Scoping review of research perspectives. J Dent Sci 2026;21(1):1-8.
- 5. Özdemir ÖT, Güven Y. ChatGPT usage areas and limitations in dentistry. Selcuk Dent J 2025;12(1):184-190.
- 6. Puleio F, Lo Giudice G, Bellocchio AM, Boschetti CE, Lo Giudice R. clinical, research, and educational applications of ChatGPT in dentistry: a narrative review. Appl Sci 2024;14(23):10802.
- 7. Taşyürek M, Adıgüzel Ö, Gündoğar M, Goncharuk-Khomyn M, Ortaç H. Comparative evaluation of the responses from ChatGPT-5, Gemini 2.5 Flash, and DeepSeek-V3.1 chatbots to patient inquiries about endodontic treatment in terms of accuracy, understandability, and readability. Int Dent Res 2025;15(3):123-135.
- 8. Atakır K, Işın K, Taş A, Önder H. Diagnostic accuracy and consistency of ChatGPT-4o in radiology: influence of image, clinical data, and answer options on performance. Diagn Interv Radiol 2025.
Details
Primary Language
English
Subjects
Oral and Maxillofacial Radiology
Journal Section
Research Article
Publication Date
March 27, 2026
Submission Date
February 4, 2026
Acceptance Date
February 23, 2026
Published in Issue
Year 2026 Volume: 29 Number: 1