Evaluation of Chat-GPT 5.1 for the Detection of Apical Lesions in Panoramic Radiography

Ezgi Uzun; Burak Kerem Apaydın; İsmail Ongun

doi:10.7126/cumudj.1881678

EN TR

Evaluation of Chat-GPT 5.1 for the Detection of Apical Lesions in Panoramic Radiography

Abstract

Objective: The aim of this study was to evaluate the diagnostic performance of ChatGPT-5.1 in determining the presence or absence of apical lesions on panoramic radiographs based on visual input and to analyze the obtained results on a jaw-specific basis. Materials and Methods: A total of 207 anonymized panoramic radiographs were retrospectively analyzed. In each radiograph, the region in which an apical lesion was present was recorded as “lesion-present,” whereas the contralateral jaw region without an apical lesion on the same radiograph was considered “lesion-absent.” In this context, each lesion-present and lesion-absent region was treated as an independent unit of analysis. All evaluations were independently performed by ChatGPT-5.1 using standardized and anatomically restricted prompts that clearly defined the jaw (maxilla/mandible), side (right/left), and anatomical region. Model outputs were classified as true positive, true negative, false positive, or false negative. Sensitivity, specificity, accuracy, and F1 score were calculated for overall performance and on a jaw-specific basis. Results: Overall sensitivity, specificity, accuracy, and F1 score of ChatGPT-5.1 were 67.15%, 60.87%, 64.01%, and 65.11%, respectively. Tooth-level detection sensitivity was 67.6%. Mandibular performance was higher than maxillary performance (accuracy: 67.52% vs. 57.14%; tooth-level sensitivity: 69.89% vs. 63.04%). Concusion: ChatGPT-5.1 demonstrated a moderate level of diagnostic performance in detecting apical lesions on panoramic radiographs. The findings indicate that the model is not suitable for use as a standalone reliable diagnostic tool.

Keywords

Panoramik Radyografide Apikal Lezyonların Tespitinde Chat-GPT 5.1’in Değerlendirilmesi

Öz

Amaç: Bu çalışmanın amacı, görsel girdiye dayanarak panoramik radyografilerde apikal lezyonların varlığını veya yokluğunu belirlemede ChatGPT-5.1’in tanısal performansını değerlendirmek ve elde edilen sonuçları çeneye özgü olarak analiz etmektir. Gereç ve Yöntem: Toplam 207 adet anonimleştirilmiş panoramik radyografi retrospektif olarak analiz edilmiştir. Her bir radyografide apikal lezyonun mevcut olduğu bölge “lezyonlu” olarak kaydedilirken, aynı radyografi üzerindeki apikal lezyon bulunmayan kontralateral çene bölgesi “lezyonsuz” olarak kabul edilmiştir. Bu kapsamda, her bir lezyonlu ve lezyonsuz bölge birbirinden bağımsız analiz birimleri olarak ele alınmıştır. Tüm değerlendirmeler; çene (maksilla/mandibula), taraf (sağ/sol) ve anatomik bölgeyi açıkça tanımlayan standartlaştırılmış ve anatomik olarak sınırlandırılmış istemler kullanılarak ChatGPT-5.1 tarafından bağımsız olarak gerçekleştirilmiştir. Model çıktıları doğru pozitif, doğru negatif, yanlış pozitif ve yanlış negatif olarak sınıflandırılmıştır. Genel tanısal performans ve çeneye özgü performans için duyarlılık, özgüllük, doğruluk ve F1 skoru hesaplanmıştır. Bulgular: ChatGPT-5.1’in genel duyarlılık, özgüllük, doğruluk ve F1 skoru sırasıyla %67,15, %60,87, %64,01 ve %65,11 olarak bulunmuştur. Diş düzeyinde lezyon saptama duyarlılığı %67,6’dır. Mandibuladaki performans maksillaya kıyasla daha yüksek bulunmuştur (doğruluk: %67,52’ye karşı %57,14; diş düzeyinde duyarlılık: %69,89’a karşı %63,04). Sonuç: ChatGPT-5.1, panoramik radyografilerde apikal lezyonların saptanmasında orta düzeyde bir tanısal performans sergilemiştir. Elde edilen bulgular, modelin tek başına güvenilir bir tanı aracı olarak kullanımına uygun olmadığını göstermektedir.

Anahtar Kelimeler

Supporting Institution

No financial support was received for this study.

Ethical Statement

This study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki and was approved by the Non-Interventional Clinical Research Ethics Committee of Pamukkale University Faculty of Medicine (E-60116787-020-797246).

Thanks

No acknowledgements.

References

1. Sezer B, Okutan AE. Evaluation of ChatGPT-4’s performance on pediatric dentistry questions: accuracy and completeness analysis. BMC Oral Health 2025;25(1):1427.
2. Durmazpinar PM, Ekmekci E. Comparing diagnostic skills in endodontic cases: dental students versus ChatGPT-4o. BMC Oral Health 2025;25(1):457.
3. Tussie C, Starosta A. Comparing the dental knowledge of large language models. Br Dent J 2024.
4. Hamada M, Kikuchi S, Akitomo T, Kusaka S, Iwamoto Y, Nomura R. Applications and potential of ChatGPT in dentistry: Scoping review of research perspectives. J Dent Sci 2026;21(1):1-8.
5. Özdemir ÖT, Güven Y. ChatGPT usage areas and limitations in dentistry. Selcuk Dent J 2025;12(1):184-190.
6. Puleio F, Lo Giudice G, Bellocchio AM, Boschetti CE, Lo Giudice R. clinical, research, and educational applications of ChatGPT in dentistry: a narrative review. Appl Sci 2024;14(23):10802.
7. Taşyürek M, Adıgüzel Ö, Gündoğar M, Goncharuk-Khomyn M, Ortaç H. Comparative evaluation of the responses from ChatGPT-5, Gemini 2.5 Flash, and DeepSeek-V3.1 chatbots to patient inquiries about endodontic treatment in terms of accuracy, understandability, and readability. Int Dent Res 2025;15(3):123-135.
8. Atakır K, Işın K, Taş A, Önder H. Diagnostic accuracy and consistency of ChatGPT-4o in radiology: influence of image, clinical data, and answer options on performance. Diagn Interv Radiol 2025.

9. Zhou X, Chen Y, Abdulghani EA, Zhang X, Zheng W, Li Y. Performance in answering orthodontic patients’ frequently asked questions: Conversational artificial intelligence versus orthodontists. J World Fed Orthod 2025;14(4):202-207.
10. Çeki̇ç EC, Tavşan O. Evaluating large language models using national endodontic specialty examination questions: are they ready for real-world dentistry? BMC Med Educ 2025;25(1):1308.
11. Yilmaz B, Kahraman EN, Brennan MT, Grewal AS, Aktas A. Accuracy of ChatGPT‐4 Plus in providing information on oral cancer management. Oral Dis 2025.
12. Tassoker M. ChatGPT-4 Omni’s superiority in answering multiple-choice oral radiology questions. BMC Oral Health 2025;25(1);173.
13. Jacobs T, Shaari A, Gazonas CB, Ziccardi VB. Is ChatGPT an accurate and readable patient aid for third molar extractions? J Oral Maxillofac Surg 2024;82(10):1239-1245.
14. Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent 2024;131(4):659.e1-659.e6.
15. Akkoca F, Özdede M, İlhan G, Koyuncu E, Ellidokuz H. Assessing the success of ChatGPT-4o in oral radiology education and practice: a pioneering research. Cumhuriyet Dent J 2025;28(2):210-215.
16. Ekici Ö, Çalışkan İ. Comparison of performance of leading large language models in answering medical pathology questions in dentistry specialization education entrance exams: a cross-sectional research. Turkiye Klinikleri J Dental Sci 2025.
17. Makrygiannakis MA, Kaklamanos EG. Assessment of AI software’s diagnostic accuracy in identifying impacted teeth in panoramic radiographs. Eur J Orthod 2025;47(5):cjaf085.
18. Salmanpour F, Akpınar M. Performance of Chat Generative Pretrained Transformer-4.0 in determining labiolingual localization of maxillary impacted canine and presence of resorption in incisors through panoramic radiographs: a retrospective study. Am J Orthod Dentofacial Orthop 2025;168(2):220-231.
19. Suárez A, Arena S, Herranz Calzada A, Castillo Varón AI, Diaz-Flores García V, Freire Y. Decoding wisdom: avaluating ChatGPT’s accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment. Comput Struct Biotechnol J 2025;28:141-147.
20. Ding L, Fan L, Shen M, Wang Y, Sheng K, Zou Z, et al. Evaluating ChatGPT’s diagnostic potential for pathology images. Front Med (Lausanne) 2024;11:1507203.
21. Shrivastava PK, Rai A, Injety RJ, Singh S, Jain A, Mahuli AV et al. Performance of ChatGPT in dentistry: a cross-sectional, multi-specialty and multi-centric study. Braz J Oral Sci 2025;24:e254954.
22. Achanur M, Bhatt S, Maniyar RN, et al. ChatGPT’s emerging role in dentistry: a review. J Pharm Bioallied Sci 2025;17(Suppl 1):S99-S101.
23. Bragazzi NL, Szarpak L, Piccotti F. Assessing ChatGPT’s potential in endodontics: preliminary findings from a diagnostic accuracy study. SSRN 2023;4631017.
24. Kahalian S, Rajabzadeh M, Öçbe M, Medisoglu MS. ChatGPT-4.0 in oral and maxillofacial radiology: prediction of anatomical and pathological conditions from radiographic images. Folia Medica 66(6): 863-868. 2024;66(6):863-868.
25. Aşar EM, İpek İ, Bi̇lge K. Customized GPT-4V(ision) for radiographic diagnosis: can large language model detect supernumerary teeth? BMC Oral Health 2025;25(1):756.
26. Stera G, Giusti M, Magnini A, Calistri L, Izzetti R, Nardi C. Diagnostic accuracy of periapical radiography and panoramic radiography in the detection of apical periodontitis: a systematic review and meta-analysis. Radiol Med 2024;129(11):1682.
27. Dhillon M, Raju SM, Verma S, et al. Positioning errors and quality assessment in panoramic radiography. Imaging Sci Dent 2012;42(4):207-212.

Details

Primary Language

English

Subjects

Oral and Maxillofacial Radiology

Journal Section

Research Article

Authors

Ezgi Uzun
0000-0003-3198-8325
Türkiye

Burak Kerem Apaydın ^*
0000-0003-2621-4704
Türkiye

İsmail Ongun
0000-0003-1546-461X
Türkiye

Publication Date

March 27, 2026

Submission Date

February 4, 2026

Acceptance Date

February 23, 2026

Published in Issue

Year 2026 Volume: 29 Number: 1

DOI

https://doi.org/10.7126/cumudj.1881678

IZ

https://izlik.org/JA76KU77BK

Cite

RIS / Bibtex

EndNote

Uzun E, Apaydın BK, Ongun İ (March 1, 2026) Evaluation of Chat-GPT 5.1 for the Detection of Apical Lesions in Panoramic Radiography. Cumhuriyet Dental Journal 29 1 167–173.