Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources

Mahmure Ayşe Tayman

doi:10.7126/cumudj.1673333

Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources

Abstract

Objectives: To assess and check the validity and reliability of the answers given by ChatGpt-4o mini, Deepseek, Copilot and Gemini 1.5 flash daily chatbots to often seeked queries in the area of periodontology. Materials and Methods: Questions were selected from the most frequently asked patient questions by a periodontologist. Each question was asked to the chatbots three times. The answers (n=240) were independently evaluated by two periodontologists on a Likert scale (5=violently agree; 4=agree; 3: neutral; 2=disagree; 1=violently disagree). Disputes in scoring were removed through evidence-based negotiations. In evaluating the validity of the answers: Low threshold was determined as a score ≥4 for whole three answers; high threshold was determined as a score 5 for whole three answers. Fisher's exact test was performed to compare the validity of the answers among the chatbots. Cronbach's alpha was computed to evaluate the consistency and reliability of recurrent answers for each chatbot. Results: All four chatbots answered the questions. In the low-threshold validity test, ChatGpt had 100%, Deepseek and Copilot had 95%, Gemini had 65%. Gemini was significantly different from the others (p<0.05). In the high-threshold validity test, ChatGpt had 80%, Deepseek had 75%, Copilot and Gemini were significantly lower at 5%. While there was no significant difference between ChatGpt and Deepseek (p>0.05), both were significantly higher than Copilot and Gemini (p<0.05). All four chatbots reached an acceptable level of reliability (Cronbach's alpha >0.7). Conclusion: ChatGpt and Deepseek provided more reliable information on periodontology-related topics than Copilot and Gemini.

Keywords

Ethical Statement

Ethics Approval and Consent to Participate: This study does not require ethics committee approval. No data was collected from participants and no human or animal material was used.

References

1. Bayrakdar İ, Ҫelik Ö, Orhan K, Bilgir E, Odabaş A, Aslan A. Success of artificial intelligence system in determining alveolar bone loss from dental panoramic radiography images. Cumhuriyet Dent J 2020;23(4):318-324.
2. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;28;521(7553):436-444.
3. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; 61:85-117.
4. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med 2023;183(6):589-596.
5. Safi Z, Abd-Alrazaq A, Khalifa M, Househ M. Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review. J Med Internet Res 2020;18;22(12):e19127.
6. Burisch C, Bellary A, Breuckmann F, Ehlers J, Thal SC, Sellmann T, Gödde D. ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial. JMIR Res Protoc 2025;14:e63887.
7. Temsah A, Alhasan K, Altamimi I, Jamal A, Al-Eyadhy A, Malki KH, Temsah MH. DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier. Cureus 2025;17(2):e79221.
8. Hancı V, Ergün B, Gül Ş, Uzun Ö, Erdemir İ, Hancı FB. Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine (Baltimore). 2024;103(33):e39305.

9. Wang S, Wang Y, Jiang L, Chang Y, Zhang S, Zhao K, Chen L, Gao C. Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation. Eur J Med Res 2025;30(1):45.
10. Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int Endod J 2024;57(3):305-314.
11. Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, Mohan N. Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma. Dent Traumatol 2025;41(2):187-193.
12. Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 2007;102(9):2070-7.
13. Bland JM, Altman DG. Cronbach's alpha. BMJ. 1997;314(7080):572. doi: 10.1136/bmj.314.7080.572.
14. Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, Staubli SM. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J Med Internet Res 2023;25:e47479.
15. Doshi RH, Amin K, Khosla P, Bajaj S, Chheang S, Forman H. Utilizing large language models to simplify radiology reports: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Bard, and Microsoft Bing. MedRxiv. 2023; 06. 04.23290786.
16. Ayers JW, Zhu Z, Poliak A, Leas EC, Dredze M, Hogarth M, Smith DM. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Netw Open 2023;6(6):e2317517.
17. Kayaalp ME, Prill R, Sezgin EA, Cong T, Królikowska A, Hirschmann MT. DeepSeek versus ChatGPT: Multimodal artificial intelligence revolutionizing scientific discovery. From language editing to autonomous content generation-Redefining innovation in research and practice. Knee Surg Sports Traumatol Arthrosc 2025;33(5):1553-1556.
18. Thelwall M. Is Google Gemini better than ChatGPT at evaluating research quality?. Journal of Data and Information Science 2025;10(2):1-5.
19. Bhardwaz S, Kumar J. An extensive comparative analysis of chatbot technologies -ChatGPT, Google BARD and Microsoft Bing. In: 2023 2nd international conference on applied artificial intelligence and computing (ICAAIC). Salem, India: IEEE 2023:673-679.
20. Kerbage A, Burke CA, Rouphael C. Artificial Intelligence Chatbots in Healthcare: Navigating Accuracy, Privacy, and Global Applicability. Clin Gastroenterol Hepatol 2024;22(10):2158-2159.

Details

Primary Language

English

Subjects

Periodontics , Dental Public Health

Journal Section

Research Article

Authors

Mahmure Ayşe Tayman ^*
0000-0001-8924-6725
Türkiye

Publication Date

September 30, 2025

Submission Date

April 10, 2025

Acceptance Date

June 16, 2025

Published in Issue

Year 1970 Volume: 28 Number: 3

DOI

https://doi.org/10.7126/cumudj.1673333

IZ

https://izlik.org/JA27RW63HM

Cite

RIS / Bibtex

EndNote

Tayman MA (September 1, 2025) Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources. Cumhuriyet Dental Journal 28 3 390–396.

Cited By

Do AI Chatbots Tell the Truth About Dentin Hypersensitivity? A Comparative Evaluation of Quality, Accuracy, and Readability

Cumhuriyet Dental Journal

https://doi.org/10.7126/cumudj.1848545