Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources

Mahmure Ayşe Tayman

doi:10.7126/cumudj.1673333

Research Article

BibTex

RIS

Cite

Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources

Year 2025, Volume: 28 Issue: 3, 390 - 396, 30.09.2025

Mahmure Ayşe Tayman

https://doi.org/10.7126/cumudj.1673333

Abstract

Objectives: To assess and check the validity and reliability of the answers given by ChatGpt-4o mini, Deepseek, Copilot and Gemini 1.5 flash daily chatbots to often seeked queries in the area of periodontology.
Materials and Methods: Questions were selected from the most frequently asked patient questions by a periodontologist. Each question was asked to the chatbots three times. The answers (n=240) were independently evaluated by two periodontologists on a Likert scale (5=violently agree; 4=agree; 3: neutral; 2=disagree; 1=violently disagree). Disputes in scoring were removed through evidence-based negotiations. In evaluating the validity of the answers: Low threshold was determined as a score ≥4 for whole three answers; high threshold was determined as a score 5 for whole three answers. Fisher's exact test was performed to compare the validity of the answers among the chatbots. Cronbach's alpha was computed to evaluate the consistency and reliability of recurrent answers for each chatbot.
Results: All four chatbots answered the questions. In the low-threshold validity test, ChatGpt had 100%, Deepseek and Copilot had 95%, Gemini had 65%. Gemini was significantly different from the others (p<0.05). In the high-threshold validity test, ChatGpt had 80%, Deepseek had 75%, Copilot and Gemini were significantly lower at 5%. While there was no significant difference between ChatGpt and Deepseek (p>0.05), both were significantly higher than Copilot and Gemini (p<0.05). All four chatbots reached an acceptable level of reliability (Cronbach's alpha >0.7).
Conclusion: ChatGpt and Deepseek provided more reliable information on periodontology-related topics than Copilot and Gemini.

Keywords

Chatbots , artificial intelligence , deep learning , dentistry , periodontology

Ethical Statement

Ethics Approval and Consent to Participate: This study does not require ethics committee approval. No data was collected from participants and no human or animal material was used.

References

1. Bayrakdar İ, Ҫelik Ö, Orhan K, Bilgir E, Odabaş A, Aslan A. Success of artificial intelligence system in determining alveolar bone loss from dental panoramic radiography images. Cumhuriyet Dent J 2020;23(4):318-324.
2. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;28;521(7553):436-444.
3. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; 61:85-117.
4. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med 2023;183(6):589-596.
5. Safi Z, Abd-Alrazaq A, Khalifa M, Househ M. Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review. J Med Internet Res 2020;18;22(12):e19127.
6. Burisch C, Bellary A, Breuckmann F, Ehlers J, Thal SC, Sellmann T, Gödde D. ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial. JMIR Res Protoc 2025;14:e63887.
7. Temsah A, Alhasan K, Altamimi I, Jamal A, Al-Eyadhy A, Malki KH, Temsah MH. DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier. Cureus 2025;17(2):e79221.
8. Hancı V, Ergün B, Gül Ş, Uzun Ö, Erdemir İ, Hancı FB. Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine (Baltimore). 2024;103(33):e39305.
9. Wang S, Wang Y, Jiang L, Chang Y, Zhang S, Zhao K, Chen L, Gao C. Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation. Eur J Med Res 2025;30(1):45.
10. Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int Endod J 2024;57(3):305-314.
11. Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, Mohan N. Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma. Dent Traumatol 2025;41(2):187-193.
12. Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 2007;102(9):2070-7.
13. Bland JM, Altman DG. Cronbach's alpha. BMJ. 1997;314(7080):572. doi: 10.1136/bmj.314.7080.572.
14. Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, Staubli SM. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J Med Internet Res 2023;25:e47479.
15. Doshi RH, Amin K, Khosla P, Bajaj S, Chheang S, Forman H. Utilizing large language models to simplify radiology reports: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Bard, and Microsoft Bing. MedRxiv. 2023; 06. 04.23290786.
16. Ayers JW, Zhu Z, Poliak A, Leas EC, Dredze M, Hogarth M, Smith DM. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Netw Open 2023;6(6):e2317517.
17. Kayaalp ME, Prill R, Sezgin EA, Cong T, Królikowska A, Hirschmann MT. DeepSeek versus ChatGPT: Multimodal artificial intelligence revolutionizing scientific discovery. From language editing to autonomous content generation-Redefining innovation in research and practice. Knee Surg Sports Traumatol Arthrosc 2025;33(5):1553-1556.
18. Thelwall M. Is Google Gemini better than ChatGPT at evaluating research quality?. Journal of Data and Information Science 2025;10(2):1-5.
19. Bhardwaz S, Kumar J. An extensive comparative analysis of chatbot technologies -ChatGPT, Google BARD and Microsoft Bing. In: 2023 2nd international conference on applied artificial intelligence and computing (ICAAIC). Salem, India: IEEE 2023:673-679.
20. Kerbage A, Burke CA, Rouphael C. Artificial Intelligence Chatbots in Healthcare: Navigating Accuracy, Privacy, and Global Applicability. Clin Gastroenterol Hepatol 2024;22(10):2158-2159.

There are 20 citations in total.

Details

Primary Language	English
Subjects	Periodontics, Dental Public Health
Journal Section	Research Article
Authors	Mahmure Ayşe Tayman 0000-0001-8924-6725
Submission Date	April 10, 2025
Acceptance Date	June 16, 2025
Publication Date	September 30, 2025
Published in Issue	Year 2025 Volume: 28 Issue: 3

Cite

EndNote	Tayman MA (September 1, 2025) Validity and Reliability of Responses to Periodontology Questions by 4 Different Artificial Intelligence Chatbots as Public Information Sources. Cumhuriyet Dental Journal 28 3 390–396.

Article Files

Full Text

Cumhuriyet Dental Journal (Cumhuriyet Dent J, CDJ) is the official publication of Cumhuriyet University Faculty of Dentistry. CDJ is an international journal dedicated to the latest advancement of dentistry. The aim of this journal is to provide a platform for scientists and academicians all over the world to promote, share, and discuss various new issues and developments in different areas of dentistry. First issue of the Journal of Cumhuriyet University Faculty of Dentistry was published in 1998. In 2010, journal's name was changed as Cumhuriyet Dental Journal. Journal’s publication language is English.

CDJ accepts articles in English. Submitting a paper to CDJ is free of charges. In addition, CDJ has not have article processing charges.

Frequency: Four times a year (March, June, September, and December)

IMPORTANT NOTICE

All users of Cumhuriyet Dental Journal should visit to their user's home page through the "https://dergipark.org.tr/tr/user" " or "https://dergipark.org.tr/en/user" links to update their incomplete information shown in blue or yellow warnings and update their e-mail addresses and information to the DergiPark system. Otherwise, the e-mails from the journal will not be seen or fall into the SPAM folder. Please fill in all missing part in the relevant field.

Please visit journal's AUTHOR GUIDELINE to see revised policy and submission rules to be held since 2020.