<?xml version="1.0" encoding="UTF-8"?>
<article xsi:noNamespaceSchemaLocation="http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1-mathml3.xsd" dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <front>
        <journal-meta>
            <journal-title-group>
                <journal-title>Contemporary Education and Teaching Research</journal-title>
            </journal-title-group>
            <issn media_type="print">2737-4203</issn>
            <issn media_type="electronic">2737-4335</issn>
            <publisher>
                <publisher-name>BONI FUTURE DIGITAL PUBLISHING CO.,LIMITED</publisher-name>
            </publisher>
            <url>https://ojs.bonfuturepress.com/index.php/CETR/article/view/1877</url>
            <volume>6</volume>
            <issue>8</issue>
            <year>2025</year>
            <published-time>2025-08-25</published-time>
            <title>Rapid Evolution of Large Language Models in Medical Education: Comparative Performance of ChatGPT-3.5, ChatGPT-5, and DeepSeek on Medical Microbiology MCQs</title>
            <author>Malik Sallam ,Amal Irshaid ,Johan Snygg ,Rula Albadri ,Mohammed Sallam </author>
            <abstract>Rapid advances in large language models (LLMs) warrant specialty-specific benchmarking to assess their educational potential and limitations. We evaluated the newly released generative artificial intelligence (genAI) model ChatGPT-5, DeepSeek-R1, and the early ChatGPT-3.5 on 80 multiple-choice questions (MCQs) from a medical microbiology course examination, weighted for midterm and final components. Items were classified according to the revised Bloom’s taxonomy. Performance was compared with that of more than 150 Doctor of Dental Surgery students. Content quality was assessed independently by two consultants in clinical microbiology using the validated CLEAR tool modified to assess AI content completeness, accuracy, and relevance. The mean total scores were 80.5 for ChatGPT-3.5, 96.0 for ChatGPT-5, and 95.5 for DeepSeek, versus a student mean of 86.21/100. ChatGPT-5 and DeepSeek-R1 significantly outperformed ChatGPT-3.5 in completeness and accuracy scores, with no differences between them. ChatGPT-5 maintained high accuracy across lower- and higher-order cognitive Bloom’s domains, whereas DeepSeek-R1 showed a significant drop in higher-order items. For ChatGPT-3.5, incorrect responses had longer answer-choice word counts. CLEAR scores were significantly higher for correct versus incorrect responses in all models (p < 0.001). This study showed that the currently available LLMs can exceed average student performance in medical microbiology while providing high-quality explanations. Regular benchmarking is essential to ensure responsible integration of genAI into educational, pedagogical, and assessment tools.</abstract>
            <keywords>ChatGPT-5,artificial intelligence,large language models,medical education,medical microbiology, assessment</keywords>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.61360/BoniCETR252018770801</article-id>
        </article-meta>
    </front>
    <tbody>
        <back>
            <sec/>
            <ref-list>
                <ref>
                   <element-citation publication-type="journal">
                       <p>Abdaljaleel, M., Barakat, M., Alsanafi, M., Salim, N. A., Abazid, H., Malaeb, D., et al. (2024). A multinational study on the factors influencing university students’ attitudes and usage of ChatGPT. Scientific Reports, 14(1), 1983. doi:10.1038/s41598-024-52549-8&#13;
Ateeq, A., Alzoraiki, M., &amp; Milhem, M. (2024). Artificial intelligence in education: implications for academic integrity and the shift toward holistic assessment. Frontiers in Education, 9, 1470979. doi:10.3389/feduc.2024.1470979&#13;
Azaria, A., Azoulay, R., &amp; Reches, S. (2023). ChatGPT is a Remarkable Tool—For Experts. Data Intelligence, 6, 1-49. doi:10.1162/dint_a_00235&#13;
Barakat, M., Salim, N. A., &amp; Sallam, M. (2025). University Educators Perspectives on ChatGPT: A Technology Acceptance Model-Based Study. Open Praxis, 17(1), 129–144. doi:10.55982/openpraxis.17.1.718&#13;
Bharatha, A., Ojeh, N., Fazle Rabbi, A. M., Campbell, M. H., Krishnamurthy, K., Layne-Yarde, R. N. A., et al. (2024). Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy. Adv Med Educ Pract, 15, 393-400. doi:10.2147/amep.S457408&#13;
Bushuyev, S., Puziichuk, A., Bushueva, N., Bushuyeva, V., &amp; Bushuyev, D. (2025). The evolving landscape of education under the influence of AI. Bulletin of NTU KhPI Series Strategic Management Portfolio Program and Project Management, 3-8. doi:10.20998/2413-3000.2024.9.1&#13;
Chelli, M., Descamps, J., Lavoué, V., Trojani, C., Azar, M., Deckert, M., et al. (2024). Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res, 26, e53164. doi:10.2196/53164&#13;
Córdova-Esparza, D.-M. (2025). AI-Powered Educational Agents: Opportunities, Innovations, and Ethical Challenges. Information, 16(6), 469. doi:10.3390/info16060469&#13;
Fu, Y., &amp; Weng, Z. (2024). Navigating the ethical terrain of AI in education: A systematic review on framing responsible human-centered AI practices. Computers and Education: Artificial Intelligence, 7, 100306. doi:10.1016/j.caeai.2024.100306&#13;
Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., et al. (2023). How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ, 9, e45312. doi:10.2196/45312&#13;
Gurajala, S. (2024). Artificial intelligence (AI) and medical microbiology: A narrative review. Indian Journal of Microbiology Research, 11, 156-162. doi:10.18231/j.ijmr.2024.029&#13;
Haugen, H. J., &amp; de Lange, T. (2024). Multiple choice as formative assessment in dental education. Eur J Dent Educ, 28(3), 757-769. doi:10.1111/eje.13002&#13;
Herrmann-Werner, A., Festl-Wietek, T., Holderried, F., Herschbach, L., Griewatz, J., Masters, K., et al. (2024). Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study. J Med Internet Res, 26, e52113. doi:10.2196/52113&#13;
Hirani, R., Noruzi, K., Khuram, H., Hussaini, A. S., Aifuwa, E. I., Ely, K. E., et al. (2024). Artificial Intelligence and Healthcare: A Journey through History, Present Innovations, and Future Possibilities. Life (Basel), 14(5), 557. doi:10.3390/life14050557&#13;
Holzinger, A., Saranti, A., Angerschmid, A., Finzel, B., Schmid, U., &amp; Mueller, H. (2023). Toward human-level concept learning: Pattern benchmarking for AI algorithms. Patterns (N Y), 4(8), 100788. doi:10.1016/j.patter.2023.100788&#13;
Hu, C., Li, F., Wang, S., Gao, Z., Pan, S., &amp; Qing, M. (2025). The role of artificial intelligence in enhancing personalized learning pathways and clinical training in dental education. Cogent Education, 12(1), 2490425. doi:10.1080/2331186X.2025.2490425&#13;
Jiang, Q., Gao, Z., &amp; Karniadakis, G. (2025). DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks. arXiv. doi:10.48550/arXiv.2502.17764&#13;
Jin, I., Tangsrivimol, J. A., Darzi, E., Hassan Virk, H. U., Wang, Z., Egger, J., et al. (2025). DeepSeek vs. ChatGPT: prospects and challenges. Front Artif Intell, 8, 1576992. doi:10.3389/frai.2025.1576992&#13;
Joshi, L. T. (2021). Using alternative teaching and learning approaches to deliver clinical microbiology during the COVID-19 pandemic. FEMS Microbiol Lett, 368(16). doi:10.1093/femsle/fnab103&#13;
Karahan, B. N., &amp; Emekli, E. (2025). Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots. Radiography (Lond), 31(5), 103087. doi:10.1016/j.radi.2025.103087&#13;
Katona, J., &amp; Gyonyoru, K. I. K. (2025). AI-based Adaptive Programming Education for Socially Disadvantaged Students: Bridging the Digital Divide. TechTrends. doi:10.1007/s11528-025-01088-8&#13;
Khan, M. S., Umer, H., &amp; Faruqe, F. (2024). Artificial intelligence for low income countries. Humanities and Social Sciences Communications, 11(1), 1422. doi:10.1057/s41599-024-03947-w&#13;
Kim, J., Yu, S., Detrick, R., &amp; Li, N. (2025). Exploring students’ perspectives on Generative AI-assisted academic writing. Education and Information Technologies, 30(1), 1265-1300. doi:10.1007/s10639-024-12878-7&#13;
Kovalainen, T., Pramila-Savukoski, S., Kuivila, H.-M., Juntunen, J., Jarva, E., Rasi, M., et al. (2025). Utilising artificial intelligence in developing education of health sciences higher education: An umbrella review of reviews. Nurse Education Today, 147, 106600. doi:10.1016/j.nedt.2025.106600&#13;
Lin, Z., Guan, S., Zhang, W., Zhang, H., Li, Y., &amp; Zhang, H. (2024). Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models. Artificial Intelligence Review, 57(9), 243. doi:10.1007/s10462-024-10896-y&#13;
Martens, D., Shmueli, G., Evgeniou, T., Bauer, K., Janiesch, C., Feuerriegel, S., et al. (2025). Beware of “Explanations” of AI. arXiv. doi:10.48550/arXiv.2504.06791&#13;
Matarazzo, A., &amp; Torlone, R. (2025). A Survey on Large Language Models with some Insights on their Capabilities and Limitations. arXiv. doi:10.48550/arXiv.2501.04040&#13;
Mawarsih, P. B., Nadzifah, H., Puspa Widuri, A. W., &amp; Kurniawati, E. (2025). Generative AI in higher education: the ChatGPT effect. Asia Pacific Journal of Education, 1-3. doi:10.1080/02188791.2024.2420309&#13;
Michel-Villarreal, R., Vilalta-Perdomo, E., Salinas-Navarro, D. E., Thierry-Aguilera, R., &amp; Gerardou, F. S. (2023). Challenges and Opportunities of Generative AI for Higher Education as Explained by ChatGPT. Education Sciences, 13(9), 856. doi:10.3390/educsci13090856&#13;
Mirea, C.-M., Bologa, R., Toma, A., Clim, A., Plăcintă, D.-D., &amp; Bobocea, A. (2025). Transforming Learning with Generative AI: From Student Perceptions to the Design of an Educational Solution. Applied Sciences, 15(10), 5785. doi:10.3390/app15105785&#13;
Mohseni, P., &amp; Ghorbani, A. (2024). Exploring the synergy of artificial intelligence in microbiology: Advancements, challenges, and future prospects. Computational and Structural Biotechnology Reports, 1, 100005. doi:10.1016/j.csbr.2024.100005&#13;
Monrad, S., Zaidi, L., Grob, K., Kurtz, J., Tai, A., Hortsch, M., et al. (2021). What faculty write versus what students see? Perspectives on multiple-choice questions using Bloom's taxonomy. Medical teacher, 43, 1-12. doi:10.1080/0142159X.2021.1879376&#13;
Nelson, A. S., Santamaría, P. V., Javens, J. S., &amp; Ricaurte, M. (2025). Students’ Perceptions of Generative Artificial Intelligence (GenAI) Use in Academic Writing in English as a Foreign Language. Education Sciences, 15(5), 611. doi:10.3390/educsci15050611&#13;
Newton, P., &amp; Xiromeriti, M. (2024). ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review. Assessment &amp; Evaluation in Higher Education, 49(6), 781-798. doi:10.1080/02602938.2023.2299059&#13;
Newton, P. M. (2020). Guidelines for Creating Online MCQ-Based Exams to Evaluate Higher Order Learning and Reduce Academic Misconduct. In S. E. Eaton (Ed.), Handbook of Academic Integrity (pp. 1-17). Singapore: Springer Nature Singapore.&#13;
Oyekunle, D., Nwaiku, M., Matthew, U., Onyedibe, N., Onyedibe, O., Nwanakwaugwu, A., et al. (2024). Transition to Sustainable Human-Centric Education in Emerging Artificial Intelligence Industry 5.0: Conversational AI With User-Centric ChatGPT-5. In (pp. 37-76).&#13;
Parekh, P., &amp; Bahadoor, V. (2024). The Utility of Multiple-Choice Assessment in Current Medical Education: A Critical Review. Cureus, 16(5), e59778. doi:10.7759/cureus.59778&#13;
Parthasarathy, V., Zafar, A., Khan, A., &amp; Shahid, A. (2024). The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities. arXiv. doi:10.48550/arXiv.2408.13296&#13;
Parveen, D., &amp; Ramzan, S. (2024). The Role of Digital Technologies in Education: Benefits and Challenges. International Research Journal on Advanced Engineering and Management (IRJAEM), 2, 2029-2037. doi:10.47392/IRJAEM.2024.0299&#13;
Pesovski, I., Santos, R., Henriques, R., &amp; Trajkovik, V. (2024). Generative AI for Customizable Learning Experiences. Sustainability, 16, 3034. doi:10.3390/su16073034&#13;
Rajaram, K. (2023). Future of Learning: Teaching and Learning Strategies. In K. Rajaram (Ed.), Learning Intelligence: Innovative and Digital Transformative Learning Strategies: Cultural and Social Engineering Perspectives (pp. 3-53). Singapore: Springer Nature Singapore.&#13;
Richardson, M., &amp; Clesham, R. (2021). Rise of the machines? The evolving role of Artificial Intelligence (AI) technologies in high stakes assessment. London Review of Education, 19. doi:10.14324/LRE.19.1.09&#13;
Rodger, D., Mann, S. P., Earp, B., Savulescu, J., Bobier, C., &amp; Blackshaw, B. P. (2025). Generative AI in healthcare education: How AI literacy gaps could compromise learning and patient safety. Nurse Education in Practice, 87, 104461. doi:10.1016/j.nepr.2025.104461&#13;
Rony, M. K. K., Parvin, M. R., Wahiduzzaman, M., Debnath, M., Bala, S. D., &amp; Kayesh, I. (2024). “I Wonder if my Years of Training and Expertise Will be Devalued by Machines”: Concerns About the Replacement of Medical Professionals by Artificial Intelligence. SAGE Open Nurs, 10, 23779608241245220. doi:10.1177/23779608241245220&#13;
Rudolph, J., Tan, S., &amp; Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of applied learning and teaching, 6(1), 342-363. doi:10.37074/jalt.2023.6.1.9&#13;
Sallam, M. (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel), 11(6), 887. doi:10.3390/healthcare11060887&#13;
Sallam, M., Al-Mahzoum, K., Almutawaa, R. A., Alhashash, J. A., Dashti, R. A., AlSafy, D. R., et al. (2024a). The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses. BMC Research Notes, 17(1), 247. doi:10.1186/s13104-024-06920-7&#13;
Sallam, M., Al-Mahzoum, K., Eid, H., Al-Salahat, K., Sallam, M., Ali, G., et al. (2025a). Chinese Generative AI Models Challenge Western AI in Clinical Chemistry MCQs: A Benchmarking Follow-up Study on AI Use in Health Education. Babylonian Journal of Artificial Intelligence, 2025, 1-14. doi:10.58496/BJAI/2025/001&#13;
Sallam, M., Al-Mahzoum, K., Sallam, M., &amp; Mijwil, M. M. (2025b). DeepSeek: Is it the End of Generative AI Monopoly or the Mark of the Impending Doomsday? Mesopotamian Journal of Big Data, 2025, 26-34. doi:10.58496/MJBD/2025/002&#13;
Sallam, M., &amp; Al-Salahat, K. (2023). Below average ChatGPT performance in medical microbiology exam compared to university students. Frontiers in Education, 8, 1333415. doi:10.3389/feduc.2023.1333415&#13;
Sallam, M., Al-Salahat, K., &amp; Al-Ajlouni, E. (2023a). ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus, 15(12), e50629. doi:10.7759/cureus.50629&#13;
Sallam, M., Al-Salahat, K., Eid, H., Egger, J., &amp; Puladi, B. (2024b). Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions. Adv Med Educ Pract, 15, 857-871. doi:10.2147/amep.S479801&#13;
Sallam, M., Barakat, M., &amp; Sallam, M. (2023b). Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus, 15(11), e49373. doi:10.7759/cureus.49373&#13;
Sallam, M., Barakat, M., &amp; Sallam, M. (2024c). A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res, 13, e54704. doi:10.2196/54704&#13;
Sallam, M., Khalil, R., &amp; Sallam, M. (2024d). Benchmarking Generative AI: A Call for Establishing a Comprehensive Framework and a Generative AIQ Test. Mesopotamian Journal of Artificial Intelligence in Healthcare, 2024, 69-75. doi:10.58496/MJAIH/2024/010&#13;
Sallam, M., Salim, N. A., Barakat, M., &amp; Al-Tammemi, A. B. (2023c). ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J, 3(1), e103. doi:10.52225/narra.v3i1.103&#13;
Sallam, M., &amp; Sallam, M. (2025). Ethical aspects of implementing generative artificial intelligence in medical education: a narrative review. History and Philosophy of Medicine, 7, 18–25. doi:10.53388/HPM2025020&#13;
Scarlatos, A., Liu, N., Lee, J., Baraniuk, R., &amp; Lan, A. (2025). Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues. arXiv. doi:10.48550/arXiv.2503.06424&#13;
Schmidt, D. A., Alboloushi, B., Thomas, A., &amp; Magalhaes, R. (2025). Integrating artificial intelligence in higher education: perceptions, challenges, and strategies for academic innovation. Computers and Education Open, 9, 100274. doi:10.1016/j.caeo.2025.100274&#13;
Sharma, S., Mittal, P., Kumar, M., &amp; Bhardwaj, V. (2025). The role of large language models in personalized learning: a systematic review of educational impact. Discover Sustainability, 6(1), 243. doi:10.1007/s43621-025-01094-z&#13;
Singh, S. P., &amp; Nagmoti, J. M. (2021). Strengthening clinical microbiology skill acquisition; a nationwide survey of faculty perceptions &amp; practices on teaching &amp; assessment of practical skills to undergraduate students. Indian Journal of Medical Microbiology, 39(2), 154-158. doi:10.1016/j.ijmmb.2020.11.003&#13;
Skryd, A., &amp; Lawrence, K. (2024). ChatGPT as a Tool for Medical Education and Clinical Decision-Making on the Wards: Case Study. JMIR Form Res, 8, e51346. doi:10.2196/51346&#13;
Storey, V. C., Yue, W. T., Zhao, J. L., &amp; Lukyanenko, R. (2025). Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research. Information Systems Frontiers. doi:10.1007/s10796-025-10581-7&#13;
Tan, X., Cheng, G., &amp; Ling, M. H. (2025). Artificial intelligence in teaching and teacher professional development: A systematic review. Computers and Education: Artificial Intelligence, 8, 100355. doi:10.1016/j.caeai.2024.100355&#13;
Trikoili, A., Georgiou, D., Pappa, C. I., &amp; Pittich, D. (2025). Critical Thinking Assessment in Higher Education: A Mixed-Methods Comparative Analysis of AI and Human Evaluator. International Journal of Human–Computer Interaction, 1-14. doi:10.1080/10447318.2025.2499164&#13;
Vieriu, A. M., &amp; Petrea, G. (2025). The Impact of Artificial Intelligence (AI) on Students’ Academic Development. Education Sciences, 15(3), 343. doi:10.3390/educsci15030343&#13;
Weng, Z., &amp; Fu, Y. (2025). Generative AI in Language Education: Bridging Divide and Fostering Inclusivity. International Journal of Technology in Education, 8, 395-420. doi:10.46328/ijte.1056&#13;
Wong, W. K. O. (2024). The sudden disruptive rise of generative artificial intelligence? An evaluation of their impact on higher education and the global workplace. Journal of Open Innovation: Technology, Market, and Complexity, 10(2), 100278. doi:10.1016/j.joitmc.2024.100278&#13;
Wu, Y., Zheng, Y., Feng, B., Yang, Y., Kang, K., &amp; Zhao, A. (2024). Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students. JMIR Med Educ, 10, e52483. doi:10.2196/52483&#13;
Xia, Q., Weng, X., Ouyang, F., Lin, T. J., &amp; Chiu, T. K. F. (2024). A scoping review on how generative artificial intelligence transforms assessment in higher education. International Journal of Educational Technology in Higher Education, 21(1), 40. doi:10.1186/s41239-024-00468-z&#13;
Ying, L., Collins, K., Wong, L., Sucholutsky, I., Liu, R., Weller, A., et al. (2025). On Benchmarking Human-Like Intelligence in Machines. arXiv. doi:10.48550/arXiv.2502.20502&#13;
Yusuf, A., Pervin, N., &amp; Román-González, M. (2024). Generative AI and the future of higher education: a threat to academic integrity or reformation? Evidence from multicultural perspectives. International Journal of Educational Technology in Higher Education, 21(1), 21. doi:10.1186/s41239-024-00453-6&#13;
Zhu, Y. (2025). Revolutionizing simulation-based clinical training with AI: Integrating FASSLING for enhanced emotional intelligence and therapeutic competency in clinical psychology education. Journal of Clinical Technology and Theory, 2, 38-54. doi:10.54254/3049-5458/2025.21247</p>
                   </element-citation>
                </ref>
            </ref-list>
        </back>
    </tbody>
</article>