Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study

Artículo de revista

Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study

Taejun Kim et al · Elsevier · 2026

Material complementario disponible

Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Autor / responsable

Taejun Kim et al

Editorial

Elsevier

Año

2026

ISSN

0020-6539

ISSN

0020-6539

Idioma

eng

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Material complementario disponible

El enlace apunta a material asociado, anexos, tablas, datos o página complementaria. No se marca como libro/texto completo.

Abrir material

Resumen

Descripción general del contenido del recurso.

Introduction and aims: To evaluate the diagnostic and reasoning capabilities of 4 state-of-the-art large language models (LLMs) on the Korean Dental Licensing Examination (KDLE) and to assess their potential as educational tools in dentistry. Methods: Four LLMs—ChatGPT-4o, Claude-4 Opus, Gemini 2.5 Pro, and DeepSeek-V3—were evaluated using official KDLE question sets from 2024 and 2025 (n = 642 questions total). The primary endpoint was overall accuracy across all items, with modality-level and subject-wise analyses conducted as secondary and exploratory assessments. Questions covered 13 dental subjects and included both text-only and image-based items. Performance was analyzed using Cochran's Q test for overall comparisons, McNemar's test for pairwise contrasts, and Cohen's kappa for inter-model agreement. Statistical significance was set at p < .05. Results: All LLMs exceeded the passing threshold of 180 points. ChatGPT-4o (mean score: 251.5), Claude-4 Opus (mean score: 256.5), and Gemini 2.5 Pro (mean score: 270.5) achieved performance approached or exceeding student examinees, while DeepSeek-V3 underperformed (mean score: 218.5) despite passing. Significant performance differences existed among models (Q = 116.40, p < .001), except between ChatGPT-4o and Claude-4 Opus (p > 0.05). All models demonstrated superior performance on text-only versus image-based questions. LLMs consistently outperformed students in Oral Biology but underperformed in Oral and Maxillofacial Radiology. Cohen's kappa revealed substantial inter-model agreement (κ = 0.631-0.778). Conclusion: Contemporary LLMs demonstrate competent performance on standardized dental licensing examinations, with 3 models achieving near-human competency. However, persistent limitations in visual interpretation and clinical reasoning suggest their role should remain supplementary to human expertise in dental education and practice. Clinical Relevance: While LLMs show promise as educational tools for exam preparation and knowledge reinforcement, their limitations in visual interpretation and integrative clinical reasoning necessitate continued human oversight in clinical decision-making contexts.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, T. K. E. (2026). Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study. https://doi.org/10.1016/j.identj.2026.109466

MLA

al, Taejun Kim et. "Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study." 2026. https://doi.org/10.1016/j.identj.2026.109466.

Chicago

al, Taejun Kim et. 2026. "Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study.". https://doi.org/10.1016/j.identj.2026.109466.

Harvard

al, T. K. E. 2026, Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study, Elsevier, available at: https://doi.org/10.1016/j.identj.2026.109466 [Accessed 29 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

RIS BibTeX

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título: Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study

Autor / colaboradores: Taejun Kim et al

Editorial: Elsevier

Año de publicación: 2026

ISSN: 0020-6539

ISSN: 0020-6539

Idioma: eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Large language models; Artificial intelligence; Dentistry; Examination questions

Comparative Performance of State-of-the-Art LLMs on the KDLE: A 2025 Benchmark Study

A Comparative Study of Caerin 1.1/1.9 and Calcium Hydroxide in the Treatment of Apical Periodontitis in Rats

Acceso al recurso

Resumen

Cómo citar

APA 7

MLA

Chicago

Harvard

Compartir e imprimir

Exportar referencia

Detalles del recurso

Materias