← Volver a resultados
Ficha bibliográfica · Consulta y acceso
Artículo

Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles

Fahrudin Mukti Wibowo et al · MMU Press · 2025

Material complementario disponible
Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Material complementario disponible

DOAJ DOAJ - Open Access Journals
El enlace apunta a material asociado, anexos, tablas, datos o página complementaria. No se marca como libro/texto completo.
Abrir material

Resumen

Descripción general del contenido del recurso.

This study addresses the critical challenge of detecting duplicate final year project (FYP) titles in academic institutions, where minor variations like reordering, synonyms, and paraphrasing often obscure plagiarism. We systematically evaluate four string similarity algorithms - Jaro-Winkler, Levenshtein Edit Distance, TF-IDF with Cosine Similarity, and Jaccard Similarity - using a synthetic dataset of 250 title pairs representing common duplication patterns. Our experiments reveal that character-based methods (Jaro-Winkler and Edit Distance) achieve perfect detection (F1-score=1.0) for literal matches, including typographical variations and phrase reordering. At the same time, TF-IDF demonstrates strong semantic capability (F1-score=0.95), albeit with some false positives. Jaccard Similarity performs poorly (Recall=0.40) due to its inability to handle paraphrased content. The analysis of score distributions show a clear separation between duplicates and non-duplicates for character-based approaches, compared to significant overlap in set-based methods. Based on these findings, we propose a practical two-stage screening framework: initial high-confidence filtering using Jaro-Winkler (threshold>0.9) followed by semantic validation with TF-IDF (threshold>0.8). This hybrid approach offers institutions an effective balance between accuracy and computational efficiency for title screening. This study contributes by demonstrating how existing string similarity techniques can be orchestrated into a lightweight, two-stage screening framework tailored for academic title duplication, balancing accuracy with deployment feasibility in institutional settings. Future work should explore multilingual extensions and validation with real-world title datasets to further enhance the practical applicability of these findings.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, F. M. W. E. (2025). Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles. https://journals.mmupress.com/index.php/jiwe/article/view/2107

MLA

al, Fahrudin Mukti Wibowo et. "Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles." 2025. https://journals.mmupress.com/index.php/jiwe/article/view/2107.

Chicago

al, Fahrudin Mukti Wibowo et. 2025. "Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles.". https://journals.mmupress.com/index.php/jiwe/article/view/2107.

Harvard

al, F. M. W. E. 2025, Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles, MMU Press, available at: https://journals.mmupress.com/index.php/jiwe/article/view/2107 [Accessed 24 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título
Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles
Autor / colaboradores
Fahrudin Mukti Wibowo et al
Editorial
MMU Press
Año de publicación
2025
ISSN
2821-370X
ISSN
2821-370X
Idioma
eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Copiado