Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

izvorni znanstveni rad

Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

Mladen Sokele

Vrsta prilog u časopisu

Tip izvorni znanstveni rad

Godina 2024

Časopis Computers (Basel)

Volumen 13

Svesčić 2

Stranice 39, 23

DOI 10.3390/computers13020039

EISSN 2073-431X

Status objavljeno

Sažetak

This paper introduces a novel approach to the creation and application of confusion matrices for error pattern discovery in spellchecking for the Croatian language. The experimental dataset has been derived from a corpus of mistyped words and user corrections collected since 2008 using the Croatian spellchecker available at ispravi.me. The important role of confusion matrices in enhancing the precision of spellcheckers, particularly within the diverse linguistic context of the Croatian language, is investigated. Common causes of spelling errors, emphasizing the challenges posed by diacritic usage, have been identified and analyzed. This research contributes to the advancement of spellchecking technologies and provides a more comprehensive understanding of linguistic details, particularly in languages with diacritic-rich orthographies, like Croatian. The presented user-data-driven approach demonstrates the potential for custom spellchecking solutions, especially considering the ever-changing dynamics of language use in digital communication.

Ključne riječi

natural language processing; spellchecking; confusion matrix; Zipf–Mandelbrot law; spelling errors; language properties

Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

Sažetak

Ključne riječi

Ostale publikacije

Automatizacija dostupnosti virtualnog okruženja pomoću PowerCLI

Usporedba performansi i implementaijce prosljeđivanja grafičkih kartica na hyper-v i kvm hipervizorima

SMB Over QUIC: A Performance Evaluation