Croatian Text Summarizer (CROSUM)

izvorni znanstveni rad

izvorni znanstveni rad

Croatian Text Summarizer (CROSUM)

Vrsta prilog sa skupa (u zborniku)
Tip izvorni znanstveni rad
Godina 2005
Nadređena publikacija Proceedings of the 27th International Conference on Information Technology Interfaces (ITI)
Stranice str. 651-657
Status objavljeno

Sažetak

The paper describes automatic summarization of the scientific papers in Croatian language. The goal of the CROSUM is to generate extracts with high percent of extract-worthiness and about the same size as the author's abstract. This preliminary research shows that extracts generated using the lemmatized wordforms dictionary are not quite different from extracts that are given on the base of the non-lemmatized wordforms dictionary. The research brought us to conclusion that we should develop a technique for identifying cue phrases from training corpus or some linguistic technique in order to improve the text summarization for Croatian language.

Ključne riječi

Text summarizer; Croatian language; extract; inflected language; tfidf words