Tehničko veleučilište u Zagrebu · Zagreb

A Method for Estimating Variations in Speech Tempo from Recorded Speech

sažetak izlaganja sa skupa

sažetak izlaganja sa skupa

A Method for Estimating Variations in Speech Tempo from Recorded Speech

Vrsta prilog sa skupa (u zborniku)
Tip sažetak izlaganja sa skupa
Godina 2019
Nadređena publikacija MIPRO 2019
Stranice str. 1277-1282
ISSN 1847-3946
EISSN 1847-3946
Status objavljeno

Sažetak

In this paper we describe a method for measuring variations in speech tempo from speech samples recorded from Croatian news channels, where the text of what was spoken is available through subtitles. For speech recognition we use a feed- forward neural network trained with about 150 seconds of speech. To extract word boundaries, we created a speech-to-text aligner capable of finding an acceptable match between text and sequence of phonemes classified by the neural network. The aligner takes into consideration certain categories of phonemes for which the neural network has higher accuracy. Preliminary experiments show average alignment miss of about one to three phonemes, depending on the speaker, the content, and recording quality.

Ključne riječi

speech recognition, text-to-speech alignment, speech tempo, neural network