Segmentation of Speech and Humming in Vocal Input

Loading...
Thumbnail Image
Date
2012-09
ORCID
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Společnost pro radioelektronické inženýrství
Abstract
Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.
Description
Citation
Radioengineering. 2012, vol. 21, č. 3, s. 923-929. ISSN 1210-2512
http://www.radioeng.cz/fulltexts/2012/12_03_0923_0929.pdf
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Document licence
Creative Commons Attribution 3.0 Unported License
http://creativecommons.org/licenses/by/3.0/
DOI
Collections
Citace PRO