Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Electrical and Computer Engineering

PDF

Series

2005

Speech recognition

Articles 1 - 2 of 2

Full-Text Articles in Computer Engineering

Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces, Michael T. Johnson, Richard J. Povinelli, Andrew C. Lindgren, Jinjin Ye, Xiaolin Liu, Kevin M Indrebo Jul 2005

Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces, Michael T. Johnson, Richard J. Povinelli, Andrew C. Lindgren, Jinjin Ye, Xiaolin Liu, Kevin M Indrebo

Electrical and Computer Engineering Faculty Research and Publications

This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a …


Capacity And Complexity Of Hmm Duration Modeling Techniques, Michael T. Johnson May 2005

Capacity And Complexity Of Hmm Duration Modeling Techniques, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

The ability of a standard hidden Markov model (HMM) or expanded state HMM (ESHMM) to accurately model duration distributions of phonemes is compared with specific duration-focused approaches such as semi-Markov models or variable transition probabilities. It is demonstrated that either a three-state ESHMM or a standard HMM with an increased number of states is capable of closely matching both Gamma distributions and duration distributions of phonemes from the TIMIT corpus, as measured by Bhattacharyya distance to the true distributions. Standard HMMs are easily implemented with off-the-shelf tools, whereas duration models require substantial algorithmic development and have higher computational costs when …