Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Electrical and Computer Engineering

Marquette University

Electrical and Computer Engineering Faculty Research and Publications

Speech recognition

Articles 1 - 6 of 6

Full-Text Articles in Engineering

Rnn Language Model With Word Clustering And Class-Based Output Layer, Yongzhe Shi, Wei-Qiang Zhang, Jia Liu, Michael T. Johnson Jul 2013

Rnn Language Model With Word Clustering And Class-Based Output Layer, Yongzhe Shi, Wei-Qiang Zhang, Jia Liu, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNNLM. In this method, word class information is incorporated into the output layer by utilizing the Brown clustering algorithm to estimate a class-based language model. Experimental results show that the new output layer with word clustering not only improves the convergence obviously but also reduces the perplexity and word error rate in large vocabulary continuous speech recognition.


Efficient Embedded Speech Recognition For Very Large Vocabulary Mandarin Car-Navigation Systems, Yanmin Qian, Jia Liu, Michael T. Johnson Aug 2009

Efficient Embedded Speech Recognition For Very Large Vocabulary Mandarin Car-Navigation Systems, Yanmin Qian, Jia Liu, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

Automatic speech recognition (ASR) for a very large vocabulary of isolated words is a difficult task on a resource-limited embedded device. This paper presents a novel fast decoding algorithm for a Mandarin speech recognition system which can simultaneously process hundreds of thousands of items and maintain high recognition accuracy. The proposed algorithm constructs a semi-tree search network based on Mandarin pronunciation rules, to avoid duplicate syllable matching and save redundant memory. Based on a two-stage fixed-width beam-search baseline system, the algorithm employs a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce recognition time. This algorithm …


Minimum Mean-Squared Error Estimation Of Mel-Frequency Cepstral Coefficients Using A Novel Distortion Model, Kevin M. Indrebo, Richard J. Povinelli, Michael T. Johnson Oct 2008

Minimum Mean-Squared Error Estimation Of Mel-Frequency Cepstral Coefficients Using A Novel Distortion Model, Kevin M. Indrebo, Richard J. Povinelli, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

In this paper, a new method for statistical estimation of Mel-frequency cepstral coefficients (MFCCs) in noisy speech signals is proposed. Previous research has shown that model-based feature domain enhancement of speech signals for use in robust speech recognition can improve recognition accuracy significantly. These methods, which typically work in the log spectral or cepstral domain, must face the high complexity of distortion models caused by the nonlinear interaction of speech and noise in these domains. In this paper, an additive cepstral distortion model (ACDM) is developed, and used with a minimum mean-squared error (MMSE) estimator for recovery of MFCC features …


Sub-Banded Reconstructed Phase Spaces For Speech Recognition, Kevin M Indrebo, Richard J. Povinelli, Michael T. Johnson Jul 2006

Sub-Banded Reconstructed Phase Spaces For Speech Recognition, Kevin M Indrebo, Richard J. Povinelli, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

A novel method combining filter banks and reconstructed phase spaces is proposed for the modeling and classification of speech. Reconstructed phase spaces, which are based on dynamical systems theory, have advantages over spectral-based analysis methods in that they can capture nonlinear or higher-order statistics. Recent work has shown that the natural measure of a reconstructed phase space can be used for modeling and classification of phonemes. In this work, sub-banding of speech, which has been examined for recognition of noise-corrupted speech, is studied in combination with phase space reconstruction. This sub-banding, which is motivated by empirical psychoacoustical studies, is shown …


Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces, Michael T. Johnson, Richard J. Povinelli, Andrew C. Lindgren, Jinjin Ye, Xiaolin Liu, Kevin M Indrebo Jul 2005

Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces, Michael T. Johnson, Richard J. Povinelli, Andrew C. Lindgren, Jinjin Ye, Xiaolin Liu, Kevin M Indrebo

Electrical and Computer Engineering Faculty Research and Publications

This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a …


Capacity And Complexity Of Hmm Duration Modeling Techniques, Michael T. Johnson May 2005

Capacity And Complexity Of Hmm Duration Modeling Techniques, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

The ability of a standard hidden Markov model (HMM) or expanded state HMM (ESHMM) to accurately model duration distributions of phonemes is compared with specific duration-focused approaches such as semi-Markov models or variable transition probabilities. It is demonstrated that either a three-state ESHMM or a standard HMM with an increased number of states is capable of closely matching both Gamma distributions and duration distributions of phonemes from the TIMIT corpus, as measured by Bhattacharyya distance to the true distributions. Standard HMMs are easily implemented with off-the-shelf tools, whereas duration models require substantial algorithmic development and have higher computational costs when …