Open Access. Powered by Scholars. Published by Universities.®

Electrical and Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Electrical and Computer Engineering

Speech Enhancement Using Bayesian Estimators Of The Perceptually-Motivated Short-Time Spectral Amplitude (Stsa) With Chi Speech Priors, Marek B. Trawicki, Michael T. Johnson Feb 2014

Speech Enhancement Using Bayesian Estimators Of The Perceptually-Motivated Short-Time Spectral Amplitude (Stsa) With Chi Speech Priors, Marek B. Trawicki, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

In this paper, the authors propose new perceptually-motivated Weighted Euclidean (WE) and Weighted Cosh (WCOSH) estimators that utilize more appropriate Chi statistical models for the speech prior with Gaussian statistical models for the noise likelihood. Whereas the perceptually-motivated WE and WCOSH cost functions emphasized spectral valleys rather than spectral peaks (formants) and indirectly accounted for auditory masking effects, the incorporation of the Chi distribution statistical models demonstrated distinct improvement over the Rayleigh statistical models for the speech prior. The estimators incorporate both weighting law and shape parameters on the cost functions and distributions. Performance is evaluated in terms of the …


Distributed Multichannel Speech Enhancement Based On Perceptually-Motivated Bayesian Estimators Of The Spectral Amplitude, Marek B. Trawicki, Michael T. Johnson Jun 2013

Distributed Multichannel Speech Enhancement Based On Perceptually-Motivated Bayesian Estimators Of The Spectral Amplitude, Marek B. Trawicki, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

In this study, the authors propose multichannel weighted Euclidean (WE) and weighted cosh (WCOSH) cost function estimators for speech enhancement in the distributed microphone scenario. The goal of the work is to illustrate the advantages of utilising additional microphones and modified cost functions for improving signal-to-noise ratio (SNR) and segmental SNR (SSNR) along with log-likelihood ratio (LLR) and perceptual evaluation of speech quality (PESQ) objective metrics over the corresponding single-channel baseline estimators. As with their single-channel counterparts, the perceptually-motivated multichannel WE and WCOSH estimators are functions of a weighting law parameter, which influences attention of the noisy spectral amplitude through …


Distributed Multichannel Speech Enhancement With Minimum Mean-Square Error Short-Time Spectral Amplitude, Log-Spectral Amplitude, And Spectral Phase Estimation, Marek B. Trawicki, Michael T. Johnson Feb 2012

Distributed Multichannel Speech Enhancement With Minimum Mean-Square Error Short-Time Spectral Amplitude, Log-Spectral Amplitude, And Spectral Phase Estimation, Marek B. Trawicki, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise field for the surrounding environment. Based on the Signal-to-Noise Ratio (SNR) and Segmental Signal-to-Noise Ratio (SSNR) along with the Log-Likelihood Ratio (LLR) and Perceptual Evaluation of Speech Quality (PESQ) as objective metrics, the multichannel LSA estimator decreases background noise and speech distortion and increases speech quality compared to …


Optimal Distributed Microphone Phase Estimation, Marek B. Trawicki, Michael T. Johnson Apr 2009

Optimal Distributed Microphone Phase Estimation, Marek B. Trawicki, Michael T. Johnson

Dr. Dolittle Project: A Framework for Classification and Understanding of Animal Vocalizations

This paper presents a minimum mean-square error spectral phase estimator for speech enhancement in the distributed multiple microphone scenario. The estimator uses Gaussian models for both the speech and noise priors under the assumption of a diffuse incoherent noise field representing ambient noise in a widely dispersed microphone configuration. Experiments demonstrate significant benefits of using the optimal multichannel phase estimator as compared to the noisy phase of a reference channel.


Auditory Coding Based Speech Enhancement, Yao Ren, Michael T. Johnson Apr 2009

Auditory Coding Based Speech Enhancement, Yao Ren, Michael T. Johnson

Dr. Dolittle Project: A Framework for Classification and Understanding of Animal Vocalizations

This paper demonstrates a speech enhancement system based on an efficient auditory coding approach, coding of time-relative structure using spikes. The spike coding method can more compactly represent the non-stationary characteristics of speech signals than the Fourier transform or wavelet transform. Enhancement is accomplished through the use of MMSE thresholding on the spike code. Experimental results show that compared with the spectral domain logSTSA filter, both the subjective spectrogram evaluation and objective SSNR improvement for the proposed approach is better in suppressing noise in high noise situations, with fewer musical artifacts.P


Minimum Mean-Squared Error Estimation Of Mel-Frequency Cepstral Coefficients Using A Novel Distortion Model, Kevin M. Indrebo, Richard J. Povinelli, Michael T. Johnson Oct 2008

Minimum Mean-Squared Error Estimation Of Mel-Frequency Cepstral Coefficients Using A Novel Distortion Model, Kevin M. Indrebo, Richard J. Povinelli, Michael T. Johnson

Electrical and Computer Engineering Faculty Research and Publications

In this paper, a new method for statistical estimation of Mel-frequency cepstral coefficients (MFCCs) in noisy speech signals is proposed. Previous research has shown that model-based feature domain enhancement of speech signals for use in robust speech recognition can improve recognition accuracy significantly. These methods, which typically work in the log spectral or cepstral domain, must face the high complexity of distortion models caused by the nonlinear interaction of speech and noise in these domains. In this paper, an additive cepstral distortion model (ACDM) is developed, and used with a minimum mean-squared error (MMSE) estimator for recovery of MFCC features …


An Improved Snr Estimator For Speech Enhancement, Yao Ren, Michael T. Johnson Mar 2008

An Improved Snr Estimator For Speech Enhancement, Yao Ren, Michael T. Johnson

Dr. Dolittle Project: A Framework for Classification and Understanding of Animal Vocalizations

In this paper, we propose an MMSE a priori SNR estimator for speech enhancement. This estimator has similar benefits to the well-known decision-directed approach, but does not require an ad-hoc weighting factor to balance the past a priori SNR and current ML SNR estimate with smoothing across frames. Performance is evaluated in terms of estimation error and segmental SNR using the standard logSTSA speech enhancement method. Experimental results show that, in contrast with the decision-directed estimator and ML estimator, the proposed SNR estimator can help enhancement algorithms preserve more weak speech information and efficiently suppress musical noise.


Speech Signal Enhancement Through Adaptive Wavelet Thresholding, Michael T. Johnson, Xiaolong Yuan, Yao Ren Feb 2007

Speech Signal Enhancement Through Adaptive Wavelet Thresholding, Michael T. Johnson, Xiaolong Yuan, Yao Ren

Electrical and Computer Engineering Faculty Research and Publications

This paper demonstrates the application of the Bionic Wavelet Transform (BWT), an adaptive wavelet transform derived from a non-linear auditory model of the cochlea, to the task of speech signal enhancement. Results, measured objectively by Signal-to-Noise ratio (SNR) and Segmental SNR (SSNR) and subjectively by Mean Opinion Score (MOS), are given for additive white Gaussian noise as well as four different types of realistic noise environments. Enhancement is accomplished through the use of thresholding on the adapted BWT coefficients, and the results are compared to a variety of speech enhancement techniques, including Ephraim Malah filtering, iterative Wiener filtering, and spectral …