Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Engineering

Engineering: Beyond Ears In Pre-College Years, Uchechukwu O. Ofoegbu, Ananth N. Iyer, John Helferty, Joseph Fischgrund Jun 2007

Engineering: Beyond Ears In Pre-College Years, Uchechukwu O. Ofoegbu, Ananth N. Iyer, John Helferty, Joseph Fischgrund

Ananth N Iyer

A 12-week program was developed in which electrical engineering concepts, in form of robotics projects, are taught to students at a secondary educational institution for the deaf and hearing impaired. The robotics course was originally designed for, and has been taught for about a decade to freshmen at the Temple University college of Engineering. The objectives of this project range from eliminating existing boundaries of engineering education to increasing the anticipation of success amongst the physically impaired. A prior breakthrough in the extension of engineering education beyond assumed “limits” was achieved when a young man who was both sight and …


Speaker Recognition In Adverse Conditions, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Stanley J. Wenndt Mar 2007

Speaker Recognition In Adverse Conditions, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Stanley J. Wenndt

Ananth N Iyer

Recognizing speakers from their voices is a challenging area of research with several practical applications. Presently speaker verification (SV) systems achieve a high level of accuracy under ideal conditions such as, when there is ample data to build speaker models and when speaker verification is performed in the presence of little or no interference. In general, these systems assume that the features extracted from the data follow a particular parametric probability density function (pdf), i.e., Gaussian or a mixture of Gaussians; where a form of the pdf is imposed on the speech data rather than determining the underlying structure of …


Unsupervised Indexing Of Noisy Conversations With Short Speaker Utterances, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt Mar 2007

Unsupervised Indexing Of Noisy Conversations With Short Speaker Utterances, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt

Ananth N Iyer

Two speaker indexing system for conversations are presented in this paper. The first method involves indexing two-speaker conversations. In this method, two reference models are judiciously chosen from the conversation such that they represent the two different speakers. Models are then matched to the reference speakers using distance-based comparisons. The second technique is based on first determining the number of participants in the conversation using a speaker count method termed the “Residual Ratio Algorithm” (RRA), and then indexing based on this count. The RRA involves an elimination process in which speech segments matching a chosen set of reference models are …


Blind Speaker Clustering, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski Dec 2006

Blind Speaker Clustering, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such …


Generic Modeling Applied To Speaker Count, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski Dec 2006

Generic Modeling Applied To Speaker Count, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

The problem of determing the number of speakers participating in a conversation and building their models in short conversations, within an unknown group of speakers, is addressed in this paper. The lack of information about the number of speakers and the unavailability of sufficient data present a challenging task of efficiently estimating the speaker model parameters. The proposed method uses a novel generic speaker identification (GSID) system as a guide in the model building process. The GSID system is designed performing speaker identification where the speaker associated with the test data may not be enrolled. The models in the GSID …


Detection Of A Third Speaker In Telephone Conversations, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt Sep 2006

Detection Of A Third Speaker In Telephone Conversations, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt

Ananth N Iyer

Differentiating speakers participating in telephone conversations is a challenging task in speech processing because only short consecutive utterances can be examined for each speaker. Research has shown that, given only brief utterances (1 second or less), humans can recognize speakers with an accuracy of about 54% on average. The task becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining whether there are two or three speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique …


A Novel Approach To Automated Source Separation In Multispeaker Environment, Robert M. Nickel, Ananth N. Iyer May 2006

A Novel Approach To Automated Source Separation In Multispeaker Environment, Robert M. Nickel, Ananth N. Iyer

Ananth N Iyer

We are proposing a new approach to the solution of the cocktail party problem (CPP). The goal of the CPP is to isolate the speech signals of individuals who are concurrently talking while being recorded with a properly positioned microphone array. The new approach provides a powerful yet simple alternative to commonly used methods for the separation of speakers. It is based on the observation that the estimation of the signal transfer matrix between speakers and microphones is significantly simplified if one can assure that during certain periods of the conversation only one speaker is active while all other speakers …


Emotion Detection From Infant Facial Expressions And Cries, Pritam Pal, Ananth N. Iyer, Robert E. Yantorno May 2006

Emotion Detection From Infant Facial Expressions And Cries, Pritam Pal, Ananth N. Iyer, Robert E. Yantorno

Ananth N Iyer

A new system for translating the infant cries from its facial image and cry sounds is presented in this paper. The system is designed to analyze the facial image and sound of the crying infant to derive the reason why the infant is crying. The image and the sound represent the same cry event. The image processing module determines the state of certain facial features, certain combinations of which determine the reason for crying. The sound processing module analyzes the data for the fundamental frequency and the first two formants and uses k-means clustering to determine the reason of the …


Speaker Identification Using Usable Speech Concept, Ananth N. Iyer, Brett Y. Smolenski, Robert E. Yantorno, Jashmin K. Shah, Edward J. Cupples, Stanley J. Wenndt Sep 2004

Speaker Identification Using Usable Speech Concept, Ananth N. Iyer, Brett Y. Smolenski, Robert E. Yantorno, Jashmin K. Shah, Edward J. Cupples, Stanley J. Wenndt

Ananth N Iyer

Most signal processing involves processing a signal without concern for the quality or information content of that signal. In speech processing, speech is processed on a frame-by-frame basis, usually only with concern that the frame is either speech or silence. However, knowing how reliable the information is in a frame of speech can be very important and useful. This is where usable speech detection and extraction can play a very important role. The usable speech frames can be defined as frames of speech that contain higher information content compared to unusable frames with reference to a particular application. We have …


Robust Speaker Verification With Principal Pitch Components, Robert M. Nickel, Sachin P. Oswal, Ananth N. Iyer Sep 2004

Robust Speaker Verification With Principal Pitch Components, Robert M. Nickel, Sachin P. Oswal, Ananth N. Iyer

Ananth N Iyer

We are presenting a new method that improves the accuracy of text dependent speaker identification systems. The new method exploits a set of novel speech features that is derived from a principal component analysis (PC) of voiced speech segments. The new PC features are only weakly correlated with the corresponding cepstral features. A distance measure that combines both, cepstral and PC pitch features provides a discriminative power that cannot be achieved with cepstral features alone. It is well known that the discriminative power of cepstral features declines if the dimensionality of the feature space is increased beyond its optimal value. …


Sequential K-Nn Pattern Recognition For Usable Speech Classification, Jashmin K. Shah, Brett Y. Smolenski, Robert E. Yantorno, Ananth N. Iyer Sep 2004

Sequential K-Nn Pattern Recognition For Usable Speech Classification, Jashmin K. Shah, Brett Y. Smolenski, Robert E. Yantorno, Ananth N. Iyer

Ananth N Iyer

The accuracy of speech processing techniques degrades when operating in a co-channel environment. Co-channel speech occurs when more than one person is talking at the same time. The idea of usable speech segmentation is to identify and extract those portions of co-channel speech that are minimally degraded but still useful for speech processing application such as speaker identification. Usable speech measures are features that are extracted from the co-channel signal to distinguish between usable and unusable speech. In this paper, a new usable speech extraction technique is presented. The new method extracts features recursively and variable length segmentation is performed …


Usable Speech Detection Using A Context Dependent Gaussian Mixture Model Classifier, Robert E. Yantorno, Brett Y. Smolenski, Ananth N. Iyer, Jashmin K. Shah May 2004

Usable Speech Detection Using A Context Dependent Gaussian Mixture Model Classifier, Robert E. Yantorno, Brett Y. Smolenski, Ananth N. Iyer, Jashmin K. Shah

Ananth N Iyer

Speech that is corrupted by nonstationary interference, but contains segments that are still usable for applications such as speaker identification or speech recognition, is referred to as "usable" speech. A common example of nonstationary interference occurs when there is more than one person talking at the same time, which is known as co-channel speech. In general the above speech processing applications do not work in co-channel environments; however, they can work on the extracted usable segments. Unfortunately, currently available usable speech measures only detect about 75% of the total available usable speech. The first reason for this high error stems …


Structural Usable Speech Measure Using Lpc Residual, Ananth N. Iyer, Melinda Gleiter, Brett Y. Smolenski, Robert E. Yantorno Dec 2003

Structural Usable Speech Measure Using Lpc Residual, Ananth N. Iyer, Melinda Gleiter, Brett Y. Smolenski, Robert E. Yantorno

Ananth N Iyer

In an operational environment speech is degraded by many kinds of interferences. The operation of many speech processing techniques are plagued by such interferences. Usable speech extraction is a novel concept of processing degraded speech data. The idea of usable speech is to identify and extract portions of degraded speech that are considered useful for various speech processing systems. The performance reduction of speaker identification systems under degraded conditions and use of usable speech concept to improve the performance has been demonstrated in previous work. A new usable speech measure, based on the structure of Linear Predictive Coding (LPC) residual …