Entire DC Network | Open Access Articles | Digital Commons Network™

Blind Speaker Clustering, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski Dec 2006

Blind Speaker Clustering, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such …

Go to article

Generic Modeling Applied To Speaker Count, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski Dec 2006

Generic Modeling Applied To Speaker Count, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

The problem of determing the number of speakers participating in a conversation and building their models in short conversations, within an unknown group of speakers, is addressed in this paper. The lack of information about the number of speakers and the unavailability of sufficient data present a challenging task of efficiently estimating the speaker model parameters. The proposed method uses a novel generic speaker identification (GSID) system as a guide in the model building process. The GSID system is designed performing speaker identification where the speaker associated with the test data may not be enrolled. The models in the GSID …

Go to article

Detection Of A Third Speaker In Telephone Conversations, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt Sep 2006

Detection Of A Third Speaker In Telephone Conversations, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt

Ananth N Iyer

Differentiating speakers participating in telephone conversations is a challenging task in speech processing because only short consecutive utterances can be examined for each speaker. Research has shown that, given only brief utterances (1 second or less), humans can recognize speakers with an accuracy of about 54% on average. The task becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining whether there are two or three speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique …

Go to article

A Novel Approach To Automated Source Separation In Multispeaker Environment, Robert M. Nickel, Ananth N. Iyer May 2006

A Novel Approach To Automated Source Separation In Multispeaker Environment, Robert M. Nickel, Ananth N. Iyer

Ananth N Iyer

We are proposing a new approach to the solution of the cocktail party problem (CPP). The goal of the CPP is to isolate the speech signals of individuals who are concurrently talking while being recorded with a properly positioned microphone array. The new approach provides a powerful yet simple alternative to commonly used methods for the separation of speakers. It is based on the observation that the estimation of the signal transfer matrix between speakers and microphones is significantly simplified if one can assure that during certain periods of the conversation only one speaker is active while all other speakers …

Go to article

Emotion Detection From Infant Facial Expressions And Cries, Pritam Pal, Ananth N. Iyer, Robert E. Yantorno May 2006

Emotion Detection From Infant Facial Expressions And Cries, Pritam Pal, Ananth N. Iyer, Robert E. Yantorno

Ananth N Iyer

A new system for translating the infant cries from its facial image and cry sounds is presented in this paper. The system is designed to analyze the facial image and sound of the crying infant to derive the reason why the infant is crying. The image and the sound represent the same cry event. The image processing module determines the state of certain facial features, certain combinations of which determine the reason for crying. The sound processing module analyzes the data for the fundamental frequency and the first two formants and uses k-means clustering to determine the reason of the …

Go to article

Digital Commons Network^™

Full-Text Articles in Entire DC Network

Blind Speaker Clustering, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

Generic Modeling Applied To Speaker Count, Ananth N. Iyer, Uchechukwu O. Ofoegbu, Robert E. Yantorno, Brett Y. Smolenski

Ananth N Iyer

Detection Of A Third Speaker In Telephone Conversations, Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt

Ananth N Iyer

A Novel Approach To Automated Source Separation In Multispeaker Environment, Robert M. Nickel, Ananth N. Iyer

Ananth N Iyer

Emotion Detection From Infant Facial Expressions And Cries, Pritam Pal, Ananth N. Iyer, Robert E. Yantorno

Ananth N Iyer