Physical Sciences and Mathematics | Open Access Articles

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou Dec 2022

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou

Machine Learning Faculty Publications

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. …

Go to article

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li Sep 2022

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li

Research Collection School Of Computing and Information Systems

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s [25] hierarchical generative process and non-semantic factors by …

Go to article

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito Apr 2022

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this …

Go to article

Mixture Models In Machine Learning, Soumyabrata Pal Mar 2022

Mixture Models In Machine Learning, Soumyabrata Pal

Doctoral Dissertations

Modeling with mixtures is a powerful method in the statistical toolkit that can be used for representing the presence of sub-populations within an overall population. In many applications ranging from financial models to genetics, a mixture model is used to fit the data. The primary difficulty in learning mixture models is that the observed data set does not identify the sub-population to which an individual observation belongs. Despite being studied for more than a century, the theoretical guarantees of mixture models remain unknown for several important settings. In this thesis, we look at three groups of problems. The first part …

Go to article

Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou

Machine Learning Faculty Publications

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li

Research Collection School Of Computing and Information Systems

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

Mixture Models In Machine Learning, Soumyabrata Pal

Doctoral Dissertations

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses