Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Series

Statistics

Institution
Publication Year
Publication

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou Dec 2022

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou

Machine Learning Faculty Publications

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. …


Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li Sep 2022

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li

Research Collection School Of Computing and Information Systems

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s [25] hierarchical generative process and non-semantic factors by …


Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito Apr 2022

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison Jul 2020

Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison

Presentations

Do you trudge through gathering statistics at fiscal or calendar year-end? Do you wonder why you track certain things, thinking many seem outdated or irrelevant? Many places seem to keep counting certain statistics because "that's what they've always done." For e-resources, how do you integrate those with physical counts and reconcile the variations (updated e-resources versus re-cataloged physical items)? What about repository downloads and other web traffic? The quantity of stats that libraries track is staggering and keeps growing. This program will encourage attendees to stop and evaluate what and why they're gathering data and help identify possible alternatives to …


Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen Mar 2020

Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen

Publications

Modern smart homes are being equipped with certain renewable energy resources that can produce their own electric energy. From time to time, these smart homes or microgrids are also capable of supplying energy to other houses, buildings, or energy grid in the time of available self-produced renewable energy. Therefore, researches have been carried out to develop optimal trading strategies, and many recent technologies are also being used in combination with microgrids. One such technology is blockchain, which works over decentralized distributed ledger. In this paper, we develop a blockchain based approach for microgrid energy auction. To make this auction more …


Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano Jan 2020

Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano

Computer Science Faculty Publications

Background: As more protein atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. Methods: We applied a histogram-based outlier score (HBOS) to six sets of cryo-EM atomic structures and five sets of X-ray atomic structures, including one derived from X-ray data with better than 1.5 Å resolution. Cryo-EM data sets contain structures released by December 2016 and those released between 2017 and 2019, derived from resolution ranges 0–4 Å and 4–6 Å respectively. Results: The distribution of HBOS values in five sets of X-ray structures show that HBOS is sensitive distinguishing …


9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association Sep 2019

9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association

Annual Postdoctoral Science Symposium Abstracts

The mission of the Annual Postdoctoral Science Symposium (APSS) is to provide a platform for talented postdoctoral fellows throughout the Texas Medical Center to present their work to a wider audience. The MD Anderson Postdoctoral Association convened its inaugural Annual Postdoctoral Science Symposium (APSS) on August 4, 2011.

The APSS provides a professional venue for postdoctoral scientists to develop, clarify, and refine their research as a result of formal reviews and critiques of faculty and other postdoctoral scientists. Additionally, attendees discuss current research on a broad range of subjects while promoting academic interactions and enrichment and developing new collaborations.


Automated Trading Systems Statistical And Machine Learning Methods And Hardware Implementation: A Survey, Boming Huang, Yuziang Huan, Li Da Xu, Lirong Zheng, Zhuo Zou Jan 2019

Automated Trading Systems Statistical And Machine Learning Methods And Hardware Implementation: A Survey, Boming Huang, Yuziang Huan, Li Da Xu, Lirong Zheng, Zhuo Zou

Information Technology & Decision Sciences Faculty Publications

Automated trading, which is also known as algorithmic trading, is a method of using a predesigned computer program to submit a large number of trading orders to an exchange. It is substantially a real-time decision-making system which is under the scope of Enterprise Information System (EIS). With the rapid development of telecommunication and computer technology, the mechanisms underlying automated trading systems have become increasingly diversified. Considerable effort has been exerted by both academia and trading firms towards mining potential factors that may generate significantly higher profits. In this paper, we review studies on trading systems built using various methods and …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


Best Practice To Accurately Collect, Review, And Present Eresource Data That Is Utilized By Libraries In A University Setting, Matt W. Marcukaitis Dec 2017

Best Practice To Accurately Collect, Review, And Present Eresource Data That Is Utilized By Libraries In A University Setting, Matt W. Marcukaitis

Student Scholarship – Computer Science

This report compares a manual process to a semi-automated process (SUSHI) for collect eResource statistical data for a university library. Along the way with the discussion will show the need for statistical data for an organization. Also, the reason for utilizing an open source software CORAL to help with the SUSHI process. Finally concluding with a recommendation as how to proceed after the findings.


Forest Understory Trees Can Be Segmented Accurately Within Sufficiently Dense Airborne Laser Scanning Point Clouds, Hamid Hamraz, Marco A. Contreras, Jun Zhang Jul 2017

Forest Understory Trees Can Be Segmented Accurately Within Sufficiently Dense Airborne Laser Scanning Point Clouds, Hamid Hamraz, Marco A. Contreras, Jun Zhang

Computer Science Faculty Publications

Airborne laser scanning (LiDAR) point clouds over large forested areas can be processed to segment individual trees and subsequently extract tree-level information. Existing segmentation procedures typically detect more than 90% of overstory trees, yet they barely detect 60% of understory trees because of the occlusion effect of higher canopy layers. Although understory trees provide limited financial value, they are an essential component of ecosystem functioning by offering habitat for numerous wildlife species and influencing stand development. Here we model the occlusion effect in terms of point density. We estimate the fractions of points representing different canopy layers (one overstory and …


Homogenization Of Plastic Deformation In Heterogeneous Lamella Structures, Rui Yuan, Irene J. Beyerlein, Caizhi Zhou Jul 2017

Homogenization Of Plastic Deformation In Heterogeneous Lamella Structures, Rui Yuan, Irene J. Beyerlein, Caizhi Zhou

Materials Science and Engineering Faculty Research & Creative Works

It has been shown that unlike its constituent nanocrystalline (NC) phase, a heterogeneous lamella (HL) composite comprising NC and coarse-grain layers exhibits greatly improved ductility. To understand the origin of this enhancement, we present a 3D discrete dislocation, crystal plasticity finite element model to study the development of strains across this microstructure. Here we show that the HL structure homogenizes the plastic strains in the NC layer, weakening the effect of strain concentrations. These findings can provide valuable insight into the effects of material length scales on material instabilities, which is needed to design heterogeneous structures with superior properties.


Siam Data Mining "Brings It" To Annual Meeting, Jeremy Kepner, Sanjukta Bhowmick, Aydın Buluç, Rajmonda Caceres, R. Jordan Crouser, Vijay Gadepally, Ben Miller, Jennifer Webster Jan 2017

Siam Data Mining "Brings It" To Annual Meeting, Jeremy Kepner, Sanjukta Bhowmick, Aydın Buluç, Rajmonda Caceres, R. Jordan Crouser, Vijay Gadepally, Ben Miller, Jennifer Webster

Computer Science: Faculty Publications

The Data Mining Activity Group is one of SIAM's most vibrant and dynamic activity groups. To better share our enthusiasm for data mining with the broader SIAM community, our activity group organized six minisymposia at the 2016 Annual Meeting. These minisymposia included 48 talks organized by 11 SIAM members on - GraphBLAS (Aydın Buluç) - Algorithms and statistical methods for noisy network analysis (Sanjukta Bhowmick & Ben Miller) - Inferring networks from non-network data (Rajmonda Caceres, Ivan Brugere & Tanya Y. Berger-Wolf) - Visual analytics (Jordan Crouser) - Mining in graph data (Jennifer Webster, Mahantesh Halappanavar & Emilie Hogan) - …


Statistics In League Of Legends: Analyzing Runes For Last-Hitting, Brian M. Hook May 2016

Statistics In League Of Legends: Analyzing Runes For Last-Hitting, Brian M. Hook

Mathematics: Student Scholarship & Creative Works

While other sports have statisticians to evaluate players and their stats, in electronic sports there is a need for statisticians to evaluate different parts of the game. League of Legends is the most popular of ESports and is the focus of this discussion. The mechanic of focus here is runes which give boosts to the players stats in-game like being able to do extra damage. We will be finding the effectiveness of these runes by looking at gold efficiency, help with last hitting, and extra damage dealt through the use of Python.


Quantification Of The Statistical Effects Of Spatiotemporal Processing Of Nontask Fmri Data, M. Muge Karaman, Andrew S. Nencka, Iain P. Bruce, Daniel B. Rowe Nov 2014

Quantification Of The Statistical Effects Of Spatiotemporal Processing Of Nontask Fmri Data, M. Muge Karaman, Andrew S. Nencka, Iain P. Bruce, Daniel B. Rowe

Mathematics, Statistics and Computer Science Faculty Research and Publications

Nontask functional magnetic resonance imaging (fMRI) has become one of the most popular noninvasive areas of brain mapping research for neuroscientists. In nontask fMRI, various sources of “noise” corrupt the measured blood oxygenation level-dependent signal. Many studies have aimed to attenuate the noise in reconstructed voxel measurements through spatial and temporal processing operations. While these solutions make the data more “appealing,” many commonly used processing operations induce artificial correlations in the acquired data. As such, it becomes increasingly more difficult to derive the true underlying covariance structure once the data have been processed. As the goal of nontask fMRI studies …


Shopprofiler: Profiling Shops With Crowdsourcing Data, Xiaonan Guo, Eddie C. L. Chan, Ce Liu, Kaishun Wu, Siyuan Liu, Lionel Ni May 2014

Shopprofiler: Profiling Shops With Crowdsourcing Data, Xiaonan Guo, Eddie C. L. Chan, Ce Liu, Kaishun Wu, Siyuan Liu, Lionel Ni

Research Collection School Of Computing and Information Systems

Sensing data from mobile phones provide us exciting and profitable applications. Recent research focuses on sensing indoor environment, but suffers from inaccuracy because of the limited reachability of human traces or requires human intervention to perform sophisticated tasks. In this paper, we present ShopProfiler, a shop profiling system on crowdsourcing data. First, we extract customer movement patterns from traces. Second, we improve accuracy of building floor plan by adopting a gradient-based approach and then localize shops through WiFi heat map. Third, we categorize shops by designing an SVM classifier in shop space to support multi-label classification. Finally, we infer brand …


Big Data: Immediate Opportunities And Longer Term Challenges, Jens Pohl, Kym Jason Pohl Jul 2013

Big Data: Immediate Opportunities And Longer Term Challenges, Jens Pohl, Kym Jason Pohl

Collaborative Agent Design (CAD) Research Center

The transformation of words, locations, and human interactions into digital data forms the basis of trend detection and information extraction opportunities that can be automated with the increasing availability of relatively inexpensive computer storage and processing technology. Trend detection, which focuses on what, is facilitated by the ability to apply analytics to an entire corpus of data instead of a random sample. Since the corpus essentially includes all data within a population there is no need to apply any of the precautions that are in order to ensure the representativeness of a sample in traditional statistical analysis. Several examples are …


A Bayesian Secondary Analysis In An Asthma Study, Samuel P. Wilcock, Vernon M. Chinchilli, Stephen P. Peters Jun 2011

A Bayesian Secondary Analysis In An Asthma Study, Samuel P. Wilcock, Vernon M. Chinchilli, Stephen P. Peters

ACMS Conference Proceedings 2011

A recent study published in the New England Journal of Medicine by the Asthma Clinical Research Network (ACRN) compared three different treatments for their effectiveness in treating adults with uncontrolled asthma. This paper will describe the study design and its results, then detail the beginnings of a secondary analysis using Bayesian methods to estimate the parameters of interest. The methods will be explained, and the preliminary estimates given and contextualized. The paper will conclude with a discussion of the next steps and the goals for further analysis of the data in this study.


Application Of A Data Mining Framework For The Identification Of Agricultural Production Areas In Wa, Yunous Vagh, Leisa Armstrong, Dean Diepeveen Jan 2010

Application Of A Data Mining Framework For The Identification Of Agricultural Production Areas In Wa, Yunous Vagh, Leisa Armstrong, Dean Diepeveen

Research outputs pre 2011

This paper will propose a data mining framework for the identification of agricultural production areas ill WA. The data mining (DM) framework was developed with the aim of enhancing the analysis of agricultural datasets compared to currently used statistical methods. The DM framework is a synthesis of different technologies brought together for the purpose of enhancing the interrogation of these datasets. The DM framework is based on the data, information, knowledge and wisdom continuum as a horizontal axis, with DM and online analytical processing (OLAP) forming the vertical axis. In addition the DM framework incorporates aspects of data warehousing phases, …


The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch Jan 2008

The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch

CGU Faculty Publications and Research

The number of publications in biomedicine is increasing enormously each year. To help researchers digest the information in these documents, text mining tools are being developed that present co-occurrence relations between concepts. Statistical measures are used to mine interesting subsets of relations. We demonstrate how directionality of these relations affects interestingness. Support and confidence, simple data mining statistics, are used as proxies for interestingness metrics. We first built a test bed of 126,404 directional relations extracted from biomedical abstracts, which we represent as graphs containing a central starting concept and 2 rings of associated relations. We manipulated directionality in four …