Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

275 Full-Text Articles 386 Authors 67,053 Downloads 60 Institutions

All Articles in Categorical Data Analysis

Faceted Search

275 full-text articles. Page 1 of 10.

Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. 2019 SDSMT

Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal ...


An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine 2019 Southern Methodist University

An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine

SMU Data Science Review

In this paper, we present an evaluation of training size impact on validation accuracy for an optimized Convolutional Neural Network (CNN). CNNs are currently the state-of-the-art architecture for object classification tasks. We used Amazon’s machine learning ecosystem to train and test 648 models to find the optimal hyperparameters with which to apply a CNN towards the Fashion-MNIST (Mixed National Institute of Standards and Technology) dataset. We were able to realize a validation accuracy of 90% by using only 40% of the original data. We found that hidden layers appear to have had zero impact on validation accuracy, whereas the ...


Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran 2019 Southern Methodist University

Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran

SMU Data Science Review

In this paper, we present a performance comparison of machine learning algorithms executed on traditional and quantum computers. Quantum computing has potential of achieving incredible results for certain types of problems, and we explore if it can be applied to machine learning. First, we identified quantum machine learning algorithms with reproducible code and had classical machine learning counterparts. Then, we found relevant data sets with which we tested the comparable quantum and classical machine learning algorithm's performance. We evaluated performance with algorithm execution time and accuracy. We found that quantum variational support vector machines in some cases had higher ...


Application Of Bradford’S Law Of Scattering On Research Publication In Astronomy & Astrophysics Of India, Satish Kumar, Senthilkumar R. 2018 Bharathiar University, Coimbatore & IIT(ISM) Dhanbad

Application Of Bradford’S Law Of Scattering On Research Publication In Astronomy & Astrophysics Of India, Satish Kumar, Senthilkumar R.

Library Philosophy and Practice (e-journal)

The present study is focused on examining the application of Bradford’s law of scattering on research articles published in the field of Astronomy & Astrophysics by Indian scientist during 1988-2017. The bibliographic data was retrieved from Web of Science (WoS) bibliographic data base for different period of time. Total 18,877 journal’s article have been published by Indian scientist in the field of Astronomy & Astrophysics during 1988-2017 which was further retrieved and analyzed separately for different blocks of 10 years as well as for 30 years consolidated too. The core journal of the field was identified. The Bradford law ...


Instances Of Influenza In The United States Visualized, Parth Patel 2018 CUNY New York City College of Technology

Instances Of Influenza In The United States Visualized, Parth Patel

Publications and Research

The Tycho Project collects large data sets related to healthcare and in particular, instances and geographical information of diseases. We look at the instance counts and locations of Influenza from 1919-1951 across the United States. We hope to find seasonal and geographical insight to the spread of the disease.


Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak 2018 University of Nebraska-Lincoln

Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak

Dissertations and Theses in Statistics

Infectious disease assays can be imperfect. When estimating disease prevalence, these imperfections are accounted for by incorporating assay sensitivity and specificity into point and variance estimates. Unfortunately, these accuracy measures are often treated as fixed constants, rather than acknowledging that they are estimates from an assay validation process. The purpose of this study is to show the detrimental effect of not taking into account this sampling variability when samples are obtained through group testing (aka, pooled testing). We show that confidence interval coverage can dramatically decline as the sample size increases for the main sample of interest. As a remedy ...


Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna GOTTIPATI, Venky SHANKARARAMAN, Jeff Rongsheng LIN 2018 Singapore Management University

Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna Gottipati, Venky Shankararaman, Jeff Rongsheng Lin

Research Collection School Of Information Systems

In academic institutions, it is normal practice that at the end of each term, students are required to complete a questionnaire that is designed to gather students’ perceptions of the instructor and their learning experience in the course. Students’ feedback includes numerical answers to Likert scale questions and textual comments to open-ended questions. Within the textual comments given by the students are embedded suggestions. A suggestion can be explicit or implicit. Any suggestion provides useful pointers on how the instructor can further enhance the student learning experience. However, it is tedious to manually go through all the qualitative comments and ...


Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr. 2018 Iowa State University

Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr.

Statistics Preprints

Warranty return data from repairable systems, such as vehicles, usually result in recurrent event data. The non-homogeneous Poisson process (NHPP) model is used widely to describe such data. Seasonality in the repair frequencies and other variabilities, however, complicate the modeling of recurrent event data. Not much work has been done to address the seasonality, and this paper provides a general approach for the application of NHPP models with dynamic covariates to predict seasonal warranty returns. A hierarchical clustering method is used to stratify the population into groups that are more homogeneous than the than the overall population. The stratification facilitates ...


Vpsearch: Achieving Verifiability For Privacy-Preserving Multi-Keyword Search Over Encrypted Cloud Data, Zhiguo WAN, Robert H. DENG 2018 Singapore Management University

Vpsearch: Achieving Verifiability For Privacy-Preserving Multi-Keyword Search Over Encrypted Cloud Data, Zhiguo Wan, Robert H. Deng

Research Collection School Of Information Systems

Although cloud computing offers elastic computation and storage resources, it poses challenges on verifiability of computations and data privacy. In this work we investigate verifiability for privacy-preserving multi-keyword search over outsourced documents. As the cloud server may return incorrect results due to system faults or incentive to reduce computation cost, it is critical to offer verifiability of search results and privacy protection for outsourced data at the same time. To fulfill these requirements, we design aVerifiablePrivacy-preserving keywordSearch scheme, called VPSearch, by integrating an adapted homomorphic MAC technique with a privacy-preserving multi-keyword search scheme. The proposed scheme enables the client to ...


Blockchain Based Efficient And Robust Fair Payment For Outsourcing Services In Cloud Computing, Yinghui ZHANG, Robert H. DENG, Ximeng LIU, Dong ZHENG 2018 Singapore Management University

Blockchain Based Efficient And Robust Fair Payment For Outsourcing Services In Cloud Computing, Yinghui Zhang, Robert H. Deng, Ximeng Liu, Dong Zheng

Research Collection School Of Information Systems

As an attractive business model of cloud computing, outsourcing services usually involve online payment and security issues. The mutual distrust between users and outsourcing service providers may severely impede the wide adoption of cloud computing. Nevertheless, most existing payment solutions only consider a specific type of outsourcing service and rely on a trusted third-party to realize fairness. In this paper, in order to realize secure and fair payment of outsourcing services in general without relying on any third-party, trusted or not, we introduce BCPay, a blockchain based fair payment framework for outsourcing services in cloud computing. We first present the ...


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John 2018 Southern Methodist University

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...


Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra 2018 Southern Methodist University

Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

SMU Data Science Review

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to ...


Efvs Effects On Pilot Performance, Michael Campbell, Nsikak Udo-Imeh, Steven J. Landry 2018 Purdue University

Efvs Effects On Pilot Performance, Michael Campbell, Nsikak Udo-Imeh, Steven J. Landry

The Summer Undergraduate Research Fellowship (SURF) Symposium

Flight tests have been conducted at Purdue University using a computer-based flying simulator in an attempt to determine and measure the effects of Enhanced Flight Vision Systems (EFVS) on the performance of pilots during landing. Knowledge of these effects could help guide future design and implementation of EFVS in modern commercial aircraft, and further increase pilots’ ability to control the aircraft in low-visibility conditions. The problem that has faced researchers in the past has revolved around the difficulty in interpreting the data which is generated by these tests. The difficulty in making a generalized conclusion based on the large amount ...


Generalized Non-Inferential Approach To Modeling Restricted Discrete Choice For The Case Of The Spatial Random Utility, Elena Labzina 2018 Washington University in St Louis

Generalized Non-Inferential Approach To Modeling Restricted Discrete Choice For The Case Of The Spatial Random Utility, Elena Labzina

Arts & Sciences Electronic Theses and Dissertations

Multinomial logistic regression model (MNL) is a powerful and easily tractable way for measuring the probabilistic impact of input variables on individual categorical choices. Crucially, the standard MNL assumes that all subjects of the study have the same choice sets. In the meanwhile, especially in political science and economics, this condition is frequently violated. Probably, the most graphical example of varying choice sets (VCS) is partially contested elections. Furthermore, the MNL implicitly implies the Independence of the Irregular Alternatives (IIA) assumption by requiring i.i.d errors that contrasts the MNL and the multinomial probit (MNP) and mixed logit (MXL ...


Pretrial Release And Failure-To-Appear In Mclean County, Il, Jonathan Monsma 2018 Illinois State University

Pretrial Release And Failure-To-Appear In Mclean County, Il, Jonathan Monsma

Stevenson Center for Community and Economic Development to Stevenson Center for Community and Economic Development—Student Research

Actuarial risk assessment tools increasingly have been employed in jurisdictions across the U.S. to assist courts in the decision of whether someone charged with a crime should be detained or released prior to their trial. These tools should be continually monitored and researched by independent 3rd parties to ensure that these powerful tools are being administered properly and used in the most proficient way as to provide socially optimal results. McLean County, Illinois began using the Public Safety Assessment-CourtTM (PSA-Court or simply PSA) risk assessment tool beginning in 2016. This study culls data from the McLean County ...


Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar 2018 Southen Methodist University, Dallas, Texas

Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar

SMU Data Science Review

Data center security traditionally is implemented at the external network access points, i.e., the perimeter of the data center network, and focuses on preventing malicious software from entering the data center. However, these defenses do not cover all possible entry points for malicious software, and they are not 100% effective at preventing infiltration through the connection points. Therefore, security is required within the data center to detect malicious software activity including its lateral movement within the data center. In this paper, we present a machine learning-based network traffic analysis approach to detect the lateral movement of malicious software within ...


Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin 2018 Southern Methodist University

Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin

SMU Data Science Review

In this paper, we present a model for predicting the game day outcomes of National Football League games. 3 of the most popular sources for game day predictions are analyzed for comparison. Player data and outcomes from previous games are used, but we also incorporate several weather factors into our models. Over 1,700 games were incorporated and 3 separate models are created using simple regression, principal component analysis, and a recursive model. We also discuss the ethicality of using data science techniques by individuals with the knowledge in order to gain an advantage over a population lacking this specialized ...


A Bayesian Latent Variable Model Of User Preferences With Item Context, Aghiles SALAH, Hady Wirawan LAUW 2018 Singapore Management University

A Bayesian Latent Variable Model Of User Preferences With Item Context, Aghiles Salah, Hady Wirawan Lauw

Research Collection School Of Information Systems

Personalized recommendation has proven to be very promising in modeling the preference of users over items. However, most existing work in this context focuses primarily on modeling user-item interactions, which tend to be very sparse. We propose to further leverage the item-item relationships that may reflect various aspects of items that guide users’ choices. Intuitively, items that occur within the same “context” (e.g., browsed in the same session, purchased in the same basket) are likely related in some latent aspect. Therefore, accounting for the item’s context would complement the sparse user-item interactions by extending a user’s preference ...


Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley 2018 Brigham Young University

Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley

Undergraduate Honors Theses

All insurance companies, regardless of the kind of insurance they offer, do their best to predict the future by comparing current to historical information. Any statistically significant correlation, regardless of expectations and hidden factors, can help to actuarially model future behavior. Using deidentified data from over 6 million health insurance policies over one year, we looked for any significant groupings of medical issues. The medical issues are defined based on the commercial “Episode Treatment Groups” (ETGs) classification, and our claims contain 347 different ETGs. We performed different kinds of analysis, including Bayesian posterior cluster analysis, k-means cluster analysis, and association ...


Under The Influence, Leonardo Cavicchio 2018 Bryant University

Under The Influence, Leonardo Cavicchio

Honors Projects in Mathematics

The purpose of this Honors Capstone entitled Under the Influence is to assess the validity of claims concerning the possible influence of roommates on one another, concerning alcohol on college campuses. This will be done by examining data collected in a prior study conducted over a two-year period. This analysis will focus on how alcohol consumption changes in correlation with the personality factors of roommates over an extended period of time. This secondary analysis of de-identified data will focus on primary and secondary subquestions. The primary question that will be addressed with the data set collected from the University of ...


Digital Commons powered by bepress