Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

254 Full-Text Articles 337 Authors 67,053 Downloads 54 Institutions

All Articles in Categorical Data Analysis

Faceted Search

254 full-text articles. Page 1 of 9.

Under The Influence, Leonardo Cavicchio 2018 Bryant University

Under The Influence, Leonardo Cavicchio

Honors Projects in Mathematics

The purpose of this Honors Capstone entitled Under the Influence is to assess the validity of claims concerning the possible influence of roommates on one another, concerning alcohol on college campuses. This will be done by examining data collected in a prior study conducted over a two-year period. This analysis will focus on how alcohol consumption changes in correlation with the personality factors of roommates over an extended period of time. This secondary analysis of de-identified data will focus on primary and secondary subquestions. The primary question that will be addressed with the data set collected from the University of ...


Understanding Natural Keyboard Typing Using Convolutional Neural Networks On Mobile Sensor Data, Travis Siems 2018 Southern Methodist University

Understanding Natural Keyboard Typing Using Convolutional Neural Networks On Mobile Sensor Data, Travis Siems

Computer Science and Engineering Theses and Dissertations

Mobile phones and other devices with embedded sensors are becoming increasingly ubiquitous. Audio and motion sensor data may be able to detect information that we did not think possible. Some researchers have created models that can predict computer keyboard typing from a nearby mobile device; however, certain limitations to their experiment setup and methods compelled us to be skeptical of the models’ realistic prediction capability. We investigate the possibility of understanding natural keyboard typing from mobile phones by performing a well-designed data collection experiment that encourages natural typing and interactions. This data collection helps capture realistic vulnerabilities of the security ...


Default Priors For The Intercept Parameter In Logistic Regressions, Philip S. Boonstra, Ryan P. Barbaro, Ananda Sen 2018 The University Of Michigan

Default Priors For The Intercept Parameter In Logistic Regressions, Philip S. Boonstra, Ryan P. Barbaro, Ananda Sen

The University of Michigan Department of Biostatistics Working Paper Series

In logistic regression, separation refers to the situation in which a linear combination of predictors perfectly discriminates the binary outcome. Because finite-valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Little focus has been given on whether and how to shrink the intercept parameter. Based upon classical studies of separation, we argue that efficiency in estimating regression coefficients may vary with the intercept prior. We adapt alternative prior distributions for the intercept that downweight implausibly extreme regions of the parameter space rendering less sensitivity to separation ...


Building A Better Risk Prevention Model, Steven Hornyak 2018 Houston County Schools

Building A Better Risk Prevention Model, Steven Hornyak

National Youth-At-Risk Conference Savannah

This presentation chronicles the work of Houston County Schools in developing a risk prevention model built on more than ten years of longitudinal student data. In its second year of implementation, Houston At-Risk Profiles (HARP), has proven effective in identifying those students most in need of support and linking them to interventions and supports that lead to improved outcomes and significantly reduces the risk of failure.


Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett 2018 The University of Akron

Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett

Honors Research Projects

The purpose of this Senior Honors Project is to research, study, and demonstrate newfound knowledge of various machine learning statistical techniques that are not covered in the University of Akron’s statistics major curriculum. This report will be an overview of three machine-learning methods that were used to predict NCAA Basketball results, specifically, the March Madness tournament. The variables used for these methods, models, and tests will include numerous variables kept throughout the season for each team, along with a couple variables that are used by the selection committee when tournament teams are being picked. The end goal is to ...


Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin 2018 The University of Akron

Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin

Honors Research Projects

The issue of sexual assault has garnered widespread attention in recent years, as is evident by the growing number of high-profile cases and mainstream social movements. With this increasingly bright spotlight, it is no surprise that The University of Akron has interest in improving the sexual violence education programs offered to students. In 2015, the university conducted a survey to gather information on the campus climate surrounding sexual assault. This analysis dives into a deeper analysis of the data gathered in an attempt to pinpoint areas that require the university’s attention. The analysis covers topics identified by Dean of ...


Exploring Quantitative Timed Up And Go Sensor Data With Statistical Learning Techniques, Anthony Wright 2018 University of Windsor

Exploring Quantitative Timed Up And Go Sensor Data With Statistical Learning Techniques, Anthony Wright

Major Papers

Injuries and hospitalizations due to accidental falls among seniors represent a major expense for the Canadian public health system. It is highly desirable to be able to predict risk of falls for senior individuals in order to place them in prevention programs. Recently, sensor technologies have been used to predict risk of falls and levels of frailty of individuals. A commonly used test for assessing risk of falls is known as QTUG (Quantitative `Timed Up and Go'). The QTUG data often consist of a small set of survey answers about the individuals' historic variables (e.g., number of falls in ...


Using Data Analytics For Discovering Library Resource Insights – Case From Singapore Management University, Ning LU, Rui SONG, Dina HENG, Swapna GOTTIPATI, Chee Hsien Aaron (ZHENG Zhixian) TAY, Aaron TAY 2017 Singapore Management University

Using Data Analytics For Discovering Library Resource Insights – Case From Singapore Management University, Ning Lu, Rui Song, Dina Heng, Swapna Gottipati, Chee Hsien Aaron (Zheng Zhixian) Tay, Aaron Tay

Research Collection School Of Information Systems

Library resources are critical in supporting teaching, research and learning processes. Several universities have employed online platforms and infrastructure for enabling the online services to students, faculty and staff. To provide efficient services by understanding and predicting user needs libraries are looking into the area of data analytics. Library analytics in Singapore Management University is the project committed to provide an interface for data-intensive project collaboration, while supporting one of the library’s key pillars on its commitment to collaborate on initiatives with SMU Communities and external groups. In this paper, we study the transaction logs for user behavior analysis ...


Data-Adaptive Kernel Support Vector Machine, Xin Liu 2017 The University of Western Ontario

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges ...


Data Envelopment Analysis Using Glpkapi In R, Konrad Miziolek, Jordan Beary, Shreyas Vasanth, Surekha Chanamolu, Rudraxi Mitra 2017 Portland State University

Data Envelopment Analysis Using Glpkapi In R, Konrad Miziolek, Jordan Beary, Shreyas Vasanth, Surekha Chanamolu, Rudraxi Mitra

Engineering and Technology Management Student Projects

The work done here is primarily a wrapper function written to separate some of the more difficult-to-use glpkAPI functionality from the end-user. The user, when prompted, selects the appropriate configuration of the .mod file to the task (for example, output-oriented CRS), and the data file, as a .dat. The function then loads the required glpkAPI library, and carries forward the model. It allocates the problem and workspace, reads the model file and data file the user selects, builds the problem, and solves it. The function returns primal values, and, if dual = TRUE is selected, also returns dual weights.


How Singapore Investors Can Profit From Unstructured Data, Clarence GOH 2017 Singapore Management University

How Singapore Investors Can Profit From Unstructured Data, Clarence Goh

Research Collection School Of Accountancy

Data that is collected in the business environment can be structured or unstructured. In general, structured data refers to information which is highly organised and which can easily be stored in rows and columns within database systems. On the other hand, unstructured data does not have a strict data structure, and is also not organised in a pre-defined manner.


Improving The Accuracy For The Long-Term Hydrologic Impact Assessment (L-Thia) Model, Anqi Zhang, Lawrence Theller, Bernard A. Engel 2017 Purdue University

Improving The Accuracy For The Long-Term Hydrologic Impact Assessment (L-Thia) Model, Anqi Zhang, Lawrence Theller, Bernard A. Engel

The Summer Undergraduate Research Fellowship (SURF) Symposium

Urbanization increases runoff by changing land use types from less impervious to impervious covers. Improving the accuracy of a runoff assessment model, the Long-Term Hydrologic Impact Assessment (L-THIA) Model, can help us to better evaluate the potential uses of Low Impact Development (LID) practices aimed at reducing runoff, as well as to identify appropriate runoff and water quality mitigation methods. Several versions of the model have been built over time, and inconsistencies have been introduced between the models. To improve the accuracy and consistency of the model, the equations and parameters (primarily curve numbers in the case of this model ...


Integrating Apache Spark And R For Big Data Analytics On Solving Geographic Problems, Mengqi ZHANG, Tin Seong KAM 2017 Singapore Management University

Integrating Apache Spark And R For Big Data Analytics On Solving Geographic Problems, Mengqi Zhang, Tin Seong Kam

Research Collection School Of Information Systems

With the advent ofdigital technology and smart devices, a flood of digital data is beinggenerated every day. This huge amount of data not only records the historyactivities but also provides future valuable information for organizations andbusinesses. However, the true values of these data will not be fullyappreciated until they have been processed, analyzed and the analysis resultsbeen communicated to decision makers in a business friendly manner.In view of thisneed, big data has been one of the major research focus in the academicresearch community especially in the field of computer science and the softwarevendor as well as the big data ...


Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei 2017 STATinMED Research/SIMR, Inc.

Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei

Publications and Research

Comparative data on the burden of atopic dermatitis (AD) in adults relative to the general population are limited. We performed a large-scale evaluation of the burden of disease among US adults with AD relative to matched non-AD controls, encompassing comorbidities, healthcare resource utilization (HCRU), and costs, using healthcare claims data. The impact of AD disease severity on these outcomes was also evaluated.


Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad 2017 The University of Western Ontario

Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad

Electronic Thesis and Dissertation Repository

Despite a large volume of research on the prognosis, diagnosis and overall burden of multimorbidity, very little is known about socio-demographic characteristics of multimorbid patients. This thesis aims to analyze the socio-demographic characteristics of patients with multiple chronic conditions (multimorbidity), focusing on patient groups sharing the same combination of diseases. Several methods were explored to analyze the co-occurrence of multiple chronic diseases as well as the associations between socio-demographics and chronic conditions. These methods include disease pair distributions over gender, age groups and income level quintiles, Multimorbidity Coefficients for measuring the concurrence of disease pairs and triples, and k-modes clustering ...


Now You See It, Now You Don't! A Study Of Content Modification Behavior In Facebook, Fuxiang CHEN, Ee-peng LIM 2017 Singapore Management University

Now You See It, Now You Don't! A Study Of Content Modification Behavior In Facebook, Fuxiang Chen, Ee-Peng Lim

Research Collection School Of Information Systems

Social media, as a major platform to disseminate information, has changed the way users and communities contribute content. In this paper, we aim to study content modifications on public Facebook pages operated by news media, community groups, and bloggers. We also study the possible reasons behind them, and their effects on user interaction. We conducted a detailed study of Content Censorship (CC) and Content Edit (CE) in Facebook using a detailed longitudinal dataset consisting of 57 public Facebook pages over 3 weeks covering 145,955 posts and 9,379,200 comments. We detected many CC and CE activities between 28 ...


Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr 2017 Murray State University

Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr

Scholars Week

Timing methods and performance metrics are important in the heavily industrialized world we live in. Industrial plants use metrics to measure quality of production, help make decisions, and drive the strategy of the organization. However, there are many factors to be considered when measuring performance based on a metric; of which we will be analyzing the importance of product variation. We will be analyzing assembly line timings, whilst controlling for product variance, to show the importance differences between products makes in one’s ability to predict performance. In addition, we will be analyzing the current “statistical” methods used by an ...


Efficient Motif Discovery In Spatial Trajectories Using Discrete Fréchet Distance, Bo TANG, Man Lung YIU, Kyriakos MOURATIDIS, Kai WANG 2017 Singapore Management University

Efficient Motif Discovery In Spatial Trajectories Using Discrete Fréchet Distance, Bo Tang, Man Lung Yiu, Kyriakos Mouratidis, Kai Wang

Research Collection School Of Information Systems

The discrete Fréchet distance (DFD) captures perceptual and geographicalsimilarity between discrete trajectories. It has been successfullyadopted in a multitude of applications, such as signatureand handwriting recognition, computer graphics, as well as geographicapplications. Spatial applications, e.g., sports analysis,traffic analysis, etc. require discovering the pair of most similarsubtrajectories, be them parts of the same or of different input trajectories.The identified pair of subtrajectories is called a motif.The adoption of DFD as the similarity measure in motif discovery,although semantically ideal, is hindered by the high computationalcomplexity of DFD calculation. In this paper, we propose asuite of novel lower ...


Are You Ready? Data Analytics Is Reshaping The Work Of Accountants, Clarence GOH 2017 Singapore Management University

Are You Ready? Data Analytics Is Reshaping The Work Of Accountants, Clarence Goh

Research Collection School Of Accountancy

According to the 2016 State of Analytics and Data Science reportpublished by data analytics firm Mu Sigma, 65% of senior business leaderssurveyed in the United States believe that data analytics has influenced theirbusiness in a positive way.


Are You Ready? Data Analytics Is Reshaping The Work Of Accountants, Clarence GOH 2017 Singapore Management University

Are You Ready? Data Analytics Is Reshaping The Work Of Accountants, Clarence Goh

Research Collection School Of Accountancy

According to the 2016 State of Analytics and Data Science reportpublished by data analytics firm Mu Sigma, 65% of senior business leaderssurveyed in the United States believe that data analytics has influenced theirbusiness in a positive way.


Digital Commons powered by bepress