Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

257 Full-Text Articles 347 Authors 67,053 Downloads 55 Institutions

All Articles in Categorical Data Analysis

Faceted Search

257 full-text articles. Page 1 of 9.

Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar 2018 Southen Methodist University, Dallas, Texas

Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar

SMU Data Science Review

Data center security traditionally is implemented at the external network access points, i.e., the perimeter of the data center network, and focuses on preventing malicious software from entering the data center. However, these defenses do not cover all possible entry points for malicious software, and they are not 100% effective at preventing infiltration through the connection points. Therefore, security is required within the data center to detect malicious software activity including its lateral movement within the data center. In this paper, we present a machine learning-based network traffic analysis approach to detect the lateral movement of malicious software within ...


Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin 2018 Southern Methodist University

Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin

SMU Data Science Review

In this paper, we present a model for predicting the game day outcomes of National Football League games. 3 of the most popular sources for game day predictions are analyzed for comparison. Player data and outcomes from previous games are used, but we also incorporate several weather factors into our models. Over 1,700 games were incorporated and 3 separate models are created using simple regression, principal component analysis, and a recursive model. We also discuss the ethicality of using data science techniques by individuals with the knowledge in order to gain an advantage over a population lacking this specialized ...


Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley 2018 Brigham Young University

Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley

Undergraduate Honors Theses

All insurance companies, regardless of the kind of insurance they offer, do their best to predict the future by comparing current to historical information. Any statistically significant correlation, regardless of expectations and hidden factors, can help to actuarially model future behavior. Using deidentified data from over 6 million health insurance policies over one year, we looked for any significant groupings of medical issues. The medical issues are defined based on the commercial “Episode Treatment Groups” (ETGs) classification, and our claims contain 347 different ETGs. We performed different kinds of analysis, including Bayesian posterior cluster analysis, k-means cluster analysis, and association ...


Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna GOTTIPATI, Venky SHANKARARAMAN, Jeff Rongsheng LIN 2018 Singapore Management University

Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna Gottipati, Venky Shankararaman, Jeff Rongsheng Lin

Research Collection School Of Information Systems

In academic institutions, it is normal practice that at the end of each term, students are required to complete a questionnaire that is designed to gather students’ perceptions of the instructor and their learning experience in the course. Students’ feedback includes numerical answers to Likert scale questions and textual comments to open-ended questions. Within the textual comments given by the students are embedded suggestions. A suggestion can be explicit or implicit. Any suggestion provides useful pointers on how the instructor can further enhance the student learning experience. However, it is tedious to manually go through all the qualitative comments and ...


Understanding Natural Keyboard Typing Using Convolutional Neural Networks On Mobile Sensor Data, Travis Siems 2018 Southern Methodist University

Understanding Natural Keyboard Typing Using Convolutional Neural Networks On Mobile Sensor Data, Travis Siems

Computer Science and Engineering Theses and Dissertations

Mobile phones and other devices with embedded sensors are becoming increasingly ubiquitous. Audio and motion sensor data may be able to detect information that we did not think possible. Some researchers have created models that can predict computer keyboard typing from a nearby mobile device; however, certain limitations to their experiment setup and methods compelled us to be skeptical of the models’ realistic prediction capability. We investigate the possibility of understanding natural keyboard typing from mobile phones by performing a well-designed data collection experiment that encourages natural typing and interactions. This data collection helps capture realistic vulnerabilities of the security ...


A Convolutional Neural Network Model For Species Classification Of Camera Trap Images, Annie Casey 2018 Boise State University

A Convolutional Neural Network Model For Species Classification Of Camera Trap Images, Annie Casey

Mathematics Undergraduate Theses

The overall purpose of this study was to automate the manual process of tagging species found in camera trap images using machine learning. The basic design of this study was to implement a Convolutional Neural Network model in Python using the Keras and Tensorflow modules that learn to recognize patterns in images in order to classify what species is in a given image and to label it accordingly. Results of the analysis highlight the importance of a large sample size, the degree of accuracy according to various arguments in the model, effectiveness of multiple layers that include Max Pooling, and ...


Under The Influence, Leonardo Cavicchio 2018 Bryant University

Under The Influence, Leonardo Cavicchio

Honors Projects in Mathematics

The purpose of this Honors Capstone entitled Under the Influence is to assess the validity of claims concerning the possible influence of roommates on one another, concerning alcohol on college campuses. This will be done by examining data collected in a prior study conducted over a two-year period. This analysis will focus on how alcohol consumption changes in correlation with the personality factors of roommates over an extended period of time. This secondary analysis of de-identified data will focus on primary and secondary subquestions. The primary question that will be addressed with the data set collected from the University of ...


Default Priors For The Intercept Parameter In Logistic Regressions, Philip S. Boonstra, Ryan P. Barbaro, Ananda Sen 2018 The University Of Michigan

Default Priors For The Intercept Parameter In Logistic Regressions, Philip S. Boonstra, Ryan P. Barbaro, Ananda Sen

The University of Michigan Department of Biostatistics Working Paper Series

In logistic regression, separation refers to the situation in which a linear combination of predictors perfectly discriminates the binary outcome. Because finite-valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Little focus has been given on whether and how to shrink the intercept parameter. Based upon classical studies of separation, we argue that efficiency in estimating regression coefficients may vary with the intercept prior. We adapt alternative prior distributions for the intercept that downweight implausibly extreme regions of the parameter space rendering less sensitivity to separation ...


Building A Better Risk Prevention Model, Steven Hornyak 2018 Houston County Schools

Building A Better Risk Prevention Model, Steven Hornyak

National Youth-At-Risk Conference Savannah

This presentation chronicles the work of Houston County Schools in developing a risk prevention model built on more than ten years of longitudinal student data. In its second year of implementation, Houston At-Risk Profiles (HARP), has proven effective in identifying those students most in need of support and linking them to interventions and supports that lead to improved outcomes and significantly reduces the risk of failure.


Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett 2018 The University of Akron

Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett

Honors Research Projects

The purpose of this Senior Honors Project is to research, study, and demonstrate newfound knowledge of various machine learning statistical techniques that are not covered in the University of Akron’s statistics major curriculum. This report will be an overview of three machine-learning methods that were used to predict NCAA Basketball results, specifically, the March Madness tournament. The variables used for these methods, models, and tests will include numerous variables kept throughout the season for each team, along with a couple variables that are used by the selection committee when tournament teams are being picked. The end goal is to ...


Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin 2018 The University of Akron

Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin

Honors Research Projects

The issue of sexual assault has garnered widespread attention in recent years, as is evident by the growing number of high-profile cases and mainstream social movements. With this increasingly bright spotlight, it is no surprise that The University of Akron has interest in improving the sexual violence education programs offered to students. In 2015, the university conducted a survey to gather information on the campus climate surrounding sexual assault. This analysis dives into a deeper analysis of the data gathered in an attempt to pinpoint areas that require the university’s attention. The analysis covers topics identified by Dean of ...


Exploring Quantitative Timed Up And Go Sensor Data With Statistical Learning Techniques, Anthony Wright 2018 University of Windsor

Exploring Quantitative Timed Up And Go Sensor Data With Statistical Learning Techniques, Anthony Wright

Major Papers

Injuries and hospitalizations due to accidental falls among seniors represent a major expense for the Canadian public health system. It is highly desirable to be able to predict risk of falls for senior individuals in order to place them in prevention programs. Recently, sensor technologies have been used to predict risk of falls and levels of frailty of individuals. A commonly used test for assessing risk of falls is known as QTUG (Quantitative `Timed Up and Go'). The QTUG data often consist of a small set of survey answers about the individuals' historic variables (e.g., number of falls in ...


Penalized Mixed-Effects Ordinal Response Models For High-Dimensional Genomic Data In Twins And Families, Amanda E. Gentry 2018 Virginia Commonwealth University

Penalized Mixed-Effects Ordinal Response Models For High-Dimensional Genomic Data In Twins And Families, Amanda E. Gentry

Theses and Dissertations

The Brisbane Longitudinal Twin Study (BLTS) was being conducted in Australia and was funded by the US National Institute on Drug Abuse (NIDA). Adolescent twins were sampled as a part of this study and surveyed about their substance use as part of the Pathways to Cannabis Use, Abuse and Dependence project. The methods developed in this dissertation were designed for the purpose of analyzing a subset of the Pathways data that includes demographics, cannabis use metrics, personality measures, and imputed genotypes (SNPs) for 493 complete twin pairs (986 subjects.) The primary goal was to determine what combination of SNPs and ...


Using Data Analytics For Discovering Library Resource Insights – Case From Singapore Management University, Ning LU, Rui SONG, Dina HENG, Swapna GOTTIPATI, Chee Hsien Aaron (ZHENG Zhixian) TAY, Aaron TAY 2017 Singapore Management University

Using Data Analytics For Discovering Library Resource Insights – Case From Singapore Management University, Ning Lu, Rui Song, Dina Heng, Swapna Gottipati, Chee Hsien Aaron (Zheng Zhixian) Tay, Aaron Tay

Research Collection School Of Information Systems

Library resources are critical in supporting teaching, research and learning processes. Several universities have employed online platforms and infrastructure for enabling the online services to students, faculty and staff. To provide efficient services by understanding and predicting user needs libraries are looking into the area of data analytics. Library analytics in Singapore Management University is the project committed to provide an interface for data-intensive project collaboration, while supporting one of the library’s key pillars on its commitment to collaborate on initiatives with SMU Communities and external groups. In this paper, we study the transaction logs for user behavior analysis ...


Data-Adaptive Kernel Support Vector Machine, Xin Liu 2017 The University of Western Ontario

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges ...


Data Envelopment Analysis Using Glpkapi In R, Konrad Miziolek, Jordan Beary, Shreyas Vasanth, Surekha Chanamolu, Rudraxi Mitra 2017 Portland State University

Data Envelopment Analysis Using Glpkapi In R, Konrad Miziolek, Jordan Beary, Shreyas Vasanth, Surekha Chanamolu, Rudraxi Mitra

Engineering and Technology Management Student Projects

The work done here is primarily a wrapper function written to separate some of the more difficult-to-use glpkAPI functionality from the end-user. The user, when prompted, selects the appropriate configuration of the .mod file to the task (for example, output-oriented CRS), and the data file, as a .dat. The function then loads the required glpkAPI library, and carries forward the model. It allocates the problem and workspace, reads the model file and data file the user selects, builds the problem, and solves it. The function returns primal values, and, if dual = TRUE is selected, also returns dual weights.


How Singapore Investors Can Profit From Unstructured Data, Clarence GOH 2017 Singapore Management University

How Singapore Investors Can Profit From Unstructured Data, Clarence Goh

Research Collection School Of Accountancy

Data that is collected in the business environment can be structured or unstructured. In general, structured data refers to information which is highly organised and which can easily be stored in rows and columns within database systems. On the other hand, unstructured data does not have a strict data structure, and is also not organised in a pre-defined manner.


Improving The Accuracy For The Long-Term Hydrologic Impact Assessment (L-Thia) Model, Anqi Zhang, Lawrence Theller, Bernard A. Engel 2017 Purdue University

Improving The Accuracy For The Long-Term Hydrologic Impact Assessment (L-Thia) Model, Anqi Zhang, Lawrence Theller, Bernard A. Engel

The Summer Undergraduate Research Fellowship (SURF) Symposium

Urbanization increases runoff by changing land use types from less impervious to impervious covers. Improving the accuracy of a runoff assessment model, the Long-Term Hydrologic Impact Assessment (L-THIA) Model, can help us to better evaluate the potential uses of Low Impact Development (LID) practices aimed at reducing runoff, as well as to identify appropriate runoff and water quality mitigation methods. Several versions of the model have been built over time, and inconsistencies have been introduced between the models. To improve the accuracy and consistency of the model, the equations and parameters (primarily curve numbers in the case of this model ...


Integrating Apache Spark And R For Big Data Analytics On Solving Geographic Problems, Mengqi ZHANG, Tin Seong KAM 2017 Singapore Management University

Integrating Apache Spark And R For Big Data Analytics On Solving Geographic Problems, Mengqi Zhang, Tin Seong Kam

Research Collection School Of Information Systems

With the advent ofdigital technology and smart devices, a flood of digital data is beinggenerated every day. This huge amount of data not only records the historyactivities but also provides future valuable information for organizations andbusinesses. However, the true values of these data will not be fullyappreciated until they have been processed, analyzed and the analysis resultsbeen communicated to decision makers in a business friendly manner.In view of thisneed, big data has been one of the major research focus in the academicresearch community especially in the field of computer science and the softwarevendor as well as the big data ...


Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei 2017 STATinMED Research/SIMR, Inc.

Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei

Publications and Research

Comparative data on the burden of atopic dermatitis (AD) in adults relative to the general population are limited. We performed a large-scale evaluation of the burden of disease among US adults with AD relative to matched non-AD controls, encompassing comorbidities, healthcare resource utilization (HCRU), and costs, using healthcare claims data. The impact of AD disease severity on these outcomes was also evaluated.


Digital Commons powered by bepress