Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi Dec 2020

Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi

Publications and Research

In Spring 2020, I did a project, "Decision Tree Predicting the Party of Legislators," and construct a decision tree model to predict legislators' parties' based on their votes. We also use this model to identify legislators who frequently voted against their parties. We used the legislators' roll call votes, Office of Clerk U.S. House of Representatives Data Sets (Categorical values) collected in 2018 and 2019. In this new project, We study the 2018 and 2019 vote data using Principal Component Analysis (PCA). The goal is to find a (compressed) model using unsupervised learning to distinguish the legislators' parties, and PCA …


Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker Dec 2020

Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker

Publications and Research

Ocean observing systems are well-recognized as platforms for long-term monitoring of near-shore and remote locations in the global ocean. High-quality observatory data is freely available and accessible to all members of the global oceanographic community—a democratization of data that is particularly useful for early career scientists (ECS), enabling ECS to conduct research independent of traditional funding models or access to laboratory and field equipment. The concurrent collection of distinct data types with relevance for oceanographic disciplines including physics, chemistry, biology, and geology yields a unique incubator for cutting-edge, timely, interdisciplinary research. These data are both an opportunity and an incentive …


A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton Sep 2020

A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton

Dissertations, Theses, and Capstone Projects

The gameshow Jeopardy! has been around in its current iteration—hosted by Alex Trebek—since 1984. During this time, it has accumulated data on clues, contestants, and possible strategies on how to win. Using a crowd-sourced archive called J! Archive, this project seeks to find trends in the topics that the game covers and take a deeper look into the performance of its contestants. It employs topic modeling, a text-analysis method, to organize the hundreds of thousands of archived clues and statistical analysis to rate the performance of contestants by gender. Using web-based visualization tools, the data is shown in an …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Sensor Data Analysis In Smart Buildings, Manuel A. Mane Penton May 2020

Sensor Data Analysis In Smart Buildings, Manuel A. Mane Penton

Publications and Research

Data analysis and Machine Learning are destined to evolve the current technology infrastructure by solving technology and economy demands present mainly in developed cities like New York. This research proposes a machine learning (ML) based solution to alleviate one of the main issues that big buildings such as CUNY campuses have, that is the waste of energy resources. The analysis of data coming from the readings of different deployed sensors such as CO2, humidity and temperature can be used to estimate occupancy in a specific room and building in general. The outcome of this research established a relationship between the …


Using Data Mining To Identify The Most Influential Factors In Training Results, Xiaoqing Wu, Daanial Ahmad May 2020

Using Data Mining To Identify The Most Influential Factors In Training Results, Xiaoqing Wu, Daanial Ahmad

Publications and Research

Data Science is used as a tool to find hidden facts in the data. We want to find out what factors such as ‘AGE’, ‘TAX’, ‘PUPIL-TEACHER RATIO’, ‘PER-CAPITA INCOME’ contribute the most to housing prices. To answer this question, we studied the dataset of “Boston Houses Prices”. By applying the Lasso Regression (a Data Mining Technique) on the data set of “Boston Houses Prices” we identified the influential factors in the linear model. As a conclusion we found that there were six inputs which contributed the most to the prices of houses and those inputs are as follow: (i) CRIM-per …


Philosophical Perspectives, Jochen Albrecht Apr 2020

Philosophical Perspectives, Jochen Albrecht

Publications and Research

This entry follows in the footsteps of Anselin’s famous 1989 NCGIA working paper entitled “What is special about spatial?” (a report that is very timely again in an age when non-spatial data scientists are ignorant of the special characteristics of spatial data), where he outlines three unrelated but fundamental characteristics of spatial data. In a similar vein, I am going to discuss some philosophical perspectives that are internally unrelated to each other and could warrant individual entries in this Body of Knowledge. The first one is the notions of space and time and how they have evolved in …


Edge Device Speaker Verification, Thomas P. Duffy Jan 2020

Edge Device Speaker Verification, Thomas P. Duffy

Dissertations and Theses

The continued shrinking of processors and other physical hardware in concert with development of embeddable machine learning frameworks has enabled new use cases placing machine learning directly in the “wild”. The problem of speaker verification, for a long time, has been deployed to perform inference on systems with significant computations resources. More recently, these systems have been built for smaller, cheaper devices which can be placed in people's homes or other edge locations. Here, we aim to demonstrate that a reasonably accurate, generalizable, text-independent speaker verification system can be built, trained, and, ultimately, deployed onto a microcontroller with as a …