Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 56 of 56

Full-Text Articles in Physical Sciences and Mathematics

Addressing The Learning Loss During The Covid-19 Pandemic Through The Adaptation Of Virtual Platforms, Nazrul I. Khandaker, Anika Nawar Mayeesha, Violeta Escandon Correa, Toralv Munro, Andrew Singh, Matthew Khargie, Ality Aghedo, Jasmin Budhan, Krishna Mahabir, Belal A. Sayeed Oct 2021

Addressing The Learning Loss During The Covid-19 Pandemic Through The Adaptation Of Virtual Platforms, Nazrul I. Khandaker, Anika Nawar Mayeesha, Violeta Escandon Correa, Toralv Munro, Andrew Singh, Matthew Khargie, Ality Aghedo, Jasmin Budhan, Krishna Mahabir, Belal A. Sayeed

Publications and Research

The York College-hosted NASA MAA (MUREP AEROSPACE ACADEMY) has always played a pivotal role in minimizing the learning loss during the summer months, which was heightened during the pandemic. Support from AT&T, Con Edison and NASA enabled the MAA program at York College to offer a virtual STEM education with an earth science concentration to 1000 plus underserved K1-12 students from the community last summer, including 160 high school students. Two factors made this endeavor fruitful: allowing additional time to engage in STEM lessons and increasing self-motivation to successfully accomplish assigned tasks. Students built partnerships and resolved technical issues with …


Piecewise Linear Manifold Clustering, Artyom Diky Sep 2021

Piecewise Linear Manifold Clustering, Artyom Diky

Dissertations, Theses, and Capstone Projects

This work studies the application of topological analysis to non-linear manifold clustering. A novel method, that exploits the data clustering structure, allows to generate a topological representation of the point dataset. An analysis of topological construction under different simulated conditions is performed to explore the capabilities and limitations of the method, and demonstrated statistically significant improvements in performance. Furthermore, we introduce a new information-theoretical validation measure for clustering, that exploits geometrical properties of clusters to estimate clustering compressibility, for evaluation of the clustering goodness-of-fit without any prior information about true class assignments. We show how the new validation measure, when …


Detecting Stance On Covid-19 Vaccine In A Polarized Media, Rodica Ceslov Sep 2021

Detecting Stance On Covid-19 Vaccine In A Polarized Media, Rodica Ceslov

Dissertations, Theses, and Capstone Projects

The growing polarization in the United States has been widely reported. Media coverage plays an important role in shaping public opinion and influences public debates on complex and unfamiliar topics. There are some benefits to individuals and society from political polarization and conflict between opposing viewpoints. However, recent research has primarily highlighted the negative consequences of polarization which reached an all-time high. One such topic is the Covid-19 vaccine which was developed in record time, and the public learned about its safety and possible risks through the media coverage.

In this capstone, we examine U.S. news media coverage on the …


Making Space For Unquantifiable Data: Hand-Drawn Data Visualization, Eva Sibinga Sep 2021

Making Space For Unquantifiable Data: Hand-Drawn Data Visualization, Eva Sibinga

Dissertations, Theses, and Capstone Projects

This project makes space for personal “data” around labor and care, prompting users to consider the concrete and abstract (quantifiable and unquantifiable) forms labor and care take in their lives. The interactive, subjective data visualization uses hand-drawn visual elements to foreground that data about care and human interaction will always be ambiguous and complex, that they may never be satisfactorily or universally quantified, and that they will always be out of reach of perfect categorization.

The project provides an alternative to prescriptive truth-telling with data. Instead of using a dataset to provide data-driven answers and insights to users, the interactive …


Teaching Machine Learning For The Physical Sciences: A Summary Of Lessons Learned And Challenges, Viviana Acquaviva Aug 2021

Teaching Machine Learning For The Physical Sciences: A Summary Of Lessons Learned And Challenges, Viviana Acquaviva

Publications and Research

This paper summarizes some challenges encountered and best practices established in several years of teaching Machine Learning for the Physical Sciences at the undergraduate and graduate level. I discuss motivations for teaching ML to physicists, desirable properties of pedagogical materials, such as accessibility, relevance, and likeness to real-world research problems, and give examples of components of teaching units.


Content Analysis Of Two-Year And Four-Year Data Science Programs In The United States, Elizabeth Milonas, Duo Li, Qiping Zhang Jul 2021

Content Analysis Of Two-Year And Four-Year Data Science Programs In The United States, Elizabeth Milonas, Duo Li, Qiping Zhang

Publications and Research

Data has grown exponentially in the last decade, and this growth has resulted in vast challenges for both business and IT domains (Hassan & Liu, 2019). This growth has given rise to the Data Science field, which has also grown exponentially in the last few years (Hassan & Liu, 2019; Song & Zhu, 2016). The Data Science field has its origins in the statistics and mathematics domain (Cao, 2017b), but is now considered a multidisciplinary field (Aasheim et al., 2015). Data Science warrants knowledge of data analytics, programming, systems, applications, informatics, computing, communication, management, and sociology (Aasheim et al., 2015; …


Using Data Science To Create An Impact On A City Life And To Encourage Students From Underserved Communities To Get Into Stem, Elena Filatova, Deborah Hecht Jul 2021

Using Data Science To Create An Impact On A City Life And To Encourage Students From Underserved Communities To Get Into Stem, Elena Filatova, Deborah Hecht

Publications and Research

In this paper, we introduce a novel methodology for teaching Data Science. Our methodology relies on the outlook of the student body in our college. Our college is an urban, commuter, HSI (Hispanic Serving Institution) school with 34% Hispanic and 29% Black students. 61% of our students come from households with an income of less than $30,000+. Thus, many students in our college come from the communities that are underrepresented in the STEM fields and in the decision-making positions in the government (on the city level, state level, country level). However, in our methodology, we want to flip the situation …


Learn Biologically Meaningful Representation With Transfer Learning, Di He Jun 2021

Learn Biologically Meaningful Representation With Transfer Learning, Di He

Dissertations, Theses, and Capstone Projects

Machine learning has made significant contributions to bioinformatics and computational biol­ogy. In particular, supervised learning approaches have been widely used in solving problems such as bio­marker identification, drug response prediction, and so on. However, because of the limited availability of comprehensively labeled and clean data, constructing predictive models in super­ vised settings is not always desirable or possible, especially when using data­hunger, red­hot learning paradigms such as deep learning methods. Hence, there are urgent needs to develop new approaches that could leverage more readily available unlabeled data in driving successful machine learning ap­ plications in this area.

In my dissertation, …


An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja May 2021

An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja

Publications and Research

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source …


Estimation Of The Planetary Boundary Layer Height: Part 1: Global Radar Wind Profiler Network Data; Part 2: A Comparison To Ceilometer Data, Holly Josephs May 2021

Estimation Of The Planetary Boundary Layer Height: Part 1: Global Radar Wind Profiler Network Data; Part 2: A Comparison To Ceilometer Data, Holly Josephs

Theses and Dissertations

Two methods for estimating the planetary boundary layer, an algorithm to identify a maximum in the backscatter and a covariance wavelet transform method, are explored and applied to global radar wind profiler network data and ceilometer data respectively. The objective of the study is to establish that the data sources and algorithms can be used to estimate planetary boundary layer heights so that global studies can make use of these estimates. Data from the global network of wind profilers required significant restructuring and quality control in order to be used for the present study. The maximum backscatter identification algorithm was …


Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh May 2021

Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh

Publications and Research

Brownian Motion which is also considered to be a Wiener process and can be thought of as a random walk. In our project we had briefly discussed the fluctuations of financial indices and related it to Brownian Motion and the modeling of Stock prices.


Discovering Kepler’S Third Law From Planetary Data, Boyan Kostadinov, Satyanand Singh May 2021

Discovering Kepler’S Third Law From Planetary Data, Boyan Kostadinov, Satyanand Singh

Publications and Research

In this data-inspired project, we illustrate how Kepler’s Third Law of Planetary Motion can be discovered from fitting a power model to real planetary data obtained from NASA, using regression modeling. The power model can be linearized, thus we can use linear regression to fit the model parameters to the data, but we also show how a non-linear regression can be implemented, using the R programming language. Our work also illustrates how the linear least squares used for fitting the power model can be implemented in Desmos, which could serve as the computational foundation for this project at a lower …


A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri Feb 2021

A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri

Dissertations, Theses, and Capstone Projects

Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may …


Public Interest Technology – Exploring Covid-19 Health Data, Sarah Zelikovitz Jan 2021

Public Interest Technology – Exploring Covid-19 Health Data, Sarah Zelikovitz

Open Educational Resources

This module is part of a Introduction to Data Science course that covers the different parts of the data science process: data acquisition, cleaning, exploratory data analysis, and modeling. The COVID-19 pandemic has created much interest in public health data, as well as interest in visualization of all types of data. Public health data has a set of challenges that is unique to health data, with HIPAA laws, and real time collection of data. With COVID-19, the challenges are particularly amplified, as data collection and statistics collected are constantly changing in response to feedback from labs, hospitals, drug companies, and …


Goes-R Supervised Machine Learning, Ronald Adomako Jan 2021

Goes-R Supervised Machine Learning, Ronald Adomako

Dissertations and Theses

The GOES-R series is a product line of four satellite, with two currently on-orbit (GOES-16 “East” and GOES-17 “West”). GOES-17 is susceptible to a Loop-Heat-Pipe (LHP) phenomenon where during Fall and Spring seasons, there are times of day where some of the infrared bands records inaccurate readings from the Advanced Baseline Imager (ABI). This occurs from joint astronomical behavior and position of the GOES-17. This calibration issue occurs when the LHP instrument fails to radiate the heat of the sun out of ABI. Predictive Calibration (pCal) is an algorithm developed by instrument vendors for the National Oceanic Atmospheric Agency (NOAA) …


Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi Dec 2020

Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi

Publications and Research

In Spring 2020, I did a project, "Decision Tree Predicting the Party of Legislators," and construct a decision tree model to predict legislators' parties' based on their votes. We also use this model to identify legislators who frequently voted against their parties. We used the legislators' roll call votes, Office of Clerk U.S. House of Representatives Data Sets (Categorical values) collected in 2018 and 2019. In this new project, We study the 2018 and 2019 vote data using Principal Component Analysis (PCA). The goal is to find a (compressed) model using unsupervised learning to distinguish the legislators' parties, and PCA …


Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker Dec 2020

Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker

Publications and Research

Ocean observing systems are well-recognized as platforms for long-term monitoring of near-shore and remote locations in the global ocean. High-quality observatory data is freely available and accessible to all members of the global oceanographic community—a democratization of data that is particularly useful for early career scientists (ECS), enabling ECS to conduct research independent of traditional funding models or access to laboratory and field equipment. The concurrent collection of distinct data types with relevance for oceanographic disciplines including physics, chemistry, biology, and geology yields a unique incubator for cutting-edge, timely, interdisciplinary research. These data are both an opportunity and an incentive …


A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton Sep 2020

A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton

Dissertations, Theses, and Capstone Projects

The gameshow Jeopardy! has been around in its current iteration—hosted by Alex Trebek—since 1984. During this time, it has accumulated data on clues, contestants, and possible strategies on how to win. Using a crowd-sourced archive called J! Archive, this project seeks to find trends in the topics that the game covers and take a deeper look into the performance of its contestants. It employs topic modeling, a text-analysis method, to organize the hundreds of thousands of archived clues and statistical analysis to rate the performance of contestants by gender. Using web-based visualization tools, the data is shown in an …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Sensor Data Analysis In Smart Buildings, Manuel A. Mane Penton May 2020

Sensor Data Analysis In Smart Buildings, Manuel A. Mane Penton

Publications and Research

Data analysis and Machine Learning are destined to evolve the current technology infrastructure by solving technology and economy demands present mainly in developed cities like New York. This research proposes a machine learning (ML) based solution to alleviate one of the main issues that big buildings such as CUNY campuses have, that is the waste of energy resources. The analysis of data coming from the readings of different deployed sensors such as CO2, humidity and temperature can be used to estimate occupancy in a specific room and building in general. The outcome of this research established a relationship between the …


Using Data Mining To Identify The Most Influential Factors In Training Results, Xiaoqing Wu, Daanial Ahmad May 2020

Using Data Mining To Identify The Most Influential Factors In Training Results, Xiaoqing Wu, Daanial Ahmad

Publications and Research

Data Science is used as a tool to find hidden facts in the data. We want to find out what factors such as ‘AGE’, ‘TAX’, ‘PUPIL-TEACHER RATIO’, ‘PER-CAPITA INCOME’ contribute the most to housing prices. To answer this question, we studied the dataset of “Boston Houses Prices”. By applying the Lasso Regression (a Data Mining Technique) on the data set of “Boston Houses Prices” we identified the influential factors in the linear model. As a conclusion we found that there were six inputs which contributed the most to the prices of houses and those inputs are as follow: (i) CRIM-per …


Philosophical Perspectives, Jochen Albrecht Apr 2020

Philosophical Perspectives, Jochen Albrecht

Publications and Research

This entry follows in the footsteps of Anselin’s famous 1989 NCGIA working paper entitled “What is special about spatial?” (a report that is very timely again in an age when non-spatial data scientists are ignorant of the special characteristics of spatial data), where he outlines three unrelated but fundamental characteristics of spatial data. In a similar vein, I am going to discuss some philosophical perspectives that are internally unrelated to each other and could warrant individual entries in this Body of Knowledge. The first one is the notions of space and time and how they have evolved in …


Edge Device Speaker Verification, Thomas P. Duffy Jan 2020

Edge Device Speaker Verification, Thomas P. Duffy

Dissertations and Theses

The continued shrinking of processors and other physical hardware in concert with development of embeddable machine learning frameworks has enabled new use cases placing machine learning directly in the “wild”. The problem of speaker verification, for a long time, has been deployed to perform inference on systems with significant computations resources. More recently, these systems have been built for smaller, cheaper devices which can be placed in people's homes or other edge locations. Here, we aim to demonstrate that a reasonably accurate, generalizable, text-independent speaker verification system can be built, trained, and, ultimately, deployed onto a microcontroller with as a …


On Properties Of Distance-Based Entropies On Fullerene Graphs, Modjtaba Ghorbani, Matthias Dehmer, Mina Rajabi-Parsa, Abbe Mowshowitz, Frank Emmert-Streib May 2019

On Properties Of Distance-Based Entropies On Fullerene Graphs, Modjtaba Ghorbani, Matthias Dehmer, Mina Rajabi-Parsa, Abbe Mowshowitz, Frank Emmert-Streib

Publications and Research

In this paper, we study several distance-based entropy measures on fullerene graphs. These include the topological information content of a graph Ia(G), a degree-based entropy measure, the eccentric-entropy Ifs(G), the Hosoya entropy H(G) and, finally, the radial centric information entropy Hecc. We compare these measures on two infinite classes of fullerene graphs denoted by A12n+4 and B12n+6. We have chosen these measures as they are easily computable and capture meaningful graph properties. To demonstrate the utility of these measures, we investigate the Pearson correlation between them on the fullerene graphs.


Content Analysis Of Data Science Graduate Programs In The U.S., Duo Li, Elizabeth Milonas, Qiping Zhang Jul 2017

Content Analysis Of Data Science Graduate Programs In The U.S., Duo Li, Elizabeth Milonas, Qiping Zhang

Publications and Research

Data science is an emerging academic field (Paul & Aithal, 2018), which has its origins in “Big Data/Cloud Computing” and complexity science domains. Data Science is about managing large and complex data (Big Data management) and analytics technologies (Paul & Aithal, 2018). Data, technology, and people are the three pillars of data science. In addition, Data Science is composed of three key areas: analytics, infrastructure, and data curation (Tang & Sae-Lim, 2016). Stanton (2012) defined data science as “an emerging area of work concerned with the collection, preparation, analysis, visualization, management, and preservation of large collections of information (Song & …


Time Series Analysis For Psychological Research: Examining And Forecasting Change, Andrew T. Jebb, Louis Tay, Wei Wang, Qiming Huang Jun 2015

Time Series Analysis For Psychological Research: Examining And Forecasting Change, Andrew T. Jebb, Louis Tay, Wei Wang, Qiming Huang

Publications and Research

Psychological research has increasingly recognized the importance of integrating temporal dynamics into its theories, and innovations in longitudinal designs and analyses have allowed such theories to be formalized and tested. However, psychological researchers may be relatively unequipped to analyze such data, given its many characteristics and the general complexities involved in longitudinal modeling. The current paper introduces time series analysis to psychological research, an analytic domain that has been essential for understanding and predicting the behavior of variables across many diverse fields. First, the characteristics of time series data are discussed. Second, different time series modeling techniques are surveyed that …