Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Physical Sciences and Mathematics

Reading Pdfs Using Adversarially Trained Convolutional Neural Network Based Optical Character Recognition, Michael B. Brewer, Michael Catalano, Yat Leung, David Stroud Dec 2020

Reading Pdfs Using Adversarially Trained Convolutional Neural Network Based Optical Character Recognition, Michael B. Brewer, Michael Catalano, Yat Leung, David Stroud

SMU Data Science Review

A common problem that has plagued companies for years is digitizing documents and making use of the data contained within. Optical Character Recognition (OCR) technology has flooded the market, but companies still face challenges productionizing these solutions at scale. Although these technologies can identify and recognize the text on the page, they fail to classify the data to the appropriate datatype in an automated system that uses OCR technology as its data mining process. The research contained in this paper presents a novel framework for the identification of datapoints on check stub images by utilizing generative adversarial networks (GANs) to …


Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …


An Analysis Of Technological Components In Relation To Privacy In A Smart City, Kayla Rutherford, Ben Lands, A. J. Stiles Nov 2020

An Analysis Of Technological Components In Relation To Privacy In A Smart City, Kayla Rutherford, Ben Lands, A. J. Stiles

James Madison Undergraduate Research Journal (JMURJ)

A smart city is an interconnection of technological components that store, process, and wirelessly transmit information to enhance the efficiency of applications and the individuals who use those applications. Over the course of the 21st century, it is expected that an overwhelming majority of the world’s population will live in urban areas and that the number of wireless devices will increase. The resulting increase in wireless data transmission means that the privacy of data will be increasingly at risk. This paper uses a holistic problem-solving approach to evaluate the security challenges posed by the technological components that make up a …


Cash Flow Forecasting Using Probabilistic Neural Networks, Marwan Ashour Nov 2020

Cash Flow Forecasting Using Probabilistic Neural Networks, Marwan Ashour

Journal of the Arab American University مجلة الجامعة العربية الامريكية للبحوث

This paper aimed to compare the modern methods of cash flow forecasting with the traditional ones. In other words, the researcher compared between the Probabilistic Neural Networks and Transfer Function. It is worth mentioning that cash flow forecasting , nowadays, is very important and helps the upper management plan, control, assess the performance and make decisions. More specifically, in this paper, the Artificial Neural networks were used to diagnose the nature of the cash flow for the next period of time and then forecast the cash flow. The experiment was conducted in The General company for Electricity Distribution in Baghdad. …


Fall 2020 Oct 2020

Fall 2020

In The Loop

Studio CDM Documents Remote Initiatives; "Tom of Your Life" Film Release; Animation Jam Goes Virtual; DePaul Experimental Film Showcase 2020; Trackmania Soundtrack; Alumni Games at Pixel Pop; Alumnus Commemorates St. Vincent de Paul; Cybersecurity Champion Alina Kuzmenkova; Walking the Walk: Youth programs at CDM express DePaul’s Vincentian values; Fair Treatment: Three initiatives address racial inequity in health care; They've Got You Covered: A School of Design instructor leads a cottage industry of makers protecting essential workers from the novel coronavirus; Meet Would-Be Hot Topic Influencer Vera Drew; Data Detectives: CDM helps Chicago track the racial proportions of its COVID-19 cases


Extraction D’Information À Partir Des Sites Web En Arabe Basée Sur Une Méthode À Base Des Règles, Moustafa Alhajj, Amani Sabra Oct 2020

Extraction D’Information À Partir Des Sites Web En Arabe Basée Sur Une Méthode À Base Des Règles, Moustafa Alhajj, Amani Sabra

Al Jinan الجنان

Cet article décrit un outil qui se sert de l’ingénierie de la langue pour l’extraction d’information à partir des sites web en arabe, Ces informations serviront aux documentalistes du Web poue créer des fches d’archivage pour les sites. Une fche d’archivage est proposée, l’objectif étant de remplir cette fche automatiquement. Pour la reconnaissance et la classifcation des segments textuels, la méthode d’exploration contextuelle proposée par Descles est utilisée, les marqueurs et règles linguistiques sont défnis en se basant sur une étude synthétique des spécifcités de la langue arabe. Un corpus de plus de 1300 sites Web en langue arabe a …


Data Is Personal: We Should Treat It As Such, Kaleb Dunn Sep 2020

Data Is Personal: We Should Treat It As Such, Kaleb Dunn

Student Papers in Public Policy

The rise of the internet as a fact of daily life is the defining element of the modern age. Widespread use of the internet has fundamentally altered entire industries, and much of American life has migrated online. Dating is augmented by online dating; shopping by online shopping; television by internet streaming.

The digitization of American life has brought with it considerable benefits, including great convenience and innumerable efficiencies, but it has not come without a cost. Although there are many business models used by internet companies, many of the now-largest companies in the world have converged on one entity upon …


Implement Multi-Factor Authentication On All Federal Systems Now, Megan Walsh Sep 2020

Implement Multi-Factor Authentication On All Federal Systems Now, Megan Walsh

Student Papers in Public Policy

The White House Office of Management and Budget recorded 31,107 information security incidents in fiscal year 2018. The most common attacks to gain access to a user’s login credentials were e-mail/phishing, web-based attack, and brute force entering of username/password combinations. Given this high number of incidents, strong reliance on computers for everyday business, and common attacks that target passwords, information security should be a priority for information technology administrators working in federal agencies.


Removing Racially Biased Algorithms In Policing, Andie Lee Sep 2020

Removing Racially Biased Algorithms In Policing, Andie Lee

Student Papers in Public Policy

Local police departments use algorithm-based programs to do police work and predict crime. Technology has created the police tactic of predictive crime prevention. Police work, however, requires social skills, assessment of the environment, and most importantly human interaction. Automated policing lacks these characteristics. Moreover, the algorithms used to make crime predictions and risk assessments have disproportionately affected minorities.


The Case For Online Ranked-Choice Voting, Rayyan Khan Sep 2020

The Case For Online Ranked-Choice Voting, Rayyan Khan

Student Papers in Public Policy

Maine was the first to embrace ranked-choice voting on a statewide level in 2018, using it for all state and general elections. Maine voters will be the first to use ranked-choice voting in a presidential election in 2020. This system differs from traditional voting in that voters rank candidates rather than choose just one. Supporters of ranked-choice voting tout it as a better model for accurately representing the values of the voting population; however, a study conducted in San Francisco details a potential shortfall referred to as “ballot fatigue” that the theoretically-ideal system may face as it struggles to deal …


Topic Modeling To Understand Technology Talent, Chad Madding, Allen Ansari, Chris Ballenger, Aswini Thota Sep 2020

Topic Modeling To Understand Technology Talent, Chad Madding, Allen Ansari, Chris Ballenger, Aswini Thota

SMU Data Science Review

Attracting technology talent in today’s hiring climate is more complicated than ever. Recruiting for technology talent in non-technology industries is even more challenging. This intense hiring landscape is motivating companies not only to attract the right talent but also to create a culture that can retain and grow that talent. In this paper, we developed algorithms and present insights that use data provided in reviews to glean information employers can use to address or even change their priorities to meet the demands of an ever-changing job market. The core of our research is to investigate and attribute the role of …


Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed Sep 2020

Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed

SMU Data Science Review

Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused …


Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite Sep 2020

Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite

SMU Data Science Review

In this paper, modeling techniques for the forecasting of wind speed using historical values observed by Light Detection and Ranging (LIDAR) sensors in an offshore context are described. Both univariate time series and multivariate time series modeling techniques leveraging meteorological data collected simultaneously with the LIDAR data are evaluated for potential contributions to predictive ability. Accurate and timely ability to predict wind values is essential to the effective integration of wind power into existing power grid systems. It allows for both the management of rapid ramp-up / down of base production capacity due to highly variable wind power inputs and …


Toxic Language Detection Using Robust Filters, Deepti Kunupudi, Shantanu Godbole, Pankaj Kumar, Suhas Pai Sep 2020

Toxic Language Detection Using Robust Filters, Deepti Kunupudi, Shantanu Godbole, Pankaj Kumar, Suhas Pai

SMU Data Science Review

Social networks sometimes become a medium for threats, insults, and other types of cyberbullying. A large number of people are involved in online social networks. Hence, the protection of network users from anti-social behavior is a critical activity [19]. One of the significant tasks of such activity is the detection of toxic language. Abusive/Toxic language in user-generated online content has become an issue of increasing importance in recent years. Most current commercial methods use blacklists and regular expressions; however, these measures fall short when contending with more subtle, lesser-known examples of hate speech, profanity, or swearing[6]. Abusive language classification has …


Reducing Age Bias In Machine Learning: An Algorithmic Approach, Adriana Solange Garcia De Alford, Steven K. Hayden, Nicole Wittlin, Amy Atwood Sep 2020

Reducing Age Bias In Machine Learning: An Algorithmic Approach, Adriana Solange Garcia De Alford, Steven K. Hayden, Nicole Wittlin, Amy Atwood

SMU Data Science Review

In this paper, we study the prevalence of bias in machine learning; we explore the life cycle phases where bias is potentially introduced into a machine learning model; and lastly, we present how adversarial learning can be leveraged to measure unwanted bias and unfair behavior from a machine learning algorithm. This study focuses particularly on the topics of age bias in predicting employee attrition and presents a practical approach for how adversarial learning can be successful in mitigating age bias. To measure bias, we calculate group fairness metrics across five-year age groups and evaluate fairness between a baseline predictive model …


Forecasting Spare Parts Sporadic Demand Using Traditional Methods And Machine Learning - A Comparative Study, Bhuvana Adur Kannan, Ganesh Kodi, Oscar Padilla, Dough Gray, Barry C. Smith Sep 2020

Forecasting Spare Parts Sporadic Demand Using Traditional Methods And Machine Learning - A Comparative Study, Bhuvana Adur Kannan, Ganesh Kodi, Oscar Padilla, Dough Gray, Barry C. Smith

SMU Data Science Review

Sporadic demand presents a particular challenge to traditional time forecasting methods. In the past 50 years, there has been developments, such as, the Croston Model [3], which has improved forecast performance. With the rise of Machine Learning (ML) there is abundant research in the field of applying ML algorithms to predict sporadic demand [8][12][9]. However, most existing research has analyzed this problem from the demand side [17]. In this paper, we tackle this predictive analytics challenge from the supply side. We perform a comparative analysis utilizing a spare parts demand dataset from an Original Equipment Manufacturer (OEM). Since traditional measurements …


Floor Regularization And Investigation Of Transfer Learning Through Sharing Of Probability Distribution Parameters, Daniel Byrne, Stacey Smith, Joanna Duran, John Santerre Sep 2020

Floor Regularization And Investigation Of Transfer Learning Through Sharing Of Probability Distribution Parameters, Daniel Byrne, Stacey Smith, Joanna Duran, John Santerre

SMU Data Science Review

In this work we introduce a simple new regularization technique, aptly named Floor, which drops low weight connections on every forward pass whenever they fall below a specified event horizon threshold. We compare the results of this technique side by side on identical network architectures between regular Dropout and Floor algorithms. We report similar or improved regularization, with the Floor algorithm versus regular Dropout and/or in concert with regular Dropout.

In this paper we also describe our research into transfer learning by sharing of probability distribution parameters in which we investigated methods of transferring Gaussian prior parameters derived from the …


The Transcript Profile Changes With Developmental Maturation Of Fetal Lung Type 2 Cells: An Analysis Of Rnaseq Data, Heber C. Nielsen, Volodymyr Orlov, Rebecca Holsapple, Monnie Mcgee Aug 2020

The Transcript Profile Changes With Developmental Maturation Of Fetal Lung Type 2 Cells: An Analysis Of Rnaseq Data, Heber C. Nielsen, Volodymyr Orlov, Rebecca Holsapple, Monnie Mcgee

SMU Data Science Review

In this paper, we utilize next-generation sequencing (NGS) data from the LungMap project to identify and characterize the developmental RNA transcriptome in alveolar epithelial type II cells of embryonic mouse lungs of gestational ages embryonic days 16 (E16) and 18 (E18). Late gestation lung cellular maturation is necessary for survival at birth. Using R and the BioConductor packages for RNAseq analysis, we analyze changes in the mouse lung RNA transcriptome as this maturation process takes place. We particularly identify the cluster of genes whose expression changes markedly between immature (E16) and mature (E18) lungs which can be used to define …


Forecasting Power Consumption In Pennsylvania During The Covid-19 Pandemic: A Sarimax Model With External Covid-19 And Unemployment Variables, Jackson Au, Javier Saldaña Jr., Ben Spanswick, John Santerre Aug 2020

Forecasting Power Consumption In Pennsylvania During The Covid-19 Pandemic: A Sarimax Model With External Covid-19 And Unemployment Variables, Jackson Au, Javier Saldaña Jr., Ben Spanswick, John Santerre

SMU Data Science Review

In this paper, we present how electrical consumption can reveal insight into the novel COVID-19 pandemic spread. We analyze electrical power consumption provided by PPL Electric Utilities, Department of Labor’s unemployment claims, and the COVID-19 cases/deaths for the State of Pennsylvania to study the impact of the pandemic on the infrastructure. Using a SARIMA model as our benchmark and we analyzed the use of a SARIMAX model to forecast the power consumption in Pennsylvania 14 days ahead. Our work quantifies and illuminates the effect that the strict legislation passed to minimize the spread of COVID19 had a on power consumption. …


Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre Aug 2020

Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre

SMU Data Science Review

In this paper, we explore a representation methodology for the compression of DNA isolates. Using lossless string compression via tokenization of frequently repeated segments of DNA, we reduce the length of the isolates to be counted as k-mers for classification. With this new representation, we apply a previously established feature sampling method to dramatically reduce the feature space. In understanding the genetic diversity, we also look at conserving biological function across these spaces. Using a random forest model we were able to predict the resistance or susceptibility of bacteria with 85-90\% accuracy, with a 30-50\% reduction in overall isolate length, …


Spoken Language Recognition On Open-Source Datasets, Brady Arendale, Samira Zarandioon, Ryan Goodwin, Douglas Reynolds Aug 2020

Spoken Language Recognition On Open-Source Datasets, Brady Arendale, Samira Zarandioon, Ryan Goodwin, Douglas Reynolds

SMU Data Science Review

The field of speaker and language recognition is constantly being researched and developed, but much of this research is done on private or expensive datasets, making the field more inaccessible than many other areas of machine learning. In addition, many papers make performance claims without comparing their models to other recent research. With the recent development of public multilingual speech corpora such as Mozilla's Common Voice as well as several single-language corpora, we now have the resources to attempt to address both of these problems. We construct an eight-language dataset from Common Voice and a Google Bengali corpus as well …


Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum Aug 2020

Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum

SMU Data Science Review

Talent is the most important asset for every organization's success. While attrition (or churn) and turnover can refer to both employees and customers, this paper will focus on employee attrition only. Many organizations accept attrition as an inevitable cost of doing business and do nothing to adopt or implement mitigating strategies to combat it. World class companies on the other hand take deliberate measures to understand, control and mitigate attrition (turnover) at every stage. Unmitigated attrition can have a devastating effect on an organization's bottom line and market value. In addition, the “invisible" costs of low employee morale, reduced employee …


An Effective Method For Attribute Subset Selection, Considering The Resource In Pattern Recognition, Bakhtiyorjon Bakirovich Akbaraliev Aug 2020

An Effective Method For Attribute Subset Selection, Considering The Resource In Pattern Recognition, Bakhtiyorjon Bakirovich Akbaraliev

Chemical Technology, Control and Management

An analytical method for determining informative sets of features (INP) is developed, taking into account the resource for criteria based on the use of a measure of dispersion of classified objects. The areas of existence of the solution are defined. The statements and properties for the Fischer-type information criterion are proved, using which the proposed analytical method for determining the INP guarantees optimal results in the sense of maximizing the selected functional. The appropriateness of choosing this type of informative criterion is justified. A method for transforming attributes is proposed. The universality of the method in relation to the type …


Human Trafficking In Nepal: Can Big Data Help?, Shushant Khanal Aug 2020

Human Trafficking In Nepal: Can Big Data Help?, Shushant Khanal

Undergraduate Research Journal

This paper provides an overview of human trafficking in Nepal, identifies strategies implemented by the government of the country to handle the problem and possibilities of using big data as a solution to the problem of human trafficking in Nepal. Big data, may be defined as the collection of a large volume of data from the past that is processed using machine learning and artificial intelligence to find a common pattern. The use of big data in tackling the problem of human trafficking is not new in developed countries like the United States but it is still a foreign idea …


Prediction Of Feed Utilization Performance In Clarias Gariepinus Using Multiple Linear Regression In Machine Learning, Adekunle Oluwatosin Familusi Jun 2020

Prediction Of Feed Utilization Performance In Clarias Gariepinus Using Multiple Linear Regression In Machine Learning, Adekunle Oluwatosin Familusi

Journal of Bioresource Management

Machine learning models can be used to make predictions about nutrient utilization performance index using available proximate analysis data on feed composition. Data from similar experiments on nutrient utilization performance was used to fit a multiple linear regression model for the prediction of four performance indexes. The Specific Growth Rate and percentage inclusion with strength of 0.57 was noted along with a negative relationship between protein efficiency and protein content. A negative relationship between Nitrogen Free Extract (NFE) and Protein Efficiency Ratio (PER) at NFE content ≥25 % was observed. PER was predicted with 85 % accuracy, while Weight Gain …