Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,033 Full-Text Articles 2,216 Authors 177,939 Downloads 155 Institutions

All Articles in Data Science

Faceted Search

1,033 full-text articles. Page 1 of 51.

Deephtlv: A Deep Learning Framework For Detecting Human T-Lymphotrophic Virus 1 Integration Sites, Johnathan Jia, Johnathan Jia 2023 The Texas Medical Center Library

Deephtlv: A Deep Learning Framework For Detecting Human T-Lymphotrophic Virus 1 Integration Sites, Johnathan Jia, Johnathan Jia

The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access)

In the 1980s, researchers found the first human oncogenic retrovirus called human T-lymphotrophic virus type 1 (HTLV-1). Since then, HTLV-1 has been identified as the causative agent behind several diseases such as adult T-cell leukemia/lymphoma (ATL) and a HTLV-1 associated myelopathy or tropical spastic paraparesis (HAM/TSP). As part of its normal replication cycle, the genome is converted into DNA and integrated into the genome. With several hundreds to thousands of unique viral integration sites (VISs) distributed with indeterminate preference throughout the genome, detection of HTLV-1 VISs is a challenging task. Experimental studies typically use molecular biology …


Named Entity Recognition From Biomedical Text, Maged Guirguis 2023 American University in Cairo

Named Entity Recognition From Biomedical Text, Maged Guirguis

Theses and Dissertations

As vast amounts of unstructured data are becoming available digitally, computer-based methods to extract relevant and meaningful information are needed. Named entity recognition (NER) is the task of identifying text spans that mention named entities, and to classify them into predefined categories. Despite the existence of numerous and well-versed NER methods, the bio-medical domain remains under-studied. The objective of this research is to identify an efficient technique for NER tasks from biomedical data. This is achieved by investigating using deep learning technologies namely pre-trained BERT [1] model and its variances SciBERT [2] and BioBERT [3]. Preprocessing the data before passing …


Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle 2023 SDSMT

Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

Recently there has been high demand for the representation learning of graphs. Graphs are a complex data structure that contains both topology and features. There are first several domains for graphs, such as infectious disease contact tracing and social media network communications interactions. The literature describes several methods developed that work to represent nodes in an embedding space, allowing for classical techniques to perform node classification and prediction. One such method is the graph convolutional neural network that aggregates the node neighbor’s features to create the embedding. Another method, Walklets, takes advantage of the topological information stored in a graph …


2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, KC Santosh 2023 University of South Dakota

2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, Kc Santosh

SDSU Data Science Symposium

Abstract. In this paper, we analyze deep visual features from 2D data representation(s) of the respiratory sound to detect evidence of lung abnormalities. The primary motivation behind this is that visual cues are more important in decision-making than raw data (lung sound). Early detection and prompt treatments are essential for any future possible respiratory disorders, and respiratory sound is proven to be one of the biomarkers. In contrast to state-of-the-art approaches, we aim at understanding/analyzing visual features using our Convolutional Neural Networks (CNN) tailored Deep Learning Models, where we consider all possible 2D data such as Spectrogram, Mel-frequency Cepstral Coefficients …


Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle 2023 SDSMT

Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

In the era of big data, there is a need for forecasting high-dimensional time series that might be incomplete, sparse, and/or nonstationary. The current research aims to solve this problem for two-dimensional data through a combination of temporal matrix factorization (TMF) and low-rank tensor factorization. From this method, we propose an expansion of TMF to two-dimensional data: temporal tensor factorization (TTF). The current research aims to interpolate missing values via low-rank tensor factorization, which produces a latent space of the original multilinear time series. We then can perform forecasting in the latent space. We present experimental results of the proposed …


Social Impacts Of Robotics On The Labor And Employment Market, Kelvin Espinal 2023 The Graduate Center, City University of New York

Social Impacts Of Robotics On The Labor And Employment Market, Kelvin Espinal

Dissertations, Theses, and Capstone Projects

Robotics have been introduced into the workplace to perform tasks that human beings have traditionally fulfilled. Complementing or substituting human labor with robotics eliminates human involvement in functions attributable to hazardous environments, heavy lifting, toxic substances, and repetitive low-level tasks. On the other hand, they are meant to be more efficient and cost-effective, saving money, time, and labor. However, since the introduction of robotics in the workforce, societal opposition has been towards this branch of technology in fear of losing employment, wages, and purpose.

Previous studies have reported an overarching societal fear that adopting robotics in the workplace and industry …


A Bidirectional Deep Lstm Machine Learning Method For Flight Delay Modelling And Analysis, Desmond B. Bisandu, Irene Moulitsas 2023 Cranfield University

A Bidirectional Deep Lstm Machine Learning Method For Flight Delay Modelling And Analysis, Desmond B. Bisandu, Irene Moulitsas

National Training Aircraft Symposium (NTAS)

Flight delays can be prevented by providing a reference point from an accurate prediction model because predicting flight delays is a problem with a specific space. Only a few algorithms consider predicted classes' mutual correlation during flight delay classification or prediction modelling tasks. None of these existing methods works for all scenarios. Therefore, the need to investigate the performance of more models in solving the problem of flight delay is vast and rapidly increasing. This paper presents the development and evaluation of LSTM and BiLSTM models by comparing them for a flight delay prediction. The LSTM does the feature extraction …


Integrated Organizational Machine Learning For Aviation Flight Data, Michael J. Pritchard, Paul Thomas, Eric Webb, Jon Martin, Austin Walden 2023 Kansas State University

Integrated Organizational Machine Learning For Aviation Flight Data, Michael J. Pritchard, Paul Thomas, Eric Webb, Jon Martin, Austin Walden

National Training Aircraft Symposium (NTAS)

An increased availability of data and computing power has allowed organizations to apply machine learning techniques to various fleet monitoring activities. Additionally, our ability to acquire aircraft data has increased due to the miniaturization of small form factor computing machines. Aircraft data collection processes contain many data features in the form of multivariate time-series (continuous, discrete, categorical, etc.) which can be used to train machine learning models. Yet, three major challenges still face many flight organizations 1) integration and automation of data collection frameworks, 2) data cleanup and preparation, and 3) embedded machine learning framework. Data cleanup and preparation has …


Visual Analytics And Modeling Of Materials Property Data, Diwas Bhattarai 2023 Louisiana State University and Agricultural and Mechanical College

Visual Analytics And Modeling Of Materials Property Data, Diwas Bhattarai

LSU Doctoral Dissertations

Due to significant advancements in experimental and computational techniques, materials data are abundant. To facilitate data-driven research, it calls for a system for managing and sharing data and supporting a set of tools for effective data analysis and modeling. Generally, a given material property M can be considered as a multivariate data problem. The dimensions of M are the values of the property itself, the conditions (pressure P, temperature T, and multi-component composition X) that control the concerned property, and relevant metadata I (source, date).

Here we present a comprehensive database considering both experimental and computational sources …


Crow Search Algorithm With Time Varying Flight Length Strategies For Feature Selection, Mohammed Abdullahi, Abdulhameed Adamu, Ibrahim Hayatu Hassan 2023 Ahmadu Bello University, Zaria, Nigeria

Crow Search Algorithm With Time Varying Flight Length Strategies For Feature Selection, Mohammed Abdullahi, Abdulhameed Adamu, Ibrahim Hayatu Hassan

Future Computing and Informatics Journal

Feature Selection (FS) is an efficient technique use to get rid of irrelevant, redundant and noisy attributes in high dimensional datasets while increasing the efficacy of machine learning classification. The CSA is a modest and efficient metaheuristic algorithm which has been used to overcome several FS issues. The flight length (fl) parameter in CSA governs crows' search ability. In CSA, fl is set to a fixed value. As a result, the CSA is plagued by the problem of being hoodwinked in local minimum. This article suggests a remedy to this issue by bringing five new concepts of time dependent fl …


Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu 2023 Claremont Colleges

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Defining The "Quadruple-A" Player: What Makes A Baseball Player Succeed In The Minor Leagues And Fail In The Major Leagues?, Sam Bogen 2023 Claremont Colleges

Defining The "Quadruple-A" Player: What Makes A Baseball Player Succeed In The Minor Leagues And Fail In The Major Leagues?, Sam Bogen

CMC Senior Theses

The "Quadruple-A" player is defined as one who is too good to play in Triple-A (the league one step down from Major League Baseball) but not good enough to play consistently in Major League Baseball. This thesis paper attempts to explain the phenomenon of the "Quadruple-A" player. Using Triple-A data from 2013-2022 and Major League data from the "Statcast Era" (2015-2022), I build logistic and linear regression models to predict Major League success based on Triple-A performance data as well as Major League Statcast data, discovering that statistics related to how a player hits the ball such as the speed …


Integrated Machine Learning And Optimization Approaches, Dogacan Yilmaz 2022 New Jersey Institute of Technology

Integrated Machine Learning And Optimization Approaches, Dogacan Yilmaz

Dissertations

This dissertation focuses on the integration of machine learning and optimization. Specifically, novel machine learning-based frameworks are proposed to help solve a broad range of well-known operations research problems to reduce the solution times. The first study presents a bidirectional Long Short-Term Memory framework to learn optimal solutions to sequential decision-making problems. Computational results show that the framework significantly reduces the solution time of benchmark capacitated lot-sizing problems without much loss in feasibility and optimality. Also, models trained using shorter planning horizons can successfully predict the optimal solution of the instances with longer planning horizons. For the hardest data set, …


Digital Technology Enables Construction Of National Governance Modernization, Yue HAO, Kaihua CHEN, Jin KANG, Xiaoguang YANG, Chao ZHANG, Xiaolong ZHENG 2022 Xidian University, Xi'an 710126, China

Digital Technology Enables Construction Of National Governance Modernization, Yue Hao, Kaihua Chen, Jin Kang, Xiaoguang Yang, Chao Zhang, Xiaolong Zheng

Bulletin of Chinese Academy of Sciences (Chinese Version)

As digital technologies continue to be integrated into the whole process of economic and social development, promoting the modernization of digital technology-enabled national governance systems and capabilities has become an important way to seize the strategic initiative in the future world competitive landscape, and has attracted the attention of countries around the world. The rapid development of digital technologies such as big data collection, storage, processing, and analysis is constantly optimizing the organizational system structure of national governance, upgrading and perfecting the quality and methods of national governance personnel, and accelerating the process of making national governance efficient, scientific, intelligent …


Big Data Technology Enabling Legal Supervision, Qingjie LIU, Shuo LIU, Yirong WU, Yueqiang WENG, Yihao WEN, Ming LI 2022 Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

Big Data Technology Enabling Legal Supervision, Qingjie Liu, Shuo Liu, Yirong Wu, Yueqiang Weng, Yihao Wen, Ming Li

Bulletin of Chinese Academy of Sciences (Chinese Version)

Legal supervision plays an important role in the national governance system and capacity. In the era of digital revolution, the rapid development of digital procuratorial work with big data legal supervision as the core promotes to reshape the legal supervision and governance system. In this study, the inherent need of legal supervision for active prosecution in the new era, and the innovative role of new public interest litigation in comprehensive social governance, are firstly analyzed. Then, the core meaning and reshaping role of big-data-enabling-legalsupervision and supervision-promoting-national-governance of digital prosecution are discussed. After summarizing the practical experiences and challenges of big …


Deepening Digital Technologies To Enable Modernization Of China’S Governance Of Health, Tara Qia SUN, Xia FENG, Yuntao LONG, Zongben XU 2022 School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, Chin

Deepening Digital Technologies To Enable Modernization Of China’S Governance Of Health, Tara Qia Sun, Xia Feng, Yuntao Long, Zongben Xu

Bulletin of Chinese Academy of Sciences (Chinese Version)

One significant goal of science and technology innovation is to set our sights on the health and safety of the people. The rapid development of digital technologies provides multiple potentials and path to achieve the modernization of China's health governance. the role of digital technologies on enabling multiple stakeholders (i.e., hospitals, doctors, government, and social groups) to improve the supply capacity, the inclusiveness, fairness, friendliness, and convenience of health service. Second, we explore the four key issues of using digital technologies to enable the governance of health construction of digital health infrastructures, the factors affecting the adoption of digital technologies, …


Strengthen Fundamental Role Of Data Element Governance In National Governance Modernization, Kaihua CHEN, Zhuo FENG, Rui GUO, Yue HAO, Jin KANG, Xiaoguang YANG, Chao ZHANG, Binbin ZHAO 2022 School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China

Strengthen Fundamental Role Of Data Element Governance In National Governance Modernization, Kaihua Chen, Zhuo Feng, Rui Guo, Yue Hao, Jin Kang, Xiaoguang Yang, Chao Zhang, Binbin Zhao

Bulletin of Chinese Academy of Sciences (Chinese Version)

Data element governance is a key factor to promote the modernization of national governance in the digital era. By strengthening the deep integration of data factors and national governance, a new model of data-driven national governance can be formed, and the national governance can be made more scientific, refined, intelligent, and efficient. The US and European countries have continuously strengthened the top-level system design, technological innovation application, collaborative governance mechanism, and global governance cooperation of data element governance, which has effectively improved the level of data element governance and provided experience for China. Nevertheless, due to the virtuality of data …


Strategic Perspective Of Leveraging New Generation Information Technology To Enable Modernization Of Emergency Management, Haibo ZHANG, Xinyu DAI, Depei QIAN, Jian LYU 2022 School of Government, Nanjing University, Nanjing 210023, China

Strategic Perspective Of Leveraging New Generation Information Technology To Enable Modernization Of Emergency Management, Haibo Zhang, Xinyu Dai, Depei Qian, Jian Lyu

Bulletin of Chinese Academy of Sciences (Chinese Version)

The application and development of the new generation information technology is a vital support to realize the modernization of emergency management. At present, the new generation information technology such as big data and artificial intelligence has been widely used in natural disasters, safe production, and other fields. It has improved the monitoring and early warning, regulation and law enforcement, command and decision support, rescue, and social mobilization capabilities of governments, promoted the level of intrinsic safety of enterprises, provided important support for the precise prevention and control of the COVID-19, and increased the efficiency of China’s emergency management and sense …


Digital Technology Enables Modernization Of National Statistics, Zongben XU, Yanyun ZHAO, Liping ZHU, Guang CHEN, Hongyun ZHANG 2022 School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China

Digital Technology Enables Modernization Of National Statistics, Zongben Xu, Yanyun Zhao, Liping Zhu, Guang Chen, Hongyun Zhang

Bulletin of Chinese Academy of Sciences (Chinese Version)

The modernization of national statistics is part of the modernization of national governance. Digital technology has provided power for the transformation of statistical production mode, the improvement of statistical productivity, and the reconstruction of statistical production relations. Digital technology has become an important prerequisite for the promotion of statistical modernization reform. This study summarizes the international experience of digital technology enabling government statistics, the top-level design of national statistical legal system, and the importance of digital technology in promoting the modernization of statistics. This study also analyzes the main challenges existing in the current national statistics and data work. Finally, …


Fairness And Privacy In Machine Learning Algorithms, Neha Bhargava 2022 Kennesaw State University

Fairness And Privacy In Machine Learning Algorithms, Neha Bhargava

Master of Science in Computer Science Theses

Roughly 2.5 quintillion bytes of data is generated daily in this digital era. Manual processing of such huge amounts of data to extract useful information is nearly impossible but with the widespread use of machine learning algorithms and their ability to process enormous data in a fast, cost-effective, and scalable way has proven to be a preferred choice to glean useful insights and solve business problems in many domains. With this widespread use of machine learning algorithms there has always been concerns about the ethical issues that may arise from the use of this modern technology. While achieving high accuracies, …


Digital Commons powered by bepress