Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- City University of New York (CUNY) (5)
- Wright State University (5)
- New Jersey Institute of Technology (2)
- Nova Southeastern University (2)
- University of Kentucky (2)
-
- Florida International University (1)
- Kennesaw State University (1)
- Louisiana State University (1)
- Munster Technological University (1)
- Old Dominion University (1)
- The University of Southern Mississippi (1)
- University of Louisville (1)
- University of Massachusetts Amherst (1)
- University of Massachusetts Boston (1)
- University of Nebraska at Omaha (1)
- University of New Haven (1)
- University of New Orleans (1)
- Walden University (1)
- Keyword
-
- Bioinformatics (8)
- Gene-sequencing file formats (4)
- Perl (4)
- Machine learning (3)
- Alignment (2)
-
- Computational biology (2)
- Deep learning (2)
- Gene ontology (2)
- INDELs (2)
- Linux file system (2)
- Machine Learning (2)
- SNPs (2)
- Variant Calling (2)
- 1.2 COMPUTER AND INFORMATION SCIENCE (1)
- 1.6 BIOLOGICAL SCIENCES (1)
- Ab Initio Protein Structure Prediction (1)
- Artificial intellgence (1)
- Betti Numbers (1)
- Big data (1)
- Biomedical Relation Prediction (1)
- C-SSRS (1)
- Cancer (1)
- Cause of Death (1)
- Cloud computing (1)
- Clustering (1)
- Cognitive rehabilitation (1)
- Computational Drug Repositioning (1)
- Computational model (1)
- Conformational Ensemble Generator (1)
- Crisis Management (1)
- Publication
-
- Kno.e.sis Publications (5)
- Open Educational Resources (5)
- CCE Theses and Dissertations (2)
- Dissertations (2)
- Master's Theses (2)
-
- Computer Science Theses & Dissertations (1)
- Department of Computer Science Publications (1)
- Doctoral Dissertations (1)
- Electronic Theses and Dissertations (1)
- FIU Electronic Theses and Dissertations (1)
- Graduate Doctoral Dissertations (1)
- Kentucky Injury Prevention and Research Center Faculty Publications (1)
- LSU Doctoral Dissertations (1)
- Master of Science in Computer Science Theses (1)
- Theses and Dissertations--Computer Science (1)
- UNO Student Research and Creative Activity Fair (1)
- University of New Orleans Theses and Dissertations (1)
- Walden Dissertations and Doctoral Studies (1)
- Publication Type
- File Type
Articles 1 - 29 of 29
Full-Text Articles in Computer Sciences
Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie
Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie
Dissertations
Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known …
Game-Assisted Rehabilitation For Post-Stroke Survivors, Hee-Tae Jung
Game-Assisted Rehabilitation For Post-Stroke Survivors, Hee-Tae Jung
Doctoral Dissertations
Stroke is a leading cause of permanent impairments among its survivors. Although patients need to go through intensive, longitudinal rehabilitation to regain function before the stroke, patients show poor engagement and adherence to rehabilitation therapies which hampers their recovery. As a means to enhance stroke survivors' motivation, engagement, and adherence to intensive and longitudinal rehabilitation, the use of games in stroke rehabilitation has received attention from research and clinical communities. In order to realize this, it is important to take a holistic, end-to-end research approach that encompasses 1) the development of game technologies that are not only entertaining but also …
Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru
Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru
Kentucky Injury Prevention and Research Center Faculty Publications
BACKGROUND: Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.
METHODS: Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created …
Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku
Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku
Master of Science in Computer Science Theses
Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …
Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra
Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra
University of New Orleans Theses and Dissertations
Proteins are an important component of living organisms, composed of one or more polypeptide chains, each containing hundreds or even thousands of amino acids of 20 standard types. The structure of a protein from the sequence determines crucial functions of proteins such as initiating metabolic reactions, DNA replication, cell signaling, and transporting molecules. In the past, proteins were considered to always have a well-defined stable shape (structured proteins), however, it has recently been shown that there exist intrinsically disordered proteins (IDPs), which lack a fixed or ordered 3D structure, have dynamic characteristics and therefore, exist in multiple states. Based on …
Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava
Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava
Electronic Theses and Dissertations
The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …
High Performance Computing Techniques To Better Understand Protein Conformational Space, Arpita Joshi
High Performance Computing Techniques To Better Understand Protein Conformational Space, Arpita Joshi
Graduate Doctoral Dissertations
This thesis presents an amalgamation of high performance computing techniques to get better insight into protein molecular dynamics. Key aspects of protein function and dynamics can be learned from their conformational space. Datasets that represent the complex nuances of a protein molecule are high dimensional. Efficient dimensionality reduction becomes indispensable for the analysis of such exorbitant datasets. Dimensionality reduction forms a formidable portion of this work and its application has been explored for other datasets as well. It begins with the parallelization of a known non-liner feature reduction algorithm called Isomap. The code for the algorithm was re-written in C …
Iamhappy: Towards An Iot Knowledge-Based Cross-Domain Well-Being Recommendation System For Everyday Happiness, Amelia Gyrard, Amit Sheth
Iamhappy: Towards An Iot Knowledge-Based Cross-Domain Well-Being Recommendation System For Everyday Happiness, Amelia Gyrard, Amit Sheth
Kno.e.sis Publications
Nowadays, healthy lifestyle, fitness, and diet habits have become central applications in our daily life. Positive psychology such as well-being and happiness is the ultimate dream of everyday people’s feelings (even without being aware of it). Wearable devices are being increasingly employed to support well-being and fitness. Those devices produce physiological signals that are analyzed by machines to understand emotions and physical state. The Internetof Things (IoT) technology connects (wearable) devices to the Internet to easily access and process data, even using Web technologies (aka Web of Things).
We design IAMHAPPY, an innovative IoT-based well-being recommendation system to encourage every …
High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami
High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami
LSU Doctoral Dissertations
Genome sequencing technology has witnessed tremendous progress in terms of throughput and cost per base pair, resulting in an explosion in the size of data. Typical de Bruijn graph-based assembly tools demand a lot of processing power and memory and cannot assemble big datasets unless running on a scaled-up server with terabytes of RAMs or scaled-out cluster with several dozens of nodes. In the first part of this work, we present a distributed next-generation sequence (NGS) assembler called Lazer, that achieves both scalability and memory efficiency by using partitioned de Bruijn graphs. By enhancing the memory-to-disk swapping and reducing the …
Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian
Dissertations
Deep learning techniques have achieved tremendous successes in a wide range of real applications in recent years. For dimension reduction, deep neural networks (DNNs) provide a natural choice to parameterize a non-linear transforming function that maps the original high dimensional data to a lower dimensional latent space. Autoencoder is a kind of DNNs used to learn efficient feature representation in an unsupervised manner. Deep autoencoder has been widely explored and applied to analysis of continuous data, while it is understudied for characterizing discrete data. This dissertation focuses on developing model-based deep autoencoders for modeling discrete data. A motivating example of …
Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh
Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh
Department of Computer Science Publications
One of the key challenges for transcriptomics-based research is not only the processing of large data but also modeling the complexity of features that are sources of variation across samples, which is required for an accurate statistical analysis. Therefore, our goal is to foster access for wet lab researchers to bioinformatics tools, in order to enhance their ability to explore biological aspects and validate hypotheses with robust analysis. In this context, user-friendly interfaces can enable researchers to apply computational biology methods without requiring bioinformatics expertise. Such bespoke platforms can improve the quality of the findings by allowing the researcher to …
Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim
Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim
Open Educational Resources
This material introduces Linux File System structures and demonstrates how to use commands to communicate with the operating system through a Terminal program. Basic program structures and system() function of Perl are discussed. A brief introduction to gene-sequencing terminology and file formats are given.
Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim
Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim
Open Educational Resources
This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided, connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to enter the data folder, which includes gene-sequencing input files and print the first five line of each file remotely. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the VMI can be used to do the exercise.
Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim
Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim
Open Educational Resources
This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided and connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to automate the tasks to create VCF files from two different sample genomes belonging to E.coli microorganisms by using the FASTA and FASTQ files in the input folder of the virtual machine. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the …
Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim
Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim
Open Educational Resources
This material briefly reintroduces the DNA double Helix structure, explains SNP and INDEL mutations in genes and describes FASTA, FASTQ, BAM and VCF file formats. It also explains the index creation, alignment, sorting, marking duplicates and variant calling steps of a simple preprocessing workflow and how to write a Perl script to automate the execution of these steps on a Virtual Machine Image.
Gogo: An Improved Algorithm To Measure The Semantic Similarity Between Gene Ontology Terms, Chenguang Zhao
Gogo: An Improved Algorithm To Measure The Semantic Similarity Between Gene Ontology Terms, Chenguang Zhao
Master's Theses
Measuring the semantic similarity between Gene Ontology (GO) terms is an essential step in functional bioinformatics research. We implemented a software named GOGO for calculating the semantic similarity between GO terms. GOGO has the advantages of both information-content-based and hybrid methods, such as Resnik’s and Wang’s methods. Moreover, GOGO is relatively fast and does not need to calculate information content (IC) from a large gene annotation corpus but still has the advantage of using IC. This is achieved by considering the number of children nodes in the GO directed acyclic graphs when calculating the semantic contribution of an ancestor node …
Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim
Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim
Open Educational Resources
This manual guides the instructor to combine the partial files of the virtual machine image and construct sequencer.ova file. It is accompanied by the partial files of the virtual machine image.
A Machine Learning Technology For Rapid Detection Of Carbon Nanotubes/Dna Hybridization In Biosensor Healthcare Applications, Steven K. Ang
A Machine Learning Technology For Rapid Detection Of Carbon Nanotubes/Dna Hybridization In Biosensor Healthcare Applications, Steven K. Ang
Master's Theses
In molecular biology, the term “DNA hybridization” generally refers to the process of forming a double stranded nucleic acid from joining two complementary strands of DNA. The degree of genetic similarity of the DNA resulting from hybridization can be detected ei ther by using the chemical characteristics of DNA samples or by utilizing reliable biosensors which transform the chemical characteristics into a source of electrical measurements. In past research about such sensors, known as DNA Hybridization Detection Systems, the thermal and electrical characteristics of carbon nanotubes are utilized to detect whether hybridization takes place or not. However, human interpretation of …
Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy
Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy
Computer Science Theses & Dissertations
Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies.
First, we identify a protein structural fragment library (Frag-K) composed of a …
Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens
Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens
FIU Electronic Theses and Dissertations
The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to …
Data Analytics Pipeline For Rna Structure Analysis Via Shape, Quinn Nelson
Data Analytics Pipeline For Rna Structure Analysis Via Shape, Quinn Nelson
UNO Student Research and Creative Activity Fair
Coxsackievirus B3 (CVB3) is a cardiovirulent enterovirus from the family Picornaviridae. The RNA genome houses an internal ribosome entry site (IRES) in the 5’ untranslated region (5’UTR) that enables cap-independent translation. Ample evidence suggests that the structure of the 5’UTR is a critical element for virulence. We probe RNA structure in solution using base-specific modifying agents such as dimethyl sulfate as well as backbone targeting agents such as N-methylisatoic anhydride used in Selective 2’-Hydroxyl Acylation Analyzed by Primer Extension (SHAPE). We have developed a pipeline that merges and evaluates base-specific and SHAPE data together with statistical analyses that provides confidence …
Question Answering For Suicide Risk Assessment Using Reddit, Amanuel Alambo, Usha Lokala, Ugur Kursuncu, Krishnaprasad Thirunarayan, Amelia Gyrard, Randon S. Welton, Jyotishman Pathak, Amit P. Sheth
Question Answering For Suicide Risk Assessment Using Reddit, Amanuel Alambo, Usha Lokala, Ugur Kursuncu, Krishnaprasad Thirunarayan, Amelia Gyrard, Randon S. Welton, Jyotishman Pathak, Amit P. Sheth
Kno.e.sis Publications
Mental Health America designed ten questionnaires that are used to determine the risk of mental disorders. They are also commonly used by Mental Health Professionals (MHPs) to assess suicidality. Specifically, the Columbia Suicide Severity Rating Scale (C-SSRS), a widely used suicide assessment questionnaire, helps MHPs determine the severity of suicide risk and offer an appropriate treatment. A major challenge in suicide treatment is the social stigma wherein the patient feels reluctance in discussing his/her conditions with an MHP, which leads to inaccurate assessment and treatment of patients. On the other hand, the same patient is comfortable freely discussing his/her mental …
Gene Ontology-Guided Force-Directed Visualization Of Protein Interaction Networks, James Lowell King
Gene Ontology-Guided Force-Directed Visualization Of Protein Interaction Networks, James Lowell King
CCE Theses and Dissertations
Protein interaction data is being generated at unprecedented rates thanks to advancements made in high throughput techniques such as mass spectrometry and DNA microarrays. Biomedical researchers, operating under budgetary constraints, have found it difficult to scale their efforts to keep up with the ever-increasing amount of available data. They often lack the resources and manpower required to analyze the data using existing methodologies. These research deficiencies impede our ability to understand diseases, delay the advancement of clinical therapeutics, and ultimately costs lives.
One of the most commonly used techniques to analyze protein interaction data is the construction and visualization of …
Citationally Enhanced Semantic Literature Based Discovery, John David Fleig
Citationally Enhanced Semantic Literature Based Discovery, John David Fleig
CCE Theses and Dissertations
We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …
Empathi: An Ontology For Emergency Managing And Planning About Hazard Crisis, Manas Gaur, Kaeedeh Shekarpour, Amelia Gyrard, Amit P. Sheth
Empathi: An Ontology For Emergency Managing And Planning About Hazard Crisis, Manas Gaur, Kaeedeh Shekarpour, Amelia Gyrard, Amit P. Sheth
Kno.e.sis Publications
In the domain of emergency management during hazard crises, having sufficient situational awareness information is critical. It requires capturing and integrating information from sources such as satellite images, local sensors and social media content generated by local people.
A bold obstacle to capturing, representing and integrating such heterogeneous and diverse information is lack of a proper ontology which properly conceptualizes this domain, aggregates and unifies datasets. Thus, in this paper, we introduce empathi ontology which conceptualizes the core concepts describing the domain of emergency managing and planning of hazard crises.
Although empathi has a coarse-grained view, it considers the necessary …
Adaptive Knowledge Networks: A Time Capsule, Swati Padhee, Anurag Illendula, Amit Sheth, Krishnaprasad Thirunarayan, Valerie L. Shalin
Adaptive Knowledge Networks: A Time Capsule, Swati Padhee, Anurag Illendula, Amit Sheth, Krishnaprasad Thirunarayan, Valerie L. Shalin
Kno.e.sis Publications
❖ Real world events are dynamic in nature Periodic events e.g. US Presidential Election Non-periodic events e.g. Cyclone Idai
❖ Need for real-time predictive analysis, trend analysis, spatio-temporal decision making, public opinion analysis for events.
❖ Current state-of-the-art curates dynamic knowledge graph from structured text.
❖ We propose creating an Adaptive Knowledge Network from incoming real-time multimodal spatio-temporally evolving data.
Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal
Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal
Theses and Dissertations--Computer Science
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task …
Automatic Identification Of Individual Drugs In Death Certificates, Soon Jye Kho, Amit Sheth, Olivier Bodenreider
Automatic Identification Of Individual Drugs In Death Certificates, Soon Jye Kho, Amit Sheth, Olivier Bodenreider
Kno.e.sis Publications
Background:
Establishing trends of drug overdoses requires the identification of individual drugs in death certificates, not supported by coding with the International Classification of Diseases. However, identifying drug mentions from the literal portion of death certificates remains challenging due to the variability of drug names.
Objectives:
To automatically identify individual drugs in death certificates.
Methods:
We use RxNorm to collect variants for drug names (generic names, synonyms, brand names) and we algorithmically generate common misspellings. We use this automatically compiled list to identify drug mentions from 703,106 death certificates and compare the performance of our automated approach to that of …
Exploring Strategies To Integrate Disparate Bioinformatics Datasets, Charbel Bader Fakhry
Exploring Strategies To Integrate Disparate Bioinformatics Datasets, Charbel Bader Fakhry
Walden Dissertations and Doctoral Studies
Distinct bioinformatics datasets make it challenging for bioinformatics specialists to locate the required datasets and unify their format for result extraction. The purpose of this single case study was to explore strategies to integrate distinct bioinformatics datasets. The technology acceptance model was used as the conceptual framework to understand the perceived usefulness and ease of use of integrating bioinformatics datasets. The population of this study included bioinformatics specialists of a research institution in Lebanon that has strategies to integrate distinct bioinformatics datasets. The data collection process included interviews with 6 bioinformatics specialists and reviewing 27 organizational documents relating to integrating …