Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Biology

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 38

Full-Text Articles in Bioinformatics

The Role Of Machine Learning And Network Analyses In Understanding Microbial Composition In An Experimental Prairie, Ali Eastman Oku Jan 2023

The Role Of Machine Learning And Network Analyses In Understanding Microbial Composition In An Experimental Prairie, Ali Eastman Oku

Graduate Research Theses & Dissertations

Machine learning and network analyses are powerful modern tools can process and map out connections between large amount of ecological data from complex environmental communities. Random forests, an ensemble machine learning algorithm, are particularly powerful as they can capture complex patterns in data while remaining easily interpretable. These tools are specifically useful in experimental settings where different types of data are collected. The aim of this study was to demonstrate the utility of machine learning models and network analyses at analyzing diverse ecological data from dynamic plant-soil microbial communities in a prairie ecosystem. Our experimental system is an experimental prairie …


The Genetics Of Skin Cancer: What Genes Drive The Development Of Basal Cell Carcinoma, Squamous Cell Carcinoma, And Melanoma?, Cassandra Poole, Abagail Pack, Elizabeth Whitehead, Virginia Marshall Oct 2022

The Genetics Of Skin Cancer: What Genes Drive The Development Of Basal Cell Carcinoma, Squamous Cell Carcinoma, And Melanoma?, Cassandra Poole, Abagail Pack, Elizabeth Whitehead, Virginia Marshall

Spring Showcase for Research and Creative Inquiry

Skin cancer is one of the most common forms of cancer worldwide. The American Academy of Dermatology estimates that 9500 people in the United States are diagnosed with skin cancer every day, and that 1 in 5 Americans will be diagnosed with skin cancer by age 70. With such a high prevalence of disease, understanding how skin cancer develops and how it can be treated is extremely important. This project aims to analyze the genes involved in the development of the three most common forms of skin cancer: basal cell carcinoma, squamous cell carcinoma, and melanoma.


Ubjective Information And Survival In A Simulated Biological System, Tyler S. Barker, Massimiliano Pierobon, Peter J. Thomas Apr 2022

Ubjective Information And Survival In A Simulated Biological System, Tyler S. Barker, Massimiliano Pierobon, Peter J. Thomas

School of Computing: Faculty Publications

Information transmission and storage have gained traction as unifying concepts to characterize biological systems and their chances of survival and evolution at multiple scales. Despite the potential for an information-based mathematical framework to offer new insights into life processes and ways to interact with and control them, the main legacy is that of Shannon’s, where a purely syntactic characterization of information scores systems on the basis of their maximum information efficiency. The latter metrics seem not entirely suitable for biological systems, where transmission and storage of different pieces of information (carrying different semantics) can result in different chances of survival. …


Predicting Gene Function Of Unknown Yeast Orfs Through Phylogenetic Comparative Analysis, Lewis Barr Jan 2022

Predicting Gene Function Of Unknown Yeast Orfs Through Phylogenetic Comparative Analysis, Lewis Barr

Graduate Research Showcase

Yeast (Saccharomyces cerevisiae) has been an instrumental model system for an extraordinary diverse array of research applications for over a century now. The S. cerevisiae genome was fully sequenced in 1996, and, as a result, 6,753 potential proteins were identified. These putative proteins were established by investigating likely open reading frames within the genome. Over the past few decades, nearly 5,000 open reading frames (ORFs) and their expressed proteins have been described, and the remaining undefined open reading frames are labeled as open reading frames of unknown function (ORFans). To better understand the remaining gaps within the S. …


Construction And Analysis Of Three Multi-Partite Synthetic Microbial Communities, Alexander J. Lazzara, Jacob K. Fanning May 2021

Construction And Analysis Of Three Multi-Partite Synthetic Microbial Communities, Alexander J. Lazzara, Jacob K. Fanning

Honors Theses

Microbial Communities are of interest to molecular biologists hoping to understand the nature of metabolic interactions between co-existing, or possibly mutualistic, organisms. These interactions are ubiquitous in nature, but understanding the molecular mechanisms involved remains challenging and not well understood. Here, we design three tri-partite microbial circuits based on possible interactions among involved microbes, which are discussed and may suggest mutualistic interactions. Carbon and nitrogen molecular pathways and the intracellular metabolism of each microbe is discussed. We present minimal growth media that will ensure that organisms utilize available resources, which may originate from metabolic processes in neighboring microbes, simulating a …


The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor Jan 2021

The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor

Theses

This thesis proposes that by applying state-of-the-art software engineering tools, techniques and frameworks to currently recognised challenges in bioinformatics, improved outcomes can be attained in that field. It begins by decomposing software engineering into two categories, namely process and architecture, and choosing two key challenges in the practice of bioinformatics: reproducibility and scalability. The body of the thesis is an exploration of the intersection between these two software engineering categories and these two bioinformatics challenges. The question is asked: Can best practices in professional software engineering be applied to address key issues in the bioinformatics domain, creating positive outcomes? And …


Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani Jan 2021

Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani

Theses and Dissertations (Comprehensive)

As new techniques have been introduced, specifically the possibility of complete genome sequencing, better methods of defining bacterial species have also been proposed. One of the most recently proposed methods, using bioinformatic techniques, is to calculate the average nucleotide identity (ANI) between the homologous genome segments of different isolates. Another method for species discrimination that has been tested successfully is the similarity of DNA compositional signatures. However, in a recent update, DNA signatures split the available Escherichia coli complete genomes into three groups. To check if this result was consistent with such genomes belonging to different species, we tested methods …


A Genomic Analysis Of Bobcat Populations In North America With A Comparison To The Canada Lynx: An Assessment Of Local Adaptation To Unique Ecoregions And Phylogeography, Jennifer C. Broderick May 2020

A Genomic Analysis Of Bobcat Populations In North America With A Comparison To The Canada Lynx: An Assessment Of Local Adaptation To Unique Ecoregions And Phylogeography, Jennifer C. Broderick

Electronic Theses and Dissertations

Bobcats (Lynx rufus) are an ecologically and genetically diverse species with a large contiguous range throughout North America. The species not only has a wide array of phenotypic variation compared to other mammals, but shows marked adaptability across ecozones with differing ecological influences. It is these various selective pressures in distinctive parts of the continent that have likely led to localized adaptations within the bobcat metapopulations. The species is also marked by its ability to maintain connectivity and populations in anthropogenically developed areas, an advantage it has over other felids, including its close relative the Canada lynx ( …


Bioinformatics Ii, Bio 3352, Course Outline, Eugenia G. Giannopoulou May 2019

Bioinformatics Ii, Bio 3352, Course Outline, Eugenia G. Giannopoulou

Open Educational Resources

This course is a continuation of Bioinformatics I. Topics include gene expression, microarrays, next- generation sequencing methods, RNA-seq, large genomic projects, protein structure and stability, protein folding, and computational structure prediction of proteins; proteomics; and protein-nucleic acid interactions. The lab component includes R-based statistical data analysis on large datasets, introduction to big data analysis tools, protein visualization software, internet-based tools and high-level programming languages.


Identification And Folding Of Mirna Through Machine Learning, Xavier Pellow Jun 2018

Identification And Folding Of Mirna Through Machine Learning, Xavier Pellow

The International Student Science Fair 2018

As of writing, there is currently no efficient way to accurately identify miRNA or predict the structure of miRNA without the usage of a lab. The purpose of this work is to provide a framework which allows for efficient identification of mature miRNA and folding of pre-miRNA using a feedforward neural network (FFNN) and probabilistic context-free grammar (PCFG) parsing, respectively. After training, the FFNN developed an accuracy of 98%. Out of all control cases using high confidence miRNA, the PCFG used returned folded structures that matched the canonical structures to an accuracy of 81%. The results of this work indicates …


Newall Glacier Nucleic Acid Analysis, Shannon Turner Apr 2018

Newall Glacier Nucleic Acid Analysis, Shannon Turner

Honors Projects

The Newall Glacier is located in Antarctica between Mount Newall and Mount Weyant, at approximately 77°30′S, 162°50′E. Having existed for millions of years, and being rarely touched by human populations, glaciers are a major source of information on climate and life in the past. During the past 5 decades, a multi-country team of scientists have collaborated to drill into many of Antarctica’s glaciers and ice fields, removing ice cores for scientific investigation. The ice core section chosen for this project was drilled from the Newall Glacier in 1988 and its depth was from 100.670 to 101.000 m. The purpose of …


Software Development For Genome Sequence Analysis, David Farr May 2017

Software Development For Genome Sequence Analysis, David Farr

Symposium Of University Research and Creative Expression (SOURCE)

The cost of genome sequencing has decreased rapidly, expanding availability for many biological applications (Muir 2016). For example, researchers can now obtain genome sequences from multiple populations under different types of selection. Comparison of these sequences allows for identification of chromosome regions and specific genes associated with adaptive evolution (Kelly 2013). As an increasing number of researchers engage in this type of inquiry, many have created in-house computer scripts to analyze the raw sequence data (e.g., Kelly 2013), creating a gap in both continuity and standardization.

Using a test dataset and preliminary results from an ongoing artificial selection experiment in …


1. Types Of Alignment: Presentations & Demos Assignment, Sarah O'Leary-Driscoll Oct 2016

1. Types Of Alignment: Presentations & Demos Assignment, Sarah O'Leary-Driscoll

Sequence Alignments

Pairwise Alignment: DNA

Pairwise Alignment: Protein

Multiple Sequence Alignment: DNA

Multiple Sequence Alignment: Protein


2016-01-A3dsrinp-Csc-Sta-Cmb-522-Bps-542, Raymond Pulver, Neal Buxton, Xiaodong Wang, John Lucci, Jean Yves Hervé, Lenore Martin May 2016

2016-01-A3dsrinp-Csc-Sta-Cmb-522-Bps-542, Raymond Pulver, Neal Buxton, Xiaodong Wang, John Lucci, Jean Yves Hervé, Lenore Martin

Bioinformatics Software Design Projects

Cholesterol is carried and transported through bloodstream by lipoproteins. There are two types of lipoproteins: low density lipoprotein, or LDL, and high density lipoprotein, or HDL. LDL cholesterol is considered “bad” cholesterol because it can form plaque and hard deposit leading to arteries clog and make them less flexible. Heart attack or stroke will happen if the hard deposit blocks a narrowed artery. HDL cholesterol helps to remove LDL from the artery back to the liver.

Traditionally, particle counts of LDL and HDL plays an important role to understanding and prediction of heart disease risk. But recently research suggested that …


Sequencing Techniques: A Comparison Assignment, Sarah O'Leary-Driscoll Oct 2015

Sequencing Techniques: A Comparison Assignment, Sarah O'Leary-Driscoll

Sequencing & Genome Mining

With your partner, create some sort of visual (table, map, chart, other, ask me!) that compares the main types of sequencing that we discussed, as well as two of the techniques considered 'next generation'.


Discussion Questions: Genome Mining, Sarah O'Leary-Driscoll Oct 2015

Discussion Questions: Genome Mining, Sarah O'Leary-Driscoll

Sequencing & Genome Mining

No abstract provided.


Alignment Information, Sarah O'Leary-Driscoll Oct 2015

Alignment Information, Sarah O'Leary-Driscoll

Sequence Alignments

Pairwise DNA alignment is frequently used to identify similar regions that will show how two sequences have functional or structural similarities. It can also be used to show how exons and introns change between different sequences and whether they have an effect on the final structure of the RNA after the DNA is processed within a cell.


Alignment Outline, Sarah O'Leary-Driscoll Oct 2015

Alignment Outline, Sarah O'Leary-Driscoll

Sequence Alignments

No abstract provided.


2: Sequence Alignment Practice Activity, Sarah O'Leary-Driscoll Oct 2015

2: Sequence Alignment Practice Activity, Sarah O'Leary-Driscoll

Sequence Alignments

Now that you have learned how to do the four basic sequence alignments (Pairwise and Multiple for both nucleotide and protein sequences) select a gene/protein, it may be one that you've used before, and run each of these alignments.


Pt. 2: Presentation / Paper Guidelines, Sarah O'Leary-Driscoll Oct 2015

Pt. 2: Presentation / Paper Guidelines, Sarah O'Leary-Driscoll

Research Project

The presentations for your project should follow the same format that the paper would, but in a much more abbreviated form, aim for 5-7 minutes.


Project Guidelines, Sarah O'Leary-Driscoll Oct 2015

Project Guidelines, Sarah O'Leary-Driscoll

Research Project

No abstract provided.


Pt. 1: Research Question & Background, Sarah O'Leary-Driscoll Oct 2015

Pt. 1: Research Question & Background, Sarah O'Leary-Driscoll

Research Project

No abstract provided.


Primer Design Activity, Sarah O'Leary-Driscoll Oct 2015

Primer Design Activity, Sarah O'Leary-Driscoll

Primer Design

No abstract provided.


Obtaining Genomic Sequence Practice, Sarah O'Leary-Driscoll Oct 2015

Obtaining Genomic Sequence Practice, Sarah O'Leary-Driscoll

Introduction to NCBI

No abstract provided.


Dna Timeline And Poster Project, Sarah O'Leary-Driscoll Oct 2015

Dna Timeline And Poster Project, Sarah O'Leary-Driscoll

Genomics: Past & Future

The DNA timeline goes through many of the major discoveries that have driven our understanding of genetics since Mendel. Pick two scientists and create a PowerPoint slide poster (to be printed out on regular printer sized paper) that covers the following:


3: Genomics: Past & Future Bibliography, Sarah O'Leary-Driscoll Oct 2015

3: Genomics: Past & Future Bibliography, Sarah O'Leary-Driscoll

Genomics: Past & Future

No abstract provided.


Future Of Genomics: Presentations, Sarah O'Leary-Driscoll Oct 2015

Future Of Genomics: Presentations, Sarah O'Leary-Driscoll

Genomics: Past & Future

In his testimony to a House of Representatives sub-committee on health, director of the National Human Genome Research Institute, Francis S. Collins, said that the future of genomics had three main focal points:

"Genomics to Biology: The human genome sequence provides foundational information that now will allow development of a comprehensive catalog of all of the genome's components, determination of the function of all human genes, and deciphering of how genes and proteins work together in pathways and networks.

Genomics to Health: Completion of the human genome sequence offers a unique opportunity to understand the role of genetic factors in …


Database/Resource Acronyms, Sarah O'Leary-Driscoll Oct 2015

Database/Resource Acronyms, Sarah O'Leary-Driscoll

Course Information

No abstract provided.


What Is Bioinformatics?, Sarah O'Leary-Driscoll Oct 2015

What Is Bioinformatics?, Sarah O'Leary-Driscoll

Course Information

Bioinformatics has evolved into a full-fledged multidisciplinary subject that integrates developments in information and computer technology as applied to Biotechnology and Biological Sciences. Bioinformatics uses computer software tools for database creation, data management, data warehousing, data mining and global communication networking. Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of the sequences and structural information as well methods to access, search, visualize and retrieve the information. Bioinformatics concern the creation and maintenance of databases of biological information whereby researchers can both access existing information …


Comprehensive Course Syllabus, Sarah O'Leary-Driscoll Oct 2015

Comprehensive Course Syllabus, Sarah O'Leary-Driscoll

Course Information

The bioinformatics seminar is focused on developing an understanding of the principles behind genomic analyses, developing skills using the different available bioinformatics programs, and becoming aware of the past developments and current research avenues that are benefited by these types of analyses.