Open Access. Powered by Scholars. Published by Universities.®

Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Genomics

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman Jan 2021

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman

Computer Science Faculty Publications

Genomic regions of high segmental duplication content and/or structural variation have led to gaps and misassemblies in the human reference sequence, and are refractory to assembly from whole-genome short-read datasets. Human subtelomere regions are highly enriched in both segmental duplication content and structural variations, and as a consequence are both impossible to assemble accurately and highly variable from individual to individual. Recently, we developed a pipeline for improved region-specific assembly called Regional Extension of Assemblies Using Linked-Reads (REXTAL). In this study, we evaluate REXTAL and genome-wide assembly (Supernova) approaches on 10X Genomics linked-reads data sets partitioned and barcoded using the …


Mrub_3019 Casa Gene Is An Ortholog To E. Coli B2760, Kelsey Heiland, Dr. Lori Scott Feb 2019

Mrub_3019 Casa Gene Is An Ortholog To E. Coli B2760, Kelsey Heiland, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This research is part of the Meiothermus ruber genome annotation project which aims to predict gene function with various bioinformatics tools. We investigated the function of Mrub_3019, which encodes the CasA protein involved in the multi-subunit effector complex for the CRISPR-Cas immunity system and predicted it to be an ortholog of E. coli K12 MG1655 b2760 (casA). We predicted that Mrub_3019 encodes the protein CasA, which is involved in PAM recognition of CRISPR interference pathway. Foreign DNA will bind to CasA, which signals Cas3 for helicase-mediated DNA degradation. Our hypothesis is supported by low E-values for pairwise alignment in NCBI …


Mrub_3015 Is Orthologous To The B2757 Gene Found In Escherichia Coli Coding For Casd, Ramona Collins, Dr. Lori Scott Feb 2019

Mrub_3015 Is Orthologous To The B2757 Gene Found In Escherichia Coli Coding For Casd, Ramona Collins, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This project is part of the Meiothermus ruber genome analysis project, which uses a collection of online bioinformatics tools to predict gene function. We investigated the biological function of the gene Mrub_3015, which we hypothesize is a component of the CRISPR-Cas prokaryotic defense system. We predict that Mrub_3015 (DNA coordinates 3055550...3056245) encodes the the CRISPR-associated protein cas5, which is integral in maintaining the crRNA-DNA structure, keeping the complex from base pairing with the target phage DNA. Our hypothesis is supported by identical hits for Mrub_3015 and b2527 to the KEGG, Pfam, TIGRfam, CDD and PDB databases as well as a …


Mrub_3018 Is Orthologous To E. Coli B2759 (Casb), Kyle Parker, Dr. Lori Scott Feb 2019

Mrub_3018 Is Orthologous To E. Coli B2759 (Casb), Kyle Parker, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This project is part of the Meiothermus ruber genome analysis project, which uses a collection of online bioinformatics tools to predict gene function. We studied the biological activity of the Mrub_3018 gene, which we hypothesize is orthologous to E. coli gene B2759. We predicted that Mrub_3018(DNA coordinates 3057916… 3058524) encodes the protein CasB. CasB is a protein in the CRISPR CASCADE that will function as a structural protein. When the rest of the proteins form an “S” formation CasB will connect the front and back of the “S” creating a back bone for the structure. It will help bind DNA …


Mrub_3014 Is Orthologous To B2756, Samir Abdelkarim, Dr. Lori Scott Jan 2019

Mrub_3014 Is Orthologous To B2756, Samir Abdelkarim, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This project is part of the Meiothermus ruber genome analysis project, which uses a collection of online bioinformatics tools to predict gene function. We investigated the biological function of the gene Mrub_3014, which we hypothesize is a component of the CRISPR-Cas prokaryotic defense system. We predict that Mrub_3014 (DNA coordinates 3054943..3055575) encodes CRISPR-associated protein Cse3/case which function as an endonuclease. Our hypothesis is supported by identical hits for Mrub_3014 and b2756 to the KEGG, Pfam, TIGRfam, CDD and PDB databases, as well as a low E-value for a pairwise NCBI BLAST comparison. Both protein products are predicted to be localized …


M. Ruber Mrub_3013 Is Orthologous To E. Coli B2755, Laura Butcher, Dr. Lori Scott Jan 2019

M. Ruber Mrub_3013 Is Orthologous To E. Coli B2755, Laura Butcher, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This project is part of the Meiothermus ruber genome analysis project, which uses a collection of online bioinformatics tools to predict gene function. We investigated the biological function of gene Mrub_3013, which we hypothesize is orthologous to b2755 in E. coli K12 MG1655 (a.k.a. Cas1). We investigated the biological function of a gene with the M. ruber locus tag of Mrub_3013, which we hypothesize is a component of the CRISPR-Cas prokaryotic defense system in M. ruber. We predict that Mrub_3013 (DNA coordinates 3,053,978-3,054,940) encodes the protein Cas1 which as part of the CRISPR-Cas system, selects and cuts the foreign …


Mrub_3020, A Paralog Of Mrub_1489, Is Orthologous To E. Coli Casc (Locus Tag B2761), Alfred Dei-Ampeh, Dr. Lori Scott Jan 2019

Mrub_3020, A Paralog Of Mrub_1489, Is Orthologous To E. Coli Casc (Locus Tag B2761), Alfred Dei-Ampeh, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

This project is part of the Meiothermus ruber genome analysis project, which uses a collection of online bioinformatics tools to predict gene function. We investigated the biological functions of two genes: mrub_3020 and mrub_1489. We make two hypotheses in this investigation: a) mrub_3020 is orthologous to the gene b2761 in E. coli K12 MG1655 (a.k.a. casC); b) mrub_1489 is a paralog of mrub_3020. We also predict that the two genes encode unique proteins: mrub_3020 with DNA coordinates 3060491…3063190 encodes a CRISPR – associated helicase (Cas3) that supports the Cascade complex of the CRISPR – Cas adaptive immune system …


Mrub_1325, Mrub_1326, Mrub_1327, And Mrub_1328 Are Orthologs Of B_3454, B_3455, B_3457, B_3458, Respectively Found In Escherichia Coli Coding For A Branched Chain Amino Acid Atp Binding Cassette (Abc) Transporter System, Bennett Tomlin, Adam Buric, Dr. Lori Scott Jan 2018

Mrub_1325, Mrub_1326, Mrub_1327, And Mrub_1328 Are Orthologs Of B_3454, B_3455, B_3457, B_3458, Respectively Found In Escherichia Coli Coding For A Branched Chain Amino Acid Atp Binding Cassette (Abc) Transporter System, Bennett Tomlin, Adam Buric, Dr. Lori Scott

Meiothermus ruber Genome Analysis Project

In this project we investigated the biological function of the genes Mrub_1325, Mrub_1326, Mrub_1327, and Mrub_1328 (KEGG map number 02010). We predict these genes encode components of a Branched Chain Amino Acid ATP Binding Cassette (ABC) transporter: 1) Mrub_1325 (DNA coordinates 1357399-1358130 on the reverse strand) encodes the ATP binding domain; 2) Mrub_1326 (DNA coordinates 1358127-1359899 on the reverse strand) encodes the ATP-binding domain and permease domain; 3) Mrub_1327 (DNA coordinates 1359899-1360930 on the reverse strand) encodes a permease domain; and 4)Mrub_1328 (DNA coordinates 1711022-1712185 on the reverse strand) encodes the substrate binding domain. This system is not predicted to …


Development, Evaluation, And Application Of A Novel Error Correction Method For Next Generation Sequencing Data, Isaac Akogwu Dec 2017

Development, Evaluation, And Application Of A Novel Error Correction Method For Next Generation Sequencing Data, Isaac Akogwu

Dissertations

Tremendous evolvement in sequencing technologies and the vast availability of data due to decreasing cost of Next-Generation-Sequencing (NGS) has availed scientists the opportunity to address a wide variety of evolutionary and biological issues. NGS uses massively parallel technology to accelerate the process at the expense of accuracy and read length in comparison to earlier Sanger methods. Therefore, computational limitations exist in how much analysis and information can be gleaned from the data without performing some form of error correction.

Error correction process is laborious and consumes a lot of computational resources. Despite the existence of many NGS data error correction …


Discovery And Validation Of Information Theory-Based Transcription Factor And Cofactor Binding Site Motifs., Ruipeng Lu, Eliseos J Mucaki, Peter K Rogan Mar 2017

Discovery And Validation Of Information Theory-Based Transcription Factor And Cofactor Binding Site Motifs., Ruipeng Lu, Eliseos J Mucaki, Peter K Rogan

Biochemistry Publications

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, …


Characterization Of Somatically-Eliminated Genes During Development Of The Sea Lamprey (Petromyzon Marinus), Stephanie A. Bryant Jan 2016

Characterization Of Somatically-Eliminated Genes During Development Of The Sea Lamprey (Petromyzon Marinus), Stephanie A. Bryant

Theses and Dissertations--Biology

The sea lamprey (Petromyzon marinus) undergoes programmed genome rearrangements (PGRs) during early development that facilitate the elimination of ~20% of the genome from the somatic cell lineage, resulting in distinct somatic and germline genomes. To improve our understanding of the evolutionary/developmental logic of PGR, we generated computational predictions to identify candidate germline-specific genes within a transcriptomic dataset derived from adult germline and the embryonic stages encompassing PGR. Validation studies identified 44 germline-specific genes and characterized patterns of transcription and DNA loss during early embryogenesis. Expression analyses reveal that several of these genes are differentially expressed during early embryogenesis …


An Exploration Of The Phylogenetic Placement Of Recently Discovered Ultrasmall Archaeal Lineages, Jeffrey M. O'Brien Aug 2015

An Exploration Of The Phylogenetic Placement Of Recently Discovered Ultrasmall Archaeal Lineages, Jeffrey M. O'Brien

Honors Scholar Theses

In recent years, several new clades within the domain Achaea have been discovered. This is due in part to microbiological sampling of novel environments, and the increasing ability to detect and sequence uncultivable organisms through metagenomic analysis. These organisms share certain features, such as small cell size and streamlined genomes. Reduction in genome size can present difficulties to phylogenetic reconstruction programs. Since there is less genetic data to work with, these organisms often have missing genes in concatenated multiple sequence alignments. Evolutionary Biologists have not reached a consensus on the placement of these lineages in the archaeal evolutionary tree. There …


Addressing The Black Box Phenomenon Of Genome Sequencing And Assembly, Brandon Carter May 2015

Addressing The Black Box Phenomenon Of Genome Sequencing And Assembly, Brandon Carter

Senior Honors Projects, 2010-2019

Genomics, a study of all genetic material in an organism, is a new discipline having a great impact on medicine, agriculture, and environmental phenomena. Most undergraduate faculty members were not formally trained in genomics and must retool themselves in order to stay current with these evolving technologies. Advances in sequencing technology have resulted in an explosion of “big data” that can only be managed and analyzed using digital methods. Multiple complex computer programs are required to teach students the concepts using hands-on methods. These programs are challenging to use, especially since the same faculty members lacking genomics training were not …