Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

San Jose State University

Master's Projects

Theses/Dissertations

2023

Bioinformatics

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Characterizing Sequencing Artifacts, Kathy Thanh Lam Jan 2023

Characterizing Sequencing Artifacts, Kathy Thanh Lam

Master's Projects

Next Generation Sequencing (NGS) introduces artifactual variants from library preparation methods and errors, which affects the accuracy of variant calling. Whole Exome Sequencing (WES) data from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database is processed. Comparison of single nucleotide polymorphism (SNP) calls to Genome In a Bottle (GIAB) provides labels that are used to build machine learning (ML) models. The left and right flanking region (LSEQ and RSEQ) of each SNP is extracted. Nucleotide frequency, kmers of size 4 and their counts, largest homopolymer size, largest palindrome size, and largest hairpin loop size were computed …