Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin Dec 2023

Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin

Dissertations

Clustering analysis has been conducted extensively in single-cell RNA sequencing (scRNA-seq) studies. scRNA-seq can profile tens of thousands of genes' activities within a single cell. Thousands or tens of thousands of cells can be captured simultaneously in a typical scRNA-seq experiment. Biologists would like to cluster these cells for exploring and elucidating cell types or subtypes. Numerous methods have been designed for clustering scRNA-seq data. Yet, single-cell technologies develop so fast in the past few years that those existing methods do not catch up with these rapid changes and fail to fully fulfil their potential. For instance, besides profiling transcription …


Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane May 2023

Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane

Dissertations

High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery.

The discovery of biologically significant modules of genes from raw expression data is …


Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng Aug 2020

Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng

Dissertations

Global optimization is a classical problem of finding the minimum or maximum value of an objective function. It has applications in many areas, such as biological image analysis, chemistry, mechanical engineering, financial analysis, deep learning and image processing. For practical applications, it is important to understand the efficiency of global optimization algorithms. This dissertation develops and analyzes some new global optimization algorithms and applies them to practical problems, mainly for image registration and data clustering.

First, the dissertation presents a new global optimization algorithm which approximates the optimum using only function values. The basic idea is to use the points …


Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy Jan 2016

Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy

Dissertations

Keyword search is a popular technique for searching tree-structured data (e.g., XML, JSON) on the web because it frees the user from learning a complex query language and the structure of the data sources. However, the convenience of keyword search comes with drawbacks. The imprecision of the keyword queries usually results in a very large number of results of which only very few are relevant to the query. Multiple previous approaches have tried to address this problem. Some of them exploit structural and semantic properties of the tree data in order to filter out irrelevant results while others use a …


Approximate String Matching Methods For Duplicate Detection And Clustering Tasks, Oleksandr Rudniy Jan 2012

Approximate String Matching Methods For Duplicate Detection And Clustering Tasks, Oleksandr Rudniy

Dissertations

Approximate string matching methods are utilized by a vast number of duplicate detection and clustering applications in various knowledge domains. The application area is expected to grow due to the recent significant increase in the amount of digital data and knowledge sources. Despite the large number of existing string similarity metrics, there is a need for more precise approximate string matching methods to improve the efficiency of computer-driven data processing, thus decreasing labor-intensive human involvement.

This work introduces a family of novel string similarity methods, which outperform a number of effective well-known and widely used string similarity functions. The new …


Modeling Of Flexible Drug-Like Molecules : Qsar Of Gbr 12909 Analog Dat/Sert Selectivity, Kathleen Mary Gilbert May 2005

Modeling Of Flexible Drug-Like Molecules : Qsar Of Gbr 12909 Analog Dat/Sert Selectivity, Kathleen Mary Gilbert

Dissertations

The dopamine reuptake inhibitor GBR 12909 and related dialkyl piperazine and piperidine analogs have been studied as agonist substitution therapies acting on the dopamine transporter (DAT) to treat cocaine addiction. Undesirable binding to the serotonin transporter (SERT) can vary greatly depending on the specific substituents on the molecule. This study uses Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices (CoMSIA) techniques to determine a stable and predictive model for DAT/SERT selectivity for a set of flexible GBR 12909 analogs.

Families of analogs were constructed from six pairs of naphthyl-substituted piperazine and piperidine templates identified by hierarchical clustering as …


High-Dimensional Indexing Methods Utilizing Clustering And Dimensionality Reduction, Lijuan Zhang May 2005

High-Dimensional Indexing Methods Utilizing Clustering And Dimensionality Reduction, Lijuan Zhang

Dissertations

The emergence of novel database applications has resulted in the prevalence of a new paradigm for similarity search. These applications include multimedia databases, medical imaging databases, time series databases, DNA and protein sequence databases, and many others. Features of data objects are extracted and transformed into high-dimensional data points. Searching for objects becomes a search on points in the high-dimensional feature space. The dissimilarity between two objects is determined by the distance between two feature vectors. Similarity search is usually implemented as nearest neighbor search in feature vector spaces. The cost of processing k-nearest neighbor (k-NN) queries via a sequential …


Efficient Similarity Search In High-Dimensional Data Spaces, Yue Li May 2004

Efficient Similarity Search In High-Dimensional Data Spaces, Yue Li

Dissertations

Similarity search in high-dimensional data spaces is a popular paradigm for many modern database applications, such as content based image retrieval, time series analysis in financial and marketing databases, and data mining. Objects are represented as high-dimensional points or vectors based on their important features. Object similarity is then measured by the distance between feature vectors and similarity search is implemented via range queries or k-Nearest Neighbor (k-NN) queries.

Implementing k-NN queries via a sequential scan of large tables of feature vectors is computationally expensive. Building multi-dimensional indexes on the feature vectors for k-NN search also tends to be unsatisfactory …