Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Computer Sciences

Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits May 2021

Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits

Legacy Theses & Dissertations (2009 - 2024)

A variety of dynamic systems can be broken down into potentially overlapping subcomponents with varying temporal behavior, ranging from communities in networks, to clusters of trajectories in spatiotemporal data, to co-evolving subsets within multivariate time series. Using explicit regularization on various temporal behaviors within a tensor factorizationframework, I demonstrate means to mine these subgroups along with their temporal activities, as well as how that yields information about the overall systems. Additionally, I adapt this notion of temporal communities to the spatiotemporal setting to develop a reinforcement learning approach for optimizing co-ordinated communication between independent agents.


Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang Dec 2020

Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang

Legacy Theses & Dissertations (2009 - 2024)

The rate at which data is generated in modern applications has created an unprecedented demand for novel methods to effectively and efficiently extract insightful patterns. Methods aware of known domain-specific structure in the data tend to be advantageous. In particular, a joint temporal and networked view of observations offers a holistic lens to many real-world systems. Example domains abound: activity of social network users, gene interactions over time, a temporal load of infrastructure networks, and others. Existing analysis and mining approaches for such data exhibit limited quality and scalability due to their sensitivity to noise, missing observations, and the need …


Efficient Algorithms For Mining Healthcare Data :, Yan Hu Jan 2019

Efficient Algorithms For Mining Healthcare Data :, Yan Hu

Legacy Theses & Dissertations (2009 - 2024)

Data-Driven Healthcare (DDH) is defined as the usage of available medical big data to provide the best and most personalized care, which is believed to be one of the most promising directions for transforming healthcare. The healthcare data includes claims and cost data, clinical data, pharmaceutical R&D data, patient behavior and sentiment data, and health data on the web. There has been a remarkable upsurge in the adoption of healthcare data over the past several years. In particular, it has been used for medical concept extraction, patient trajectory modeling, disease inference, etc.


An Efficient System For Subgraph Discovery, Aparna Joshi Jan 2018

An Efficient System For Subgraph Discovery, Aparna Joshi

Legacy Theses & Dissertations (2009 - 2024)

Subgraph discovery in a single data graph---finding subsets of vertices and edges satisfying a user-specified criteria---is an essential and general graph analytics operation with a wide spectrum of applications. Depending on the criteria, subgraphs of interest may correspond to cliques of friends in social networks, interconnected entities in RDF data, or frequent patterns in protein interaction networks to name a few. Existing systems usually examine a large number of subgraphs while employing many computers and often produce an enormous result set of subgraphs. How can we enable fast discovery of only the most relevant subgraphs while minimizing the computational requirements?


Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao Jan 2018

Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao

Legacy Theses & Dissertations (2009 - 2024)

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction …


Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang Jan 2015

Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang

Legacy Theses & Dissertations (2009 - 2024)

Managing large and growing amount of information is a central goal of modern computer science. Data repositories of texts, images and videos have become widely accessible, thus necessitating good methods of retrieval, organization and exploration.


Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen Jan 2014

Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen

Legacy Theses & Dissertations (2009 - 2024)

Graphs are intuitive representations of relational data. Graphs have been widely used to represent biological molecular networks that operate in the living systems. In the study of systems biology, using graph mining techniques and graph-theory-based algorithms to


Roughened Random Forests For Binary Classification, Kuangnan Xiong Jan 2014

Roughened Random Forests For Binary Classification, Kuangnan Xiong

Legacy Theses & Dissertations (2009 - 2024)

Binary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are …


Bootstrapping Events And Relations From Text, Ting Liu Jan 2009

Bootstrapping Events And Relations From Text, Ting Liu

Legacy Theses & Dissertations (2009 - 2024)

Information Extraction (IE) is a technique for automatically extracting structured data from text documents. One of the key analytical tasks is extraction of important and relevant information from textual sources. While information is plentiful and readily available, from the Internet, news services, media, etc., extracting the critical nuggets that matter to business or to national security is a cognitively demanding and time consuming task. Intelligence and business analysts spend many hours poring over endless streams of text documents pulling out reference to entities of interest (people, locations, organizations) as well as their relationships as reported in text. Such extracted "information …