Computer Sciences | Open Access Articles | Digital Commons Network™

Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North

Faculty and Research Publications

As production of data is exponentially growing with a drastically lower cost, the importance of data mining required to extract and discover valuable information is becoming more paramount. To be functional in any business or industry, data must be capable of supporting sound decision-making and plausible prediction. The purpose of this paper is concisely but broadly to provide a synopsis of the technology and theory of data mining, providing an enhanced comprehension of the methods by which massive data can be transferred into meaningful information.

Go to article

Hypotheses Generation As Supervised Link Discovery With Automated Class Labeling On Large-Scale Biomedical Concept Networks, Jayasimha R. Katukuri, Ying Xie, Vijay Raghavan, Ashish Gupta

Faculty and Research Publications

Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce framework. We extract a set of heterogeneous features such as random …

Go to article

An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu

Faculty and Research Publications

Finding nearest neighbors in large multi-dimensional data has always been one of the research interests in data mining field. In this paper, we present our continuous research on similarity search problems. Previously we have worked on exploring the meaning of K nearest neighbors from a new perspective in PanKNN [20]. It redefines the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. It can be applied in various data mining fields. A large amount of real data sets have irrelevant or obstacle information which greatly affects the effectiveness …

Go to article

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum

Faculty and Research Publications

In this paper, we present our continuous research on similarity search problems. Previously we proposed PanKNN[18]which is a novel technique that explores the meaning of K nearest neighbors from a new perspective, redefines the distances between data points and a given query point Q, and efficiently and effectively selects data points which are closest to Q. It can be applied in various data mining fields. In this paper, we present our approach to solving the similarity search problem in the presence of obstacles. We apply the concept of obstacle points and process the similarity search problems in a different way. …

Go to article

Computer Sciences Commons^™

Full-Text Articles in Computer Sciences

Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North

Faculty and Research Publications

Hypotheses Generation As Supervised Link Discovery With Automated Class Labeling On Large-Scale Biomedical Concept Networks, Jayasimha R. Katukuri, Ying Xie, Vijay Raghavan, Ashish Gupta

Faculty and Research Publications

An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu

Faculty and Research Publications

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum

Faculty and Research Publications