Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Computer Sciences

Application Of Information-Theoretic Data Mining Techniques In A National Ambulatory Practice Outcomes Research Network, Adam Wright, Thomas N. Ricciardi, Martin Zwick Oct 2005

Application Of Information-Theoretic Data Mining Techniques In A National Ambulatory Practice Outcomes Research Network, Adam Wright, Thomas N. Ricciardi, Martin Zwick

Systems Science Faculty Publications and Presentations

The Medical Quality Improvement Consortium data warehouse contains de-identified data on more than 3.6 million patients including their problem lists, test results, procedures and medication lists. This study uses reconstructability analysis, an information-theoretic data mining technique, on the MQIC data warehouse to empirically identify risk factors for various complications of diabetes including myocardial infarction and microalbuminuria. The risk factors identified match those risk factors identified in the literature, demonstrating the utility of the MQIC data warehouse for outcomes research, and RA as a technique for mining clinical data warehouses.


Keynote: The Use Of Meta-Heuristic Algorithms For Data Mining, Dr. Beatrize De La Iglesia, A. Reynolds Aug 2005

Keynote: The Use Of Meta-Heuristic Algorithms For Data Mining, Dr. Beatrize De La Iglesia, A. Reynolds

International Conference on Information and Communication Technologies

In this paper we explore the application of powerful optimisers known as metaheuristic algorithms to problems within the data mining domain. We introduce some well-known data mining problems, and show how they can be formulated as optimisation problems. We then review the use of metaheuristics in this context. In particular, we focus on the task of partial classification and show how multi-objective metaheuristics have produced results that are comparable to the best known techniques but more scalable to large databases. We conclude by reinforcing the importance of research on the areas of metaheuristics for optimisation and data mining. The combination …


A Dynamic Weight Assignment Approach For Ir Systems, M. Shoaib, Prof Dr. Abad Ali Shah, A. Vashishta Aug 2005

A Dynamic Weight Assignment Approach For Ir Systems, M. Shoaib, Prof Dr. Abad Ali Shah, A. Vashishta

International Conference on Information and Communication Technologies

Weights are assigned to the extracted keywords for partial matching and computing ranking in an IR system. Weight assignment technique is suggested by the IR model that is used for an IR system. Currently suggested weight assignment techniques are static which means that once weight is assigned a keyword it remains unchanged during life-span of an IR system. In this paper, we suggest a dynamic weight assignment technique. This technique can be used by any IR model that supports partial matching.


Using Agents For Unification Of Information Extraction And Data Mining, Sharjeel Imtiaz, Azmat Hussain, Dr. Sikandar Hiyat Aug 2005

Using Agents For Unification Of Information Extraction And Data Mining, Sharjeel Imtiaz, Azmat Hussain, Dr. Sikandar Hiyat

International Conference on Information and Communication Technologies

Early work for unification of information extraction and data mining is motivational and problem stated work. This paper proposes a solution framework for unification using intelligent agents. A Relation manager agent extracted feature with cross feedback approach and also provide a Unified Undirected graphical handle. An RPM agent an approach to minimize loop back proposes pooling and model utilization with common parameter for both text and entity level abstractions.


Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jul 2005

Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaborations among criminals, for businesses to exploit relationships to sell products, or for individuals who wish to network with others. After all, it is not just what you know, but also whom you know, that matters. However, finding out who is related to whom on a large scale is a complex problem. Asking every single individual would be impractical, given the huge number of individuals and the changing dynamics of relationships. Recent advancement in technology has allowed more data about activities of individuals …


Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee Mar 2005

Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee

Theses and Dissertations

The insider threat poses a significant risk to any network or information system. A general definition of the insider threat is an authorized user performing unauthorized actions, a broad definition with no specifications on severity or action. While limited research has been able to classify and detect insider threats, it is generally understood that insider attacks are planned, and that there is a time period in which the organization's leadership can intervene and prevent the attack. Previous studies have shown that the person's behavior will generally change, and it is possible that social network analysis could be used to observe …


Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang Jan 2005

Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang

Dissertations

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others.

The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our …


Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum Jan 2005

Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum

Andrew McCallum

We present a probabilistic generative model of entity relationships and textual attributes that simultaneously discovers groups among the entities and topics among the corresponding text. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the words associated with certain relationships. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and 43 years …


On The Optimization Of Visualizations Of Complex Phenomena, Donald H. House, Althea D. Bair, Colin Ware Jan 2005

On The Optimization Of Visualizations Of Complex Phenomena, Donald H. House, Althea D. Bair, Colin Ware

Center for Coastal and Ocean Mapping

The problem of perceptually optimizing complex visualizations is a difficult one, involving perceptual as well as aesthetic issues. In our experience, controlled experiments are quite limited in their ability to uncover interrelationships among visualization parameters, and thus may not be the most useful way to develop rules-of-thumb or theory to guide the production of high-quality visualizations. In this paper, we propose a new experimental approach to optimizing visualization quality that integrates some of the strong points of controlled experiments with methods more suited to investigating complex highly-coupled phenomena. We use human-in-the-loop experiments to search through visualization parameter space, generating large …