A Survey Of Schema Matching Research, 2012 University of Massachusetts Boston
A Survey Of Schema Matching Research, Roger Blake
Roger H. Blake
Schema matching is the process of developing semantic matches between two or more schemas. The purpose of schema matching is generally either to merge two or more databases, or to enable queries on multiple, heterogeneous databases to be formulated on a single schema (Doan and Halevy 2005). This paper develops a taxonomy of schema matching approaches, classifying them as being based on a combination schema matching technique and the type of data used by those techniques. Schema matching techniques are categorized as being based on rules, learning, or ontology, and the type of data used is categorized as being based …
Ethical Considerations For Virtual Worlds, 2012 Appalachian State University
Ethical Considerations For Virtual Worlds, Alanah Mitchell, Deepak Khazanchi
Information Systems and Quantitative Analysis Faculty Proceedings & Presentations
Metaverses, like Second Life and Teleplace, and the inherent technology capabilities that they offer continue to be of interest for researchers, practitioners, and educators. Due to this trend, and the uncertainty regarding immersive virtual experiences as contrasted with face-to-face experiences, there is a need to further understand the ethical challenges associated with this virtual context. This paper presents a starting point for discussing ethics in virtual worlds. Specifically, we review virtual worlds and their unique technology capabilities as well as the ethical considerations that arise due to these unique capabilities.
Using Attribute Behavior Diversity To Build Accurate Decision Tree Committees For Microarray Data, 2012 Wright State University - Main Campus
Using Attribute Behavior Diversity To Build Accurate Decision Tree Committees For Microarray Data, Qian Han, Guozhu Dong
Kno.e.sis Publications
DNA microarrays (gene chips), frequently used in biological and medical studies, measure the expressions of thousands of genes per sample. Using microarray data to build accurate classifiers for diseases is an important task. This paper introduces an algorithm, called Committee of Decision Trees by Attribute Behavior Diversity (CABD), to build highly accurate ensembles of decision trees for such data. Since a committee's accuracy is greatly influenced by the diversity among its member classifiers, CABD uses two new ideas to "optimize" that diversity, namely (1) the concept of attribute behavior–based similarity between attributes, and (2) …
Confidence-Aware Graph Regularization With Heterogeneous Pairwise Features, 2012 Singapore Management University
Confidence-Aware Graph Regularization With Heterogeneous Pairwise Features, Yuan Fang, Bo-June Paul Hsu, Kevin Chen-Chuan Chang
Research Collection School Of Computing and Information Systems
Conventional classification methods tend to focus on features of individual objects, while missing out on potentially valuable pairwise features that capture the relationships between objects. Although recent developments on graph regularization exploit this aspect, existing works generally assume only a single kind of pairwise feature, which is often insufficient. We observe that multiple, heterogeneous pairwise features can often complement each other and are generally more robust in modeling the relationships between objects. Furthermore, as some objects are easier to classify than others, objects with higher initial classification confidence should be weighed more towards classifying related but more ambiguous objects, an …
Collective Churn Prediction In Social Network, 2012 Singapore Management University
Collective Churn Prediction In Social Network, Richard J. Oentaryo, Ee-Peng Lim, David Lo, Feida Zhu, Philips K. Prasetyo
Research Collection School Of Computing and Information Systems
In service-based industries, churn poses a significant threat to the integrity of the user communities and profitability of the service providers. As such, research on churn prediction methods has been actively pursued, involving either intrinsic, user profile factors or extrinsic, social factors. However, existing approaches often address each type of factors separately, thus lacking a comprehensive view of churn behaviors. In this paper, we propose a new churn prediction approach based on collective classification (CC), which accounts for both the intrinsic and extrinsic factors by utilizing the local features of, and dependencies among, individuals during prediction steps. We evaluate our …
A Secure And Efficient Discovery Service System In Epcglobal Network, 2012 Singapore Management University
A Secure And Efficient Discovery Service System In Epcglobal Network, Jie Shi, Yingjiu Li, Robert H. Deng
Research Collection School Of Computing and Information Systems
In recent years, the Internet of Things (IOT) has drawn considerable attention from the industrial and research communities. Due to the vast amount of data generated through IOT devices and users, there is an urgent need for an effective search engine to help us make sense of this massive amount of data. With this motivation, we begin our initial works on developing a secure and efficient search engine (SecDS) based on EPC Discovery Services (EPCDS) for EPCglobal network, an integral part of IOT. SecDS is designed to provide a bridge between different partners of supply chains to share information while …
Boosting Multi-Kernel Locality-Sensitive Hashing For Scalable Image Retrieval, 2012 Nanyang Technological University
Boosting Multi-Kernel Locality-Sensitive Hashing For Scalable Image Retrieval, Hao Xia, Steven C. H. Hoi, Pengcheng Wu, Rong Jin
Research Collection School Of Computing and Information Systems
Similarity search is a key challenge for multimedia retrieval applications where data are usually represented in high-dimensional space. Among various algorithms proposed for similarity search in high-dimensional space, Locality-Sensitive Hashing (LSH) is the most popular one, which recently has been extended to Kernelized Locality-Sensitive Hashing (KLSH) by exploiting kernel similarity for better retrieval efficacy. Typically, KLSH works only with a single kernel, which is often limited in real-world multimedia applications, where data may originate from multiple resources or can be represented in several different forms. For example, in content-based multimedia retrieval, a variety of features can be extracted to represent …
Modeling Concept Dynamics For Large Scale Music Search, 2012 Singapore Management University
Modeling Concept Dynamics For Large Scale Music Search, Jialie Shen, Hwee Hwa Pang, Meng Wang, Shuicheng Yan
Research Collection School Of Computing and Information Systems
Continuing advances in data storage and communication technologies have led to an explosive growth in digital music collections. To cope with their increasing scale, we need effective Music Information Retrieval (MIR) capabilities like tagging, concept search and clustering. Integral to MIR is a framework for modelling music documents and generating discriminative signatures for them. In this paper, we introduce a multimodal, layered learning framework called DMCM. Distinguished from the existing approaches that encode music as an ensemble of order-less feature vectors, our framework extracts from each music document a variety of acoustic features, and translates them into low-level encodings over …
A Non-Parametric Visual-Sense Model Of Images: Extending The Cluster Hypothesis Beyond Text, 2012 Singapore Management University
A Non-Parametric Visual-Sense Model Of Images: Extending The Cluster Hypothesis Beyond Text, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia
Research Collection School Of Computing and Information Systems
The main challenge of a search engine is to find information that are relevant and appropriate. However, this can become difficult when queries are issued using ambiguous words. Rijsbergen first hypothesized a clustering approach for web pages wherein closely associated pages are treated as a semantic group with the same relevance to the query (Rijsbergen 1979). In this paper, we extend Rijsbergen’s cluster hypothesis to multimedia content such as images. Given a user query, the polysemy in the return image set is related to the many possible meanings of the query. We develop a method to cluster the polysemous images …
Shortest Path Computation With No Information Leakage, 2012 Singapore Management University
Shortest Path Computation With No Information Leakage, Kyriakos Mouratidis, Man Lung Yiu
Research Collection School Of Computing and Information Systems
Shortest path computation is one of the most common queries in location-based services (LBSs). Although particularly useful, such queries raise serious privacy concerns. Exposing to a (potentially untrusted) LBS the client’s position and her destination may reveal personal information, such as social habits, health condition, shopping preferences, lifestyle choices, etc. The only existing method for privacy-preserving shortest path computation follows the obfuscation paradigm; it prevents the LBS from inferring the source and destination of the query with a probability higher than a threshold. This implies, however, that the LBS still deduces some information (albeit not exact) about the client’s location …
Presynaptic Learning And Memory With A Persistent Firing Neuron And A Habituating Synapse: A Model Of Short Term Persistent Habituation, 2012 Singapore Management University
Presynaptic Learning And Memory With A Persistent Firing Neuron And A Habituating Synapse: A Model Of Short Term Persistent Habituation, Kiruthika Ramanathan, Ning Ning, Dhiviya Dhanasekar, Guoqi Li, Luping Shi, Prahlad Vadakkepat
Research Collection School Of Computing and Information Systems
Our paper explores the interaction of persistent firing axonal and presynaptic processes in the generation of short term memory for habituation. We first propose a model of a sensory neuron whose axon is able to switch between passive conduction and persistent firing states, thereby triggering short term retention to the stimulus. Then we propose a model of a habituating synapse and explore all nine of the behavioral characteristics of short term habituation in a two neuron circuit. We couple the persistent firing neuron to the habituation synapse and investigate the behavior of short term retention of habituating response. Simulations show …
Online Feature Selection For Mining Big Data, 2012 Singapore Management University
Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin
Research Collection School Of Computing and Information Systems
Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. …
Common Criteria Meets Realpolitik Trust, Alliances, And Potential Betrayal, 2012 University of Texas at Dallas
Common Criteria Meets Realpolitik Trust, Alliances, And Potential Betrayal, Jan Kallberg
Jan Kallberg
Common Criteria for Information Technology Security Evaluation has the ambition to be a global standard for IT-security certification. The issued certifications are mutually recognized between the signatories of the Common Criteria Recognition Arrangement. The key element in any form of mutual relationships is trust. A question raised in this paper is how far trust can be maintained in Common Criteria when additional signatories enter with conflicting geopolitical interests to earlier signatories. Other issues raised are control over production, the lack of permanent organization in the Common Criteria, which leads to concerns of being able to oversee the actual compliance. As …
Data Mining Of Protein Databases, 2012 University of Nebraska-Lincoln
Data Mining Of Protein Databases, Christopher Assi
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …
Shortest Path Computation With No Information Leakage, 2012 Singapore Management University
Shortest Path Computation With No Information Leakage, Kyriakos Mouratidis, Man Lung Yiu
Kyriakos MOURATIDIS
Shortest path computation is one of the most common queries in location-based services (LBSs). Although particularly useful, such queries raise serious privacy concerns. Exposing to a (potentially untrusted) LBS the client’s position and her destination may reveal personal information, such as social habits, health condition, shopping preferences, lifestyle choices, etc. The only existing method for privacy-preserving shortest path computation follows the obfuscation paradigm; it prevents the LBS from inferring the source and destination of the query with a probability higher than a threshold. This implies, however, that the LBS still deduces some information (albeit not exact) about the client’s location …
Enhancing Access Privacy Of Range Retrievals Over B+Trees, 2012 Singapore Management University
Enhancing Access Privacy Of Range Retrievals Over B+Trees, Hwee Hwa Pang, Jilian Zhang, Kyriakos Mouratidis
Kyriakos MOURATIDIS
Users of databases that are hosted on shared servers cannot take for granted that their queries will not be disclosed to unauthorized parties. Even if the database is encrypted, an adversary who is monitoring the I/O activity on the server may still be able to infer some information about a user query. For the particular case of a B+-tree that has its nodes encrypted, we identify properties that enable the ordering among the leaf nodes to be deduced. These properties allow us to construct adversarial algorithms to recover the B+-tree structure from the I/O traces generated by range queries. Combining …
Heuristic Algorithms For Balanced Multi-Way Number Partitioning, 2012 Singapore Management University
Heuristic Algorithms For Balanced Multi-Way Number Partitioning, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang
Kyriakos MOURATIDIS
Balanced multi-way number partitioning (BMNP) seeks to split a collection of numbers into subsets with (roughly) the same cardinality and subset sum. The problem is NP-hard, and there are several exact and approximate algorithms for it. However, existing exact algorithms solve only the simpler, balanced two-way number partitioning variant, whereas the most effective approximate algorithm, BLDM, may produce widely varying subset sums. In this paper, we introduce the LRM algorithm that lowers the expected spread in subset sums to one third that of BLDM for uniformly distributed numbers and odd subset cardinalities. We also propose Meld, a novel strategy for …
Twitris+: Social Media Analytics Platform For Effective Coordination, 2012 Wright State University - Main Campus
Twitris+: Social Media Analytics Platform For Effective Coordination, Gary Alan Smith, Amit P. Sheth, Ashutosh Sopan Jadhav, Hemant Purohit, Lu Chen, Michael Cooney, Pavan Kapanipathi, Pramod Anantharam, Pramod Koneru, Wenbo Wang
Kno.e.sis Publications
Twitris+ is a Semantic Social Media analytics platform to provide technologies for analyzing large-scale social media streams across Spatio-Temporal-Thematic (STT) and People-Content-Network (PCN) dimensions. It provides holistic situational awareness from one interface and enables organizational actors to engage in well-coordinated ways for desired tasks during emergency response.
Embracing Analytics For A Better Competitive Edge, 2012 Singapore Management University
Embracing Analytics For A Better Competitive Edge, Tin Seong Kam
Research Collection School Of Computing and Information Systems
No abstract provided.
Feature-Based Opinion Mining And Ranking, 2012 San Jose State University
Feature-Based Opinion Mining And Ranking, Magdalini Eirinaki, S. Pisal, J. Singh
Magdalini Eirinaki
The proliferation of blogs and social networks presents a new set of challenges and opportunities in the way information is searched and retrieved. Even though facts still play a very important role when information is sought on a topic, opinions have become increasingly important as well. Opinions expressed in blogs and social networks are playing an important role influencing everything from the products people buy to the presidential candidate they support. Thus, there is a need for a new type of search engine which will not only retrieve facts, but will also enable the retrieval of opinions. Such a search …