Open Access. Powered by Scholars. Published by Universities.®
Databases and Information Systems Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Keyword
-
- Classification (4)
- Data mining (3)
- Information retrieval (3)
- Knowledge discovery (3)
- Supervised Learning (3)
-
- Algorithm design and analysis (2)
- Backpropagation (2)
- Content-based image retrieval (2)
- Data analysis (2)
- Log-based relevance feedback (2)
- Mobile computing (2)
- Modeling (2)
- Neural Networks (2)
- Probabilistic Neural Networks (PNN) (2)
- Relevance feedback (2)
- Semantic gap (2)
- Semantics (2)
- Statistical analysis (2)
- Acoustic signal processing (1)
- Active Learning (1)
- Active learning (1)
- Algorithms (1)
- Anticipation (1)
- Asia (1)
- Attitude (1)
- Audio alteration (1)
- Audio databases (1)
- Audio retrieval (1)
- Audio similarity measure (1)
- Authentication information (1)
Articles 1 - 30 of 57
Full-Text Articles in Databases and Information Systems
Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun
Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun
Research Collection School Of Computing and Information Systems
Using open source Web editing software (e.g., wiki), online community users can now easily edit, review and publish articles collaboratively. While much useful knowledge can be derived from these articles, content users and critics are often concerned about their qualities. In this paper, we develop two models, namely basic model and peer review model, for measuring the qualities of these articles and the authorities of their contributors. We represent collaboratively edited articles and their contributors in a bipartite graph. While the basic model measures an article's quality using both the authorities of contributors and the amount of contribution from each …
Rapid Identification Of Column Heterogeneity, Bing Tian Dai, Nick Koudas, Beng Chin Ooi, Divesh Srivastava, Suresh Venkatasubramanian
Rapid Identification Of Column Heterogeneity, Bing Tian Dai, Nick Koudas, Beng Chin Ooi, Divesh Srivastava, Suresh Venkatasubramanian
Research Collection School Of Computing and Information Systems
No abstract provided.
Clique Percolation For Finding Naturally Cohesive And Overlapping Document Clusters, Wei Gao, Kam-Fai Wong, Yunqing Xia, Ruifeng Xu
Clique Percolation For Finding Naturally Cohesive And Overlapping Document Clusters, Wei Gao, Kam-Fai Wong, Yunqing Xia, Ruifeng Xu
Research Collection School Of Computing and Information Systems
Techniques for find document clusters mostly depend on models that impose strong explicit and/or implicit priori assumptions. As a consequence, the clustering effects tend to be unnatural and stray away from the intrinsic grouping natures of a document collection. We apply a novel graph-theoretic technique called Clique Percolation Method (CPM) for document clustering. In this method, a process of enumerating highly cohesive maximal document cliques is performed in a random graph, where those strongly adjacent cliques are mingled to form naturally overlapping clusters. Our clustering results can unveil the inherent structural connections of the underlying data. Experiments show that CPM …
Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh
Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh
Research Collection School Of Computing and Information Systems
In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …
Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan
Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan
Research Collection School Of Computing and Information Systems
As increasing amount of XML data is exchanged over the internet, copyright protection of this type of data is becoming an important requirement for many applications. In this paper, we introduce a rights protection scheme for XML data based on digital watermarking. One of the main challenges for watermarking XML data is that the data could be easily reorganized by an adversary in an attempt to destroy any embedded watermark. To overcome it, we propose a query-based watermarking scheme, which creates queries to identify available watermarking capacity, such that watermarks could be recovered from reorganized data through query rewriting. The …
Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim
Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim
Research Collection School Of Computing and Information Systems
Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In these applications, continuous query processing is often required and their efficient evaluation is a critical requirement to be met. Due to the limited power supply for sensor nodes, energy efficiency is a major performance measure in such query evaluation. In this paper, we focus on continuous kNN query processing. We observe that the centralized data storage and monitoring schemes do not favor energy efficiency. We therefore propose a localized scheme to monitor long running nearest neighbor queries in sensor networks. …
Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu
Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu
Research Collection School Of Computing and Information Systems
In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …
On The Lower Bound Of Local Optimums In K-Means Algorithms, Zhenjie Zhang, Bing Tian Dai, Anthony K.H. Tung
On The Lower Bound Of Local Optimums In K-Means Algorithms, Zhenjie Zhang, Bing Tian Dai, Anthony K.H. Tung
Research Collection School Of Computing and Information Systems
No abstract provided.
Designing Web Sites For Customer Loyalty Across Business Domains: A Multilevel Analysis, S. Mithas, Narayanasamy Ramasubbu, M. S. Krishnan, C. Fornell
Designing Web Sites For Customer Loyalty Across Business Domains: A Multilevel Analysis, S. Mithas, Narayanasamy Ramasubbu, M. S. Krishnan, C. Fornell
Research Collection School Of Computing and Information Systems
Web Sites are important components of Internet strategy for organizations. This paper develops a theoretical model for understanding the effect of Web site design elements on customer loyalty to a Web site. We show the relevance of the business domain of a Web site to gain a contextual understanding of relative importance of Web site design elements. We use a hierarchical linear modeling approach to model multilevel and cross-level interactions that have not been explicitly considered in previous research. By analyzing, data on more than 12,000 online customer surveys for 43 Web sites in several business domains, we find that …
A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim
A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim
Research Collection School Of Computing and Information Systems
Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …
Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang
Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang
Research Collection School Of Computing and Information Systems
This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use.
Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun
Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun
Research Collection School Of Computing and Information Systems
In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia …
Audio Similarity Measure By Graph Modeling And Matching, Yuxin Peng, Chong-Wah Ngo, Cuihua Fang, Xiaoou Chen, Jianguo Xiao
Audio Similarity Measure By Graph Modeling And Matching, Yuxin Peng, Chong-Wah Ngo, Cuihua Fang, Xiaoou Chen, Jianguo Xiao
Research Collection School Of Computing and Information Systems
This paper proposes a new approach for the similarity measure and ranking of audio clips by graph modeling and matching. Instead of using frame-based or salient-based features to measure the acoustical similarity of audio clips, segment-based similarity is proposed. The novelty of our approach lies in two aspects: segment-based representation, and the similarity measure and ranking based on four kinds of similarity factors. In segmentbased representation, segments not only capture the change property of audio clip, but also keep and present the change relation and temporal order of audio features. In the similarity measure and ranking, four kinds of similarity …
Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang
Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang
Research Collection School Of Computing and Information Systems
Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship labels of its Web pages. In this article, we study the link chain extraction problem that is critical to the extraction of Web pages that are related. A link chain is an ordered list of anchor elements linking two Web pages related by some semantic relationship. We propose a link chain extraction method …
Service Pattern Discovery Of Web Service Mining In Web Service Registry-Repository, Qianhui Althea Liang, Jen-Yao Chung, Steven M. Miller, Yang Ouyang
Service Pattern Discovery Of Web Service Mining In Web Service Registry-Repository, Qianhui Althea Liang, Jen-Yao Chung, Steven M. Miller, Yang Ouyang
Research Collection School Of Computing and Information Systems
This paper presents and elaborates the concept of Web service usage patterns and pattern discovery through service mining. We define three different levels of service usage data: i) user request level, ii) template level and iii) instance level. At each level, we investigate patterns of service usage data and the discovery of these patterns. An algorithm for service pattern discovery at the template level is presented. We show the system architecture of a service-mining enabled service registry repository. Web service patterns, pattern discovery and pattern mining supports the discovery and composition of complex services, which in turn supports the application …
Natural Document Clustering By Clique Percolation In Random Graphs, Wei Gao, Kam-Fai Wong
Natural Document Clustering By Clique Percolation In Random Graphs, Wei Gao, Kam-Fai Wong
Research Collection School Of Computing and Information Systems
Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in …
Fast Tracking Of Near-Duplicate Keyframes In Broadcast Domain With Transitivity Propagation, Chong-Wah Ngo, Wan-Lei Zhao, Yu-Gang Jiang
Fast Tracking Of Near-Duplicate Keyframes In Broadcast Domain With Transitivity Propagation, Chong-Wah Ngo, Wan-Lei Zhao, Yu-Gang Jiang
Research Collection School Of Computing and Information Systems
The identification of near-duplicate keyframe (NDK) pairs is a useful task for a variety of applications such as news story threading and content-based video search. In this paper, we propose a novel approach for the discovery and tracking of NDK pairs and threads in the broadcast domain. The detection of NDKs in a large data set is a challenging task due to the fact that when the data set increases linearly, the computational cost increases in a quadratic speed, and so does the number of false alarms. This paper explores the symmetric and transitive nature of near-duplicate for the effective …
Cuhk At Imageclef 2005: Cross-Language And Cross Media Image Retrieval, Steven Hoi, Jianke Zhu, Michael R. Lyu
Cuhk At Imageclef 2005: Cross-Language And Cross Media Image Retrieval, Steven Hoi, Jianke Zhu, Michael R. Lyu
Research Collection School Of Computing and Information Systems
In this paper, we describe our studies of cross-language and cross-media image retrieval at the ImageCLEF 2005. This is the first participation of our CUHK (The Chinese University of Hong Kong) group at ImageCLEF. The task in which we participated is the “bilingual ad hoc retrieval” task. There are three major focuses and contributions in our participation. The first is the empirical evaluation of language models and smoothing strategies for cross-language image retrieval. The second is the evaluation of cross-media image retrieval, i.e., combining text and visual contents for image retrieval. The last is the evaluation of bilingual image retrieval …
Multi-Learner Based Recursive Supervised Training, Laxmi R. Iyer, Kiruthika Ramanathan, Sheng-Uei Guan
Multi-Learner Based Recursive Supervised Training, Laxmi R. Iyer, Kiruthika Ramanathan, Sheng-Uei Guan
Research Collection School Of Computing and Information Systems
In this paper, we propose the multi-learner based recursive supervised training (MLRT) algorithm, which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of …
Three Architectures For Trusted Data Dissemination In Edge Computing, Shen-Tat Goh, Hwee Hwa Pang, Robert H. Deng, Feng Bao
Three Architectures For Trusted Data Dissemination In Edge Computing, Shen-Tat Goh, Hwee Hwa Pang, Robert H. Deng, Feng Bao
Research Collection School Of Computing and Information Systems
Edge computing pushes application logic and the underlying data to the edge of the network, with the aim of improving availability and scalability. As the edge servers are not necessarily secure, there must be provisions for users to validate the results—that values in the result tuples are not tampered with, that no qualifying data are left out, that no spurious tuples are introduced, and that a query result is not actually the output from a different query. This paper aims to address the challenges of ensuring data integrity in edge computing. We study three schemes that enable users to check …
Wireless Indoor Positioning System With Enhanced Nearest Neighbors In Signal Space Algorithm, Quang Tran, Juki Wirawan Tantra, Ah-Hwee Tan, Ah-Hwee Tan, Kin-Choong Yow, Dongyu Qiu
Wireless Indoor Positioning System With Enhanced Nearest Neighbors In Signal Space Algorithm, Quang Tran, Juki Wirawan Tantra, Ah-Hwee Tan, Ah-Hwee Tan, Kin-Choong Yow, Dongyu Qiu
Research Collection School Of Computing and Information Systems
With the rapid development and wide deployment of wireless Local Area Networks (WLANs), WLAN-based positioning system employing signal-strength-based technique has become an attractive solution for location estimation in indoor environment. In recent years, a number of such systems has been presented, and most of the systems use the common Nearest Neighbor in Signal Space (NNSS) algorithm. In this paper, we propose an enhancement to the NNSS algorithm. We analyze the enhancement to show its effectiveness. The performance of the enhanced NNSS algorithm is evaluated with different values of the parameters. Based on the performance evaluation and analysis, we recommend some …
Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan
Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan
Research Collection School Of Computing and Information Systems
To support ubiquitous computing, the underlying data have to be persistent and available anywhere-anytime. The data thus have to migrate from devices that are local to individual computers, to shared storage volumes that are accessible over open network. This potentially exposes the data to heightened security risks. In particular, the activity on a database exhibits regular page reference patterns that could help attackers learn logical links among physical pages and then launch additional attacks. We propose two countermeasures to mitigate the risk of attacks initiated through analyzing the shared storage server’s activity for those page patterns. The first countermeasure relocates …
Discovering Image-Text Associations For Cross-Media Web Information Fusion, Tao Jiang, Ah-Hwee Tan
Discovering Image-Text Associations For Cross-Media Web Information Fusion, Tao Jiang, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
The diverse and distributed nature of the information published on the World Wide Web has made it difficult to collate and track information related to specific topics. Whereas most existing work on web information fusion has focused on multiple document summarization, this paper presents a novel approach for discovering associations between images and text segments, which subsequently can be used to support cross-media web content summarization. Specifically, we employ a similarity-based multilingual retrieval model and adopt a vague transformation technique for measuring the information similarity between visual features and textual features. The experimental results on a terrorist domain document set …
Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis
Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis
Research Collection School Of Computing and Information Systems
Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as °uctuations of edge weights. The ¯rst one maintains the query results by processing only updates that may invalidate …
Learning The Unified Kernel Machines For Classification, Steven C. H. Hoi, Michael R. Lyu, Edward Y. Chang
Learning The Unified Kernel Machines For Classification, Steven C. H. Hoi, Michael R. Lyu, Edward Y. Chang
Research Collection School Of Computing and Information Systems
Kernel machines have been shown as the state-of-the-art learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semi-supervised kernel learning, and active learning in a unified solution. In the suggested framework, we particularly focus our attention on designing a new semi-supervised kernel learning method, i.e., Spectral Kernel Learning (SKL), which is built on the principles of kernel target alignment and unsupervised kernel design. Our algorithm is related to an equivalent quadratic programming problem that can be efficiently …
An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun
An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun
Research Collection School Of Computing and Information Systems
Data broadcast is an attractive data dissemination method in mobile environments. To improve energy efficiency, existing air indexing schemes for data broadcast have focused on reducing tuning time only, i.e., the duration that a mobile client stays active in data accesses. On the other hand, existing broadcast scheduling schemes have aimed at reducing access latency through nonflat data broadcast to improve responsiveness only. Not much work has addressed the energy efficiency and responsiveness issues concurrently. This paper proposes an energy-efficient indexing scheme called MHash that optimizes tuning time and access latency in an integrated fashion. MHash reduces tuning time by …
Collaborative Image Retrieval Via Regularized Metric Learning, Luo Si, Rong Jin, Steven C. H. Hoi, Michael R. Lyu
Collaborative Image Retrieval Via Regularized Metric Learning, Luo Si, Rong Jin, Steven C. H. Hoi, Michael R. Lyu
Research Collection School Of Computing and Information Systems
In content-based image retrieval (CBIR), relevant images are identified based on their similarities to query images. Most CBIR algorithms are hindered by the semantic gap between the low-level image features used for computing image similarity and the high-level semantic concepts conveyed in images. One way to reduce the semantic gap is to utilize the log data of users' feedback that has been collected by CBIR systems in history, which is also called “collaborative image retrieval.” In this paper, we present a novel metric learning approach, named “regularized metric learning,” for collaborative image retrieval, which learns a distance metric by exploring …
Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang
Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang
Research Collection School Of Computing and Information Systems
In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a 'data-centric approach' where the evaluation data is assumed to represent the ground truth'. The standard statistical approaches take evaluation and deviation at face value. We argue that attention should be paid to the subjectivity of evaluation, judging the evaluation score not just on 'what is being said' (deviation), but also on 'who says it' (reviewer) as well as on 'whom it is said about' (object). Furthermore, we observe that bias …
A Hybrid Architecture Combining Reactive Plan Execution And Reactive Learning, Samin Karim, Liz Sonenberg, Ah-Hwee Tan
A Hybrid Architecture Combining Reactive Plan Execution And Reactive Learning, Samin Karim, Liz Sonenberg, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Developing software agents has been complicated by the problem of how knowledge should be represented and used. Many researchers have identified that agents need not require the use of complex representations, but in many cases suffice to use “the world” as their representation. However, the problem of introspection, both by the agents themselves and by (human) domain experts, requires a knowledge representation with a higher level of abstraction that is more ‘understandable’. Learning and adaptation in agents has traditionally required knowledge to be represented at an arbitrary, low-level of abstraction. We seek to create an agent that has the capability …
Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai
Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai
Research Collection School Of Computing and Information Systems
In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage …