Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

2002

Research Collection School Of Computing and Information Systems

Data sequence encoding

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Fast Filter-And-Refine Algorithms For Subsequence Selection, Beng-Chin Ooi, Hwee Hwa Pang, Hao Wang, Limsoon Wong, Cui Yu Jul 2002

Fast Filter-And-Refine Algorithms For Subsequence Selection, Beng-Chin Ooi, Hwee Hwa Pang, Hao Wang, Limsoon Wong, Cui Yu

Research Collection School Of Computing and Information Systems

Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the …