Computer Sciences | Open Access Articles | Digital Commons Network™

Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege May 2012

Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege

Masters Theses & Specialist Projects

Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima.

In this thesis we propose an ensemble approach for feature selection, where multiple …

Go to article

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard May 2011

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …

Go to article

Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan May 2011

Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan

Masters Theses & Specialist Projects

The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied …

Go to article

Automatically Extract Information From Web Documents, Dipesh Sharma Dec 2007

Automatically Extract Information From Web Documents, Dipesh Sharma

Masters Theses & Specialist Projects

The Internet could be considered to be a reservoir of useful information in textual form — product catalogs, airline schedules, stock market quotations, weather forecast etc. There has been much interest in building systems that gather such information on a user's behalf. But because these information resources are formatted differently, mechanically extracting their content is difficult. Systems using such resources typically use hand-coded wrappers, customized procedures for information extraction. Structured data objects are a very important type of information on the Web. Such data objects are often records from underlying databases and displayed in Web pages with some fixed templates. …

Go to article

Computer Sciences Commons^™

Full-Text Articles in Computer Sciences

Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege

Masters Theses & Specialist Projects

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan

Masters Theses & Specialist Projects

Automatically Extract Information From Web Documents, Dipesh Sharma

Masters Theses & Specialist Projects