Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 4 of 4
Full-Text Articles in Computer Sciences
Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege
Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege
Masters Theses & Specialist Projects
Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima.
In this thesis we propose an ensemble approach for feature selection, where multiple …
Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard
Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard
Economics Faculty Publications
The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …
Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan
Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan
Masters Theses & Specialist Projects
The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied …
Automatically Extract Information From Web Documents, Dipesh Sharma
Automatically Extract Information From Web Documents, Dipesh Sharma
Masters Theses & Specialist Projects
The Internet could be considered to be a reservoir of useful information in textual form — product catalogs, airline schedules, stock market quotations, weather forecast etc. There has been much interest in building systems that gather such information on a user's behalf. But because these information resources are formatted differently, mechanically extracting their content is difficult. Systems using such resources typically use hand-coded wrappers, customized procedures for information extraction. Structured data objects are a very important type of information on the Web. Such data objects are often records from underlying databases and displayed in Web pages with some fixed templates. …