Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Physical Sciences and Mathematics

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass Dec 2013

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass

Theses and Dissertations

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …


Analysis Of Polarimetric Synthetic Aperture Radar And Passive Visible Light Polarimetric Imaging Data Fusion For Remote Sensing Applications, Sanjit Maitra Dec 2013

Analysis Of Polarimetric Synthetic Aperture Radar And Passive Visible Light Polarimetric Imaging Data Fusion For Remote Sensing Applications, Sanjit Maitra

Theses

The recent launch of spaceborne (TerraSAR-X, RADARSAT-2, ALOS-PALSAR, RISAT) and airborne (SIRC, AIRSAR, UAVSAR, PISAR) polarimetric radar sensors, with capability of imaging through day and night in almost all weather conditions, has made polarimetric synthetic aperture radar (PolSAR) image interpretation and analysis an active area of research. PolSAR image classification is sensitive to object orientation and scattering properties. In recent years, significant work has been done in many areas including agriculture, forestry, oceanography, geology, terrain analysis. Visible light passive polarimetric imaging has also emerged as a powerful tool in remote sensing for enhanced information extraction. The intensity image provides information …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou Nov 2013

Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou

Theses and Dissertations

Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in …


Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya Jul 2013

Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya

Theses and Dissertations

In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic …


Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack May 2013

Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack

Theses and Dissertations

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) …


Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong May 2013

Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong

Dissertations & Theses (Open Access)

It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays …


Classification Of Satellite Time Series-Derived Land Surface Phenology Focused On The Northern Fertile Crescent, Brian Embree Bunker May 2013

Classification Of Satellite Time Series-Derived Land Surface Phenology Focused On The Northern Fertile Crescent, Brian Embree Bunker

Graduate Theses and Dissertations

Land surface phenology describes events in a seasonal vegetation cycle and can be used in a variety of applications from predicting onset of future drought conditions, to revealing potential limits of historical dry farming, to guiding more accurate dating of archeological sites. Traditional methods of monitoring vegetation phenology use data collected in situ. However, vegetation health indices derived from satellite remote sensor data, such as the normalized difference vegetation index (NDVI), have been used as proxy for vegetation phenology due to their repeated acquisition and broad area coverage. Land surface phenology is accessible in the NDVI satellite record when images …


Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas May 2013

Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Statistical classification is widely used in many areas where there is a need to make a data-driven decision, or to classify complicated cases or objects. For instance: disease diagnostics (is a patient sick or healthy, based on the blood test results?); weather forecasting (will there be a storm tomorrow, based on today's atmospheric pressure, air temperature, and wind velocity?); speech recognition (what was said over the phone, based on the caller's voice level and articulation); spam detection (can the unsolicited commercial e-mails be identified by their content?); and so on.

Classification trees …


Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa Jan 2013

Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa

Theses and Dissertations--Computer Science

Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern …


A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias Jan 2013

A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias

Open Access Theses & Dissertations

In pattern recognition and machine learning, a classification problem refers to finding an algorithm for assigning a given input data into one of several categories. Many natural signals are sparse or compressible in the sense that they have short representations when expressed in a suitable basis. Motivated by the recent successful development of algorithms for sparse signal recovery, we apply the selective nature of sparse representation to perform classification. Any test sample is represented in an overcomplete dictionary with the training sample as base elements. A given test sample can be expressed as a linear combination of only those training …


Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom Jan 2013

Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom

Theses and Dissertations

Multivariate optical computing (MOC) is an all-optical approach of predictive spectroscopy that utilizes multivariate calibration and spectral pattern recognition techniques while operating in a simple filter photometer instrument, removing the need for expensive instrumentation and post-processing of spectral data. This is accomplished with specially designed interference filters called multivariate optical elements (MOEs). MOC can provide analytical solutions for applications requiring low cost, rugged, and simple to operate instrumentation for use in remote and hazardous environments such as open ocean waters. These instrument specifications are central for developing a method for classifying phytoplankton in their natural environment. Phytoplankton are photosynthetic single …