Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,728 Full-Text Articles 3,525 Authors 509,760 Downloads 199 Institutions

All Articles in Data Science

Faceted Search

1,728 full-text articles. Page 85 of 85.

Hyperspectral Image Classification Using A Spectral-Spatial Sparse Coding Model, Ender Oguslu, Guoqing Zhou, Jiang Li, Lorenzo Bruzzone (Ed.) 2013 Old Dominion University

Hyperspectral Image Classification Using A Spectral-Spatial Sparse Coding Model, Ender Oguslu, Guoqing Zhou, Jiang Li, Lorenzo Bruzzone (Ed.)

Electrical & Computer Engineering Faculty Publications

We present a sparse coding based spectral-spatial classification model for hyperspectral image (HSI) datasets. The proposed method consists of an efficient sparse coding method in which the l1/lq regularized multi-class logistic regression technique was utilized to achieve a compact representation of hyperspectral image pixels for land cover classification. We applied the proposed algorithm to a HSI dataset collected at the Kennedy Space Center and compared our algorithm to a recently proposed method, Gaussian process maximum likelihood (GP-ML) classifier. Experimental results show that the proposed method can achieve significantly better performances than the GP-ML classifier when training data …


Why Police Learn From Third-Party Data, Randall K. Johnson 2013 University of Missouri - Kansas City, School of Law

Why Police Learn From Third-Party Data, Randall K. Johnson

Faculty Works

This essay argues that third-party data collection, particularly of administrative complaints and departmental audit information, holds greater promise than lawsuit data collection. It does so by asserting that third-party data collection is more useful for three reasons. First, third-party data collection prevents manipulation by individual police officers and law enforcement agencies. Second, it assures that police behavioral trends are actually identified. Lastly, third-party data collection helps to deter published § 1983 cases. The essay, however, only models and tests the final claim.


Divad: A Dynamic And Interactive Visual Analytical Dashboard For Exploring And Analyzing Transport Data, Tin Seong KAM, Ketan BARSHIKAR, Shaun Jun Hua TAN 2012 Singapore Management University

Divad: A Dynamic And Interactive Visual Analytical Dashboard For Exploring And Analyzing Transport Data, Tin Seong Kam, Ketan Barshikar, Shaun Jun Hua Tan

Research Collection School Of Computing and Information Systems

The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner’s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners …


Maximizing Network Lifetime On The Line With Adjustable Sensing Ranges, Amotz Bar-Noy, Ben Baumer 2012 The Doctorate-Granting Institution of the City University of New York

Maximizing Network Lifetime On The Line With Adjustable Sensing Ranges, Amotz Bar-Noy, Ben Baumer

Statistical and Data Sciences: Faculty Publications

Given n sensors on a line, each of which is equipped with a unit battery charge and an adjustable sensing radius, what schedule will maximize the lifetime of a network that covers the entire line? Trivially, any reasonable algorithm is at least a 1/2-approximation, but we prove tighter bounds for several natural algorithms. We focus on developing a linear time algorithm that maximizes the expected lifetime under a random uniform model of sensor distribution. We demonstrate one such algorithm that achieves an average-case approximation ratio of almost 0.9. Most of the algorithms that we consider come from a family based …


Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null 2012 The Doctorate-Granting Institution of the City University of New York

Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null

Statistical and Data Sciences: Faculty Publications

A baseball team's offensive prowess is a function of two types of abilities: batting and baserunning. While each has been studied extensively in isolation, the effects of their interaction is not well understood. We model offensive output as a scalar function f of an individual player's batting and baserunning profile z. Each of these profiles is in turn estimated from Retrosheet data using heirarchical Bayesian models. We then use the SimulOutCome simulation engine as a method to generate values of f(z) over a fine grid of points. Finally, for each of several methods of taking the extra base, we graphically …


Age Composition And Distribution Of Red Drum (Sciaenops Ocellatus) In Offshore Waters Of The North Central Gulf Of Mexico: An Evaluation Of A Stock Under A Federal Harvest Moratorium, Sean P. Powers, Crystal Hightower, J. Marcus Drymon, Matthew W. Johnson 2012 University of South Alabama

Age Composition And Distribution Of Red Drum (Sciaenops Ocellatus) In Offshore Waters Of The North Central Gulf Of Mexico: An Evaluation Of A Stock Under A Federal Harvest Moratorium, Sean P. Powers, Crystal Hightower, J. Marcus Drymon, Matthew W. Johnson

University Faculty and Staff Publications

Because of a lack of fishery- dependent data, assessment of the recovery of fish stocks that undergo the most aggressive form of management, namely harvest moratoriums, remains a challenge. Large schools of red drum (Sclaenops ocellatus) were common along the northern Gulf of Mexico until the late 1980s when increased fishing effort quickly depleted the stock. After 24 years of harvest moratorium on red drum in federal waters, the stock is in need of reassessment; however, fishery dependent data are not available in federal waters and fishery-independent data are limited. We document the distribution, age composition, growth, and condition of …


Generating A Close-To-Reality Synthetic Population Of Ghana, Tyler Frazier, Andreas Alfons 2012 William & Mary

Generating A Close-To-Reality Synthetic Population Of Ghana, Tyler Frazier, Andreas Alfons

Arts & Sciences Articles

The purpose of this research is to generate a close-to-reality synthetic human population for use in a geosimulation of urban dynamics. Two commonly accepted approaches to generating synthetic human populations are Iterative Proportional Fitting (IPF) and Resampling with Replacement. While these methods are effective at reproducing one instance of the probability model describing the survey, it is an instance with extremely small variability amongst subgroups and is very unlikely to be the real population. IPF and Resampling with Replacement also rely on pure replication of units from the underlying sample which can increase unrealistic model behavior. In this work we …


Real-Time Anomaly Detection In Full Motion Video, Glenn Konowicz,, Jiang Li, Donnie Self (Ed.) 2012 Old Dominion University

Real-Time Anomaly Detection In Full Motion Video, Glenn Konowicz,, Jiang Li, Donnie Self (Ed.)

Electrical & Computer Engineering Faculty Publications

Improvement in sensor technology such as charge-coupled devices (CCD) as well as constant incremental improvements in storage space has enabled the recording and storage of video more prevalent and lower cost than ever before. However, the improvements in the ability to capture and store a wide array of video have required additional manpower to translate these raw data sources into useful information. We propose an algorithm for automatically detecting anomalous movement patterns within full motion video thus reducing the amount of human intervention required to make use of these new data sources. The proposed algorithm tracks all of the objects …


Model Individualization For Real-Time Operator Functional State Assessment, Guangfan Zhang, Roger Xu, Wei Wang, Aaron A. Pepe, Feng Li, Jiang Li, Frederick McKenzie, Tom Schnell, Nick Anderson, Dean Heitkamp 2012 Intelligent Automation, Inc.

Model Individualization For Real-Time Operator Functional State Assessment, Guangfan Zhang, Roger Xu, Wei Wang, Aaron A. Pepe, Feng Li, Jiang Li, Frederick Mckenzie, Tom Schnell, Nick Anderson, Dean Heitkamp

Electrical & Computer Engineering Faculty Publications

Proper assessment of Operator Functional State (OFS) and appropriate workload modulation offer the potential to improve mission effectiveness and aviation safety in both overload and under-load conditions. Although a wide range of research has been devoted to building OFS assessment models, most of the models are based on group statistics and little or no research has been directed towards model individualization, i.e., tuning the group statistics based model for individual pilots. Moreover, little emphasis has been placed on monitoring whether the pilot is disengaged during low workload conditions. The primary focus of this research is to provide a real-time engagement …


Networks - Ii: A Survey Of Data Management Issues & Frameworks For Mobile Ad Hoc Networks, Noman Islam, Zubair A. Shaikh 2011 National University of Computer and Emerging Sciences, Karachi, Pakistan

Networks - Ii: A Survey Of Data Management Issues & Frameworks For Mobile Ad Hoc Networks, Noman Islam, Zubair A. Shaikh

International Conference on Information and Communication Technologies

Data Management is the execution of a pool of activities on a set of data to conform to the end user data requisitions. MANET is an emerging discipline of computer networks in which a group of roaming hosts spontaneously establishes the network among themselves. The employment of data management in MANET can engender a number of useful applications. However, data management in MANET is a taxing job as it requires deliberation on a number of research issues (e.g. knowledge representation, knowledge discovery, caching, and security etc.). This paper provides a detailed account of the data management problem and its issues, …


Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff 2011 Central Washington University

Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff

All Faculty Scholarship for the College of the Sciences

As the availability of geospatial data increases, there is a growing need to match these datasets together. However, since these datasets often vary in their origins and spatial accuracy, they frequently do not correspond well to each other, which create multiple problems. To accurately align with imagery, analysts currently either: 1) manually move the vectors, 2) perform a labor-intensive spatial registration of vectors to imagery, 3) move imagery to vectors, or 4) redigitize the vectors from scratch and transfer the attributes. All of these are time consuming and labor-intensive operations. Automated matching and fusing vector datasets has been a subject …


Semi-Automatic Management Of Knowledge Bases Using Formal Ontologies, Andreas Textor 2011 Department of Computing, Cork Institute of Technology, Cork, Ireland.

Semi-Automatic Management Of Knowledge Bases Using Formal Ontologies, Andreas Textor

Theses

This thesis presents an approach that deals with the ever-growing amount of data in knowledge bases, especially concerning knowledge interoperability and formal representation of domain knowledge. There arc multiple issues that must be addressed with current systems. A multitude of different formats, sources and tools exist in a domain, and it is desirable to develop their use further towards a standardised environment. Such an environment should support both the representation and processing of data from this domain, and the connection to other domains, where necessary. In order to manage large amounts of data, it should be possible to perform whatever …


Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie 2010 Central Washington University

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.

We have the following goals: i. To …


Agnostic Science. Towards A Philosophy Of Data Analysis, Domenico Napoletani, Marco Panza, Daniele C. Struppa 2010 George Mason University

Agnostic Science. Towards A Philosophy Of Data Analysis, Domenico Napoletani, Marco Panza, Daniele C. Struppa

MPP Published Research

In this paper we will offer a few examples to illustrate the orientation of contemporary research in data analysis and we will investigate the corresponding role of mathematics. We argue that the modus operandi of data analysis is implicitly based on the belief that if we have collected enough and sufficiently diverse data, we will be able to answer most relevant questions concerning the phenomenon itself. This is a methodological paradigm strongly related, but not limited to, biology, and we label it the microarray paradigm. In this new framework, mathematics provides powerful techniques and general ideas which generate new …


Digital Commons powered by bepress