Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Nebraska - Lincoln

Discipline
Keyword
Publication Year
Publication

Articles 31 - 50 of 50

Full-Text Articles in Databases and Information Systems

Suddenly...I'M Consulting On Data Management Plans! Data Management Plan Consultant Checklist, Kiyomi D. Deards Oct 2013

Suddenly...I'M Consulting On Data Management Plans! Data Management Plan Consultant Checklist, Kiyomi D. Deards

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This webinar will outline the most important questions to ask, and the best resources available, for those who "suddenly" will be consulting on data management plans.


Segmenting Tables Via Indexing Of Value Cells By Table Headers, Sharad C. Seth, George Nagy Aug 2013

Segmenting Tables Via Indexing Of Value Cells By Table Headers, Sharad C. Seth, George Nagy

CSE Conference and Workshop Papers

Correct segmentation of a web table into its component regions is the essential first step to understanding tabular data. Our algorithmic solution to the segmentation problem relies on the property that strings defining row and column header paths uniquely index each data cell in the table. We segment the table using only “logical layout analysis” without resorting to any appearance features or natural language understanding. We start with a CSV table that preserves the 2- dimensional structure and contents of the original source table (e.g., an HTML table) but not font size, font weight, and color. The indexing property of …


Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi Feb 2013

Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi

School of Computing: Faculty Publications

This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family …


Biodiversity Heritage Library, Smithsonian Institution Libraries, Deanna Marcum Jan 2013

Biodiversity Heritage Library, Smithsonian Institution Libraries, Deanna Marcum

Copyright, Fair Use, Scholarly Communication, etc.

The Biodiversity Heritage Library (BHL), created in 2006, is the result of a collaboration of ten natural history museum and botanical garden libraries seeking to digitize core taxonomic literature and to make it free and openly available throughout the world. Today, the BHL includes fifteen member institutions whose efforts have shaped a collection of over 60,000 titles. It is supported through a combination of membership dues, in-kind support from member institutions, contributions from the user community, and direct support from the Smithsonian Institution Libraries, and it reaches tens of thousands of users each year. While managing the complex partnership has …


Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi Dec 2012

Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi

CSE Conference and Workshop Papers

Data mining of protein databases poses special challenges because many protein databases are non- relational whereas most data mining and machine learning algorithms assume the input data to be a type of rela- tional database that is also representable as an ARFF file. We developed a method to restructure protein databases so that they become amenable for various data mining and machine learning tools. Our restructuring method en- abled us to apply both decision tree and support vector machine classifiers to a pancreatic protein database. The SVM classifier that used both GO term and PFAM families to characterize proteins gave …


Data Mining Of Protein Databases, Christopher Assi Jul 2012

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …


An Approach To The Virtual Flora Of Mongolia – From A Data Repository To An Expert System, Http://Greif.Uni-Greifswald.De/Floragreif/, Jörg Hartleib, Martin Schnittler, Sabrina Rilke, Anne Zemmrich, Bernd Bobertz, Ulrike Najmi, Reinhard Zölitz, Susanne Starke Jan 2012

An Approach To The Virtual Flora Of Mongolia – From A Data Repository To An Expert System, Http://Greif.Uni-Greifswald.De/Floragreif/, Jörg Hartleib, Martin Schnittler, Sabrina Rilke, Anne Zemmrich, Bernd Bobertz, Ulrike Najmi, Reinhard Zölitz, Susanne Starke

Erforschung biologischer Ressourcen der Mongolei / Exploration into the Biological Resources of Mongolia, ISSN 0440-1298

FloraGREIF is an internet accessible information system providing taxonomic, phytogeographic and ecological information on Mongolia’s flora in terms of descriptions, high-resolution plant images and an interactive WebGIS application. Organised along an updated checklist of the approx. 3000 Mongolian vascular plants that serves as a taxonomic backbone, information is split into the taxon level, referring to plant species, and the record level, referring to record or a collected plant specimen. At the latter level, images of living plants, scans of herbarium sheets, habitat photos and further notes can be found. Both data levels are linked by the name of the respective …


A Study Of Correlations Between The Definition And Application Of The Gene Ontology, Yuji Mo Dec 2011

A Study Of Correlations Between The Definition And Application Of The Gene Ontology, Yuji Mo

Computer and Electronics Engineering: Dissertations, Theses, and Student Research

When using the Gene Ontology (GO), nucleotide and amino acid sequences are annotated by terms in a structured and controlled vocabulary organized into relational graphs. The usage of the vocabulary (GO terms) in the annotation of these sequences may diverge from the relations defined in the ontology. We measure the consistency of the use of GO terms by comparing GO's defined structure to the terms' application. To do this, we first use synthetic data with different characteristics to understand how these characteristics influence the correlation values determined by various similarity measures. Using these results as a baseline, we found that …


Propeller: A Scalable Metadata Organization For A Versatile Searchable File System, Lei Xu, Hong Jiang, Xue Liu, Lei Tian, Yu Hua, Jian Hu Mar 2011

Propeller: A Scalable Metadata Organization For A Versatile Searchable File System, Lei Xu, Hong Jiang, Xue Liu, Lei Tian, Yu Hua, Jian Hu

CSE Technical Reports

The exponentially increasing amount of data in file systems has made it increasingly important for users, administrators and applications to be able to fast retrieve files using file-search services, instead of replying on the standard file system API to traverse the hierarchical namespaces. The quality of the file-search services is significantly affected by the file-indexing overhead, the file-search performance and the accuracy of search results. Unfortunately, the existing file-search solutions either are so poorly scalable that their performance degrades unacceptably when the systems scale up, or incur so much crawling delays that they produce acceptably inaccurate results. We believe that …


Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng Apr 2010

Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng

CSE Technical Reports

Explosive growth in volume and complexity of data exacerbates the key challenge to effectively and efficiently manage data in a way that fundamentally improves the ease and efficacy of their use. Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes. This paper proposes a novel semantic-sensitive scheme, called Rapport, to provide dynamic and adaptive namespace management and support complex queries. The basic idea is to build files’ namespace by utilizing their semantic correlation and exploiting dynamic evolution of attributes to support namespace management. …


Genbank, Dennis A. Benson, Ilene Karasch-Mizrachi, David J. Lipman, James Ostell, Eric W. Sayers Jan 2010

Genbank, Dennis A. Benson, Ilene Karasch-Mizrachi, David J. Lipman, James Ostell, Eric W. Sayers

Harold W. Manter Laboratory: Library Materials

GenBank(R) is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data …


Temporal Data Classification Using Linear Classifiers, Peter Revesz, Thomas Triplet Sep 2009

Temporal Data Classification Using Linear Classifiers, Peter Revesz, Thomas Triplet

CSE Conference and Workshop Papers

Data classification is usually based on measurements recorded at the same time. This paper considers temporal data classification where the input is a temporal database that describes measurements over a period of time in history while the predicted class is expected to occur in the future. We describe a new temporal classification method that improves the accuracy of standard classification methods. The benefits of the method are tested on weather forecasting using the meteorological database from the Texas Commission on Environmental Quality.


Using Gis To Locate Areas For Growing Quality Coffee In Honduras, Ellen Mickle Apr 2009

Using Gis To Locate Areas For Growing Quality Coffee In Honduras, Ellen Mickle

Department of Environmental Studies: Undergraduate Student Theses

Abstract Small-scale coffee producers worldwide remain vulnerable to price fluctuations after the 1999-2003 coffee crisis. One way to increase small-scale farmer economic resilience is to produce a more expensive product, such as quality coffee. There is growing demand in coffee-producing and coffee-importing countries for user-friendly tools that facilitate the marketing of quality coffee. The purpose of this study is to develop a prototypical quality coffee marketing tool in the form of a GIS model that identifies regions for producing quality coffee in a country not usually associated with quality coffee, Honduras. Maps of areas for growing quality coffee were produced …


Classification And Cluster Analysis Of Complex Time-Of-Flight Secondary Ion Mass Spectrometry For Biological Samples, Stephen E. Reichenbach, Xue Tian, Qingping Tao, Alex Henderson Jan 2009

Classification And Cluster Analysis Of Complex Time-Of-Flight Secondary Ion Mass Spectrometry For Biological Samples, Stephen E. Reichenbach, Xue Tian, Qingping Tao, Alex Henderson

CSE Conference and Workshop Papers

Identifying and separating subtly different biological samples is one of the most critical tasks in biological analysis. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is becoming a popular and important technique in the analysis of biological samples, because it can detect molecular information and characterize chemical composition. ToF-SIMS spectra of biological samples are enormously complex with large mass ranges and many peaks. As a result the classification and cluster analysis are challenging. This study presents a new classification algorithm, the most similar neighbor with a probability-based spectrum similarity measure (MSN- PSSM), which uses all the information in the entire ToF- SIMS …


Getting Started With Prpl, Qingfeng Guan Apr 2008

Getting Started With Prpl, Qingfeng Guan

School of Natural Resources: Faculty Publications

pRPL is an open-source1 general-purpose parallel Raster Processing programming Library developed by Qingfeng Guan, in the Department of Geography, University of California, Santa Barbara. pRPL encapsulates complex parallel computing utilities and routines specifically for raster processing (e.g., raster data decomposition, distribution and gathering among multiple processors, inter-processor communication and data exchange), and provides an easy-to-use interface for users to parallelize almost any raster processing algorithm with any arbitrary neighborhood (or moving window) configuration. pRPL enables the implementation of parallel raster-processing algorithms without requiring a deep understanding of parallel computing and programming, thus it greatly reduces the development complexity. Moreover, even …


Adaptive Interpolation Algorithms For Temporal-Oriented Datasets, Jun Gao Jun 2006

Adaptive Interpolation Algorithms For Temporal-Oriented Datasets, Jun Gao

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Spatiotemporal datasets can be classified into two categories: temporal-oriented and spatial-oriented datasets depending on whether missing spatiotemporal values are closer to the values of its temporal or spatial neighbors. We present an adaptive spatiotemporal interpolation model that can estimate the missing values in both categories of spatiotemporal datasets. The key parameters of the adaptive spatiotemporal interpolation model can be adjusted based on experience.


Comparison Of Modis And Avhrr 16-Day Normalized Difference Vegetation Index Composite Data, Kevin P. Gallo, Lei Ji, Brad Reed, John Dwyer, Jeffrey Eidenshink Jan 2004

Comparison Of Modis And Avhrr 16-Day Normalized Difference Vegetation Index Composite Data, Kevin P. Gallo, Lei Ji, Brad Reed, John Dwyer, Jeffrey Eidenshink

School of Natural Resources: Faculty Publications

Normalized difference vegetation index (NDVI) data derived from visible and near-infrared data acquired by the MODIS and AVHRR sensors were compared over the same time periods and a variety of land cover classes within the conterminous USA. The relationship between the AVHRR derived NDVI values and those of future sensors is critical to continued long term monitoring of land surface properties. The results indicate that the 16-day composite values are quite similar over the 23 intervals of 2001 that were analyzed, and a linear relationship exists between the NDVI values from the two sensors. The composite AVHRR NDVI data were …


Digitization In An Archival Environment, Sally Mckay Jan 2003

Digitization In An Archival Environment, Sally Mckay

E-JASL 1999-2009 (Volumes 1-10)

Introduction

Cultural institutions such as museums, libraries, archives, and historical societies house remarkable collections of cultural artifacts. It is the responsibility of the staff working for those institutions to preserve, protect and provide responsible stewardship for the materials, and to the best of their ability, provide continued long-term access (Russell, 2000).

Advances in technology allow institutions to provide expanded access and education; however, there are important priorities that must be addressed prior to embarking on a digital conversion project.

Digitization in an archival environment includes taking a physical object or analog item, such as an art object, a tape recording, …


Cataloging Expert Systems: Optimism And Frustrated Reality, William Olmstadt Feb 2000

Cataloging Expert Systems: Optimism And Frustrated Reality, William Olmstadt

E-JASL 1999-2009 (Volumes 1-10)

There is little question that computers have profoundly changed how information professionals work. The process of cataloging and classifying library materials was one of the first activities transformed by information technology. The introduction of the MARC format in the 1960s and the creation of national bibliographic utilities in the 1970s had a lasting impact on cataloging. In the 1980s, the affordability of microcomputers made the computer accessible for cataloging, even to small libraries. This trend toward automating library processes with computers parallels a broader societal interest in the use of computers to organize and store information. Following World War II, …


The Computer As A Collection Management Tool, Suzanne B. Mclaren, Hugh H. Genoways, Duane A. Schlitter Jan 1987

The Computer As A Collection Management Tool, Suzanne B. Mclaren, Hugh H. Genoways, Duane A. Schlitter

University of Nebraska State Museum: Mammalogy Papers

Since the mid-1960s, discussion of computer use for information retrieval in museum collections has usually focused on research potential. Much attention has been given to the idea of networking and the ability to access data across great distances. However, the potential for collection management usage has also proven to be a legitimate rationale for computerization. Numerous aspects of collection management are discussed for which the computer may be employed. Topics include creating cross-reference files, updating taxonomic and geographic information, pinpointing mismatched specimens, locating lost and uncataloged material, controlling loan procedures, producing accession files for insurance purposes, curating all or part …