Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

4,481 Full-Text Articles 4,914 Authors 1,167,439 Downloads 160 Institutions

All Articles in Databases and Information Systems

Faceted Search

4,481 full-text articles. Page 1 of 160.

Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr. 2019 Iowa State University

Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr.

William Q Meeker

Warranty return data from repairable systems, such as vehicles, usually result in recurrent event data. The non-homogeneous Poisson process (NHPP) model is used widely to describe such data. Seasonality in the repair frequencies and other variabilities, however, complicate the modeling of recurrent event data. Not much work has been done to address the seasonality, and this paper provides a general approach for the application of NHPP models with dynamic covariates to predict seasonal warranty returns. A hierarchical clustering method is used to stratify the population into groups that are more homogeneous than the than the overall population. The stratification facilitates ...


The Effect Of Conversational Agent Skill On User Behavior During Deception, Ryan M. Schuetzler, G. Mark Grimes, Justin Scott Giboney 2019 University of Nebraska at Omaha

The Effect Of Conversational Agent Skill On User Behavior During Deception, Ryan M. Schuetzler, G. Mark Grimes, Justin Scott Giboney

Information Systems and Quantitative Analysis Faculty Publications

Conversational agents (CAs) are an integral component of many personal and business interactions. Many recent advancements in CA technology have attempted to make these interactions more natural and human-like. However, it is currently unclear how human-like traits in a CA impact the way users respond to questions from the CA. In some applications where CAs may be used, detecting deception is important. Design elements that make CA interactions more human-like may induce undesired strategic behaviors from human deceivers to mask their deception. To better understand this interaction, this research investigates the effect of conversational skill—that is, the ability of ...


Automated Knowledge Extraction From Archival Documents, Khalil Malki 2019 Atlanta University Center

Automated Knowledge Extraction From Archival Documents, Khalil Malki

Electronic Theses & Dissertations Collection for Atlanta University & Clark Atlanta University

Traditional archival media such as paper, film, photographs, etc. contain a vast storage of knowledge. Much of this knowledge is applicable to current business and scientific problems, and offers solutions; consequently, there is value in extracting this information. While it is possible to manually extract the content, this technique is not feasible for large knowledge repositories due to cost and time. In this thesis, we develop a system that can extract such knowledge automatically from large repositories. A Graphical User Interface that permits users to indicate the location of the knowledge components (indexes) is developed, and software features that permit ...


The Rise Of Citizen Science In Health And Biomedical Research, Andrea Wiggins, John Wilbanks 2019 University of Nebraska at Omaha

The Rise Of Citizen Science In Health And Biomedical Research, Andrea Wiggins, John Wilbanks

Information Systems and Quantitative Analysis Faculty Publications

Citizen science models of public participation in scientific research represent a growing area of opportunity for health and biomedical research, as well as new impetus for more collaborative forms of engagement in large-scale research. However, this also surfaces a variety of ethical issues that both fall outside of and build upon the standard human subjects concerns in bioethics. This article provides background on citizen science, examples of current projects in the field, and discussion of established and emerging ethical issues for citizen science in health and biomedical research.


Random Convolutional Coding For Robust And Straggler Resilient Distributed Matrix Computation, Anindya B. Das, Aditya Ramamoorthy, Namrata Vaswani 2019 Iowa State University

Random Convolutional Coding For Robust And Straggler Resilient Distributed Matrix Computation, Anindya B. Das, Aditya Ramamoorthy, Namrata Vaswani

Aditya Ramamoorthy

Distributed matrix computations (matrix-vector and matrix-matrix multiplications) are at the heart of several tasks within the machine learning pipeline. However, distributed clusters are well-recognized to suffer from the problem of stragglers (slow or failed nodes). Prior work in this area has presented straggler mitigation strategies based on polynomial evaluation/interpolation. However, such approaches suffer from numerical problems (blow up of round-off errors) owing to the high condition numbers of the corresponding Vandermonde matrices. In this work, we introduce a novel solution approach that relies on embedding distributed matrix computations into the structure of a convolutional code. This simple innovation allows ...


Inferring Behavioral Specifications From Large-Scale Repositories By Leveraging Collective Intelligence, Hridesh Rajan, Tien N. Nguyen, Gary T. Leavens, Robert Dyer 2019 Iowa State University

Inferring Behavioral Specifications From Large-Scale Repositories By Leveraging Collective Intelligence, Hridesh Rajan, Tien N. Nguyen, Gary T. Leavens, Robert Dyer

Hridesh Rajan

Despite their proven benefits, useful, comprehensible, and efficiently checkable specifications are not widely available. This is primarily because writing useful, non-trivial specifications from scratch is too hard, time consuming, and requires expertise that is not broadly available. Furthermore, the lack of specifications for widely-used libraries and frameworks, caused by the high cost of writing specifications, tends to have a snowball effect. Core libraries lack specifications, which makes specifying applications that use them expensive. To contain the skyrocketing development and maintenance costs of high assurance systems, this self-perpetuating cycle must be broken. The labor cost of specifying programs can be significantly ...


Declarative Visitors To Ease Fine-Grained Source Code Mining With Full History On Billions Of Ast Nodes, Robert Dyer, Tien N. Nguyen, Hridesh Rajan 2019 Iowa State University

Declarative Visitors To Ease Fine-Grained Source Code Mining With Full History On Billions Of Ast Nodes, Robert Dyer, Tien N. Nguyen, Hridesh Rajan

Hridesh Rajan

Software repositories contain a vast wealth of information about software development. Mining these repositories has proven useful for detecting patterns in software development, testing hypotheses for new software engineering approaches, etc. Specifically, mining source code has yielded significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code due to both human and computational scalability issues. In this paper we address ...


An Architecture For Blockchain-Based Collaborative Signature-Based Intrusion Detection System, Daniel Laufenberg 2019 Kennesaw State University

An Architecture For Blockchain-Based Collaborative Signature-Based Intrusion Detection System, Daniel Laufenberg

Master of Science in Information Technology Theses

Collaborative intrusion detection system (CIDS), where IDS hosts work with each other and share resources, have been proposed to cope with the increasingly sophisticated cyberattacks. Despite the promising benefits such as expanded signature databases and alert data from multiple sites, trust management and consensus building remain as challenges for a CIDS to work effectively. The blockchain technology with built-in immutability and consensus building capability provides a viable solution to the issues of CIDS. In this paper, we introduce an architecture for a blockchain-enabled signature-based collaborative IDS, discuss the implementation strategy of the proposed architecture and developed a prototype using Hyperledger ...


Social Media Text Mining Framework For Drug Abuse: An Opioid Crisis Case Analysis, Tareq Nasralah 2019 Dakota State University

Social Media Text Mining Framework For Drug Abuse: An Opioid Crisis Case Analysis, Tareq Nasralah

Masters Theses & Doctoral Dissertations

Social media is considered as a promising and viable source of data for gaining insights into various disease conditions, patients’ attitudes and behaviors, and medications. The daily use of social media provides new opportunities for analyzing several aspects of communication. Social media as a big data source can be used to recognize communication and behavioral themes of problematic use of prescription drugs. Mining and analyzing such media have challenges and limitations with respect to topic deduction and data quality. There is a need for a structured approach to efficiently and effectively analyze social media content related to drug abuse in ...


Securing Messaging Services Through Efficient Signcryption With Designated Equality Test, Yujue WANG, Hwee Hwa PANG, Robert H. DENG, Yong DING, Qianhong WU, Bo QIN 2019 Guilin University of Electronic Technology

Securing Messaging Services Through Efficient Signcryption With Designated Equality Test, Yujue Wang, Hwee Hwa Pang, Robert H. Deng, Yong Ding, Qianhong Wu, Bo Qin

Research Collection School Of Information Systems

To address security and privacy issues in messaging services, we present a public key signcryption scheme with designated equality test on ciphertexts (PKS-DET) in this paper. The scheme enables a sender to simultaneously encrypt and sign (signcrypt) messages, and to designate a tester to perform equality test on ciphertexts, i.e., to determine whether two ciphertexts signcrypt the same underlying plaintext message. We introduce the PKS-DET framework, present a concrete construction and formally prove its security against three types of adversaries, representing two security requirements on message confidentiality against outsiders and the designated tester, respectively, and a requirement on message ...


A Hidden Markov Model For Matching Spatial Networks, Benoit Costes, Julien Perret 2019 The University of Maine

A Hidden Markov Model For Matching Spatial Networks, Benoit Costes, Julien Perret

Journal of Spatial Information Science

Datasets of the same geographic space at different scales and temporalities are increasingly abundant, paving the way for new scientific research. These datasets require data integration, which implies linking homologous entities in a process called data matching that remains a challenging task, despite a quite substantial literature, because of data imperfections and heterogeneities. In this paper, we present an approach for matching spatial networks based on a hidden Markov model (HMM) that takes full benefit of the underlying topology of networks. The approach is assessed using four heterogeneous datasets (streets, roads, railway, and hydrographic networks), showing that the HMM algorithm ...


Evaluating Existing Manually Constructed Natural Landscape Classification With A Machine Learning-Based Approach, Rok Ciglic, Erik Strumbelj, Rok Cesnovar, Mauro Hrvatin, Drago Perko 2019 University of Ljubljana

Evaluating Existing Manually Constructed Natural Landscape Classification With A Machine Learning-Based Approach, Rok Ciglic, Erik Strumbelj, Rok Cesnovar, Mauro Hrvatin, Drago Perko

Journal of Spatial Information Science

Some landscape classifications officially determine financial obligations; thus, they must be objective and precise. We presume it is possible to quantitatively evaluate existing manually constructed classifications and correct them if necessary. One option for achieving this goal is a machine learning method. With (re)modeling of the landscape classification and an explanation of its structure, we can add quantitative proof to its original (qualitative) description. The main objectives of the paper are to evaluate the consistency of the existing manually constructed natural landscape classification with a machine learning-based approach and to test the newly developed general black-box explanation method in ...


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko 2019 The University of Melbourne

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in ...


Parallel Streaming Random Sampling, Kanat Tangwongsan, Srikanta Tirthapura 2019 Mahidol University International College

Parallel Streaming Random Sampling, Kanat Tangwongsan, Srikanta Tirthapura

Srikanta Tirthapura

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential ...


Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski 2019 Embry-Riddle Aeronautical University, Daytona Beach

Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski

Beyond: Undergraduate Research Journal

The purpose of this research project is to use statistical analysis, data mining, and machine learning techniques to determine identifiable factors in child welfare service records that could lead to a child entering the foster care system multiple times. This would allow us the capability of accurately predicting a case’s outcome based on these factors. We were provided with eight years of data in the form of multiple spreadsheets from Partnership for Strong Families (PSF), a child welfare services organization based in Gainesville, Florida, who is contracted by the Florida Department for Children and Families (DCF). This data contained ...


Encoding Invariances In Deep Generative Models, Viraj Shah, Ameya Joshi, Sambuddha Ghosal, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde 2019 Iowa State University

Encoding Invariances In Deep Generative Models, Viraj Shah, Ameya Joshi, Sambuddha Ghosal, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde

Baskar Ganapathysubramanian

Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a new generative modeling approach, InvNet, that can efficiently model data spaces with known invariances. We devise an adversarial training algorithm to encode them into data distribution. We validate our framework in three experimental settings: generating images with fixed motifs; solving nonlinear partial differential equations (PDEs); and ...


Parallel Streaming Random Sampling, Kanat Tangwongsan, Srikanta Tirthapura 2019 Mahidol University International College

Parallel Streaming Random Sampling, Kanat Tangwongsan, Srikanta Tirthapura

Electrical and Computer Engineering Publications

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential ...


Healthcare It In Skilled Nursing And Post-Acute Care Facilities: Reducing Hospital Admissions And Re-Admissions, Improving Reimbursement And Improving Clinical Operations, Scott L. Hopes 2019 University of South Florida

Healthcare It In Skilled Nursing And Post-Acute Care Facilities: Reducing Hospital Admissions And Re-Admissions, Improving Reimbursement And Improving Clinical Operations, Scott L. Hopes

Scott Hopes

Health information technology (HIT), which includes electronic health record (EHR) systems and clinical data analytics, has become a major component of all health care delivery and care management. The adoption of HIT by physicians, hospitals, post-acute care organizations, pharmacies and other health care providers has been accepted as a necessary (and recently, a government required) step toward improved quality, care coordination and reduced costs: “Better coordination of care provides a path to improving communication, improving quality of care, and reducing unnecessary emergency room use and hospital readmissions. LTPAC providers play a critical role in achieving these goals” (HealthIT.gov, 2013 ...


Examining Medline Search Query Reproducibility And Resulting Variation In Search Results, C. Sean Burns, Robert M. Shapiro II, Tyler Nix, Jeffrey T. Huber 2019 University of Kentucky

Examining Medline Search Query Reproducibility And Resulting Variation In Search Results, C. Sean Burns, Robert M. Shapiro Ii, Tyler Nix, Jeffrey T. Huber

C. Sean Burns

The MEDLINE database is publicly available through the National Library of Medicine’s PubMed but the data file itself is also licensed to a number of vendors, who may offer their versions to institutional and other parties as part of a database platform. These vendors provide their own interface to the MEDLINE file and offer other technologies that attempt to make their version useful to subscribers. However, little is known about how vendor platforms ingest and interact with MEDLINE data files, nor how these changes influence the construction of search queries and the results they produce. This poster presents a ...


Encoding Invariances In Deep Generative Models, Viraj Shah, Ameya Joshi, Sambuddha Ghosal, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde 2019 Iowa State University

Encoding Invariances In Deep Generative Models, Viraj Shah, Ameya Joshi, Sambuddha Ghosal, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde

Mechanical Engineering Publications

Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a new generative modeling approach, InvNet, that can efficiently model data spaces with known invariances. We devise an adversarial training algorithm to encode them into data distribution. We validate our framework in three experimental settings: generating images with fixed motifs; solving nonlinear partial differential equations (PDEs); and ...


Digital Commons powered by bepress