Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2017

Big data

Discipline
Institution
Publication
Publication Type

Articles 1 - 26 of 26

Full-Text Articles in Physical Sciences and Mathematics

Remote Sensing Of Forests Using Discrete Return Airborne Lidar, Hamid Hamraz, Marco A. Contreras Dec 2017

Remote Sensing Of Forests Using Discrete Return Airborne Lidar, Hamid Hamraz, Marco A. Contreras

Forestry and Natural Resources Faculty Publications

Airborne discrete return light detection and ranging (LiDAR) point clouds covering forested areas can be processed to segment individual trees and retrieve their morphological attributes. Segmenting individual trees in natural deciduous forests, however, remained a challenge because of the complex and multi-layered canopy. In this chapter, we present (i) a robust segmentation method that avoids a priori assumptions about the canopy structure, (ii) a vertical canopy stratification procedure that improves segmentation of understory trees, (iii) an occlusion model for estimating the point density of each canopy stratum, and (iv) a distributed computing approach for efficient processing at the forest level. …


Some Dimension Reduction Strategies For The Analysis Of Survey Data, Jiaying Weng, Derek S. Young Dec 2017

Some Dimension Reduction Strategies For The Analysis Of Survey Data, Jiaying Weng, Derek S. Young

Statistics Faculty Publications

In the era of big data, researchers interested in developing statistical models are challenged with how to achieve parsimony. Usually, some sort of dimension reduction strategy is employed. Classic strategies are often in the form of traditional inference procedures, such as hypothesis testing; however, the increase in computing capabilities has led to the development of more sophisticated methods. In particular, sufficient dimension reduction has emerged as an area of broad and current interest. While these types of dimension reduction strategies have been employed for numerous data problems, they are scantly discussed in the context of analyzing survey data. This …


Visual Logging Framework Using Elk Stack, Ravi Nishant Dec 2017

Visual Logging Framework Using Elk Stack, Ravi Nishant

Computer Science and Engineering Theses

Logging is the process of storing information for future reference and audit purposes. In software applications, logging plays a very critical role as a development utility and ensures code quality. It acts as an enabler for developers and support professionals by providing them capability to see application’s functionality and understand any issues with it. Data logging has a widespread use in scientific experiments and analytical systems. Major systems which heavily uses data logging are weather reporting services, digital advertisement, search engines, space exploration systems to name a few. Although, data logging increases the productivity and efficiency of a software system, …


The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal Nov 2017

The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

This special issue of IT Professional focuses on the graph database. The graph database, a relatively new phenomenon, is well suited to the burgeoning information era in which we are increasingly becoming immersed. Here, the guest editors briefly explain how a graph database works, its relation to the relational database management system (RDBMS), and its quantitative and qualitative pros and cons, including how graph databases can be harnessed in a hybrid environment. They also survey the excellent articles submitted for this special issue.


Migrating From Sql To Nosql Database: Practices And Analysis, Fatima Jamal Al Shekh Yassin Nov 2017

Migrating From Sql To Nosql Database: Practices And Analysis, Fatima Jamal Al Shekh Yassin

Accounting Dissertations

Most of the enterprises that are dealing with big data are moving towards using

NoSQL data structures to represent data. Converting existing SQL structures to

NoSQL structure is a very important task where we should guarantee both better

Performance and accurate data. The main objective of this thesis is to highlight the

most suitable NoSQL structure to migrate from relational Database in terms of high

performance in reading data. Different combinations of NoSQL structures have been tested and compared with SQL structure to be able to conclude the best design to use.For SQL structure, we used the MySQL data that …


The Transformation Of Science With Hpc, Big Data, And Ai, Jeffrey Kirk Oct 2017

The Transformation Of Science With Hpc, Big Data, And Ai, Jeffrey Kirk

Commonwealth Computational Summit

High performance computing has matured into an indispensable tool for not only academic research and government labs and agencies, but also for many industry sectors: energy, manufacturing, healthcare, financial services, even digital content creation. More recently, the advent of Big Data has enabled the use of HPC techniques for large scale data analysis, expanding the scope of HPC and the reach of it into more research and enterprise use cases. Since 2012, a new regime of data-driven analytics, deep learning, has erupted in popularity, fueled by both the massive performance increases in HPC technologies and in the explosive rate of …


Harnessing The Data Revolution, Chaitan Baru Oct 2017

Harnessing The Data Revolution, Chaitan Baru

Commonwealth Computational Summit

Harnessing Data for 21st Century Science and Engineering (aka Harnessing the Data Revolution, HDR) is one of NSF's six "Big Research Ideas," aimed at supporting fundamental research in data science and engineering; developing a cohesive, federated approach to the research data infrastructure needed to power this revolution; and developing of a 21st-century data-capable workforce. HDR will enable new modes of data-driven discovery allowing researchers to ask and answer new questions in frontier science and engineering, generate new knowledge and understanding by working with domain experts, and accelerate discovery and innovation. This initiative builds on NSF's history of data science investments. …


Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena Oct 2017

Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena

Dissertations (1934 -)

The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as …


Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer Oct 2017

Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer

Research Collection Lee Kong Chian School Of Business

The advent of big data has created opportunities for firms to customize their products and services to unprecedented levels of granularity. Using big data to personalize an offering in real time, however, remains a major challenge. In the mobile advertising industry, once a customer enters the network, an ad-serving decision must be made in a matter of milliseconds. In this work, we describe the design and implementation of an ad-serving algorithm that incorporates machine-learning methods to make personalized ad-serving decisions within milliseconds. We developed this algorithm for Vungle Inc., one of the largest global mobile ad networks. Our approach also …


Vetcompass Australia: A National Big Data Collection System For Veterinary Science, Paul Mcgreevy, Peter Thomson, Navneet K. Dhand, David Raubenheimer, Sophie Masters, Caroline S. Mansfield, Timothy Baldwin, Ricardo J. Soares Magalhaes, Jacquie Rand, Peter Hill, Anne Peaston, James Gilkerson, Martin Combs, Shane Raidal, Peter Irwin, Peter Irons, Richard Squires, David Brodbelt, Jeremy Hammond Sep 2017

Vetcompass Australia: A National Big Data Collection System For Veterinary Science, Paul Mcgreevy, Peter Thomson, Navneet K. Dhand, David Raubenheimer, Sophie Masters, Caroline S. Mansfield, Timothy Baldwin, Ricardo J. Soares Magalhaes, Jacquie Rand, Peter Hill, Anne Peaston, James Gilkerson, Martin Combs, Shane Raidal, Peter Irwin, Peter Irons, Richard Squires, David Brodbelt, Jeremy Hammond

Paul McGreevy, PhD

VetCompass Australia is veterinary medical records-based research coordinated with the global VetCompass endeavor to maximize its quality and effectiveness for Australian companion animals (cats, dogs, and horses). Bringing together all seven Australian veterinary schools, it is the first nationwide surveillance system collating clinical records on companion-animal diseases and treatments. VetCompass data service collects and aggregates real-time, clinical records for
researchers to interrogate, delivering sustainable and cost-effective access to data from hundreds of veterinary practitioners nationwide. Analysis of these clinical records will reveal geographical and temporal trends in the prevalence of inherited and acquired diseases, identify frequently prescribed treatments, revolutionize clinical …


Augmenting Amdahl's Second Law: A Theoretical Model To Build Cost-Effective Balanced Hpc Infrastructure For Data-Driven Science, Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania, Kisung Lee, Wooseok Chang, Seung Jong Park, Ling Liu Sep 2017

Augmenting Amdahl's Second Law: A Theoretical Model To Build Cost-Effective Balanced Hpc Infrastructure For Data-Driven Science, Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania, Kisung Lee, Wooseok Chang, Seung Jong Park, Ling Liu

Computer Science Faculty Research & Creative Works

High-performance analysis of big data demands more computing resources, forcing similar growth in computation cost. So, the challenge to the HPC system designers is providing not only high performance but also high performance at lower cost. For high performance yet cost-effective cyberinfrastructure, we propose a new system model augmenting Amdahl's second law for balanced system to optimize price-performance-ratio. We express the optimal balance among CPU-speed, I/O-bandwidth and DRAM-size (i.e., Amdahl's I/O-and memory-number) in terms of application characteristics and hardware cost. Considering Xeon processor and recent hardware prices, we showed that a system needs almost 0.17GBPS I/O-bandwidth and 3GB DRAM per …


Analysis Of Security In Big Data Related To Healthcare, Isabel De La Torre, Begoña García-Zapirain, Miguel López-Coronado Sep 2017

Analysis Of Security In Big Data Related To Healthcare, Isabel De La Torre, Begoña García-Zapirain, Miguel López-Coronado

Journal of Digital Forensics, Security and Law

Big data facilitates the processing and management of huge amounts of data. In health, the main information source is the electronic health record with others being the Internet and social media. Health-related data refers to storage in big data based on and shared via electronic means. Why are criminal organisations interested in this data? These organisations can blackmail people with information related to their health condition or sell the information to marketing companies, etc. This article analyses healthcare-related big data security and proposes different solutions. There are different techniques available to help preserve privacy such as data modification techniques, cryptographic …


Security And The Transnational Information Polity, Michael M. Losavio, Adel Said Elmaghraby Sep 2017

Security And The Transnational Information Polity, Michael M. Losavio, Adel Said Elmaghraby

Journal of Digital Forensics, Security and Law

Global information and communications technologies create criminal opportunities in which criminal violation and physical proximity are decoupled. As in all our endeavors, the good become the prey of the bad. Murderous and venal exploitation of ICT has followed from the inception of the Internet, threatening all the good it brings and the trust we need so badly as a people. As the work continues to expand the implementation of Smart Cities and the Internet of Things, there will be more opportunities for exploitation of these technologies. We examine the social and liberty risks our data and technology-driven responses may entail.


A Big Data Analytics Method For Tourist Behaviour Analysis, Shah Jahan Miah, Huy Quan Vu, John Gammack, Michael Mcgrath Sep 2017

A Big Data Analytics Method For Tourist Behaviour Analysis, Shah Jahan Miah, Huy Quan Vu, John Gammack, Michael Mcgrath

All Works

© 2016 Elsevier B.V. Big data generated across social media sites have created numerous opportunities for bringing more insights to decision-makers. Few studies on big data analytics, however, have demonstrated the support for strategic decision-making. Moreover, a formal method for analysing social media-generated big data for decision support is yet to be developed, particularly in the tourism sector. Using a design science research approach, this study aims to design and evaluate a ‘big data analytics’ method to support strategic decision-making in tourism destination management. Using geotagged photos uploaded by tourists to the photo-sharing social media site, Flickr, the applicability of …


Okcupid Data For Introductory Statistics And Data Science Courses, Albert Y. Kim, Adriana Escobedo-Land Aug 2017

Okcupid Data For Introductory Statistics And Data Science Courses, Albert Y. Kim, Adriana Escobedo-Land

Mathematics Sciences: Faculty Publications

We present a data set consisting of user profile data for 59,946 San Francisco OkCupid users (a free online dating website) from June 2012. The data set includes typical user information, lifestyle variables, and text responses to 10 essay questions. We present four example analyses suitable for use in undergraduate introductory probability and statistics and data science courses that use R. The statistical and data science concepts covered include basic data visualization, exploratory data analysis, multivariate relationships, text analysis, and logistic regression for prediction.


Distributed Knowledge Discovery For Diverse Data, Hossein Hamooni Jul 2017

Distributed Knowledge Discovery For Diverse Data, Hossein Hamooni

Computer Science ETDs

In the era of new technologies, computer scientists deal with massive data of size hundreds of terabytes. Smart cities, social networks, health care systems, large sensor networks, etc. are constantly generating new data. It is non-trivial to extract knowledge from big datasets because traditional data mining algorithms run impractically on such big datasets. However, distributed systems have come to aid this problem while introducing new challenges in designing scalable algorithms. The transition from traditional algorithms to the ones that can be run on a distributed platform should be done carefully. Researchers should design the modern distributed algorithms based on the …


Performance Optimization And Energy Efficiency Of Big-Data Computing Workflows, Tong Shu Jul 2017

Performance Optimization And Energy Efficiency Of Big-Data Computing Workflows, Tong Shu

Dissertations

Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data-intensive workflows comprised of moldable parallel computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an …


An Mrql Visualizer Using Json Integration, Rohit Bhawal May 2017

An Mrql Visualizer Using Json Integration, Rohit Bhawal

Computer Science and Engineering Theses

In today’s world where there is no limit to the amount of data being collected from IOT devices, social media platforms, and other big data applications, there is a need for systems to process them efficiently and effortlessly. Analyzing the data to identify trends, detect patterns and find other valuable information is critical for any business application. The analyzed data when produced in visual format like graphs, enables one to grasp difficult concepts or identify new patterns easily. MRQL is an SQL-like query language for large scale data analysis built on top of Apache Hadoop, Spark, Flink and Hama which …


Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron Apr 2017

Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron

Dissertations

Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference.

The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a …


A Secure And Efficient Id-Based Aggregate Signature Scheme For Wireless Sensor Networks, Limin Shen, Jianfeng Ma, Ximeng Liu, Fushan Wei, Meixia Miao Apr 2017

A Secure And Efficient Id-Based Aggregate Signature Scheme For Wireless Sensor Networks, Limin Shen, Jianfeng Ma, Ximeng Liu, Fushan Wei, Meixia Miao

Research Collection School Of Computing and Information Systems

Affording secure and efficient big data aggregation methods is very attractive in the field of wireless sensor networks (WSNs) research. In real settings, the WSNs have been broadly applied, such as target tracking and environment remote monitoring. However, data can be easily compromised by a vast of attacks, such as data interception and data tampering, etc. In this paper, we mainly focus on data integrity protection, give an identity-based aggregate signature (IBAS) scheme with a designated verifier for WSNs. According to the advantage of aggregate signatures, our scheme not only can keep data integrity, but also can reduce bandwidth and …


Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale Mar 2017

Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale

Geography Faculty Publications

No abstract provided.


Hadoop Framework Implementation And Performance Analysis On A Cloud, Göksu Zeki̇ye Özen, Mehmet Tekerek, Rayi̇mbek Sultanov Jan 2017

Hadoop Framework Implementation And Performance Analysis On A Cloud, Göksu Zeki̇ye Özen, Mehmet Tekerek, Rayi̇mbek Sultanov

Turkish Journal of Electrical Engineering and Computer Sciences

The Hadoop framework uses the MapReduce programming paradigm to process big data by distributing data across a cluster and aggregating. MapReduce is one of the methods used to process big data hosted on large clusters. In this method, jobs are processed by dividing into small pieces and distributing over nodes. Parameters such as distributing method over nodes, the number of jobs held in a parallel fashion, and the number of nodes in the cluster affect the execution time of jobs. The aim of this paper is to determine how the numbers of nodes, maps, and reduces affect the performance of …


Semantic Inference On Clinical Documents: Combining Machine Learning Algorithms With An Inference Engine For Effective Clinical Diagnosis And Treatment, Shuo Yang, Ran Wei, Jingzhi Guo, Lida Xu Jan 2017

Semantic Inference On Clinical Documents: Combining Machine Learning Algorithms With An Inference Engine For Effective Clinical Diagnosis And Treatment, Shuo Yang, Ran Wei, Jingzhi Guo, Lida Xu

Information Technology & Decision Sciences Faculty Publications

Clinical practice calls for reliable diagnosis and optimized treatment. However, human errors in health care remain a severe issue even in industrialized countries. The application of clinical decision support systems (CDSS) casts light on this problem. However, given the great improvement in CDSS over the past several years, challenges to their wide-scale application are still present, including: 1) decision making of CDSS is complicated by the complexity of the data regarding human physiology and pathology, which could render the whole process more time-consuming by loading big data related to patients; and 2) information incompatibility among different health information systems (HIS) …


Proactive It Incident Prevention: Using Data Analytics To Reduce Service Interruptions, Mark G. Malley Jan 2017

Proactive It Incident Prevention: Using Data Analytics To Reduce Service Interruptions, Mark G. Malley

Walden Dissertations and Doctoral Studies

The cost of resolving user requests for IT assistance rises annually. Researchers have demonstrated that data warehouse analytic techniques can improve service, but they have not established the benefit of using global organizational data to reduce reported IT incidents. The purpose of this quantitative, quasi-experimental study was to examine the extent to which IT staff use of organizational knowledge generated from data warehouse analytical measures reduces the number of IT incidents over a 30-day period, as reported by global users of IT within an international pharmaceutical company headquartered in Germany. Organizational learning theory was used to approach the theorized relationship …


Special Issue: Neutrosophic Theories Applied In Engineering, Florentin Smarandache, Jun Ye Jan 2017

Special Issue: Neutrosophic Theories Applied In Engineering, Florentin Smarandache, Jun Ye

Branch Mathematics and Statistics Faculty and Staff Publications

Neutrosophic sets and logic are generalizations of fuzzy and intuitionistic fuzzy sets and logic. Neutrosophic sets and logic are gaining significant attention in solving many real life decision making problems that involve uncertainty, impreciseness, vagueness, incompleteness, inconsistent, and indeterminacy. They have been applied in computational intelligence, multiple criteria decision making, image processing, medical diagnoses, etc. This Special Issue presents original research papers that report on state-of-the-art and recent advancements in neutrosophic sets and logic in soft computing, artificial intelligence, big and small data mining, decision making problems, and practical achievements.


The Habits Of Highly Effective Researchers: An Empirical Study, Subhajit Datta, Partha Basuchowdhuri, Surajit Acharya, Subhashis Majumder Jan 2017

The Habits Of Highly Effective Researchers: An Empirical Study, Subhajit Datta, Partha Basuchowdhuri, Surajit Acharya, Subhashis Majumder

Research Collection School Of Computing and Information Systems

Interest in the habits of influential individuals cuts across domains. As researchers, we are intrigued why few attain significant eminence in their fields, whereas many operate in obscurity. An empirical examination of this question has been made possible by the recent availability of large scale publication data. In this paper, we use information from the AMiner Paper Citation and Author Collaboration Networks to discern factors that relate to the impact of influential researchers across five domains in the computing discipline. We propose and apply a novel algorithm to identify influential vertices in co-authorship networks built from total corpora of 1,00,000+papers …