Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

2016

Big data

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 30 of 64

Full-Text Articles in Entire DC Network

Big Data, Technical Communication, And The Smart City, Jordan Frith Dec 2016

Big Data, Technical Communication, And The Smart City, Jordan Frith

Publications

Big data is one of the most hyped buzzwords in both academia and industry. This article makes an early contribution to research on big data by situating data theoretically as a historical object and arguing that much of the discourse about the supposed transparency and objectivity of big data ignores the crucial roles of interpretation and communication. To set forth that analysis, this article engages with recent discussion of big data and “smart” cities to show the communicative practices operating behind the scenes of large data projects and relate those practices to the profession of technical communication.


Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan Dec 2016

Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan

Open Access Dissertations

Exploratory Data Analysis (EDA) is a crucial step in the life cycle of data analysis. Exploring data with effective methods would reveal main characteristics of data and provides guidance for model building. The goal of this thesis is to develop effective and efficient methods for data exploration in the regression setting.

First, we propose to use optimal group transformations as a general approach for exploring the relationship between predictor variables X and the response Y. This approach can be considered an automatic procedure to identify the best characteristic of P( Y|X) under which the relationship …


Big Data And Organizational Impacts: A Study Of Big Data Ventures, Taha Havakhor Dec 2016

Big Data And Organizational Impacts: A Study Of Big Data Ventures, Taha Havakhor

Graduate Theses and Dissertations

New information technology (IT) ventures are at the forefront of developing IT innovations. In spite of their importance in the advancement of IT and the unique risks of survival that distinguishes them from established firms, the organizational literature on IT has mostly overlooked new IT ventures. Specifically, Big Data industry is a context where new IT ventures actively change the landscape of IT innovations. However, less is known about the factors influencing the economic success of Big Data ventures (BDVs), as well as the established firms that invest in them. To shed light on these factors, three essays are designed …


Data Mining In Social Networks, Usha Singh Dec 2016

Data Mining In Social Networks, Usha Singh

Culminating Projects in Information Assurance

The objective of the study is to examine the idea of Big Data and its applications in data mining. The data in the universe is expanding step by step every year and turns into large data. These significant data can be determined to utilize a few data mining undertakings. In short, Big Data can be called as an “asset” and data mining is a technique that is employed to give useful results. This paper implements an HACE algorithm that analysis the structure of big data and presents an efficient data mining technique. This framework model incorporates a mixture of information …


Improving Big Data Processing Time, Ruqia Maihveen Lnu Dec 2016

Improving Big Data Processing Time, Ruqia Maihveen Lnu

Culminating Projects in Mechanical and Manufacturing Engineering

The process of storing and processing massive amounts of data (big data) in a traditional database is expensive and consumes a lot of time to obtain desired results. This project has been implemented to solve these problems faced by an organization, with the implementation of Hadoop framework that stores huge data sets on distributed clusters and performs parallel data processing to achieve results quickly. It uses commodity hardware to store the data making it cost effective and provides data security by replicating the data sets. The main goals of the project were to improve the performance of processing huge data …


Decision Modeling For Housing And Community Development: #11;A Methodology For Evidence-Based Urban And Regional Planning, Michael P. Johnson Jr. Nov 2016

Decision Modeling For Housing And Community Development: #11;A Methodology For Evidence-Based Urban And Regional Planning, Michael P. Johnson Jr.

Michael P. Johnson

Urban community development corporations and other local institutions routinely face challenging problems in housing and economic development that require substantial expertise in data analytics and decision modeling. While CDC employees have significant experience in diverse application areas, they often face limitations in acquiring, analyzing and sharing data, and using these data to solve decision problems whose solutions can generate novel strategies to address localized needs. Recent research, inspired by local responses to
the housing foreclosure crisis, and developed in cooperation with Boston‐area CDCs, has resulted in a collection of analytic methods and applications that can assist CDCs and similar organizations …


The Organization Of The Future And The Marketing Function: Marketers' Competencies In The Era Of Information Technology, Mario V. González Fuentes Nov 2016

The Organization Of The Future And The Marketing Function: Marketers' Competencies In The Era Of Information Technology, Mario V. González Fuentes

School of Business Faculty Research

The past two decades—and the technology advancements experienced throughout them—have left marketers with a new context that has provided new business opportunities. This new context has prompted a change in the focus of the marketing function and demanded a shift in marketing imperatives and competencies. This chapter provides a comprehensive review of the technological changes experienced by the marketing function in a company, as documented by both scholars and practitioners. It also provides a thorough discussion of the ongoing academic debate regarding the new set of technical skills that have defined employability in the marketing circles for the past couple …


Parallelization Of Push-Based System For Molecular Simulation Data Analysis With Gpu, Iliiazbek Akhmedov Oct 2016

Parallelization Of Push-Based System For Molecular Simulation Data Analysis With Gpu, Iliiazbek Akhmedov

USF Tampa Graduate Theses and Dissertations

Modern simulation systems generate big amount of data, which consequently has to be analyzed in a timely fashion. Traditional database management systems follow principle of pulling the needed data, processing it, and then returning the results. This approach is then optimized by means of caching, storing in different structures, or doing some sacrifices on precision of the results to make it faster. When it comes to the point of doing various queries that require analysis of the whole data, this design has the following disadvantages: considerable overhead on traditional disk random I/O framework while reading from the simulation output files …


Automatic Scaling Hadoop In The Cloud For Efficient Process Of Big Geospatial Data, Zhenlong Li, Chaoweikai Yang, Kai Liu, Fei Hu, Baoxuan Jin Sep 2016

Automatic Scaling Hadoop In The Cloud For Efficient Process Of Big Geospatial Data, Zhenlong Li, Chaoweikai Yang, Kai Liu, Fei Hu, Baoxuan Jin

Faculty Publications

Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the …


Large-Scale Computational Screening And Machine Learning Approaches To Drug Discovery, Bryce K. Allen Sep 2016

Large-Scale Computational Screening And Machine Learning Approaches To Drug Discovery, Bryce K. Allen

Open Access Dissertations

Biological information continues to grow exponentially fueled by massive data generation projects such as the Human Genome Project, The Cancer Genome Atlas (TCGA), and the Library of Integrated Network-based Cellular Signatures (LINCS). Unprecedented amounts and varieties of data (big data) have the potential to bring enormous scientific advances. Such data-driven research relies on advanced computational approaches for data integration and analysis. While bioinformatics encompasses many fields, the focus of my research has been to predict small molecule chemicals that interact with protein targets of interest and could, ultimately, become therapeutically useful drugs. Drug resistance in newly diagnosed tumors is often …


Using Statistical Analysis To Improve Data Partitioning In Algorithms For Data Parallel Processing Implementation, Manuel E. Hidalgo Murillo Sep 2016

Using Statistical Analysis To Improve Data Partitioning In Algorithms For Data Parallel Processing Implementation, Manuel E. Hidalgo Murillo

Theses

In multiprocessor systems, data parallelism is the execution of the same task on data distributed across multiple processors. It involves splitting the data set into smaller data partitions or batches. The process to split the data among the different processors is call “Data Partitioning” and it is an important factor of efficiency for data parallel processing implementation. Data partitioning influences the workload in each processing unit and the network traffic between processes. A poor partition quality can lead to serious performance problems. This research presents a data partitioning method that can be used to improve the performance of data parallel …


Data Mining Twitter For Cancer, Diabetes, And Asthma Insights, Kimberly Chulis Aug 2016

Data Mining Twitter For Cancer, Diabetes, And Asthma Insights, Kimberly Chulis

Open Access Dissertations

Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As …


Plasma Volume Hematocrit (Pvh): “Big Data” Applied To Physiology Enabled By A New Algorithm, Paul Washington Dent Aug 2016

Plasma Volume Hematocrit (Pvh): “Big Data” Applied To Physiology Enabled By A New Algorithm, Paul Washington Dent

Dissertations - ALL

This work describes the ongoing analysis of blood noninvasively in vivo along with the in vitro validation of the algorithm. The blood is taken as two components, red blood cells and plasma, both of which cause elastic emission (from Mie and Rayleigh scattering) and inelastic emission (from fluorescence and Raman emission). The algorithm describes the linear dependence of the volume fractions of both red blood cells and plasma with both the elastic and inelastic emissions where the two equations are independent. These equations are used to calculate the Hematocrit which is defined as the volume fraction of red blood cells …


Big Data And Predictive Analytics For Supply Chain And Organizational Performance, A Gunasekaran, T Papadopoulos, R Dubey, Sf Wamba, Sj Childe, B Hazen, S Akter Jul 2016

Big Data And Predictive Analytics For Supply Chain And Organizational Performance, A Gunasekaran, T Papadopoulos, R Dubey, Sf Wamba, Sj Childe, B Hazen, S Akter

Plymouth Business School

Scholars acknowledge the importance of big data and predictive analytics (BDPA) in achieving business value and firm performance. However, the impact of BDPA assimilation on supply chain (SCP) and organizational performance (OP) has not been thoroughly investigated. To address this gap, this paper draws on resource-based view. It conceptualizes assimilation as a three stage process (acceptance, routinization, and assimilation) and identifies the influence of resources (connectivity and information sharing) under the mediation effect of top management commitment on big data assimilation (capability), SCP and OP. The findings suggest that connectivity and information sharing under the mediation effect of top management …


The Future Is In The Numbers: The Power Of Predictive Analysis In The Biomedical Educational Environment, Charles A. Gullo Phd Jul 2016

The Future Is In The Numbers: The Power Of Predictive Analysis In The Biomedical Educational Environment, Charles A. Gullo Phd

Charles Gullo

Biomedical programs have a potential treasure trove of data they can mine to assist admissions committees in identification of students who are likely to do well and help educational committees in the identification of students who are likely to do poorly on standardized national exams and who may need remediation. In this article, we provide a step-by-step approach that schools can utilize to generate data that are useful when predicting the future performance of current students in any given program. We discuss the use of linear regression analysis as the means of generating that data and highlight some of the …


The Future Is In The Numbers: The Power Of Predictive Analysis In The Biomedical Educational Environment, Charles A. Gullo Phd Jun 2016

The Future Is In The Numbers: The Power Of Predictive Analysis In The Biomedical Educational Environment, Charles A. Gullo Phd

Biochemistry and Microbiology

Biomedical programs have a potential treasure trove of data they can mine to assist admissions committees in identification of students who are likely to do well and help educational committees in the identification of students who are likely to do poorly on standardized national exams and who may need remediation. In this article, we provide a step-by-step approach that schools can utilize to generate data that are useful when predicting the future performance of current students in any given program. We discuss the use of linear regression analysis as the means of generating that data and highlight some of the …


Fast Computation On Processing Data Warehousing Queries On Gpu Devices, Sam Cyrus Jun 2016

Fast Computation On Processing Data Warehousing Queries On Gpu Devices, Sam Cyrus

USF Tampa Graduate Theses and Dissertations

Current database management systems use Graphic Processing Units (GPUs) as dedicated accelerators to process each individual query, which results in underutilization of GPU. When a single query data warehousing workload was run on an open source GPU query engine, the utilization of main GPU resources was found to be less than 25%. The low utilization then leads to low system throughput. To resolve this problem, this paper suggests a way to transfer all of the desired data into the global memory of GPU and keep it until all queries are executed as one batch. The PCIe transfer time from CPU …


Deduplication On Encrypted Big Data In Cloud, Zheng Yan, Wenxiu Ding, Xixun Yu, Haiqi Zhu, Deng, Robert H. Jun 2016

Deduplication On Encrypted Big Data In Cloud, Zheng Yan, Wenxiu Ding, Xixun Yu, Haiqi Zhu, Deng, Robert H.

Research Collection School Of Computing and Information Systems

Cloud computing offers a new way of service provision by re-arranging various resources over the Internet. The most important and popular cloud service is data storage. In order to preserve the privacy of data holders, data are often stored in cloud in an encrypted form. However, encrypted data introduce new challenges for cloud data deduplication, which becomes crucial for big data storage and processing in cloud. Traditional deduplication schemes cannot work on encrypted data. Existing solutions of encrypted data deduplication suffer from security weakness. They cannot flexibly support data access control and revocation. Therefore, few of them can be readily …


How Long Will This Live? Discovering The Lifespans Of Software Engineering Ideas, Subhajit Datta, Santonu Sarkar, A. S. M Sajeev Jun 2016

How Long Will This Live? Discovering The Lifespans Of Software Engineering Ideas, Subhajit Datta, Santonu Sarkar, A. S. M Sajeev

Research Collection School Of Computing and Information Systems

We all want to be associated with long lasting ideas; as originators, or at least, expositors. For a tyro researcher or a seasoned veteran, knowing how long an idea will remain interesting in the community is critical in choosing and pursuing research threads. In the physical sciences, the notion of half-life is often evoked to quantify decaying intensity. In this paper, we study a corpus of 19,000+ papers written by 21,000+ authors across 16 software engineering publication venues from 1975 to 2010, to empirically determine the half-life of software engineering research topics. In the absence of any consistent and well-accepted …


The Law And Policy Of People Analytics, Matthew T. Bodie, Miriam A. Cherry, Marcia L. Mcormick, Jintong Tang May 2016

The Law And Policy Of People Analytics, Matthew T. Bodie, Miriam A. Cherry, Marcia L. Mcormick, Jintong Tang

AI-DR Collection

Leading technology companies such as Google and Facebook have been experimenting with people analytics, a new data-driven approach to human resources management. People analytics is just one example of the new phenomenon of “big data,” in which analyses of huge sets of quantitative information are used to guide decisions. Applying big data to the workplace could lead to more effective outcomes, as in the Moneyball example, where the Oakland Athletics baseball franchise used statistics to assemble a winning team on a shoestring budget. Data may help firms determine which candidates to hire, how to help workers improve job performance, and …


Factors Affecting Big Data Technology Adoption, Nayem Rahman May 2016

Factors Affecting Big Data Technology Adoption, Nayem Rahman

Student Research Symposium

With the advancement of computer science, hardware and software engineering, and computing power, and later with the advent of the internet, social networking tools and other sources such as sensors data growth has increased significantly. These data are called big data which are mostly unstructured, generated in large volumes, data need to be captured in near real-time. To handle big data a completely new set of tools and technologies are being emerged. I have studied big data literature to identify the factors that might influence big data adoption. I was able to list quite a few factors or attributes that …


Phoenix And Hive As Alternatives To Rdbms, Diana Ornelas May 2016

Phoenix And Hive As Alternatives To Rdbms, Diana Ornelas

Computer Science Graduate Projects and Theses

There are many small and medium businesses with mid sized data sets that would like to implement low budget data management systems that will perform well with their existing budget and scale as more data is accumulated. One solution is to choose one of the many high-performing and cost effective Big Data management systems such as Hive and Phoenix. Another option is to use parallel database management systems which are high-performance alternatives but are expensive and can be complicated to implement. The purpose of this project was to compare Hive and Phoenix with MySQL to see if either are viable …


Mobile Big Data Analytics Using Deep Learning And Apache Spark, Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Zhu Han May 2016

Mobile Big Data Analytics Using Deep Learning And Apache Spark, Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Zhu Han

Research Collection School Of Computing and Information Systems

The proliferation of mobile devices, such as smartphones and Internet of Things gadgets, has resulted in the recent mobile big data era. Collecting mobile big data is unprofitable unless suitable analytics and learning methods are utilized to extract meaningful information and hidden patterns from data. This article presents an overview and brief tutorial on deep learning in mobile big data analytics and discusses a scalable learning framework over Apache Spark. Specifically, distributed deep learning is executed as an iterative MapReduce computing on many Spark workers. Each Spark worker learns a partial deep model on a partition of the overall mobile, …


Gis And Location-Based Crime Risk Analysis: Summer Internship With Location, Inc., Zhilan Deng May 2016

Gis And Location-Based Crime Risk Analysis: Summer Internship With Location, Inc., Zhilan Deng

Sustainability and Social Justice

My internship with Location, Inc. took place from May 20th, 2015 to August 24th, 2015. I worked with one direct supervisor, Jonathan Glick, as well as the CEO of Location, Inc. Andrew Schiller. I have four main responsibilities during the summer: 1) collecting U.S. crime point data, 2) geocoding and processing crime point data; 3) collecting and processing Canada Crime statistics and demographic data; 4) updating school performance data and U. S. crime statistics. This report includes the introduction of Location, Inc., where I did my internship, the details of my responsibilities in Location, Inc., and my assessment to Location, …


Cognitive Big Data Analytics And Persuasive Social Influence Diffusion, Eman Ahmed Ghanim Abukhousa May 2016

Cognitive Big Data Analytics And Persuasive Social Influence Diffusion, Eman Ahmed Ghanim Abukhousa

Dissertations

Current demands in local and global economies and the pursuit of competitiveness are calling for data-driven strategies. Data-driven solutions analyze trends, make predictions about future events, and prescribe what to do next in an actionable manner. However, cognitive and behavioral data are distinguished by their multiplicity and rapid changes to meet evolving and dynamic goals of individuals. This research work is concerned with the utility of analytical solutions to synthesize and influence cognitive and behavioral adoption. We propose a multidimensional data model to identify and extract cognitive indicators for analysis and persuasive interventions. The process starts by discovering behavioral features …


Why We Need Multiple Archives, Michael L. Nelson, Herbert Van De Sompel Apr 2016

Why We Need Multiple Archives, Michael L. Nelson, Herbert Van De Sompel

Computer Science Presentations

PDF of a powerpoint presentation from the Coalition for Networked Information (CNI) Spring 2016 Membership Meeting in San Antonio, Texas, April 3, 2016. Also available on Slideshare.


Big Data, Patents, And The Future Of Medicine, W. Nicholson Price Ii Apr 2016

Big Data, Patents, And The Future Of Medicine, W. Nicholson Price Ii

Articles

Big data has tremendous potential to improve health care. Unfortunately, intellectual property law isn’t ready to support that leap. In the next wave of data- driven medicine, black-box medicine, researchers use sophisticated algorithms to examine huge troves of health data, finding complex, implicit relationships and making individualized assessments for patients. Black-box medicine offers potentially immense benefits, but also requires substantial high investment. Firms must develop new datasets, models, and validations, which are all nonrivalrous information goods with significant spillovers, requiring incentives for welfare-optimizing investment. Current intellectual property law fails to provide adequate incentives for black- box medicine. The Supreme Court …


Precision Agriculture And Big Farm Data: Producer Adoption And Opinions, Michael H. Castle, Bradley D. Lubben, Joe D. Luck Apr 2016

Precision Agriculture And Big Farm Data: Producer Adoption And Opinions, Michael H. Castle, Bradley D. Lubben, Joe D. Luck

UCARE Research Products

Using scarce resources to feed an ever-increasing world population in the climate of increasingly-volatile commodity prices has charged producers with the task of becoming more efficient. The answer to these problems may lie within technological advancements, through the usage of precision agriculture and the “big” data these technologies are capable of producing. These technologies are expected to have an enormous impact that could effectively allow farmers to produce more with less. As such, research regarding producer adoption and opinions of the technology are of great relevance. Furthermore, there is great debate over the data produced by these technologies; with the …


Storage Management Of Data-Intensive Computing Systems, Yiqi Xu Mar 2016

Storage Management Of Data-Intensive Computing Systems, Yiqi Xu

FIU Electronic Theses and Dissertations

Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and storage management is critical to application performance in such data-intensive computing systems. However, existing resource management frameworks in these systems lack the support for storage management, which causes unpredictable performance degradations when applications are under I/O contention. Storage management of data-intensive systems is a challenging problem because I/O resources cannot be easily partitioned and distributed storage systems require scalable management. This dissertation presents the solutions to address these challenges for typical data-intensive systems including high-performance computing (HPC) systems and big-data …


Securing Big Data Provenance For Auditors The Big Data Provenance Black Box As Reliable Evidence, Deniz Appelbaum Mar 2016

Securing Big Data Provenance For Auditors The Big Data Provenance Black Box As Reliable Evidence, Deniz Appelbaum

Department of Accounting and Finance Faculty Scholarship and Creative Works

The purpose of this article is to highlight a main issue regarding reliable audit evidence derived from Big Data—that of secure data provenance. Traditionally, audit evidence external to the client has been regarded as superior to other forms of evidence. However, regarding external “messy” Big Data sources that may be material to aspects of the audit, these sources may lack provenance and verifiability. That is, the origins of the data may be unclear and its log files incomplete. According to the standards, such evidence should be considered as less reliable for audit evidence. External auditors, as outsiders of the client, …