Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 72

Full-Text Articles in Social and Behavioral Sciences

Apex2s: A Two-Layer Machine Learning Model For Discovery Of Host-Pathogen Protein-Protein Interactions On Cloud-Based Multiomics Data, Huaming Chen, Jun Shen, Lei Wang, Chi-Hung Chi Jan 2020

Apex2s: A Two-Layer Machine Learning Model For Discovery Of Host-Pathogen Protein-Protein Interactions On Cloud-Based Multiomics Data, Huaming Chen, Jun Shen, Lei Wang, Chi-Hung Chi

Faculty of Engineering and Information Sciences - Papers: Part B

No abstract provided.


Refinement And Augmentation For Data In Micro Learning Activity With An Evolutionary Rule Generators, Geng Sun, Jiayin Lin, Tingru Cui, Jun Shen, Dongming Xu, Mahesh Kayastha Jan 2020

Refinement And Augmentation For Data In Micro Learning Activity With An Evolutionary Rule Generators, Geng Sun, Jiayin Lin, Tingru Cui, Jun Shen, Dongming Xu, Mahesh Kayastha

Faculty of Engineering and Information Sciences - Papers: Part B

Improving both the quantity and quality of existing data are placed at the center of research for adaptive micro open learning. To cover this research gap, our work targets on the current scarcity of both data and rules that represent open learning activities. An evolutionary rule generator is constructed, which consists of an outer loop and an inner loop. The outer loop runs a genetic algorithm (GA) to produce association rules that can be effective in the micro open learning scenario from a small amount of available data sources; while the inner loop optimizes generated candidates by taking into account …


A New Data Driven Long-Term Solar Yield Analysis Model Of Photovoltaic Power Plants, Biplob Ray, Rakibuzzaman Shah, Md Rabiul Islam, Syed Islam Jan 2020

A New Data Driven Long-Term Solar Yield Analysis Model Of Photovoltaic Power Plants, Biplob Ray, Rakibuzzaman Shah, Md Rabiul Islam, Syed Islam

Faculty of Engineering and Information Sciences - Papers: Part B

Historical data offers a wealth of knowledge to the users. However, often restrictively mammoth that the information cannot be fully extracted, synthesized, and analyzed efficiently for an application such as the forecasting of variable generator outputs. Moreover, the accuracy of the prediction method is vital. Therefore, a trade-off between accuracy and efficacy is required for the data-driven energy forecasting method. It has been identified that the hybrid approach may outperform the individual technique in minimizing the error while challenging to synthesize. A hybrid deep learning-based method is proposed for the output prediction of the solar photovoltaic systems (i.e. proposed PV …


On Masking And Releasing Smart Meter Data At Micro-Level: The Multiplicative Noise Approach, John Brackenbury, P. Y. O'Shaughnessy, Yan-Xia Lin Jan 2020

On Masking And Releasing Smart Meter Data At Micro-Level: The Multiplicative Noise Approach, John Brackenbury, P. Y. O'Shaughnessy, Yan-Xia Lin

Faculty of Engineering and Information Sciences - Papers: Part B

Smart meter electricity data presents privacy risks when malicious agents gain insights of private information, including residents’ lifestyle and daily habits. When allowing access to record-level data, we apply the multiplicative noise method to mask individual smart meter data, which simultaneously aims to minimise disclosure of a dwelling’s consumption signal to any third party and to enable accurate estimation of the sum of a cluster of households. Three testing criteria are introduced to measure the performance of multiplicative noise masking approach relevant to the smart meter data. We propose a novel ‘Twin Uniform’ noise distribution and derive relevant theoretical results. …


A Framework Towards Data Analysis On Host-Pathogen Protein-Protein Interactions, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song Jan 2020

A Framework Towards Data Analysis On Host-Pathogen Protein-Protein Interactions, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song

Faculty of Engineering and Information Sciences - Papers: Part B

With the rapid development of high-throughput technologies, systems biology is now embracing a great opportunity made possible by the increased accumulation of data available online. Biological data analytics is considered as a critical means to contribute to a better understanding on such data through extraction of the latent features, relationships and the associated mechanisms. Therefore, it is important to evaluate how to involve data analytics from both computational and biological perspectives in practice. This paper has investigated interaction relationships in the proteomics area, which provide insights of the critical molecular processes within infection mechanisms. Specifically, we focused on host–pathogen protein–protein …


Data Privacy And System Security For Banking And Financial Services Industry Based On Cloud Computing Infrastructure, Abhishek Mahalle, Jianming Yong, Xiaohui Tao, Jun Shen Jan 2018

Data Privacy And System Security For Banking And Financial Services Industry Based On Cloud Computing Infrastructure, Abhishek Mahalle, Jianming Yong, Xiaohui Tao, Jun Shen

Faculty of Engineering and Information Sciences - Papers: Part B

No abstract provided.


Fast Multi-Resource Allocation With Patterns In Large Scale Cloud Data Center, Jiyuan Shi, Junzhou Luo, Fang Dong, Jiahui Jin, Jun Shen Jan 2018

Fast Multi-Resource Allocation With Patterns In Large Scale Cloud Data Center, Jiyuan Shi, Junzhou Luo, Fang Dong, Jiahui Jin, Jun Shen

Faculty of Engineering and Information Sciences - Papers: Part B

How to achieve fast and efficient resource allocation is an important optimization problem of resource management in cloud data center. On one hand, in order to ensure the user experience of resource requesting, the system has to achieve fast resource allocation to timely process resource requests; on the other hand, in order to ensure the efficiency of resource allocation, how to allocate multi-dimensional resource requests to servers needs to be optimized, such that server's resource utilization can be improved. However, most of existing approaches focus on finding out the mapping of each specific resource request to each specific server. This …


Exploring The Potential Of Big Data On The Health Care Delivery Value Chain (Cdvc): A Preliminary Literature And Research Agenda, William J. Tibben, Samuel Fosso Wamba Jan 2018

Exploring The Potential Of Big Data On The Health Care Delivery Value Chain (Cdvc): A Preliminary Literature And Research Agenda, William J. Tibben, Samuel Fosso Wamba

Faculty of Engineering and Information Sciences - Papers: Part B

Big data analytics (BDA) is emerging as a game changer in healthcare. While the practitioner literature has been speculating on the high potential of BDA in transforming the healthcare sector, few rigorous empirical studies have been conducted by scholars to assess the real potential of BDA. Drawing on the health care delivery value chain (CDVC) and an extensive literature review, this exploratory study aims to discuss current peer-reviewed articles dealing with BDA across the CDVC and discuss future research directions.


Data Fusion For Maas: Opportunities And Challenges, Jianqing Wu, Luping Zhou, Chen Cai, Jun Shen, S K. Lau, Jianming Yong Jan 2018

Data Fusion For Maas: Opportunities And Challenges, Jianqing Wu, Luping Zhou, Chen Cai, Jun Shen, S K. Lau, Jianming Yong

Faculty of Engineering and Information Sciences - Papers: Part B

No abstract provided.


Identity-Based Remote Data Integrity Checking With Perfect Data Privacy Preserving For Cloud Storage, Yong Yu, Man Ho Au, Giuseppe Ateniese, Xinyi Huang, Willy Susilo, Yuanshun Dai, Geyong Min Jan 2017

Identity-Based Remote Data Integrity Checking With Perfect Data Privacy Preserving For Cloud Storage, Yong Yu, Man Ho Au, Giuseppe Ateniese, Xinyi Huang, Willy Susilo, Yuanshun Dai, Geyong Min

Faculty of Engineering and Information Sciences - Papers: Part A

Remote data integrity checking (RDIC) enables a data storage server, say a cloud server, to prove to a verifier that it is actually storing a data owner's data honestly. To date, a number of RDIC protocols have been proposed in the literature. However, most of the constructions suffer from the issue of requiring complex key management. That is, they rely on the expensive public key infrastructure (PKI), which might hinder the deployment of RDIC in practice. In this paper, we propose a new construction of identity-based (ID-based) RDIC protocol by making use of key-homomorphic cryptographic primitive to reduce the system …


Sharing Social Network Data: Differentially Private Estimation Of Exponential Family Random-Graph Models, Vishesh Karwa, Pavel N. Krivitsky, Aleksandra B. Slavkovic Jan 2017

Sharing Social Network Data: Differentially Private Estimation Of Exponential Family Random-Graph Models, Vishesh Karwa, Pavel N. Krivitsky, Aleksandra B. Slavkovic

Faculty of Engineering and Information Sciences - Papers: Part A

Motivated by a real life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyse synthetic graphs to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case-study using a version of the Enron e-mail corpus data set demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy and supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analysing such data. …


Collaborative Data Analytics Towards Prediction On Pathogen-Host Protein-Protein Interactions, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song Jan 2017

Collaborative Data Analytics Towards Prediction On Pathogen-Host Protein-Protein Interactions, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song

Faculty of Engineering and Information Sciences - Papers: Part B

Nowadays more and more data are being sequenced and accumulated in system biology, which bring the data analytics researchers to a brand new era, namely 'big data', to extract the inner relationship and knowledge from the huge amount of data.


Towards Massive Data And Sparse Data In Adaptive Micro Open Educational Resource Recommendation: A Study On Semantic Knowledge Base Construction And Cold Start Problem, Geng Sun, Tingru Cui, Ghassan Beydoun, Shiping Chen, Fang Dong, Dongming Xu, Jun Shen Jan 2017

Towards Massive Data And Sparse Data In Adaptive Micro Open Educational Resource Recommendation: A Study On Semantic Knowledge Base Construction And Cold Start Problem, Geng Sun, Tingru Cui, Ghassan Beydoun, Shiping Chen, Fang Dong, Dongming Xu, Jun Shen

Faculty of Engineering and Information Sciences - Papers: Part B

Micro Learning through open educational resources (OERs) is becoming increasingly popular. However, adaptive micro learning support remains inadequate by current OER platforms. To address this, our smart system, Micro Learning as a Service (MLaaS), aims to deliver personalized OER with micro learning to satisfy their real-time needs.


Cost-Effective Big Data Mining In The Cloud: A Case Study With K-Means, Qiang He, Xiaodong Zhu, Dongwei Li, Shuliang Wang, Jun Shen, Yun Yang Jan 2017

Cost-Effective Big Data Mining In The Cloud: A Case Study With K-Means, Qiang He, Xiaodong Zhu, Dongwei Li, Shuliang Wang, Jun Shen, Yun Yang

Faculty of Engineering and Information Sciences - Papers: Part B

Mining big data often requires tremendous computationalresources. This has become a major obstacle to broad applicationsof big data analytics. Cloud computing allows data scientists to access computationalresources on-demand for building their big data analytics solutions in the cloud.


Metamorphic Testing For Adobe Data Analytics Software, Darryl C. Jarman, Zhiquan Zhou, Tsong Yueh Chen Jan 2017

Metamorphic Testing For Adobe Data Analytics Software, Darryl C. Jarman, Zhiquan Zhou, Tsong Yueh Chen

Faculty of Engineering and Information Sciences - Papers: Part B

It is challenging to test data analytics software because a test oracle might not be available. This study reports our experience of applying metamorphic testing to Adobe's data analytics software that is used for anomaly detection in a set of time series data. We make use of geometric transformations to build metamorphic relations and generate simple time series data as the source test cases. The results of this study show that metamorphic testing is highly effective for both verification and validation purposes. An investigation of the issues detected during metamorphic testing revealed three bugs in the software under test.


Towards Cost Reduction In Cloud-Based Workflow Management Through Data Replication, Fei Xie, Jun Yan, Jun Shen Jan 2017

Towards Cost Reduction In Cloud-Based Workflow Management Through Data Replication, Fei Xie, Jun Yan, Jun Shen

Faculty of Engineering and Information Sciences - Papers: Part B

No abstract provided.


Text Data Mining Of Aged Care Accreditation Reports To Identify Risk Factors In Medication Management In Australian Residential Aged Care Homes, Tao Jiang, Siyu Qian, David M. Hailey, Jun Ma, Ping Yu Jan 2017

Text Data Mining Of Aged Care Accreditation Reports To Identify Risk Factors In Medication Management In Australian Residential Aged Care Homes, Tao Jiang, Siyu Qian, David M. Hailey, Jun Ma, Ping Yu

Faculty of Engineering and Information Sciences - Papers: Part B

This study aimed to identify risk factors in medication management in Australian residential aged care (RAC) homes. Only 18 out of 3,607 RAC homes failed aged care accreditation standard in medication management between 7th March 2011 and 25th March 2015. Text data mining methods were used to analyse the reasons for failure. This led to the identification of 21 risk indicators for an RAC home to fail in medication management. These indicators were further grouped into ten themes. They are overall medication management, medication assessment, ordering, dispensing, storage, stock and disposal, administration, incident report, monitoring, staff and resident satisfaction. The …


Predictive Inference For Big, Spatial, Non-Gaussian Data: Modis Cloud Data And Its Change-Of-Support, Aritra Sengupta, Noel A. Cressie, Brian H. Kahn, Richard Frey Jan 2016

Predictive Inference For Big, Spatial, Non-Gaussian Data: Modis Cloud Data And Its Change-Of-Support, Aritra Sengupta, Noel A. Cressie, Brian H. Kahn, Richard Frey

Faculty of Engineering and Information Sciences - Papers: Part A

Remote sensing of the earth with satellites yields datasets that can be massive in size, nonstationary in space, and non-Gaussian in distribution. To overcome computational challenges, we use the reduced-rank spatial random effects (SRE) model in a statistical analysis of cloud-mask data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on board NASA's Terra satellite. Parameterisations of cloud processes are the biggest source of uncertainty and sensitivity in different climate models' future projections of Earth's climate. An accurate quantification of the spatial distribution of clouds, as well as a rigorously estimated pixel-scale clear-sky-probability process, is needed to establish reliable estimates …


Public Cloud Data Auditing With Practical Key Update And Zero Knowledge Privacy, Yong Yu, Yannan Li, Man Ho Au, Willy Susilo, Kim-Kwang Raymond Choo, Xinpeng Zhang Jan 2016

Public Cloud Data Auditing With Practical Key Update And Zero Knowledge Privacy, Yong Yu, Yannan Li, Man Ho Au, Willy Susilo, Kim-Kwang Raymond Choo, Xinpeng Zhang

Faculty of Engineering and Information Sciences - Papers: Part A

Data integrity is extremely important for cloud based storage services, where cloud users no longer have physical possession of their outsourced files. A number of data auditing mechanisms have been proposed to solve this problem. However, how to update a cloud user's private auditing key (as well as the authenticators those keys are associated with) without the user's re-possession of the data remains an open problem. In this paper, we propose a key-updating and authenticator-evolving mechanism with zero-knowledge privacy of the stored files for secure cloud data auditing, which incorporates zero knowledge proof systems, proxy re-signatures and homomorphic linear authenticators. …


Two-Factor Data Security Protection Mechanism For Cloud Storage System, Joseph K. Liu, Kaitai Liang, Willy Susilo, Jianghua Liu, Yang Xiang Jan 2016

Two-Factor Data Security Protection Mechanism For Cloud Storage System, Joseph K. Liu, Kaitai Liang, Willy Susilo, Jianghua Liu, Yang Xiang

Faculty of Engineering and Information Sciences - Papers: Part A

In this paper, we propose a two-factor data security protection mechanism with factor revocability for cloud storage system. Our system allows a sender to send an encrypted message to a receiver through a cloud storage server. The sender only needs to know the identity of the receiver but no other information (such as its public key or its certificate). The receiver needs to possess two things in order to decrypt the ciphertext. The first thing is his/her secret key stored in the computer. The second thing is a unique personal security device which connects to the computer. It is impossible …


A Data-Driven Predictive Model For Residential Mobility In Australia - A Generalised Linear Mixed Model For Repeated Measured Binary Data, Mohammad-Reza Namazi-Rad, Payam Mokhtarian, Nagesh Shukla, Albert Munoz Jan 2016

A Data-Driven Predictive Model For Residential Mobility In Australia - A Generalised Linear Mixed Model For Repeated Measured Binary Data, Mohammad-Reza Namazi-Rad, Payam Mokhtarian, Nagesh Shukla, Albert Munoz

Faculty of Engineering and Information Sciences - Papers: Part A

Household relocation modelling is an integral part of the Government planning process as residential movements influence the demand for community facilities and services. This study will address the problem of modelling residential relocation choice by estimating a logit-link class model. The proposed model estimates the probability of an event which triggers household relocation. The attributes considered in this study are: requirement for bedrooms, employment status, income status, household characteristics, and tenure (i.e. duration living at the current location). Accurate prediction of household relocations for population units should rely on real world observations. In this study, a longitudinal survey data gathered …


Towards Data Analytics Of Pathogen-Host Protein-Protein Interaction: A Survey, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song Jan 2016

Towards Data Analytics Of Pathogen-Host Protein-Protein Interaction: A Survey, Huaming Chen, Jun Shen, Lei Wang, Jiangning Song

Faculty of Engineering and Information Sciences - Papers: Part A

"Big Data" is immersed in many disciplines, including computer vision, economics, online resources, bioinformatics and so on. Increasing researches are conducted on data mining and machine learning for uncovering and predicting related domain knowledge. Protein-protein interaction is one of the main areas in bioinformatics as it is the basis of the biological functions. However, most pathogen-host protein-protein interactions, which would be able to reveal much more infectious mechanisms between pathogen and host, are still up for further investigation. Considering a decent feature representation of pathogen-host protein-protein interactions (PHPPI), currently there is not a well structured database for research purposes, not …


A Bottom-Up Data Collection Methodology For Characterising The Residential Building Stock In Australia, Clayton Mcdowell, Georgios Kokogiannakis, Paul Cooper, Michael P. Tibbs Jan 2016

A Bottom-Up Data Collection Methodology For Characterising The Residential Building Stock In Australia, Clayton Mcdowell, Georgios Kokogiannakis, Paul Cooper, Michael P. Tibbs

Faculty of Engineering and Information Sciences - Papers: Part B

In Australia the majority of the current residential building stock has been constructed with little regard to energy consumption or thermal comfort. With only 1-2 % of Australia's building stock being replaced each year retrofitting solutions are necessary if residential energy consumption is to be reduced. Australia's records of the characteristics of its current building stock are minimal and outdated and thus these need to be renewed to enable the evaluation of retrofit upgrade strategies. Thus this paper presents a methodology and results of a bottom-up data collection tool that captured building and occupant characteristics from 200 elderly low income …


Arcgis V.10 Landslide Susceptibility Data Mining Add-In Tool Integrating Data Mining And Gis Techniques To Model Landslide Susceptibility, Darshika Palamakumbure, David Stirling, Phillip N. Flentje, Robin N. Chowdhury Jan 2015

Arcgis V.10 Landslide Susceptibility Data Mining Add-In Tool Integrating Data Mining And Gis Techniques To Model Landslide Susceptibility, Darshika Palamakumbure, David Stirling, Phillip N. Flentje, Robin N. Chowdhury

Faculty of Engineering and Information Sciences - Papers: Part A

Landslide susceptibility modeling is an essential early step towards managing landslide risk. A minimum of $4.8 million is lost due to landslide related damages every year in Illawara region of Australia. At present, Data mining and knowledge discovery techniques are becoming popular in building landslide susceptibility models due to their enhanced predictive performances. Until now, the lack of tools to undertake data extraction and making the predictions have limited the applicability of this novel technique in landslide model building. This paper discusses the development of the LSDM (Landslide Susceptibility Data Mining) toolbar which was designed to utilize machine learning techniques …


Vivambc: Estimating Viral Sequence Variation In Complex Populations From Illumina Deep-Sequencing Data Using Model-Based Clustering, Bie M. P Verbist, Lieven Clement, Joke Reumers, Kim Thys, Alexander Vapirev, Willem Talloen, Yves Wetzels, Joris Meys, Jeroen Aerssens, Luc Bijnens, Olivier Thas Jan 2015

Vivambc: Estimating Viral Sequence Variation In Complex Populations From Illumina Deep-Sequencing Data Using Model-Based Clustering, Bie M. P Verbist, Lieven Clement, Joke Reumers, Kim Thys, Alexander Vapirev, Willem Talloen, Yves Wetzels, Joris Meys, Jeroen Aerssens, Luc Bijnens, Olivier Thas

Faculty of Engineering and Information Sciences - Papers: Part A

Background: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second …


Recent Advances In Security And Privacy In Big Data, Yong Yu, Yi Mu, Giuseppe Ateniese Jan 2015

Recent Advances In Security And Privacy In Big Data, Yong Yu, Yi Mu, Giuseppe Ateniese

Faculty of Engineering and Information Sciences - Papers: Part A

Big data has become an important topic in science, engineering, medicine, healthcare, finance, business and ultimately society itself. Big data refers to the massive amount of digital information stored or transmitted in computer systems. Approximately, 2.5 quintillion bytes of data are created every day. Almost 90% of data in the world today are created in the last two years alone. Security and privacy issues becomes more critical due to large volumes and variety, due to data hosted in large-scale cloud infrastructures, diversity of data sources and formats, streaming nature of data acquisition and high volume inter-cloud migration. In large-scale cloud …


Searchable Atribute-Based Mechanism With Efficiient Data Sharing For Secure Cloud Storage, Kaitai Liang, Willy Susilo Jan 2015

Searchable Atribute-Based Mechanism With Efficiient Data Sharing For Secure Cloud Storage, Kaitai Liang, Willy Susilo

Faculty of Engineering and Information Sciences - Papers: Part A

To date, the growth of electronic personal data leads to a trend that data owners prefer to remotely outsource their data to clouds for the enjoyment of the high-quality retrieval and storage service without worrying the burden of local data management and maintenance. However, secure share and search for the outsourced data is a formidable task, which may easily incur the leakage of sensitive personal information. Efficient data sharing and searching with security is of critical importance. This paper, for the first time, proposes a searchable attribute-based proxy re-encryption system. When compared to existing systems only supporting either searchable attribute-based …


Tour-Based Travel Mode Choice Estimation Based On Data Mining And Fuzzy Techniques, Nagesh Shukla, Jun Ma, Rohan Wickramasuriya, Nam N. Huynh, Pascal Perez Jan 2015

Tour-Based Travel Mode Choice Estimation Based On Data Mining And Fuzzy Techniques, Nagesh Shukla, Jun Ma, Rohan Wickramasuriya, Nam N. Huynh, Pascal Perez

Faculty of Engineering and Information Sciences - Papers: Part A

No abstract provided.


Identity-Based Secure Distributed Data Storage Schemes, Jinguang Han, Willy Susilo, Yi Mu Jan 2014

Identity-Based Secure Distributed Data Storage Schemes, Jinguang Han, Willy Susilo, Yi Mu

Faculty of Engineering and Information Sciences - Papers: Part A

Secure distributed data storage can shift the burden of maintaining a large number of files from the owner to proxy servers. Proxy servers can convert encrypted files for the owner to encrypted files for the receiver without the necessity of knowing the content of the original files. In practice, the original files will be removed by the owner for the sake of space efficiency. Hence, the issues on confidentiality and integrity of the outsourced data must be addressed carefully. In this paper, we propose two identity-based secure distributed data storage (IBSDDS) schemes. Our schemes can capture the following properties: (1) …


Data Scientists As Game Changers In Big Data Environments, Akemi T. Chatfield, Vivian N. Shlemoon, Wilbur Redublado, Faizur Rahman Jan 2014

Data Scientists As Game Changers In Big Data Environments, Akemi T. Chatfield, Vivian N. Shlemoon, Wilbur Redublado, Faizur Rahman

Faculty of Engineering and Information Sciences - Papers: Part A

The potential power of big data to generate insights and create new forms of value in the ways which transform organizations and society has been observed by big data-driven organizations and big data experts. Despite the recent sensational declaration of a data scientist as "the sexiest job of the 21st century", however, there has been the lack of rigorous studies of what a data scientist is, and what job skill requirements this hottest job title may need. In order to address this gap, we systematically examine relevant source material to extract definitions and categorize them with a classification scheme developed …