Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

2017

Discipline
Institution
Publication
Publication Type

Articles 1 - 25 of 25

Full-Text Articles in Computer Sciences

On Analyzing Job Hop Behavior And Talent Flow Networks, Richard J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo Nov 2017

On Analyzing Job Hop Behavior And Talent Flow Networks, Richard J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo

Research Collection School Of Computing and Information Systems

Analyzing job hopping behavior is important for theunderstanding of job preference and career progression of working individuals.When analyzed at the workforce population level, job hop analysis helps to gaininsights of talent flow and organization competition. Traditionally, surveysare conducted on job seekers and employers to study job behavior. While surveysare good at getting direct user input to specially designed questions, they areoften not scalable and timely enough to cope with fast-changing job landscape.In this paper, we present a data science approach to analyze job hops performedby about 490,000 working professionals located in a city using their publiclyshared profiles. We develop several …


Machine Learning In Xenon1t Analysis, Dillon A. Davis, Rafael F. Lang, Darryl P. Masson Aug 2017

Machine Learning In Xenon1t Analysis, Dillon A. Davis, Rafael F. Lang, Darryl P. Masson

The Summer Undergraduate Research Fellowship (SURF) Symposium

In process of analyzing large amounts of quantitative data, it can be quite time consuming and challenging to uncover populations of interest contained amongst the background data. Therefore, the ability to partially automate the process while gaining additional insight into the interdependencies of key parameters via machine learning seems quite appealing. As of now, the primary means of reviewing the data is by manually plotting data in different parameter spaces to recognize key features, which is slow and error prone. In this experiment, many well-known machine learning algorithms were applied to a dataset to attempt to semi-automatically identify known populations, …


Analyzing The Relationship Between Human Behavior And Indoor Air Quality, Beiyu Lin, Yibo Huangfu, Nathan Lima, Bertram Jobson, Max Kirk, Patrick O’Keeffe, Shelley N. Pressley, Von Walden, Brian Lamb, Diane J. Cook Aug 2017

Analyzing The Relationship Between Human Behavior And Indoor Air Quality, Beiyu Lin, Yibo Huangfu, Nathan Lima, Bertram Jobson, Max Kirk, Patrick O’Keeffe, Shelley N. Pressley, Von Walden, Brian Lamb, Diane J. Cook

Computer Science Faculty Publications and Presentations

In the coming decades, as we experience global population growth and global aging issues, there will be corresponding concerns about the quality of the air we experience inside and outside buildings. Because we can anticipate that there will be behavioral changes that accompany population growth and aging, we examine the relationship between home occupant behavior and indoor air quality. To do this, we collect both sensor-based behavior data and chemical indoor air quality measurements in smart home environments. We introduce a novel machine learning-based approach to quantify the correlation between smart home features and chemical measurements of air quality, and …


Distributed Knowledge Discovery For Diverse Data, Hossein Hamooni Jul 2017

Distributed Knowledge Discovery For Diverse Data, Hossein Hamooni

Computer Science ETDs

In the era of new technologies, computer scientists deal with massive data of size hundreds of terabytes. Smart cities, social networks, health care systems, large sensor networks, etc. are constantly generating new data. It is non-trivial to extract knowledge from big datasets because traditional data mining algorithms run impractically on such big datasets. However, distributed systems have come to aid this problem while introducing new challenges in designing scalable algorithms. The transition from traditional algorithms to the ones that can be run on a distributed platform should be done carefully. Researchers should design the modern distributed algorithms based on the …


Os2: Oblivious Similarity Based Searching For Encrypted Data Outsourced To An Untrusted Domain, Zeeshan Pervez, Mahmood Ahmad, Asad Masood Khattak, Naeem Ramzan, Wajahat Ali Khan Jul 2017

Os2: Oblivious Similarity Based Searching For Encrypted Data Outsourced To An Untrusted Domain, Zeeshan Pervez, Mahmood Ahmad, Asad Masood Khattak, Naeem Ramzan, Wajahat Ali Khan

All Works

© 2017 Pervez et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by …


Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh Jul 2017

Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh

Research Collection School Of Computing and Information Systems

Wikis are widely used collaborative environments as sources of information and knowledge. The facilitate students to engage in collaboration and share information among members and enable collaborative learning. In particular, Wikis play an important role in capstone projects. Wikis aid in various project related tasks and aid to organize information and share. Mining project Wikis is critical to understand the students learning and latest trends in industry. Mining Wikis is useful to educationists and academicians for decision-making about how to modify the educational environment to improve student's learning. The main challenge is that the content or data in project Wikis …


Mining Diverse Consumer Preferences For Bundling And Recommendation, Ha Loc Do Jul 2017

Mining Diverse Consumer Preferences For Bundling And Recommendation, Ha Loc Do

Dissertations and Theses Collection

That consumers share similar tastes on some products does not guarantee their agreement on other products. Therefore, both similarity and dierence should be taken into account for a more rounded view on consumer preferences. This manuscript focuses on mining this diversity of consumer preferences from two perspectives, namely 1) between consumers and 2) between products. Diversity of preferences between consumers is studied in the context of recommendation systems. In some preference models, measuring similarities in preferences between two consumers plays the key role. These approaches assume two consumers would share certain degree of similarity on any products, ignoring the fact …


Multi-Agent Simulation Of The Battle Of Ankara, 1402, Ruili Tang Jun 2017

Multi-Agent Simulation Of The Battle Of Ankara, 1402, Ruili Tang

Honors Theses

In 1402, at the north of city Ankara, Turkey, a battle between Ottoman Empire and Tamerlane Empire decided the fate of Europe and Asia. Although historians largely agree on the general battle procedure, the details are still open to dispute. Several factors may have contributed to the Ottoman defeat, such as the overwhelming size of Tamerlanes army, poisoned water, the tactical formations of the military units, and betrayal by the Tartar cavalry in the Ottoman left wing. The approach is divided into two stages: the simulation stage, which provides data to analyze the complex interactions of autonomous agents, and the …


Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad May 2017

Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad

Electronic Thesis and Dissertation Repository

Despite a large volume of research on the prognosis, diagnosis and overall burden of multimorbidity, very little is known about socio-demographic characteristics of multimorbid patients. This thesis aims to analyze the socio-demographic characteristics of patients with multiple chronic conditions (multimorbidity), focusing on patient groups sharing the same combination of diseases. Several methods were explored to analyze the co-occurrence of multiple chronic diseases as well as the associations between socio-demographics and chronic conditions. These methods include disease pair distributions over gender, age groups and income level quintiles, Multimorbidity Coefficients for measuring the concurrence of disease pairs and triples, and k-modes clustering …


Aspect Discovery From Product Reviews, Ying Ding May 2017

Aspect Discovery From Product Reviews, Ying Ding

Dissertations and Theses Collection

With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques to efficiently process them. One of the most …


Dpweka: Achieving Differential Privacy In Weka, Srinidhi Katla May 2017

Dpweka: Achieving Differential Privacy In Weka, Srinidhi Katla

Graduate Theses and Dissertations

Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high long-term and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive attributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection …


High Utility Itemsets Identification In Big Data, Ashish Tamrakar May 2017

High Utility Itemsets Identification In Big Data, Ashish Tamrakar

UNLV Theses, Dissertations, Professional Papers, and Capstones

High utility itemset mining is an important data mining problem which considers profit factors besides quantity from the transactional database. It helps find the most valuable products/items that are difficult to track using only the frequent data mining set. An item that has a high-profit value might be rare in the transactional database despite its tremendous importance. While there are many existing algorithms which generate comparatively large candidate sets while finding high utility itemsets, the major focus is to reduce the computational time significantly with the introduction of pruning strategies. Another aspect of high utility itemset mining is to compute …


Who Will Leave The Company?: A Large-Scale Industry Study Of Developer Turnover By Mining Monthly Work Report, Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, Shanping Li May 2017

Who Will Leave The Company?: A Large-Scale Industry Study Of Developer Turnover By Mining Monthly Work Report, Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, Shanping Li

Research Collection School Of Computing and Information Systems

Software developer turnover has become a big challenge for information technology (IT) companies. The departure of key software developers might cause big loss to an IT company since they also depart with important business knowledge and critical technical skills. Understanding developer turnover is very important for IT companies to retain talented developers and reduce the loss due to developers' departure. Previous studies mainly perform qualitative observations or simple statistical analysis of developers' activity data to understand developer turnover. In this paper, we investigate whether we can predict the turnover of software developers in non-open source companies by automatically analyzing monthly …


Peeking Into The Other Half Of The Glass : Handling Polarization In Recommender Systems., Mahsa Badami May 2017

Peeking Into The Other Half Of The Glass : Handling Polarization In Recommender Systems., Mahsa Badami

Electronic Theses and Dissertations

This dissertation is about filtering and discovering information online while using recommender systems. In the first part of our research, we study the phenomenon of polarization and its impact on filtering and discovering information. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. We study polarization within the context of the users' interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate …


Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki Apr 2017

Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki

Dissertations

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.

This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques …


Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron Apr 2017

Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron

Dissertations

Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference.

The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a …


Statistical Learning Methods For Mining Marketing And Biological Data, Jie Zhang Apr 2017

Statistical Learning Methods For Mining Marketing And Biological Data, Jie Zhang

Dissertations

Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets.

This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal …


Data Mining By Grid Computing In The Search For Extrasolar Planets, Oisin Creaner [Thesis] Jan 2017

Data Mining By Grid Computing In The Search For Extrasolar Planets, Oisin Creaner [Thesis]

Doctoral

A system is presented here to provide improved precision in ensemble differential photometry. This is achieved by using the power of grid computing to analyse astronomical catalogues. This produces new catalogues of optimised pointings for each star, which maximise the number and quality of reference stars available. Astronomical phenomena such as exoplanet transits and small-scale structure within quasars may be observed by means of millimagnitude photometric variability on the timescale of minutes to hours. Because of atmospheric distortion, ground-based observations of these phenomena require the use of differential photometry whereby the target is compared with one or more reference stars. …


Dtreesim: A New Approach To Compute Decision Tree Similarity Using Re-Mining, Gözde Bakirli, Derya Bi̇rant Jan 2017

Dtreesim: A New Approach To Compute Decision Tree Similarity Using Re-Mining, Gözde Bakirli, Derya Bi̇rant

Turkish Journal of Electrical Engineering and Computer Sciences

A number of recent studies have used a decision tree approach as a data mining technique; some of them needed to evaluate the similarity of decision trees to compare the knowledge reflected in different trees or datasets. There have been multiple perspectives and multiple calculation techniques to measure the similarity of two decision trees, such as using a simple formula or an entropy measure. The main objective of this study is to compute the similarity of decision trees using data mining techniques. This study proposes DTreeSim, a new approach that applies multiple data mining techniques (classification, sequential pattern mining, and …


Discovering The Relationships Between Yarn And Fabric Properties Using Association Rule Mining, Peli̇n Yildirim, Derya Bi̇rant, Tuba Alpyildiz Jan 2017

Discovering The Relationships Between Yarn And Fabric Properties Using Association Rule Mining, Peli̇n Yildirim, Derya Bi̇rant, Tuba Alpyildiz

Turkish Journal of Electrical Engineering and Computer Sciences

Investigation of the effects of yarn parameters on fabric quality and finding important parameters to achieve desired fabric properties are important issues for the design process with the aim to meet the needs of the textile industry and the consumer for complex and specific requirements of functionality. Despite many statistical and mathematical studies that predict and reveal specific properties of utilized yarn and fabric materials, a number of challenges continue to exist when evaluated in many perspectives, such as discovering complex relationships among material properties in data. Data mining plays an important role in discovering hidden patterns from fabric data …


So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis Jan 2017

So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis

UNL Libraries: Faculty Publications

This article takes as its case study the challenge of data sets for text mining, sources that offer tremendous promise for digital humanities (DH) methodology but present specific challenges for humanities scholars. These text sets raise a range of issues: What skills do you train humanists to have? What is the library’s role in enabling and supporting use of those materials? How do you allocate staff? Who oversees sustainability and data management? By addressing these questions through a specific use case scenario, this article shows how these questions are central to mapping out future directions for a range of library …


Mining Data On Traumatic Brain Injury With Reconstructability Analysis, Martin Zwick, Nancy Carney, Rosemary Nettleton Jan 2017

Mining Data On Traumatic Brain Injury With Reconstructability Analysis, Martin Zwick, Nancy Carney, Rosemary Nettleton

Systems Science Faculty Publications and Presentations

This paper reports the analysis of data on traumatic brain injury using a probabilistic graphical modeling technique known as reconstructability analysis (RA). The analysis shows the flexibility, power, and comprehensibility of RA modeling, which is well-suited for mining biomedical data. One finding of the analysis is that education is a confounding variable for the Digit Symbol Test in discriminating the severity of concussion; another - and anomalous - finding is that previous head injury predicts improved performance on the Reaction Time test. This analysis was exploratory, so its findings require follow-on confirmatory tests of their generalizability.


Siam Data Mining "Brings It" To Annual Meeting, Jeremy Kepner, Sanjukta Bhowmick, Aydın Buluç, Rajmonda Caceres, R. Jordan Crouser, Vijay Gadepally, Ben Miller, Jennifer Webster Jan 2017

Siam Data Mining "Brings It" To Annual Meeting, Jeremy Kepner, Sanjukta Bhowmick, Aydın Buluç, Rajmonda Caceres, R. Jordan Crouser, Vijay Gadepally, Ben Miller, Jennifer Webster

Computer Science: Faculty Publications

The Data Mining Activity Group is one of SIAM's most vibrant and dynamic activity groups. To better share our enthusiasm for data mining with the broader SIAM community, our activity group organized six minisymposia at the 2016 Annual Meeting. These minisymposia included 48 talks organized by 11 SIAM members on - GraphBLAS (Aydın Buluç) - Algorithms and statistical methods for noisy network analysis (Sanjukta Bhowmick & Ben Miller) - Inferring networks from non-network data (Rajmonda Caceres, Ivan Brugere & Tanya Y. Berger-Wolf) - Visual analytics (Jordan Crouser) - Mining in graph data (Jennifer Webster, Mahantesh Halappanavar & Emilie Hogan) - …


An Ant Colony Optimization Algorithm-Based Classification For The Diagnosis Of Primary Headaches Using A Website Questionnaire Expert System, Ufuk Çeli̇k, Ni̇lüfer Yurtay Jan 2017

An Ant Colony Optimization Algorithm-Based Classification For The Diagnosis Of Primary Headaches Using A Website Questionnaire Expert System, Ufuk Çeli̇k, Ni̇lüfer Yurtay

Turkish Journal of Electrical Engineering and Computer Sciences

The purpose of this research was to evaluate the classification accuracy of the ant colony optimization algorithm for the diagnosis of primary headaches using a website questionnaire expert system that was completed by patients. This cross-sectional study was conducted in 850 headache patients who randomly applied to hospital from three cities in Turkey with the assistance of a neurologist in each city. The patients filled in a detailed web-based headache questionnaire. Finally, neurologists' diagnosis results were compared with the classification results of an ant colony optimization-based classification algorithm. The ant colony algorithm for diagnosis classified patients with 96.9412% overall accuracy. …


Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi Jan 2017

Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi

Turkish Journal of Electrical Engineering and Computer Sciences

Phishing websites are fake ones that are developed by ill-intentioned people to imitate real and legal websites. Most of these types of web pages have high visual similarities to hustle the victims. The victims of phishing websites may give their bank accounts, passwords, credit card numbers, and other important information to the designers and owners of phishing websites. The increasing number of phishing websites has become a great challenge in e-business in general and in electronic banking specifically. In the present study, a novel framework based on model-based clustering is introduced to fight against phishing websites. First, a model is …