Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Artificial Intelligence and Robotics (3)
- Computer Engineering (3)
- Engineering (3)
- Mathematics (3)
- Data Science (2)
-
- Information Security (2)
- Logic and Foundations (2)
- Systems Architecture (2)
- Theory and Algorithms (2)
- Astrophysics and Astronomy (1)
- Databases and Information Systems (1)
- Electrical and Computer Engineering (1)
- Numerical Analysis and Scientific Computing (1)
- Operations Research, Systems Engineering and Industrial Engineering (1)
- Other Astrophysics and Astronomy (1)
- Software Engineering (1)
- Systems Science (1)
- Institution
-
- Portland State University (2)
- Singapore Management University (2)
- Boise State University (1)
- China Simulation Federation (1)
- James Madison University (1)
-
- New Jersey Institute of Technology (1)
- Oberlin (1)
- TÜBİTAK (1)
- University at Albany, State University of New York (1)
- University of Al-Qadisiyah (1)
- University of Tennessee, Knoxville (1)
- University of Texas at Tyler (1)
- West Virginia University (1)
- Western Michigan University (1)
- Zayed University (1)
- Publication
-
- Dissertations (2)
- Research Collection School Of Computing and Information Systems (2)
- Systems Science Faculty Publications and Presentations (2)
- Al-Qadisiyah Journal of Pure Science (1)
- All Works (1)
-
- Boise State University Theses and Dissertations (1)
- Computer Science Faculty Publications and Presentations (1)
- Doctoral Dissertations (1)
- Graduate Theses, Dissertations, and Problem Reports (1)
- Honors Papers (1)
- Journal of System Simulation (1)
- Legacy Theses & Dissertations (2009 - 2024) (1)
- Senior Honors Projects, 2020-current (1)
- Turkish Journal of Electrical Engineering and Computer Sciences (1)
- Publication Type
Articles 1 - 17 of 17
Full-Text Articles in Computer Sciences
Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi
Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi
Dissertations
Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …
Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang
Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang
Legacy Theses & Dissertations (2009 - 2024)
The rate at which data is generated in modern applications has created an unprecedented demand for novel methods to effectively and efficiently extract insightful patterns. Methods aware of known domain-specific structure in the data tend to be advantageous. In particular, a joint temporal and networked view of observations offers a holistic lens to many real-world systems. Example domains abound: activity of social network users, gene interactions over time, a temporal load of infrastructure networks, and others. Existing analysis and mining approaches for such data exhibit limited quality and scalability due to their sensitivity to noise, missing observations, and the need …
Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou
Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou
Dissertations
In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.
The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …
Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng
Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng
Doctoral Dissertations
Mobile location data are ubiquitous in the digital world. People intentionally and unintentionally generate numerous location data when connecting to cellular networks or sharing posts on social networks. As mobile devices normally choose to communicate with nearby cell towers outdoor, it is reasonable to infer human locations based on cell tower coordinates. Many social networking platforms, such as Twitter, allow users to geo-tag their posts optionally, publishing personal locations to friends or everyone. These location data are particularly useful for understanding mobile usage behaviors and human mobility patterns. Meanwhile, the public expresses great concern about the privacy and security of …
Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi
Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi
Boise State University Theses and Dissertations
Wikipedia is a free and open-collaboration based online encyclopedia. The website has millions of pages that are maintained by thousands of volunteer editors. It is part of Wikipedia’s fundamental principles that pages are written with a neutral point of view and are maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such information.
This thesis addresses for the first time the …
Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo
Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo
Research Collection School Of Computing and Information Systems
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic …
Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick
Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick
Systems Science Faculty Publications and Presentations
This paper integrates the structures considered in Reconstructability Analysis (RA) and those considered in Bayesian Networks (BN) into a joint lattice of probabilistic graphical models. This integration and associated lattice visualizations are done in this paper for four variables, but the approach can easily be expanded to more variables. The work builds on the RA work of Klir (1985), Krippendorff (1986), and Zwick (2001), and the BN work of Pearl (1985, 1987, 1988, 2000), Verma (1990), Heckerman (1994), Chickering (1995), Andersson (1997), and others. The RA four variable lattice and the BN four variable lattice partially overlap: there are ten …
Reconstructability Analysis & Its Occam Implementation, Martin Zwick
Reconstructability Analysis & Its Occam Implementation, Martin Zwick
Systems Science Faculty Publications and Presentations
This talk will describe Reconstructability Analysis (RA), a probabilistic graphical modeling methodology deriving from the 1960s work of Ross Ashby and developed in the systems community in the 1980s and afterwards. RA, based on information theory and graph theory, resembles and partially overlaps Bayesian networks (BN) and log-linear techniques, but also has some unique capabilities. (A paper explaining the relationship between RA and BN will be given in this special session.) RA is designed for exploratory modeling although it can also be used for confirmatory hypothesis testing. In RA modeling, one either predicts some DV from a set of IVs …
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Research Collection School Of Computing and Information Systems
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …
Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo
Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo
Senior Honors Projects, 2020-current
Advancements in the modern age have brought many conveniences, one of those being credit cards. Providing an individual the ability to hold their entire purchasing power in the form of pocket-sized plastic cards have made credit cards the preferred method to complete financial transactions. However, these systems are not infallible and may provide criminals and other bad actors the opportunity to abuse them. Financial institutions and their customers lose billions of dollars every year to credit card fraud. To combat this issue, fraud detection systems are deployed to discover fraudulent activity after they have occurred. Such systems rely on advanced …
A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag
A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag
Al-Qadisiyah Journal of Pure Science
The web today has become an archive of information in any structure such content, sound, video, designs, and multimedia, with the progression of time overall web, the world wide web is now crowded with different data making extraction of virtual data burdensome process, web utilizes various information mining strategies to mine helpful information from page substance and web hyperlink. The fundamental employments of web content mining are to gather, sort out, classify, providing the best data accessible on the web for the client who needs to get it. The WCM tools are needful to examining some HTML reports, content and …
Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi
Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi
Computer Science Faculty Publications and Presentations
Big Data courses in which students are asked to carry out Big Data projects are becoming more frequent as a part of University Engineering curriculum. In these courses, instructors and students must face a series of special characteristics, difficulties and challenges that it is important to know about beforehand, so the lecturer can better plan the subject and manage the teaching methods in order to prevent students' academic dropout and low performance. The goal of this research is to approach this problem by sharing the lessons learned in the process of teaching e-learning courses where students are required to develop …
A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe
A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe
All Works
Consumer satisfaction is an important part for any business as it has been shown to be a major factor for consumer loyalty. Identifying satisfaction in products is also important as it allows businesses alter production plans based on the level of consumer satisfaction for a product. With consumer satisfaction data being very volatile for some products due to a short requirement period for such products, current consumer satisfaction must be identified within a shorter period before the data becomes obsolete. The fast fashion industry, which is part of the fashion industry, is adopted as a case study in this research. …
Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun
Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun
Journal of System Simulation
Abstract: Because the traditional methods can hardly analyze the complex combustion characteristics of cement kiln mixed with domestic refuse, a data mining technology is introduced. A domestic cement plant is selected as the object, and its operating data and relevant parameters are collected. The influence coefficient of each parameter on coal consumption and NOx emission is analyzed by using Stability Selection algorithm. The mathematical model of coal consumption and NOx emission is established with Random Forest algorithm, and the key optimization parameters and their optimal values are obtained by K-means clustering algorithm. The result shows that this method …
Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh
Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh
Honors Papers
This paper explores and seeks to improve the ways in which Stack Overflow question posts can elicit answers. Using statistical data analysis approaches and reviews of existing literature, we pin- point three key factors that are found in many previously success- ful/answerable questions. We then present a prototypical sidebar for the ask page that leverages these factors to dynamically (1) evaluate the quality of questions in construction (2) display answer previews of relevant questions and (3) scaffold the identified factors to subsequent askers during their question development processes.
Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant
Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant
Turkish Journal of Electrical Engineering and Computer Sciences
Recently, there has been a growing interest in association rule mining (ARM) in various fields. However, standard ARM algorithms fail to discover rules for multitask problems as they do not consider task-oriented investigation and, therefore, they ignore the correlation among the tasks. Considering this situation, this paper proposes a novel algorithm, named multitask association rule miner (MTARM), that tends to jointly discover rules by considering multiple tasks. This paper also introduces two novel concepts: single-task rule and multiple-task rule. In the first phase of the proposed approach, highly frequent local rules (single-task rules) are explored for each task separately and …
Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine
Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine
Graduate Theses, Dissertations, and Problem Reports
Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best …