Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (18)
- Mathematics (5)
- Computer Engineering (4)
- Data Science (4)
- Engineering (4)
-
- Artificial Intelligence and Robotics (3)
- Information Security (2)
- Logic and Foundations (2)
- Multivariate Analysis (2)
- Statistics and Probability (2)
- Systems Architecture (2)
- Theory and Algorithms (2)
- Analysis (1)
- Applied Statistics (1)
- Astrophysics and Astronomy (1)
- Biostatistics (1)
- Business (1)
- Business Administration, Management, and Operations (1)
- Categorical Data Analysis (1)
- Clinical Trials (1)
- Data Storage Systems (1)
- Databases and Information Systems (1)
- Diseases (1)
- Electrical and Computer Engineering (1)
- Environmental Sciences (1)
- Human Resources Management (1)
- Management Sciences and Quantitative Methods (1)
- Medical Specialties (1)
- Medicine and Health Sciences (1)
- Institution
-
- Portland State University (2)
- Singapore Management University (2)
- Boise State University (1)
- China Simulation Federation (1)
- City University of New York (CUNY) (1)
-
- James Madison University (1)
- Louisiana Tech University (1)
- New Jersey Institute of Technology (1)
- Oberlin (1)
- Southern Methodist University (1)
- TÜBİTAK (1)
- University at Albany, State University of New York (1)
- University of Al-Qadisiyah (1)
- University of Louisville (1)
- University of Nebraska - Lincoln (1)
- University of New Mexico (1)
- University of Tennessee, Knoxville (1)
- University of Texas at Tyler (1)
- West Virginia University (1)
- Western Michigan University (1)
- Zayed University (1)
- Publication
-
- Dissertations (2)
- Research Collection School Of Computing and Information Systems (2)
- Systems Science Faculty Publications and Presentations (2)
- Al-Qadisiyah Journal of Pure Science (1)
- All Works (1)
-
- Boise State University Theses and Dissertations (1)
- Branch Mathematics and Statistics Faculty and Staff Publications (1)
- Computer Science Faculty Publications and Presentations (1)
- Doctoral Dissertations (1)
- Electronic Theses and Dissertations (1)
- Graduate Theses, Dissertations, and Problem Reports (1)
- Honors Papers (1)
- Journal of System Simulation (1)
- Legacy Theses & Dissertations (2009 - 2024) (1)
- Mathematics Senior Capstone Papers (1)
- Publications and Research (1)
- SMU Data Science Review (1)
- School of Natural Resources: Faculty Publications (1)
- Senior Honors Projects, 2020-current (1)
- Turkish Journal of Electrical Engineering and Computer Sciences (1)
- Publication Type
Articles 1 - 23 of 23
Full-Text Articles in Physical Sciences and Mathematics
Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker
Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker
Publications and Research
Ocean observing systems are well-recognized as platforms for long-term monitoring of near-shore and remote locations in the global ocean. High-quality observatory data is freely available and accessible to all members of the global oceanographic community—a democratization of data that is particularly useful for early career scientists (ECS), enabling ECS to conduct research independent of traditional funding models or access to laboratory and field equipment. The concurrent collection of distinct data types with relevance for oceanographic disciplines including physics, chemistry, biology, and geology yields a unique incubator for cutting-edge, timely, interdisciplinary research. These data are both an opportunity and an incentive …
Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang
Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang
Legacy Theses & Dissertations (2009 - 2024)
The rate at which data is generated in modern applications has created an unprecedented demand for novel methods to effectively and efficiently extract insightful patterns. Methods aware of known domain-specific structure in the data tend to be advantageous. In particular, a joint temporal and networked view of observations offers a holistic lens to many real-world systems. Example domains abound: activity of social network users, gene interactions over time, a temporal load of infrastructure networks, and others. Existing analysis and mining approaches for such data exhibit limited quality and scalability due to their sensitivity to noise, missing observations, and the need …
Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi
Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi
Dissertations
Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …
Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou
Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou
Dissertations
In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.
The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …
Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum
Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum
SMU Data Science Review
Talent is the most important asset for every organization's success. While attrition (or churn) and turnover can refer to both employees and customers, this paper will focus on employee attrition only. Many organizations accept attrition as an inevitable cost of doing business and do nothing to adopt or implement mitigating strategies to combat it. World class companies on the other hand take deliberate measures to understand, control and mitigate attrition (turnover) at every stage. Unmitigated attrition can have a devastating effect on an organization's bottom line and market value. In addition, the “invisible" costs of low employee morale, reduced employee …
Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng
Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng
Doctoral Dissertations
Mobile location data are ubiquitous in the digital world. People intentionally and unintentionally generate numerous location data when connecting to cellular networks or sharing posts on social networks. As mobile devices normally choose to communicate with nearby cell towers outdoor, it is reasonable to infer human locations based on cell tower coordinates. Many social networking platforms, such as Twitter, allow users to geo-tag their posts optionally, publishing personal locations to friends or everyone. These location data are particularly useful for understanding mobile usage behaviors and human mobility patterns. Meanwhile, the public expresses great concern about the privacy and security of …
Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo
Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo
Research Collection School Of Computing and Information Systems
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic …
Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi
Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi
Boise State University Theses and Dissertations
Wikipedia is a free and open-collaboration based online encyclopedia. The website has millions of pages that are maintained by thousands of volunteer editors. It is part of Wikipedia’s fundamental principles that pages are written with a neutral point of view and are maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such information.
This thesis addresses for the first time the …
Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick
Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick
Systems Science Faculty Publications and Presentations
This paper integrates the structures considered in Reconstructability Analysis (RA) and those considered in Bayesian Networks (BN) into a joint lattice of probabilistic graphical models. This integration and associated lattice visualizations are done in this paper for four variables, but the approach can easily be expanded to more variables. The work builds on the RA work of Klir (1985), Krippendorff (1986), and Zwick (2001), and the BN work of Pearl (1985, 1987, 1988, 2000), Verma (1990), Heckerman (1994), Chickering (1995), Andersson (1997), and others. The RA four variable lattice and the BN four variable lattice partially overlap: there are ten …
Reconstructability Analysis & Its Occam Implementation, Martin Zwick
Reconstructability Analysis & Its Occam Implementation, Martin Zwick
Systems Science Faculty Publications and Presentations
This talk will describe Reconstructability Analysis (RA), a probabilistic graphical modeling methodology deriving from the 1960s work of Ross Ashby and developed in the systems community in the 1980s and afterwards. RA, based on information theory and graph theory, resembles and partially overlaps Bayesian networks (BN) and log-linear techniques, but also has some unique capabilities. (A paper explaining the relationship between RA and BN will be given in this special session.) RA is designed for exploratory modeling although it can also be used for confirmatory hypothesis testing. In RA modeling, one either predicts some DV from a set of IVs …
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Research Collection School Of Computing and Information Systems
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …
Forecasting Daily Stock Market Return With Multiple Linear Regression, Shengxuan Chen
Forecasting Daily Stock Market Return With Multiple Linear Regression, Shengxuan Chen
Mathematics Senior Capstone Papers
The purpose of this project is to use data mining and big data analytic techniques to forecast daily stock market return with multiple linear regression. Using mathematical and statistical models to analyze the stock market is important and challenging. The accuracy of the final results relies on the quality of the input data and the validity of the methodology. In the report, within 5-year period, the data regarding eleven financial and economical features are observed and recorded on each trading day. After preprocessing the raw data with statistical method, we use the multiple linear regression to predict the daily return …
Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo
Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo
Senior Honors Projects, 2020-current
Advancements in the modern age have brought many conveniences, one of those being credit cards. Providing an individual the ability to hold their entire purchasing power in the form of pocket-sized plastic cards have made credit cards the preferred method to complete financial transactions. However, these systems are not infallible and may provide criminals and other bad actors the opportunity to abuse them. Financial institutions and their customers lose billions of dollars every year to credit card fraud. To combat this issue, fraud detection systems are deployed to discover fraudulent activity after they have occurred. Such systems rely on advanced …
Novel Inference Methods For Generalized Linear Models Using Shrinkage Priors And Data Augmentation., Arinjita Bhattacharyya
Novel Inference Methods For Generalized Linear Models Using Shrinkage Priors And Data Augmentation., Arinjita Bhattacharyya
Electronic Theses and Dissertations
Generalized linear models have broad applications in biostatistics and sociology. In a regression setup, the main target is to find a relevant set of predictors out of a large collection of covariates. Sparsity is the assumption that only a few of these covariates in a regression setup have a meaningful correlation with an outcome variate of interest. Sparsity is incorporated by regularizing the irrelevant slopes towards zero without changing the relevant predictors and keeping the resulting inferences intact. Frequentist variable selection and sparsity are addressed by popular techniques like Lasso, Elastic Net. Bayesian penalized regression can tackle the curse of …
A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag
A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag
Al-Qadisiyah Journal of Pure Science
The web today has become an archive of information in any structure such content, sound, video, designs, and multimedia, with the progression of time overall web, the world wide web is now crowded with different data making extraction of virtual data burdensome process, web utilizes various information mining strategies to mine helpful information from page substance and web hyperlink. The fundamental employments of web content mining are to gather, sort out, classify, providing the best data accessible on the web for the client who needs to get it. The WCM tools are needful to examining some HTML reports, content and …
Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi
Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi
Computer Science Faculty Publications and Presentations
Big Data courses in which students are asked to carry out Big Data projects are becoming more frequent as a part of University Engineering curriculum. In these courses, instructors and students must face a series of special characteristics, difficulties and challenges that it is important to know about beforehand, so the lecturer can better plan the subject and manage the teaching methods in order to prevent students' academic dropout and low performance. The goal of this research is to approach this problem by sharing the lessons learned in the process of teaching e-learning courses where students are required to develop …
A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe
A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe
All Works
Consumer satisfaction is an important part for any business as it has been shown to be a major factor for consumer loyalty. Identifying satisfaction in products is also important as it allows businesses alter production plans based on the level of consumer satisfaction for a product. With consumer satisfaction data being very volatile for some products due to a short requirement period for such products, current consumer satisfaction must be identified within a shorter period before the data becomes obsolete. The fast fashion industry, which is part of the fashion industry, is adopted as a case study in this research. …
Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun
Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun
Journal of System Simulation
Abstract: Because the traditional methods can hardly analyze the complex combustion characteristics of cement kiln mixed with domestic refuse, a data mining technology is introduced. A domestic cement plant is selected as the object, and its operating data and relevant parameters are collected. The influence coefficient of each parameter on coal consumption and NOx emission is analyzed by using Stability Selection algorithm. The mathematical model of coal consumption and NOx emission is established with Random Forest algorithm, and the key optimization parameters and their optimal values are obtained by K-means clustering algorithm. The result shows that this method …
A Review Of Drought Monitoring Using Remote Sensing And Data Mining Methods, R. Inoubli, A.B. Abbes, I.R. Farah, V. Singh, T. Tadesse, A.Z. Abiy
A Review Of Drought Monitoring Using Remote Sensing And Data Mining Methods, R. Inoubli, A.B. Abbes, I.R. Farah, V. Singh, T. Tadesse, A.Z. Abiy
School of Natural Resources: Faculty Publications
No abstract provided.
Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh
Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh
Honors Papers
This paper explores and seeks to improve the ways in which Stack Overflow question posts can elicit answers. Using statistical data analysis approaches and reviews of existing literature, we pin- point three key factors that are found in many previously success- ful/answerable questions. We then present a prototypical sidebar for the ask page that leverages these factors to dynamically (1) evaluate the quality of questions in construction (2) display answer previews of relevant questions and (3) scaffold the identified factors to subsequent askers during their question development processes.
A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication, Florentin Smarandache, Sudan Jha, Gyanendra Prasad Joshi, Lewis Nkenyereya, Dae Wan Kim
A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication, Florentin Smarandache, Sudan Jha, Gyanendra Prasad Joshi, Lewis Nkenyereya, Dae Wan Kim
Branch Mathematics and Statistics Faculty and Staff Publications
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters. A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets. This paper focuses on cluster analysis based on neutrosophic set implication, i.e., a k-means algorithm with a threshold-based clustering technique. This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm. To evaluate the validity of the proposed method, several validity measures and validity indices are applied to the Iris dataset (from the University of California, Irvine, Machine …
Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant
Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant
Turkish Journal of Electrical Engineering and Computer Sciences
Recently, there has been a growing interest in association rule mining (ARM) in various fields. However, standard ARM algorithms fail to discover rules for multitask problems as they do not consider task-oriented investigation and, therefore, they ignore the correlation among the tasks. Considering this situation, this paper proposes a novel algorithm, named multitask association rule miner (MTARM), that tends to jointly discover rules by considering multiple tasks. This paper also introduces two novel concepts: single-task rule and multiple-task rule. In the first phase of the proposed approach, highly frequent local rules (single-task rules) are explored for each task separately and …
Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine
Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine
Graduate Theses, Dissertations, and Problem Reports
Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best …