Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Databases and Information Systems (20)
- Engineering (15)
- Artificial Intelligence and Robotics (12)
- Data Science (9)
- Life Sciences (9)
-
- Social and Behavioral Sciences (9)
- Statistics and Probability (9)
- Computer Engineering (8)
- Numerical Analysis and Scientific Computing (8)
- Other Computer Sciences (8)
- Business (7)
- Information Security (7)
- Bioinformatics (6)
- Mathematics (4)
- Applied Statistics (3)
- Biostatistics (3)
- Categorical Data Analysis (3)
- Computational Biology (3)
- Genetics and Genomics (3)
- Genomics (3)
- Library and Information Science (3)
- Management Information Systems (3)
- Arts and Humanities (2)
- Communication (2)
- Computational Neuroscience (2)
- Neuroscience and Neurobiology (2)
- Other Computer Engineering (2)
- Social Media (2)
- Institution
-
- New Jersey Institute of Technology (15)
- University at Albany, State University of New York (9)
- Louisiana State University (7)
- Louisiana Tech University (6)
- Singapore Management University (6)
-
- University of Texas at Arlington (6)
- Air Force Institute of Technology (5)
- University of Louisville (4)
- Brigham Young University (3)
- Clemson University (3)
- Nova Southeastern University (3)
- Old Dominion University (3)
- University of Kentucky (3)
- University of Nevada, Las Vegas (3)
- University of Tennessee, Knoxville (3)
- Wayne State University (3)
- Western University (3)
- Boise State University (2)
- California State University, San Bernardino (2)
- Edith Cowan University (2)
- Minnesota State University, Mankato (2)
- Portland State University (2)
- Purdue University (2)
- University of Central Florida (2)
- University of South Florida (2)
- University of Texas at El Paso (2)
- City University of New York (CUNY) (1)
- Dartmouth College (1)
- Eastern Washington University (1)
- Georgia Southern University (1)
- Publication Year
- Publication
-
- Dissertations (13)
- Doctoral Dissertations (10)
- Legacy Theses & Dissertations (2009 - 2024) (9)
- Theses and Dissertations (9)
- Electronic Theses and Dissertations (7)
-
- Computer Science and Engineering Theses (4)
- Dissertations and Theses Collection (Open Access) (4)
- LSU Doctoral Dissertations (4)
- Theses (4)
- CCE Theses and Dissertations (3)
- Electronic Thesis and Dissertation Repository (3)
- LSU Master's Theses (3)
- Theses and Dissertations--Computer Science (3)
- UNLV Theses, Dissertations, Professional Papers, and Capstones (3)
- All Dissertations (2)
- All Graduate Theses, Dissertations, and Other Capstone Projects (2)
- Boise State University Theses and Dissertations (2)
- Computational Modeling & Simulation Engineering Theses & Dissertations (2)
- Computer Science and Engineering Dissertations (2)
- Dissertations and Theses (2)
- Dissertations and Theses Collection (2)
- Open Access Theses & Dissertations (2)
- Theses : Honours (2)
- Theses Digitization Project (2)
- USF Tampa Graduate Theses and Dissertations (2)
- Wayne State University Dissertations (2)
- All Theses (1)
- Computer Science Dissertations (1)
- Computer Science ETDs (1)
- Computer Science Senior Theses (1)
Articles 61 - 90 of 125
Full-Text Articles in Computer Sciences
Spatio-Temporal Patterns Of Gps Trajectories Using Association Rule Mining, Vivek Kumar Sharma
Spatio-Temporal Patterns Of Gps Trajectories Using Association Rule Mining, Vivek Kumar Sharma
Computer Science and Engineering Theses
The availability of location-tracking devices such as GPS, Cellular Networks and other devices provides the facility to log a person or device locations automatically. This creates spatio-temporal datasets of user's movement with features like latitude,longitude of a particular location on a specific day and time. With the help of these features different patterns of user movement can be collected,queues and analyzed.In this research work, we are focused on user's movement patterns and frequent movements of users on a particular place,day or time interval. To achieve this we used Association Rule mining concept based on Apriori algorithm to find interesting movement …
Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali
Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali
Information Technology Master Theses
In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for …
Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya
Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya
Open Access Theses
As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …
Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri
Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri
Wayne State University Dissertations
Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …
Speaker Identification In Live Events Using Twitter, Minumol Joseph
Speaker Identification In Live Events Using Twitter, Minumol Joseph
Computer Science and Engineering Theses
The prevalence of social media has given rise to a new research area. Data from social media is now being used in research to gather deeper insights into many different fields. Twitter is one of the most popular microblogging websites. Users express themselves on a variety of different topics in 140 characters or less. Oftentimes, users “tweet” about issues and subjects that are gaining in popularity, a great example being politics. Any development in politics frequently results in a tweet of some form. The research which follows focuses on identifying a speaker’s name at a live event by collecting and …
Clustering-Based Personalization, Seyed Nima Mirbakhsh
Clustering-Based Personalization, Seyed Nima Mirbakhsh
Electronic Thesis and Dissertation Repository
Recommendation systems have been the most emerging technology in the last decade as one of the key parts in e-commerce ecosystem. Businesses offer a wide variety of items and contents through different channels such as Internet, Smart TVs, Digital Screens, etc. The number of these items sometimes goes over millions for some businesses. Therefore, users can have trouble finding the products that they are looking for. Recommendation systems address this problem by providing powerful methods which enable users to filter through large information and product space based on their preferences. Moreover, users have different preferences. Thus, businesses can employ recommendation …
Data Mining In Computational Proteomics And Genomics, Yang Song
Data Mining In Computational Proteomics And Genomics, Yang Song
Dissertations
This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.
The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …
Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang
Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang
Theses and Dissertations--Computer Science
The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …
Indirect Association Rule Mining For Crime Data Analysis, Riley Englin
Indirect Association Rule Mining For Crime Data Analysis, Riley Englin
EWU Masters Thesis Collection
"Crime data analysis is difficult to undertake. There are continuous efforts to analyze crime and determine ways to combat crime but that task is a complex one. Additionally, the nature of a domestic violence crime is hard to detect and even more difficult to predict. Recently police have taken steps to better classify domestic violence cases. The problem is that there is nominal research into this category of crime, possibly due to its sensitive nature or lack of data available for analysis, and therefore there is little known about these crimes and how they relate to others. The objectives of …
Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang
Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang
Legacy Theses & Dissertations (2009 - 2024)
Managing large and growing amount of information is a central goal of modern computer science. Data repositories of texts, images and videos have become widely accessible, thus necessitating good methods of retrieval, organization and exploration.
Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya
Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya
LSU Doctoral Dissertations
Molecular dynamics simulation method is widely used to calculate and understand a wide range of properties of materials. A lot of research efforts have been focused on simulation techniques but relatively fewer works are done on methods for analyzing the simulation results. Large-scale simulations usually generate massive amounts of data, which make manual analysis infeasible, particularly when it is necessary to look into the details of the simulation results. In this dissertation, we propose a system that uses computational method to automatically perform analysis of simulation data, which represent atomic position-time series. The system identifies, in an automated fashion, the …
Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell
Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell
Doctoral Dissertations
This research defines, models, and quantifies a new metric for social networks: the social fingerprint. Just as one's fingers leave behind a unique trace in a print, this dissertation introduces and demonstrates that the manner in which people interact with other accounts on social networks creates a unique data trail. Accurate identification of a user's social fingerprint can address the growing demand for improved techniques in unique user account analysis, computational forensics and social network analysis.
In this dissertation, we theorize, construct and test novel software and methodologies which quantify features of social network data. All approaches and methodologies are …
A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace
A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace
Doctoral Dissertations
As the level of sophistication in power system technologies increases, the amount of system state parameters being recorded also increases. This data not only provides an opportunity for monitoring and diagnostics of a power system, but it also creates an environment wherein security can be maintained. Being able to extract relevant information from this pool of data is one of the key challenges still yet to be obtained in the smart grid. The potential exists for the creation of innovative power grid cybersecurity applications, which harness the information gained from advanced analytics. Such analytics can be based on the extraction …
Ranking-Based Approaches For Localizing Faults, Lucia Lucia
Ranking-Based Approaches For Localizing Faults, Lucia Lucia
Dissertations and Theses Collection (Open Access)
A fault is the root cause of program failures where a program behaves differently from the intended behavior. Finding or localizing faults is often laborious (especially so for complex programs), yet it is an important task in the software lifecycle. An automated technique that can accurately and quickly identify the faulty code is greatly needed to alleviate the costs of software debugging. Many fault localization techniques assume that faults are localizable, i.e., each fault manifests only in a single or a few lines of code that are close to one another. To verify this assumption, we study how faults spread …
Corl8: A System For Analyzing Diagnostic Measures In Wireless Sensor Networks, Loren Klingman
Corl8: A System For Analyzing Diagnostic Measures In Wireless Sensor Networks, Loren Klingman
All Theses
Due to an increasing demand to monitor the physical world, researchers are deploying wireless sensor networks more than ever before. These networks comprise a large number of sensors integrated with small, low-power wireless transceivers used to transmit data to a central processing and storage location. These devices are often deployed in harsh, volatile locations, which increases their failure rate and decreases the rate at which packets can be successfully transmitted. Existing sensor debugging tools, such as Sympathy and EmStar, rely on add-in network protocols to report status information, and to collectively diagnose network problems. Some protocols rely on a central …
On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen
On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen
Dissertations and Theses Collection (Open Access)
User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.
Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen
Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen
Legacy Theses & Dissertations (2009 - 2024)
Graphs are intuitive representations of relational data. Graphs have been widely used to represent biological molecular networks that operate in the living systems. In the study of systems biology, using graph mining techniques and graph-theory-based algorithms to
Roughened Random Forests For Binary Classification, Kuangnan Xiong
Roughened Random Forests For Binary Classification, Kuangnan Xiong
Legacy Theses & Dissertations (2009 - 2024)
Binary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are …
Multi-Threaded Implementation Of Association Rule Mining With Visualization Of The Pattern Tree, Eera Gupta
Multi-Threaded Implementation Of Association Rule Mining With Visualization Of The Pattern Tree, Eera Gupta
LSU Master's Theses
Motor Vehicle fatalities per 100,000 population in the United States has been reported to be 10.69% in the year 2012 as per NHTSA (National Highway Traffic Safety Administration). The fatality rate has increased by 0.27% in 2012 compared to the rate in the year 2011. As per the reports, there are many factors involved in increasing the fatality rate drastically such as driving under influence, testing while driving, and various other weather phenomena. Decision makers need to analyze the factors attributing to the increase in an accident rate to take implied measures. Current methods used to perform the data analysis …
Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan
Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan
Computer Science Theses
Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, …
Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali
Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali
Electronic Thesis and Dissertation Repository
In this thesis we developed a new fault localization process to localize faults in object oriented software. The process is built upon the "Encapsulation'' principle and aims to locate state-dependent discrepancies in the software's behavior. We experimented with the proposed process on 50 seeded faults in 8 subject programs, and were able to locate the faulty class in 100% of the cases when objects with constant states were taken into consideration, while we missed 24% percent of the faults when these objects were not considered. We also developed a customized data mining technique "Associated sequence mining'' to be used in …
Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi
Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi
Theses and Dissertations - UTB/UTPA
Visual Data Mining have proven to be of high value in exploratory data analysis and data mining because it provides an intuitive feedback on data analysis and support decision-making activities. Several visualization techniques have been developed for cluster discovery such as Grand Tour, HD-Eye, Star Coordinates, etc. They are very useful tool which are visualized in 2D or 3D; however, they have not simple for users who are not trained. This thesis proposes a new approach to build a 3D clustering visualization system for document clustering by using k-mean algorithm. A cluster will be represented by a neutron (centroid) and …
On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj
On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj
LSU Doctoral Dissertations
In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
Theses and Dissertations--Computer Science
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …
Exploring The Learnability Of Numeric Datasets, Di Lin
Exploring The Learnability Of Numeric Datasets, Di Lin
LSU Doctoral Dissertations
When doing classification, it has often been observed that datasets may exhibit different levels of difficulty with respect to how accurately they can be classified. That is, there are some datasets which can be classified very accurately by many classification algorithms, and there also exist some other datasets that no classifier can classify them with high accuracy. Based on this observation, we try to address the following problems: a)what are the factors that make a dataset easy or difficult to be accurately classified? b) how to use such factors to predict the difficulties of unclassified datasets? and c) how to …
Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini
Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini
Doctoral Dissertations
Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …
Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin
Computational Modeling & Simulation Engineering Theses & Dissertations
Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.
This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …
A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson
A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson
Theses and Dissertations
Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …
Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu
Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu
Theses
RNA (Ribonucleic acid) Motifs are tertiary structures that play an important role in the folding mechanism of the RNA molecule. The overall function of a RNA Motif depends on its specific bp (base pairs) sequence that constitutes the secondary structure. Data mining is a novel method in both discovering potential tertiary structures within DNA (Deoxyribonucleic acid), RNA, and protein molecules and storing the information in databases. The RNA Motif of interest is the tetraloop-tetraloop receptor, which is composed of a highly conserved 11 nt (nucleotide) sequence and a tetraloop with the generic form of GNRA (where N = any base …
Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor
Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor
Theses and Dissertations
Software development is a process fraught with unpredictability, in part because software is created by people. Human interactions add complexity to development processes, and collaborative development can become a liability if not properly understood and managed. Recent years have seen an increase in the use of data mining techniques on publicly-available repository data with the goal of improving software development processes, and by extension, software quality. In this thesis, we introduce the concept of author entropy as a metric for quantifying interaction and collaboration (both within individual files and across projects), present results from two empirical observational studies of open-source …