Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Computer Engineering

Information-Theoretic Model Diagnostics (Infomod), Armin Esmaeilzadeh May 2023

Information-Theoretic Model Diagnostics (Infomod), Armin Esmaeilzadeh

UNLV Theses, Dissertations, Professional Papers, and Capstones

Model validation is a critical step in the development, deployment, and governance of machine learning models. During the validation process, the predictive power of a model is measured on unseen datasets with a variety of metrics such as Accuracy and F1-Scores for classification tasks. Although the most used metrics are easy to implement and understand, they are aggregate measures over all the segments of heterogeneous datasets, and therefore, they do not identify the performance variation of a model among different data segments. The lack of insight into how the model performs over segments of unseen datasets has raised significant challenges …


Preprocessing Of Astronomical Images From The Neowise Survey For Near-Earth Asteroid Detection, Rachel Meyer Apr 2022

Preprocessing Of Astronomical Images From The Neowise Survey For Near-Earth Asteroid Detection, Rachel Meyer

Scholar Week 2016 - present

Asteroid detection is a common field in astronomy for planetary defense which requires observations from survey telescopes to detect and classify different objects. The amount of data collected each night is increasing as better designed telescopes are created each year. This amount is quickly becoming unmanageable and many researchers are looking for ways to better process this data. The dominant solution is to implement computer algorithms to automatically detect these sources and to use Machine Learning in order to create a more efficient and accurate classifier. In the past there has been a focus on larger asteroids that create streaks …


Multilateration Index., Chip Lynch Aug 2021

Multilateration Index., Chip Lynch

Electronic Theses and Dissertations

We present an alternative method for pre-processing and storing point data, particularly for Geospatial points, by storing multilateration distances to fixed points rather than coordinates such as Latitude and Longitude. We explore the use of this data to improve query performance for some distance related queries such as nearest neighbor and query-within-radius (i.e. “find all points in a set P within distance d of query point q”). Further, we discuss the problem of “Network Adequacy” common to medical and communications businesses, to analyze questions such as “are at least 90% of patients living within 50 miles of a covered emergency …


Semantic, Integrated Keyword Search Over Structured And Loosely Structured Databases, Xinge Lu Dec 2020

Semantic, Integrated Keyword Search Over Structured And Loosely Structured Databases, Xinge Lu

Dissertations

Keyword search has been seen in recent years as an attractive way for querying data with some form of structure. Indeed, it allows simple users to extract information from databases without mastering a complex structured query language and without having knowledge of the schema of the data. It also allows for integrated search of heterogeneous data sources. However, as keyword queries are ambiguous and not expressive enough, keyword search cannot scale satisfactorily on big datasets and the answers are, in general, of low accuracy. Therefore, flat keyword search alone cannot efficiently return high quality results on large data with structure. …


Predicting Co And Nox Emissions From Gas Turbines: Novel Data And A Benchmark Pems, Heysem Kaya, Pinar Tüfekci̇, Erdi̇nç Uzun Jan 2019

Predicting Co And Nox Emissions From Gas Turbines: Novel Data And A Benchmark Pems, Heysem Kaya, Pinar Tüfekci̇, Erdi̇nç Uzun

Turkish Journal of Electrical Engineering and Computer Sciences

Predictive emission monitoring systems (PEMS) are important tools for validation and backing up of costly continuous emission monitoring systems used in gas-turbine-based power plants. Their implementation relies on the availability of appropriate and ecologically valid data. In this paper, we introduce a novel PEMS dataset collected over five years from a gas turbine for the predictive modeling of the CO and NOx emissions. We analyze the data using a recent machine learning paradigm, and present useful insights about emission predictions. Furthermore, we present a benchmark experimental procedure for comparability of future works on the data


Performance Analysis Of Java Persistence Api Providers, Besart Pllana Oct 2018

Performance Analysis Of Java Persistence Api Providers, Besart Pllana

UBT International Conference

Nowadays, fast and accurate access to data is very important. Usually data is managed and processed through software applications. In recent years, the most preferred programming model by most application developers is Object Oriented Programming (OOP) where data is represented through objects. These data must be persistent and therefore needs to be stored, and storage can be done on a variety of databases. The most common databases are Relational Database Management Systems (RDBMS). While persistence of objects in RDBMS is limited by object-relational mismatch which is the inconsistency of the direct interaction between two components based on different approaches, OOP …


Sort Vs. Hash Join On Knights Landing Architecture, Victor L. Pan, Felix Lin Aug 2018

Sort Vs. Hash Join On Knights Landing Architecture, Victor L. Pan, Felix Lin

The Summer Undergraduate Research Fellowship (SURF) Symposium

With the increasing amount of information stored, there is a need for efficient database algorithms. One of the most important database operations is “join”. This involves combining columns from two tables and grouping common values in the same row in order to minimize redundant data. The two main algorithms used are hash join and sort merge join. Hash join builds a hash table to allow for faster searching. Sort merge join first sorts the two tables to make it more efficient when comparing values. There has been a lot of debate over which approach is superior. At first, hash join …


Querying And Visualization Of Moving Objects Using Constraint Databases, Semere M. Woldemariam Jul 2017

Querying And Visualization Of Moving Objects Using Constraint Databases, Semere M. Woldemariam

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Good querying and visualization of moving objects and their trajectories is still an open problem. This thesis investigates three types of moving objects. First, projectiles, whose parabolic motion is difficult to represent. Second, moving objects that slide down a slope. The representation of these objects is challenging because of their accelerating motion. Third, the motion of migrating animals. The motion of migrating animals is challenging because it also involves some spatio-temporal interpolation. The thesis shows a solution to these problems using ideas from physics and an implementation in the MLPQ constraint databases system. The MLPQ implementation enables several complex spatio-temporal …


Data Mining Of Protein Databases, Christopher Assi Jul 2012

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …


Provable De-Anonymization Of Large Datasets With Sparse Dimensions, Anupam Datta, Divya Sharma, Arunesh Sinha Apr 2012

Provable De-Anonymization Of Large Datasets With Sparse Dimensions, Anupam Datta, Divya Sharma, Arunesh Sinha

Research Collection School Of Computing and Information Systems

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional …