Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Computer Engineering

Design And Analysis Of Content Caching Systems, Anirudh Sabnis Aug 2023

Design And Analysis Of Content Caching Systems, Anirudh Sabnis

Doctoral Dissertations

Caching is a simple yet powerful technique that has had a significant impact on improving the performance of various computer systems. From internet content delivery to CPUs, domain name systems, and database systems, caching has played a pivotal role in making these systems faster and more efficient. The basic idea behind caching is to store frequently accessed data locally, so that future requests for that data can be served more quickly. For example, a Content Delivery Network (CDN) like Akamai deploys thousands of edge caches across the globe, so that end-user requests can be served from a nearby cache, rather …


Data Scarcity In Event Analysis And Abusive Language Detection, Sheikh Muhammad Sarwar Oct 2022

Data Scarcity In Event Analysis And Abusive Language Detection, Sheikh Muhammad Sarwar

Doctoral Dissertations

Lack of data is almost always the cause of the suboptimal performance of neural networks. Even though data scarce scenarios can be simulated for any task by assuming limited access to training data, we study two problem areas where data scarcity is a practical challenge: event analysis and abusive content detection} Journalists, social scientists and political scientists need to retrieve and analyze event mentions in unstructured text to compute useful statistical information to understand society. We claim that it is hard to specify information need about events using keyword-based representation and propose a Query by Example (QBE) setting for event …


Improving And Understanding Data Quality In Large-Scale Data Systems, Xiaolan Wang Mar 2019

Improving And Understanding Data Quality In Large-Scale Data Systems, Xiaolan Wang

Doctoral Dissertations

Systems and applications rely heavily on data, which makes data quality a critical factor for their function. In turn, low quality data can be incredibly costly and disruptive, leading to loss of revenue, incorrect conclusions, and misguided policy decisions. Improving data quality is far more than purging datasets of errors; it is more important to improve the processes that produce the data, to collect good data sources that are used for generating the data, and to truly understand the quality of the data. Therefore, the objective of this thesis is to improve and understand data quality from the above aspects. …


A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong Aug 2017

A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong

Doctoral Dissertations

Data has overwhelmed the digital world in terms of volume, variety and velocity. Data- intensive applications are facing unprecedented challenges. On the other hand, computation resources, such as memory, suffer from shortage comparing to data scale. However, in certain applications, it is a must to process large amount of data in a time efficient manner. Probabilistic approaches are compromises between these three perspectives: large amount of data, limited computation resources and high time efficiency, in the sense that those approaches cannot guarantee 100% correctness, their error rates, however, are predictable and adjustable depending on available computation resources and time constraints. …


Achieving High Reliability And Efficiency In Maintaining Large-Scale Storage Systems Through Optimal Resource Provisioning And Data Placement, Lipeng Wan Aug 2016

Achieving High Reliability And Efficiency In Maintaining Large-Scale Storage Systems Through Optimal Resource Provisioning And Data Placement, Lipeng Wan

Doctoral Dissertations

With the explosive increase in the amount of data being generated by various applications, large-scale distributed and parallel storage systems have become common data storage solutions and been widely deployed and utilized in both industry and academia. While these high performance storage systems significantly accelerate the data storage and retrieval, they also bring some critical issues in system maintenance and management. In this dissertation, I propose three methodologies to address three of these critical issues.

First, I develop an optimal resource management and spare provisioning model to minimize the impact brought by component failures and ensure a highly operational experience …


Universal Schema For Knowledge Representation From Text And Structured Data, Limin Yao Mar 2015

Universal Schema For Knowledge Representation From Text And Structured Data, Limin Yao

Doctoral Dissertations

In data integration we transform information from a source into a target schema. A general problem in this task is loss of fidelity and coverage: the source expresses more knowledge than that can be fit into the target schema, or knowledge that is hard to fit into any schema at all. This problem is taken to an extreme in information extraction (IE) where the source is natural language---one of the most expressive forms of knowledge representation. To address this issue, one can either automatically learn a latent schema emergent in text (a brittle and ill-defined task), or manually define schemas. …